[jira] [Commented] (SPARK-30077) create TEMPORARY VIEW USING should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-30077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984803#comment-16984803 ] Huaxin Gao commented on SPARK-30077: I will work on this > create TEMPORARY VIEW USING should look up catalog/table like v2 commands > - > > Key: SPARK-30077 > URL: https://issues.apache.org/jira/browse/SPARK-30077 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Major > > create TEMPORARY VIEW USING should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30077) create TEMPORARY VIEW USING should look up catalog/table like v2 commands
Huaxin Gao created SPARK-30077: -- Summary: create TEMPORARY VIEW USING should look up catalog/table like v2 commands Key: SPARK-30077 URL: https://issues.apache.org/jira/browse/SPARK-30077 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Huaxin Gao create TEMPORARY VIEW USING should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30069) Clean up non-shuffle disk block manager files following executor exists on YARN
[ https://issues.apache.org/jira/browse/SPARK-30069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lantao Jin updated SPARK-30069: --- Component/s: Spark Core > Clean up non-shuffle disk block manager files following executor exists on > YARN > --- > > Key: SPARK-30069 > URL: https://issues.apache.org/jira/browse/SPARK-30069 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Currently we only clean up the local directories on application removed. > However, when executors die and restart repeatedly, many temp files are left > untouched in the local directories, which is undesired behavior and could > cause disk space used up gradually. > SPARK-24340 had fixed this problem in standalone mode. But in YARN mode, this > issue still exists. Especially, in long running service like Spark > thrift-server with dynamic resource allocation disabled, it's very easy > causes local disk full. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30078) flatMapGroupsWithState failure
salamani created SPARK-30078: Summary: flatMapGroupsWithState failure Key: SPARK-30078 URL: https://issues.apache.org/jira/browse/SPARK-30078 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 2.4.4 Reporter: salamani I have built Apache Spark v2.4.4 on Big Endian Platform with AdoptJDK OpenJ9 1.8.0_202. My build is successful. However while running the scala tests of "Spark Project SQL" module, I am facing failures at with FlatMapGroupsWithStateSuite, Error Log as attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30078) flatMapGroupsWithState failure
[ https://issues.apache.org/jira/browse/SPARK-30078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salamani updated SPARK-30078: - Attachment: FlatMapGroupsWithStateSuite.txt > flatMapGroupsWithState failure > -- > > Key: SPARK-30078 > URL: https://issues.apache.org/jira/browse/SPARK-30078 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.4 >Reporter: salamani >Priority: Major > Labels: big-endian > Attachments: FlatMapGroupsWithStateSuite.txt > > > I have built Apache Spark v2.4.4 on Big Endian Platform with AdoptJDK OpenJ9 > 1.8.0_202. > My build is successful. However while running the scala tests of "Spark > Project SQL" module, I am facing failures at with > FlatMapGroupsWithStateSuite, Error Log as attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30079) Tests fail in environments with locale different from en_US
Lukas Menzel created SPARK-30079: Summary: Tests fail in environments with locale different from en_US Key: SPARK-30079 URL: https://issues.apache.org/jira/browse/SPARK-30079 Project: Spark Issue Type: Bug Components: Build, Tests Affects Versions: 3.0.0 Environment: any environment, with non-english locale and/or different separators for numbers. Reporter: Lukas Menzel Tests fail on systems with different locale than en_US. Assertions regarding messages of exceptions fail, because they are localized by Java depending on the system environment. (e.g org.apache.spark.deploy.SparkSubmitSuite) Other tests fail because of assertions about formatted numbers, which use a different separators (see [https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html]) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30080) ADD/LIST Resources should look up catalog/table like v2 commands
[ https://issues.apache.org/jira/browse/SPARK-30080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984901#comment-16984901 ] Aman Omer commented on SPARK-30080: --- I will work on this > ADD/LIST Resources should look up catalog/table like v2 commands > > > Key: SPARK-30080 > URL: https://issues.apache.org/jira/browse/SPARK-30080 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Aman Omer >Priority: Major > > ADD/LIST Resources should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30080) ADD/LIST Resources should look up catalog/table like v2 commands
Aman Omer created SPARK-30080: - Summary: ADD/LIST Resources should look up catalog/table like v2 commands Key: SPARK-30080 URL: https://issues.apache.org/jira/browse/SPARK-30080 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Aman Omer ADD/LIST Resources should look up catalog/table like v2 commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30081) StreamingAggregationSuite failure on zLinux(big endian)
Dev Leishangthem created SPARK-30081: Summary: StreamingAggregationSuite failure on zLinux(big endian) Key: SPARK-30081 URL: https://issues.apache.org/jira/browse/SPARK-30081 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 2.4.4 Reporter: Dev Leishangthem The tests in 3 instance, the first two is at *[info] - SPARK-23004: Ensure that TypedImperativeAggregate functions do not throw errors - state format version 1 *** FAILED *** (760 milliseconds)* *[info] Assert on query failed: : Query [id = 065b66ad-227a-46a4-9d9d-50d27672f02a, runId = 99c001b7-45df-4977-89b6-f68970378f4b] terminated with exception: Job aborted due to stage failure: Task 0 in stage 192.0 failed 1 times, most recent failure: Lost task 0.0 in stage 192.0 (TID 518, localhost, executor driver): java.lang.AssertionError: sizeInBytes (76) should be a multiple of 8* *[info] at org.apache.spark.sql.catalyst.expressions.UnsafeRow.pointTo(UnsafeRow.java:168)* *[info] at org.apache.spark.sql.execution.UnsafeKVExternalSorter$KVSorterIterator.next(UnsafeKVExternalSorter.java:297)* *[info] at org.apache.spark.sql.execution.aggregate.SortBasedAggregator$$anon$1.(ObjectAggregationIterator.scala:242)* *[info] at org.apache.spark.sql.execution.aggregate.SortBasedAggregator.destructiveIterator(ObjectAggregationIterator.scala:239)* *[info] at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:198)* *[info] at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.(ObjectAggregationIterator.scala:78)* *[info] at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:114)* *[info] at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)* *[info] at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)* *[info] at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)* *[info] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)* *[info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)* and third one is *[info] - simple count, update mode - recovery from checkpoint uses state format version 1 *** FAILED *** (1 second, 21 milliseconds)* *[info] == Results ==* *[info] !== Correct Answer - 3 == == Spark Answer - 3 ==* *[info] !struct<_1:int,_2:int> struct* *[info] [1,1] [1,1]* *[info] ![2,2] [2,1]* *[info] ![3,3] [3,1]* *[info]* *[info]* *[info] == Progress ==* *[info] StartStream(ProcessingTime(0),org.apache.spark.util.SystemClock@f12c12fb,Map(spark.sql.streaming.aggregation.stateFormatVersion -> 2),/scratch/devleish/spark/target/tmp/spark-5a533a9c-da17-41f9-a7d4-c3309d1c2b6f)* *[info] AddData to MemoryStream[value#1713]: 3,2,1* *[info] => CheckLastBatch: [3,3],[2,2],[1,1]* *[info] AssertOnQuery(, name)* *[info] AddData to MemoryStream[value#1713]: 4,4,4,4* *[info] CheckLastBatch: [4,4]* This *[https://github.com/apache/spark/commit/ebbe589d12434bc108672268bee05a7b7e571ee6] e*nsures that value is multiple of 8, but looks like it is not the case Big Endian platform -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30082) Zeros are being treated as NaNs
John Ayad created SPARK-30082: - Summary: Zeros are being treated as NaNs Key: SPARK-30082 URL: https://issues.apache.org/jira/browse/SPARK-30082 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.4 Reporter: John Ayad If you attempt to run {code} df = df.replace(float('nan'), somethingToReplaceWith) {code} It will replace all {{0}}s in columns of type {{Integer}} Example code snippet to repro this: {code} from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show() {code} Here's the output I get when I run this code: {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> spark = SQLContext(sc).sparkSession >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ |1|0| |2|3| |3|0| +-+-+ >>> df = df.replace(float('nan'), 5) >>> df.show() +-+-+ |index|value| +-+-+ |1|5| |2|3| |3|5| +-+-+ >>> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Ayad updated SPARK-30082: -- Description: If you attempt to run {code:java} df = df.replace(float('nan'), somethingToReplaceWith) {code} It will replace all {{0}} s in columns of type {{Integer}} Example code snippet to repro this: {code:java} from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show() {code} Here's the output I get when I run this code: {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> spark = SQLContext(sc).sparkSession >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ |1|0| |2|3| |3|0| +-+-+ >>> df = df.replace(float('nan'), 5) >>> df.show() +-+-+ |index|value| +-+-+ |1|5| |2|3| |3|5| +-+-+ >>> {code} was: If you attempt to run {code} df = df.replace(float('nan'), somethingToReplaceWith) {code} It will replace all {{0}}s in columns of type {{Integer}} Example code snippet to repro this: {code} from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show() {code} Here's the output I get when I run this code: {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> spark = SQLContext(sc).sparkSession >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ |1|0| |2|3| |3|0| +-+-+ >>> df = df.replace(float('nan'), 5) >>> df.show() +-+-+ |index|value| +-+-+ |1|5| |2|3| |3|5| +-+-+ >>> {code} > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Major > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Ayad updated SPARK-30082: -- Priority: Critical (was: Major) > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Critical > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27719) Set maxDisplayLogSize for spark history server
[ https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985021#comment-16985021 ] Ajith S commented on SPARK-27719: - Currently our production also encounter this issue, I would like to work on this, as per suggestion of [~hao.li] is the idea acceptable [~dongjoon] .? > Set maxDisplayLogSize for spark history server > -- > > Key: SPARK-27719 > URL: https://issues.apache.org/jira/browse/SPARK-27719 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: hao.li >Priority: Minor > > Sometimes a very large eventllog may be useless, and parses it may waste many > resources. > It may be useful to avoid parse large enventlogs by setting a configuration > spark.history.fs.maxDisplayLogSize. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985118#comment-16985118 ] John Ayad commented on SPARK-30082: --- Just thought i'd update on this, the {{replace}} function seems to be, correctly, replacing {{NaN}}s. Here's a better example that also demonstrates that the problem is limited to columns of type {{Integer}}: {code:java} >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], >>> ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ | 1.0|0| | 0.0|3| | NaN|0| +-+-+>>> df.replace(float('nan'), 2).show() +-+-+ |index|value| +-+-+ | 1.0|2| | 0.0|3| | 2.0|2| +-+-+ {code} > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Critical > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985118#comment-16985118 ] John Ayad edited comment on SPARK-30082 at 11/29/19 5:07 PM: - Just thought i'd update on this, the {{replace}} function seems to be, correctly, replacing {{NaN}}s. Here's a better example that also demonstrates that the problem is limited to columns of type Integer}}: {code:java} >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], >>> ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ | 1.0|0| | 0.0|3| | NaN|0| +-+-+>>> df.replace(float('nan'), 2).show() +-+-+ |index|value| +-+-+ | 1.0|2| | 0.0|3| | 2.0|2| +-+-+ {code} was (Author: jayad): Just thought i'd update on this, the {{replace}} function seems to be, correctly, replacing {{NaN}}s. Here's a better example that also demonstrates that the problem is limited to columns of type {{Integer}}: {code:java} >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], >>> ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ | 1.0|0| | 0.0|3| | NaN|0| +-+-+>>> df.replace(float('nan'), 2).show() +-+-+ |index|value| +-+-+ | 1.0|2| | 0.0|3| | 2.0|2| +-+-+ {code} > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Critical > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985118#comment-16985118 ] John Ayad edited comment on SPARK-30082 at 11/29/19 5:08 PM: - Just thought i'd update on this, the {{replace}} function seems to be, correctly, replacing {{NaNs. Here's a better example that also demonstrates that the problem is limited to columns of type Integer}}: {code:java} >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], >>> ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ | 1.0|0| | 0.0|3| | NaN|0| +-+-+ >>> df.replace(float('nan'), 2).show() +-+-+ |index|value| +-+-+ | 1.0|2| | 0.0|3| | 2.0|2| +-+-+ {code} was (Author: jayad): Just thought i'd update on this, the {{replace}} function seems to be, correctly, replacing {{NaN}}s. Here's a better example that also demonstrates that the problem is limited to columns of type Integer}}: {code:java} >>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], >>> ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ | 1.0|0| | 0.0|3| | NaN|0| +-+-+>>> df.replace(float('nan'), 2).show() +-+-+ |index|value| +-+-+ | 1.0|2| | 0.0|3| | 2.0|2| +-+-+ {code} > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Critical > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30083) visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
Kent Yao created SPARK-30083: Summary: visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking Key: SPARK-30083 URL: https://issues.apache.org/jira/browse/SPARK-30083 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Kent Yao For PLUS case, visitArithmeticUnary do not wrap the expr with UnaryPositive, so it escapes from type checking -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30063) Failure when returning a value from multiple Pandas UDFs
[ https://issues.apache.org/jira/browse/SPARK-30063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985192#comment-16985192 ] Ruben Berenguel commented on SPARK-30063: - Hi [~tkellogg] I’d like to have a look, do you have some small or shareable reproducible data/code? Otherwise it’s a bit hard to pinpoint in which side (Spark, Arrow-Spark, Python) the problem may be (since it may as well be a combination of the 3). My hunch is that the schema may be passed incorrectly (as in your related bug) or the converse, the schema is being passed correctly and the data incorrectly (different order). When that happens the Arrow reader at the JVM won’t make sense of the message received, and the error would look like that one. > Failure when returning a value from multiple Pandas UDFs > > > Key: SPARK-30063 > URL: https://issues.apache.org/jira/browse/SPARK-30063 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3, 2.4.4 > Environment: Happens on Mac & Ubuntu (Docker). Seems to happen on > both 2.4.3 and 2.4.4 >Reporter: Tim Kellogg >Priority: Major > Attachments: spark-debug.txt > > > I have 20 Pandas UDFs that I'm trying to evaluate all at the same time. > * PandasUDFType.GROUPED_AGG > * 3 columns in the input data frame being serialized over Arrow to Python > worker. See below for clarification. > * All functions take 2 parameters, some combination of the 3 received as > Arrow input. > * Varying return types, see details below. > _*I get an IllegalArgumentException on the Scala side of the worker when > deserializing from Python.*_ > h2. Exception & Stack Trace > {code:java} > 19/11/27 11:38:36 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 5) > java.lang.IllegalArgumentException > at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543) > at > org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58) > at > org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:132) > at > org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:181) > at > org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:172) > at > org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:65) > at > org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:162) > at > org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:122) > at > org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 19/11/27 11:38:36 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 5, > localhost, executor driver): java.lang.IllegalArgumentException > at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543) > at > org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58) > at > org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowSt
[jira] [Created] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs
Nicholas Chammas created SPARK-30084: Summary: Add docs showing how to automatically rebuild Python API docs Key: SPARK-30084 URL: https://issues.apache.org/jira/browse/SPARK-30084 Project: Spark Issue Type: Improvement Components: Build, Documentation Affects Versions: 3.0.0 Reporter: Nicholas Chammas `jekyll serve --watch` doesn't watch the API docs. That means you have to kill and restart jekyll every time you update your API docs, just to see the effect. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI
[ https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985242#comment-16985242 ] Gengliang Wang commented on SPARK-29724: This issue is resolved in https://github.com/apache/spark/pull/26378 > Support JDBC/ODBC tab for HistoryServer WebUI > - > > Key: SPARK-29724 > URL: https://issues.apache.org/jira/browse/SPARK-29724 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Support JDBC/ODBC tab for HistoryServerWebUI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29726) Support KV store for listener HiveThriftServer2Listener
[ https://issues.apache.org/jira/browse/SPARK-29726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-29726. Assignee: shahid Resolution: Fixed > Support KV store for listener HiveThriftServer2Listener > --- > > Key: SPARK-29726 > URL: https://issues.apache.org/jira/browse/SPARK-29726 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Assignee: shahid >Priority: Minor > > Support KVstore for HiveThriftServer2Listener -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29726) Support KV store for listener HiveThriftServer2Listener
[ https://issues.apache.org/jira/browse/SPARK-29726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985243#comment-16985243 ] Gengliang Wang commented on SPARK-29726: This issue is resolved in https://github.com/apache/spark/pull/26378 > Support KV store for listener HiveThriftServer2Listener > --- > > Key: SPARK-29726 > URL: https://issues.apache.org/jira/browse/SPARK-29726 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Minor > > Support KVstore for HiveThriftServer2Listener -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI
[ https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-29724. Resolution: Fixed > Support JDBC/ODBC tab for HistoryServer WebUI > - > > Key: SPARK-29724 > URL: https://issues.apache.org/jira/browse/SPARK-29724 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Priority: Major > > Support JDBC/ODBC tab for HistoryServerWebUI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI
[ https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-29724: -- Assignee: shahid > Support JDBC/ODBC tab for HistoryServer WebUI > - > > Key: SPARK-29724 > URL: https://issues.apache.org/jira/browse/SPARK-29724 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: shahid >Assignee: shahid >Priority: Major > > Support JDBC/ODBC tab for HistoryServerWebUI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29991) Support `test-hive1.2` in PR Builder
[ https://issues.apache.org/jira/browse/SPARK-29991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29991. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26710 [https://github.com/apache/spark/pull/26710] > Support `test-hive1.2` in PR Builder > > > Key: SPARK-29991 > URL: https://issues.apache.org/jira/browse/SPARK-29991 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27719) Set maxDisplayLogSize for spark history server
[ https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985270#comment-16985270 ] Jungtaek Lim commented on SPARK-27719: -- If your application is streaming, I think we are going on the right approach to deal with it - see https://issues.apache.org/jira/browse/SPARK-28594. The point is, normally we don't want to stop reading event log until some offset/size, as what we may really have interest is the "latest" status. And there're some events which we shouldn't ignore or clean up - app start, app end, environment update, etc. > Set maxDisplayLogSize for spark history server > -- > > Key: SPARK-27719 > URL: https://issues.apache.org/jira/browse/SPARK-27719 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: hao.li >Priority: Minor > > Sometimes a very large eventllog may be useless, and parses it may waste many > resources. > It may be useful to avoid parse large enventlogs by setting a configuration > spark.history.fs.maxDisplayLogSize. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30085) standardize partition spec in sql reference
Huaxin Gao created SPARK-30085: -- Summary: standardize partition spec in sql reference Key: SPARK-30085 URL: https://issues.apache.org/jira/browse/SPARK-30085 Project: Spark Issue Type: Sub-task Components: Documentation, SQL Affects Versions: 3.0.0 Reporter: Huaxin Gao Use the same partition spec for all the sql reference docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29579) Guarantee compatibility of snapshot (live entities, KVstore entities)
[ https://issues.apache.org/jira/browse/SPARK-29579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-29579: - Parent: (was: SPARK-28594) Issue Type: Task (was: Sub-task) > Guarantee compatibility of snapshot (live entities, KVstore entities) > - > > Key: SPARK-29579 > URL: https://issues.apache.org/jira/browse/SPARK-29579 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Priority: Major > > This issue is a follow-up issue after SPARK-29111 and SPARK-29261, which both > issues WILL NOT guarantee compatibility. > To safely clean up old event log files after snapshot has been written for > these files, we have to ensure the snapshot file can restore the state as > same as we replay from these event log files. The issue is on migrating to > the newer Spark version - if snapshot is not readable due to incompatibility, > the app cannot be read entirely as we've already removed old event log files. > If we can guarantee compatibility we can move on to the next item, cleaning > up old event log files to save space. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29579) Guarantee compatibility of snapshot (live entities, KVstore entities)
[ https://issues.apache.org/jira/browse/SPARK-29579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-29579: - Parent: SPARK-28870 Issue Type: Sub-task (was: Task) > Guarantee compatibility of snapshot (live entities, KVstore entities) > - > > Key: SPARK-29579 > URL: https://issues.apache.org/jira/browse/SPARK-29579 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Priority: Major > > This issue is a follow-up issue after SPARK-29111 and SPARK-29261, which both > issues WILL NOT guarantee compatibility. > To safely clean up old event log files after snapshot has been written for > these files, we have to ensure the snapshot file can restore the state as > same as we replay from these event log files. The issue is on migrating to > the newer Spark version - if snapshot is not readable due to incompatibility, > the app cannot be read entirely as we've already removed old event log files. > If we can guarantee compatibility we can move on to the next item, cleaning > up old event log files to save space. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29991) Support `test-hive1.2` in PR Builder
[ https://issues.apache.org/jira/browse/SPARK-29991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-29991: Assignee: Hyukjin Kwon (was: Dongjoon Hyun) > Support `test-hive1.2` in PR Builder > > > Key: SPARK-29991 > URL: https://issues.apache.org/jira/browse/SPARK-29991 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org