[jira] [Commented] (SPARK-21074) Parquet files are read fully even though only count() is requested
[ https://issues.apache.org/jira/browse/SPARK-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053532#comment-16053532 ] Kazuaki Ishizaki commented on SPARK-21074: -- I think that [~spektom] expected the similar behavior to this [article|https://stackoverflow.com/questions/40629435/fast-parquet-row-count-in-spark]. In summary, it is to read only meta data. I agree with setting "improvement" since this is a kind of performance optimization. > Parquet files are read fully even though only count() is requested > -- > > Key: SPARK-21074 > URL: https://issues.apache.org/jira/browse/SPARK-21074 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.1.0 >Reporter: Michael Spector > > I have the following sample code that creates parquet files: > {code:java} > val spark = SparkSession.builder() > .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", > "2") > .config("spark.hadoop.parquet.metadata.read.parallelism", "50") > .appName("Test Write").getOrCreate() > val sqc = spark.sqlContext > import sqc.implicits._ > val random = new scala.util.Random(31L) > (1465720077 to 1465720077+1000).map(x => Event(x, random.nextString(2))) > .toDS() > .write > .mode(SaveMode.Overwrite) > .parquet("s3://my-bucket/test") > {code} > Afterwards, I'm trying to read these files with the following code: > {code:java} > val spark = SparkSession.builder() > .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", > "2") > .config("spark.hadoop.parquet.metadata.read.parallelism", "50") > .config("spark.sql.parquet.filterPushdown", "true") > .appName("Test Read").getOrCreate() > spark.sqlContext.read > .option("mergeSchema", "false") > .parquet("s3://my-bucket/test") > .count() > {code} > I've enabled DEBUG log level to see what requests are actually sent through > S3 API, and I've figured out that in addition to parquet "footer" retrieval > there are requests that ask for whole file content. > For example, this is full content request example: > {noformat} > 17/06/13 05:46:50 DEBUG wire: http-outgoing-1 >> "GET > /test/part-0-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet > HTTP/1.1[\r][\n]" > > 17/06/13 05:46:50 DEBUG wire: http-outgoing-1 << "Content-Range: bytes > 0-7472093/7472094[\r][\n]" > > 17/06/13 05:46:50 DEBUG wire: http-outgoing-1 << "Content-Length: > 7472094[\r][\n]" > {noformat} > And this is partial request example for footer only: > {noformat} > 17/06/13 05:46:50 DEBUG headers: http-outgoing-2 >> GET > /test/part-0-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet HTTP/1.1 > > 17/06/13 05:46:50 DEBUG headers: http-outgoing-2 >> Range: > bytes=7472086-7472094 > ... > 17/06/13 05:46:50 DEBUG wire: http-outgoing-2 << "Content-Length: 8[\r][\n]" > > {noformat} > Here's what FileScanRDD prints: > {noformat} > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-4-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7473020, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-00011-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472503, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-6-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472501, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-7-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7473104, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-3-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472458, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-00012-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472594, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-1-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472984, partition values: [empty row] > 17/06/13 05:46:52 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-00014-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472720, partition values: [empty row] > 17/06/13 05:46:53 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/part-8-b8a8a1b7-0581-401f-b520-27fa9600f35e.snappy.parquet, > range: 0-7472339, partition values: [empty row] > 17/06/13 05:46:53 INFO FileScanRDD: Reading File path: > s3://my-bucket/test/
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053509#comment-16053509 ] Dongjoon Hyun commented on SPARK-19644: --- I see. Thank you anyway, [~deenbandhu]. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053502#comment-16053502 ] Deenbandhu Agarwal commented on SPARK-19644: And the spark cassandra connector is also not out for those spark version. which is a dependency for us > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21101) Error running Hive temporary UDTF on latest Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-21101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053501#comment-16053501 ] Dayou Zhou commented on SPARK-21101: Hi [~q79969786], thank you kindly for your response. I was not aware of this 'other' version of initialize() method, and will try our suggestion tomorrow. > Error running Hive temporary UDTF on latest Spark 2.2 > - > > Key: SPARK-21101 > URL: https://issues.apache.org/jira/browse/SPARK-21101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Dayou Zhou > > I'm using temporary UDTFs on Spark 2.2, e.g. > CREATE TEMPORARY FUNCTION myudtf AS 'com.foo.MyUdtf' USING JAR > 'hdfs:///path/to/udf.jar'; > But when I try to invoke it, I get the following error: > {noformat} > 17/06/14 19:43:50 ERROR SparkExecuteStatementOperation: Error running hive > query: > org.apache.hive.service.cli.HiveSQLException: > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'com.foo.MyUdtf': java.lang.NullPointerException; line 1 pos 7 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:266) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Any help appreciated, thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21101) Error running Hive temporary UDTF on latest Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-21101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053501#comment-16053501 ] Dayou Zhou edited comment on SPARK-21101 at 6/19/17 5:54 AM: - Hi [~q79969786], thank you kindly for your response. I was not aware of this 'other' version of initialize() method, and will try your suggestion tomorrow. was (Author: dyzhou): Hi [~q79969786], thank you kindly for your response. I was not aware of this 'other' version of initialize() method, and will try our suggestion tomorrow. > Error running Hive temporary UDTF on latest Spark 2.2 > - > > Key: SPARK-21101 > URL: https://issues.apache.org/jira/browse/SPARK-21101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Dayou Zhou > > I'm using temporary UDTFs on Spark 2.2, e.g. > CREATE TEMPORARY FUNCTION myudtf AS 'com.foo.MyUdtf' USING JAR > 'hdfs:///path/to/udf.jar'; > But when I try to invoke it, I get the following error: > {noformat} > 17/06/14 19:43:50 ERROR SparkExecuteStatementOperation: Error running hive > query: > org.apache.hive.service.cli.HiveSQLException: > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'com.foo.MyUdtf': java.lang.NullPointerException; line 1 pos 7 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:266) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Any help appreciated, thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053498#comment-16053498 ] Deenbandhu Agarwal commented on SPARK-19644: yes i can try but is there any report of such events in that particular version > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053494#comment-16053494 ] Dongjoon Hyun commented on SPARK-19644: --- Hi, [~deenbandhu]. Could you try this with the latest versions like 2.1.1 or 2.2.0-RC4? > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053488#comment-16053488 ] Deenbandhu Agarwal edited comment on SPARK-17381 at 6/19/17 5:33 AM: - [~joaomaiaduarte] I am facing a similar kind of issue. I am running spark streaming in the production environment with 6 executors and 1 GB memory and 1 core each and driver with 3 GB.Spark Version used is 2.0.1. Objects of some linked list are getting accumulated over the time in the JVM Heap of driver and after 2-3 hours the GC become very frequent and jobs starts queuing up. I tried your solution but in vain. We are not using linked list anywhere. You can find details of the issue here [https://issues.apache.org/jira/browse/SPARK-19644] was (Author: deenbandhu): [~joaomaiaduarte] I am facing a similar kind of issue. I am running spark streaming in the production environment with 6 executors and 1 GB memory and 1 core each and driver with 3 GB.Spark Version used is 2.0.1. Objects of some linked list are getting accumulated over the time in the JVM Heap of driver and after 2-3 hours the GC become very frequent and jobs starts queuing up. I tried your solution but in vain. We are not using linked list anywhere. > Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics > - > > Key: SPARK-17381 > URL: https://issues.apache.org/jira/browse/SPARK-17381 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: EMR 5.0.0 (submitted as yarn-client) > Java Version 1.8.0_101 (Oracle Corporation) > Scala Version version 2.11.8 > Problem also happens when I run locally with similar versions of java/scala. > OS: Ubuntu 16.04 >Reporter: Joao Duarte > > I am running a Spark Streaming application from a Kinesis stream. After some > hours running it gets out of memory. After a driver heap dump I found two > problems: > 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems > this was a problem before: > https://issues.apache.org/jira/browse/SPARK-11192); > To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just > needed to run the code below: > {code} > val dstream = ssc.union(kinesisStreams) > dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { > val toyDF = streamInfo.map(_ => > (1, "data","more data " > )) > .toDF("Num", "Data", "MoreData" ) > toyDF.agg(sum("Num")).first().get(0) > } > ) > {code} > 2) huge amount of Array[Byte] (9Gb+) > After some analysis, I noticed that most of the Array[Byte] where being > referenced by objects that were being referenced by SQLTaskMetrics. The > strangest thing is that those Array[Byte] were basically text that were > loaded in the executors, so they should never be in the driver at all! > Still could not replicate the 2nd problem with a simple code (the original > was complex with data coming from S3, DynamoDB and other databases). However, > when I debug the application I can see that in Executor.scala, during > reportHeartBeat(), the data that should not be sent to the driver is being > added to "accumUpdates" which, as I understand, will be sent to the driver > for reporting. > To be more precise, one of the taskRunner in the loop "for (taskRunner <- > runningTasks.values().asScala)" contains a GenericInternalRow with a lot of > data that should not go to the driver. The path would be in my case: > taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if > not the same) to the data I see when I do a driver heap dump. > I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is > fixed I would have less of this undesirable data in the driver and I could > run my streaming app for a long period of time, but I think there will always > be some performance lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053488#comment-16053488 ] Deenbandhu Agarwal commented on SPARK-17381: [~joaomaiaduarte] I am facing a similar kind of issue. I am running spark streaming in the production environment with 6 executors and 1 GB memory and 1 core each and driver with 3 GB.Spark Version used is 2.0.1. Objects of some linked list are getting accumulated over the time in the JVM Heap of driver and after 2-3 hours the GC become very frequent and jobs starts queuing up. I tried your solution but in vain. We are not using linked list anywhere. > Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics > - > > Key: SPARK-17381 > URL: https://issues.apache.org/jira/browse/SPARK-17381 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 > Environment: EMR 5.0.0 (submitted as yarn-client) > Java Version 1.8.0_101 (Oracle Corporation) > Scala Version version 2.11.8 > Problem also happens when I run locally with similar versions of java/scala. > OS: Ubuntu 16.04 >Reporter: Joao Duarte > > I am running a Spark Streaming application from a Kinesis stream. After some > hours running it gets out of memory. After a driver heap dump I found two > problems: > 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems > this was a problem before: > https://issues.apache.org/jira/browse/SPARK-11192); > To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just > needed to run the code below: > {code} > val dstream = ssc.union(kinesisStreams) > dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { > val toyDF = streamInfo.map(_ => > (1, "data","more data " > )) > .toDF("Num", "Data", "MoreData" ) > toyDF.agg(sum("Num")).first().get(0) > } > ) > {code} > 2) huge amount of Array[Byte] (9Gb+) > After some analysis, I noticed that most of the Array[Byte] where being > referenced by objects that were being referenced by SQLTaskMetrics. The > strangest thing is that those Array[Byte] were basically text that were > loaded in the executors, so they should never be in the driver at all! > Still could not replicate the 2nd problem with a simple code (the original > was complex with data coming from S3, DynamoDB and other databases). However, > when I debug the application I can see that in Executor.scala, during > reportHeartBeat(), the data that should not be sent to the driver is being > added to "accumUpdates" which, as I understand, will be sent to the driver > for reporting. > To be more precise, one of the taskRunner in the loop "for (taskRunner <- > runningTasks.values().asScala)" contains a GenericInternalRow with a lot of > data that should not go to the driver. The path would be in my case: > taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if > not the same) to the data I see when I do a driver heap dump. > I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is > fixed I would have less of this undesirable data in the driver and I could > run my streaming app for a long period of time, but I think there will always > be some performance lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19824) Standalone master JSON not showing cores for running applications
[ https://issues.apache.org/jira/browse/SPARK-19824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19824. - Resolution: Fixed Assignee: Jiang Xingbo Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Standalone master JSON not showing cores for running applications > - > > Key: SPARK-19824 > URL: https://issues.apache.org/jira/browse/SPARK-19824 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.0 >Reporter: Dan >Assignee: Jiang Xingbo >Priority: Minor > Fix For: 2.3.0 > > > The JSON API of the standalone master ("/json") does not show the number of > cores for a running application, which is available on the UI. > "activeapps" : [ { > "starttime" : 1488702337788, > "id" : "app-20170305102537-19717", > "name" : "POPAI_Aggregated", > "user" : "ibiuser", > "memoryperslave" : 16384, > "submitdate" : "Sun Mar 05 10:25:37 IST 2017", > "state" : "RUNNING", > "duration" : 1141934 > } ], -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053468#comment-16053468 ] Apache Spark commented on SPARK-20927: -- User 'ZiyueHuang' has created a pull request for this issue: https://github.com/apache/spark/pull/18349 > Add cache operator to Unsupported Operations in Structured Streaming > - > > Key: SPARK-20927 > URL: https://issues.apache.org/jira/browse/SPARK-20927 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Trivial > Labels: starter > > Just [found > out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries] > that {{cache}} is not allowed on streaming datasets. > {{cache}} on streaming datasets leads to the following exception: > {code} > scala> spark.readStream.text("files").cache > org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start();; > FileSource[files] > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92) > at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603) > at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613) > ... 48 elided > {code} > It should be included in Structured Streaming's [Unsupported > Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20927: Assignee: (was: Apache Spark) > Add cache operator to Unsupported Operations in Structured Streaming > - > > Key: SPARK-20927 > URL: https://issues.apache.org/jira/browse/SPARK-20927 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Trivial > Labels: starter > > Just [found > out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries] > that {{cache}} is not allowed on streaming datasets. > {{cache}} on streaming datasets leads to the following exception: > {code} > scala> spark.readStream.text("files").cache > org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start();; > FileSource[files] > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92) > at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603) > at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613) > ... 48 elided > {code} > It should be included in Structured Streaming's [Unsupported > Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20927: Assignee: Apache Spark > Add cache operator to Unsupported Operations in Structured Streaming > - > > Key: SPARK-20927 > URL: https://issues.apache.org/jira/browse/SPARK-20927 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Trivial > Labels: starter > > Just [found > out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries] > that {{cache}} is not allowed on streaming datasets. > {{cache}} on streaming datasets leads to the following exception: > {code} > scala> spark.readStream.text("files").cache > org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start();; > FileSource[files] > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92) > at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603) > at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613) > ... 48 elided > {code} > It should be included in Structured Streaming's [Unsupported > Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21090) Optimize the unified memory manager code
[ https://issues.apache.org/jira/browse/SPARK-21090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-21090: Fix Version/s: (was: 2.3.0) 2.2.0 > Optimize the unified memory manager code > - > > Key: SPARK-21090 > URL: https://issues.apache.org/jira/browse/SPARK-21090 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: liuxian > Fix For: 2.2.0 > > > 1.In *acquireStorageMemory*, when the MemoryMode is OFF_HEAP ,the *maxMemory* > should be modified to *maxOffHeapStorageMemory* > 2. Borrow memory from execution, *numBytes* modified to *numBytes - > storagePool.memoryFree* will be more reasonable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21090) Optimize the unified memory manager code
[ https://issues.apache.org/jira/browse/SPARK-21090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-21090: --- Assignee: liuxian > Optimize the unified memory manager code > - > > Key: SPARK-21090 > URL: https://issues.apache.org/jira/browse/SPARK-21090 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: liuxian >Assignee: liuxian > Fix For: 2.2.0 > > > 1.In *acquireStorageMemory*, when the MemoryMode is OFF_HEAP ,the *maxMemory* > should be modified to *maxOffHeapStorageMemory* > 2. Borrow memory from execution, *numBytes* modified to *numBytes - > storagePool.memoryFree* will be more reasonable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21090) Optimize the unified memory manager code
[ https://issues.apache.org/jira/browse/SPARK-21090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-21090. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18296 [https://github.com/apache/spark/pull/18296] > Optimize the unified memory manager code > - > > Key: SPARK-21090 > URL: https://issues.apache.org/jira/browse/SPARK-21090 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: liuxian > Fix For: 2.3.0 > > > 1.In *acquireStorageMemory*, when the MemoryMode is OFF_HEAP ,the *maxMemory* > should be modified to *maxOffHeapStorageMemory* > 2. Borrow memory from execution, *numBytes* modified to *numBytes - > storagePool.memoryFree* will be more reasonable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20948) Built-in SQL Function UnaryMinus/UnaryPositive support string type
[ https://issues.apache.org/jira/browse/SPARK-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20948. - Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 2.3.0 > Built-in SQL Function UnaryMinus/UnaryPositive support string type > -- > > Key: SPARK-20948 > URL: https://issues.apache.org/jira/browse/SPARK-20948 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Fix For: 2.3.0 > > > {{UnaryMinus}}/{{UnaryPositive}} function should support string type, same as > hive: > {code:sql} > $ bin/hive > Logging initialized using configuration in > jar:file:/home/wym/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties > hive> select positive('-1.11'), negative('-1.11'); > OK > -1.11 1.11 > Time taken: 1.937 seconds, Fetched: 1 row(s) > hive> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053426#comment-16053426 ] guoxiaolongzte commented on SPARK-21120: Sorry, the last two or three days I did not deal with my jira in time. > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > Attachments: 1.png > > > The current number of master metric is very small, unable to meet the needs > of spark large-scale cluster management system. So I am as much as possible > to complete the relevant metric. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053424#comment-16053424 ] Apache Spark commented on SPARK-21120: -- User 'guoxiaolongzte' has created a pull request for this issue: https://github.com/apache/spark/pull/18348 > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > Attachments: 1.png > > > The current number of master metric is very small, unable to meet the needs > of spark large-scale cluster management system. So I am as much as possible > to complete the relevant metric. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guoxiaolongzte updated SPARK-21120: --- Attachment: 1.png > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > Attachments: 1.png > > > The current number of master metric is very small, unable to meet the needs > of spark large-scale cluster management system. So I am as much as possible > to complete the relevant metric. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guoxiaolongzte updated SPARK-21120: --- Description: The current number of master metric is very small, unable to meet the needs of spark large-scale cluster management system. So I am as much as possible to complete the relevant metric. > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > Attachments: 1.png > > > The current number of master metric is very small, unable to meet the needs > of spark large-scale cluster management system. So I am as much as possible > to complete the relevant metric. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20599) ConsoleSink should work with write (batch)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053413#comment-16053413 ] Apache Spark commented on SPARK-20599: -- User 'lubozhan' has created a pull request for this issue: https://github.com/apache/spark/pull/18347 > ConsoleSink should work with write (batch) > -- > > Key: SPARK-20599 > URL: https://issues.apache.org/jira/browse/SPARK-20599 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Minor > Labels: starter > > I think the following should just work. > {code} > spark. > read. // <-- it's a batch query not streaming query if that matters > format("kafka"). > option("subscribe", "topic1"). > option("kafka.bootstrap.servers", "localhost:9092"). > load. > write. > format("console"). // <-- that's not supported currently > save > {code} > The above combination of {{kafka}} source and {{console}} sink leads to the > following exception: > {code} > java.lang.RuntimeException: > org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow > create table as select. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20599: Assignee: Apache Spark > ConsoleSink should work with write (batch) > -- > > Key: SPARK-20599 > URL: https://issues.apache.org/jira/browse/SPARK-20599 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Minor > Labels: starter > > I think the following should just work. > {code} > spark. > read. // <-- it's a batch query not streaming query if that matters > format("kafka"). > option("subscribe", "topic1"). > option("kafka.bootstrap.servers", "localhost:9092"). > load. > write. > format("console"). // <-- that's not supported currently > save > {code} > The above combination of {{kafka}} source and {{console}} sink leads to the > following exception: > {code} > java.lang.RuntimeException: > org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow > create table as select. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20599: Assignee: (was: Apache Spark) > ConsoleSink should work with write (batch) > -- > > Key: SPARK-20599 > URL: https://issues.apache.org/jira/browse/SPARK-20599 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Minor > Labels: starter > > I think the following should just work. > {code} > spark. > read. // <-- it's a batch query not streaming query if that matters > format("kafka"). > option("subscribe", "topic1"). > option("kafka.bootstrap.servers", "localhost:9092"). > load. > write. > format("console"). // <-- that's not supported currently > save > {code} > The above combination of {{kafka}} source and {{console}} sink leads to the > following exception: > {code} > java.lang.RuntimeException: > org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow > create table as select. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
[ https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21134: Assignee: (was: Apache Spark) > Codegen-only expressions should not be collapsed with upper CodegenFallback > expression > -- > > Key: SPARK-21134 > URL: https://issues.apache.org/jira/browse/SPARK-21134 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Liang-Chi Hsieh > > The rule {{CollapseProject}} in optimizer collapses lower and upper > expressions if they have common expressions. > We have seen a use case that a codegen-only expression has been collapsed > with an upper {{CodegenFallback}} expression. Because all children > expressions under {{CodegenFallback}} will be evaluated with non-codegen path > {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
[ https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053404#comment-16053404 ] Apache Spark commented on SPARK-21134: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/18346 > Codegen-only expressions should not be collapsed with upper CodegenFallback > expression > -- > > Key: SPARK-21134 > URL: https://issues.apache.org/jira/browse/SPARK-21134 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Liang-Chi Hsieh > > The rule {{CollapseProject}} in optimizer collapses lower and upper > expressions if they have common expressions. > We have seen a use case that a codegen-only expression has been collapsed > with an upper {{CodegenFallback}} expression. Because all children > expressions under {{CodegenFallback}} will be evaluated with non-codegen path > {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
[ https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21134: Assignee: Apache Spark > Codegen-only expressions should not be collapsed with upper CodegenFallback > expression > -- > > Key: SPARK-21134 > URL: https://issues.apache.org/jira/browse/SPARK-21134 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > The rule {{CollapseProject}} in optimizer collapses lower and upper > expressions if they have common expressions. > We have seen a use case that a codegen-only expression has been collapsed > with an upper {{CodegenFallback}} expression. Because all children > expressions under {{CodegenFallback}} will be evaluated with non-codegen path > {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
Liang-Chi Hsieh created SPARK-21134: --- Summary: Codegen-only expressions should not be collapsed with upper CodegenFallback expression Key: SPARK-21134 URL: https://issues.apache.org/jira/browse/SPARK-21134 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.1 Reporter: Liang-Chi Hsieh The rule {{CollapseProject}} in optimizer collapses lower and upper expressions if they have common expressions. We have seen a use case that a codegen-only expression has been collapsed with an upper {{CodegenFallback}} expression. Because all children expressions under {{CodegenFallback}} will be evaluated with non-codegen path {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-20896) spark executor get java.lang.ClassCastException when trigger two job at same time
[ https://issues.apache.org/jira/browse/SPARK-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] poseidon closed SPARK-20896. Resolution: Fixed Fix Version/s: 1.6.4 Target Version/s: 1.6.2 No a issue > spark executor get java.lang.ClassCastException when trigger two job at same > time > - > > Key: SPARK-20896 > URL: https://issues.apache.org/jira/browse/SPARK-20896 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.6.1 >Reporter: poseidon > Fix For: 1.6.4 > > > 1、zeppelin 0.6.2 in *SCOPE* mode > 2、spark 1.6.2 > 3、HDP 2.4 for HDFS YARN > trigger scala code like : > {noformat} > var tmpDataFrame = sql(" select b1,b2,b3 from xxx.x") > val vectorDf = assembler.transform(tmpDataFrame) > val vectRdd = vectorDf.select("features").map{x:Row => x.getAs[Vector](0)} > val correlMatrix: Matrix = Statistics.corr(vectRdd, "spearman") > val columns = correlMatrix.toArray.grouped(correlMatrix.numRows) > val rows = columns.toSeq.transpose > val vectors = rows.map(row => new DenseVector(row.toArray)) > val vRdd = sc.parallelize(vectors) > import sqlContext.implicits._ > val dfV = vRdd.map(_.toArray).map{ case Array(b1,b2,b3) => (b1,b2,b3) }.toDF() > val rows = dfV.rdd.zipWithIndex.map(_.swap) > > .join(sc.parallelize(Array("b1","b2","b3")).zipWithIndex.map(_.swap)) > .values.map{case (row: Row, x: String) => Row.fromSeq(row.toSeq > :+ x)} > {noformat} > --- > and code : > {noformat} > var df = sql("select b1,b2 from .x") > var i = 0 > var threshold = Array(2.0,3.0) > var inputCols = Array("b1","b2") > var tmpDataFrame = df > for (col <- inputCols){ > val binarizer: Binarizer = new Binarizer().setInputCol(col) > .setOutputCol(inputCols(i)+"_binary") > .setThreshold(threshold(i)) > tmpDataFrame = binarizer.transform(tmpDataFrame).drop(inputCols(i)) > i = i+1 > } > var saveDFBin = tmpDataFrame > val dfAppendBin = sql("select b3 from poseidon.corelatdemo") > val rows = saveDFBin.rdd.zipWithIndex.map(_.swap) > .join(dfAppendBin.rdd.zipWithIndex.map(_.swap)) > .values.map{case (row1: Row, row2: Row) => Row.fromSeq(row1.toSeq > ++ row2.toSeq)} > import org.apache.spark.sql.types.StructType > val rowSchema = StructType(saveDFBin.schema.fields ++ > dfAppendBin.schema.fields) > saveDFBin = sqlContext.createDataFrame(rows, rowSchema) > //save result to table > import org.apache.spark.sql.SaveMode > saveDFBin.write.mode(SaveMode.Overwrite).saveAsTable(".") > sql("alter table . set lifecycle 1") > {noformat} > on zeppelin with two different notebook at same time. > Found this exception log in executor : > {quote} > l1.dtdream.com): java.lang.ClassCastException: > org.apache.spark.mllib.linalg.DenseVector cannot be cast to scala.Tuple2 > at > $line127359816836.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:34) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1597) > at > org.apache.spark.rdd.ZippedWithIndexRDD$$anonfun$2.apply(ZippedWithIndexRDD.scala:52) > at > org.apache.spark.rdd.ZippedWithIndexRDD$$anonfun$2.apply(ZippedWithIndexRDD.scala:52) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1875) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1875) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {quote} > OR > {quote} > java.lang.ClassCastException: scala.Tuple2 cannot be cast to > org.apache.spark.mllib.linalg.DenseVector > at > $line34684895436.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:57) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler
[jira] [Commented] (SPARK-20896) spark executor get java.lang.ClassCastException when trigger two job at same time
[ https://issues.apache.org/jira/browse/SPARK-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053386#comment-16053386 ] poseidon commented on SPARK-20896: -- It is related to Zeppelin, so this issue can be closed . > spark executor get java.lang.ClassCastException when trigger two job at same > time > - > > Key: SPARK-20896 > URL: https://issues.apache.org/jira/browse/SPARK-20896 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.6.1 >Reporter: poseidon > > 1、zeppelin 0.6.2 in *SCOPE* mode > 2、spark 1.6.2 > 3、HDP 2.4 for HDFS YARN > trigger scala code like : > {noformat} > var tmpDataFrame = sql(" select b1,b2,b3 from xxx.x") > val vectorDf = assembler.transform(tmpDataFrame) > val vectRdd = vectorDf.select("features").map{x:Row => x.getAs[Vector](0)} > val correlMatrix: Matrix = Statistics.corr(vectRdd, "spearman") > val columns = correlMatrix.toArray.grouped(correlMatrix.numRows) > val rows = columns.toSeq.transpose > val vectors = rows.map(row => new DenseVector(row.toArray)) > val vRdd = sc.parallelize(vectors) > import sqlContext.implicits._ > val dfV = vRdd.map(_.toArray).map{ case Array(b1,b2,b3) => (b1,b2,b3) }.toDF() > val rows = dfV.rdd.zipWithIndex.map(_.swap) > > .join(sc.parallelize(Array("b1","b2","b3")).zipWithIndex.map(_.swap)) > .values.map{case (row: Row, x: String) => Row.fromSeq(row.toSeq > :+ x)} > {noformat} > --- > and code : > {noformat} > var df = sql("select b1,b2 from .x") > var i = 0 > var threshold = Array(2.0,3.0) > var inputCols = Array("b1","b2") > var tmpDataFrame = df > for (col <- inputCols){ > val binarizer: Binarizer = new Binarizer().setInputCol(col) > .setOutputCol(inputCols(i)+"_binary") > .setThreshold(threshold(i)) > tmpDataFrame = binarizer.transform(tmpDataFrame).drop(inputCols(i)) > i = i+1 > } > var saveDFBin = tmpDataFrame > val dfAppendBin = sql("select b3 from poseidon.corelatdemo") > val rows = saveDFBin.rdd.zipWithIndex.map(_.swap) > .join(dfAppendBin.rdd.zipWithIndex.map(_.swap)) > .values.map{case (row1: Row, row2: Row) => Row.fromSeq(row1.toSeq > ++ row2.toSeq)} > import org.apache.spark.sql.types.StructType > val rowSchema = StructType(saveDFBin.schema.fields ++ > dfAppendBin.schema.fields) > saveDFBin = sqlContext.createDataFrame(rows, rowSchema) > //save result to table > import org.apache.spark.sql.SaveMode > saveDFBin.write.mode(SaveMode.Overwrite).saveAsTable(".") > sql("alter table . set lifecycle 1") > {noformat} > on zeppelin with two different notebook at same time. > Found this exception log in executor : > {quote} > l1.dtdream.com): java.lang.ClassCastException: > org.apache.spark.mllib.linalg.DenseVector cannot be cast to scala.Tuple2 > at > $line127359816836.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:34) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1597) > at > org.apache.spark.rdd.ZippedWithIndexRDD$$anonfun$2.apply(ZippedWithIndexRDD.scala:52) > at > org.apache.spark.rdd.ZippedWithIndexRDD$$anonfun$2.apply(ZippedWithIndexRDD.scala:52) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1875) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1875) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {quote} > OR > {quote} > java.lang.ClassCastException: scala.Tuple2 cannot be cast to > org.apache.spark.mllib.linalg.DenseVector > at > $line34684895436.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:57) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleM
[jira] [Resolved] (SPARK-20892) Add SQL trunc function to SparkR
[ https://issues.apache.org/jira/browse/SPARK-20892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20892. -- Resolution: Fixed Assignee: Wayne Zhang Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Add SQL trunc function to SparkR > > > Key: SPARK-20892 > URL: https://issues.apache.org/jira/browse/SPARK-20892 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 2.1.1 >Reporter: Wayne Zhang >Assignee: Wayne Zhang > Fix For: 2.3.0 > > > Add SQL trunc function to SparkR -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20520) R streaming tests failed on Windows
[ https://issues.apache.org/jira/browse/SPARK-20520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20520. -- Resolution: Cannot Reproduce Fix Version/s: 2.2.0 tested no longer repo on windows with rc4 > R streaming tests failed on Windows > --- > > Key: SPARK-20520 > URL: https://issues.apache.org/jira/browse/SPARK-20520 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.1 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.2.0 > > > Running R CMD check on SparkR 2.2 RC1 packages > {code} > Failed > - > 1. Failure: read.stream, write.stream, awaitTermination, stopQuery > (@test_streaming.R#56) > head(sql("SELECT count(*) FROM people"))[[1]] not equal to 3. > 1/1 mismatches > [1] 0 - 3 == -3 > 2. Failure: read.stream, write.stream, awaitTermination, stopQuery > (@test_streaming.R#60) > head(sql("SELECT count(*) FROM people"))[[1]] not equal to 6. > 1/1 mismatches > [1] 3 - 6 == -3 > 3. Failure: print from explain, lastProgress, status, isActive > (@test_streaming.R#75) > any(grepl("\"description\" : \"MemorySink\"", > capture.output(lastProgress(q isn't true. > 4. Failure: Stream other format (@test_streaming.R#95) > - > head(sql("SELECT count(*) FROM people3"))[[1]] not equal to 3. > 1/1 mismatches > [1] 0 - 3 == -3 > 5. Failure: Stream other format (@test_streaming.R#98) > - > any(...) isn't true. > {code} > Need to investigate -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
[ https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-21128. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Running R tests multiple times failed due to pre-exiting "spark-warehouse" / > "metastore_db" > --- > > Key: SPARK-21128 > URL: https://issues.apache.org/jira/browse/SPARK-21128 > Project: Spark > Issue Type: Test > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.3.0 > > > Currently, running R tests multiple times fails due to pre-exiting > "spark-warehouse" / "metastore_db" as below: > {code} > SparkSQL functions: Spark package found in SPARK_HOME: .../spark > ...1234... > {code} > {code} > Failed > - > 1. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 2. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > 3. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 4. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > DONE > === > {code} > It looks we should remove both "spark-warehouse" and "metastore_db" _before_ > listing files into {{sparkRFilesBefore}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15799) Release SparkR on CRAN
[ https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-15799: -- Target Version/s: (was: 2.2.0) > Release SparkR on CRAN > -- > > Key: SPARK-15799 > URL: https://issues.apache.org/jira/browse/SPARK-15799 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Xiangrui Meng > > Story: "As an R user, I would like to see SparkR released on CRAN, so I can > use SparkR easily in an existing R environment and have other packages built > on top of SparkR." > I made this JIRA with the following questions in mind: > * Are there known issues that prevent us releasing SparkR on CRAN? > * Do we want to package Spark jars in the SparkR release? > * Are there license issues? > * How does it fit into Spark's release process? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20520) R streaming tests failed on Windows
[ https://issues.apache.org/jira/browse/SPARK-20520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20520: -- Affects Version/s: 2.1.1 Target Version/s: (was: 2.2.0) > R streaming tests failed on Windows > --- > > Key: SPARK-20520 > URL: https://issues.apache.org/jira/browse/SPARK-20520 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.1 >Reporter: Felix Cheung >Assignee: Felix Cheung > > Running R CMD check on SparkR 2.2 RC1 packages > {code} > Failed > - > 1. Failure: read.stream, write.stream, awaitTermination, stopQuery > (@test_streaming.R#56) > head(sql("SELECT count(*) FROM people"))[[1]] not equal to 3. > 1/1 mismatches > [1] 0 - 3 == -3 > 2. Failure: read.stream, write.stream, awaitTermination, stopQuery > (@test_streaming.R#60) > head(sql("SELECT count(*) FROM people"))[[1]] not equal to 6. > 1/1 mismatches > [1] 3 - 6 == -3 > 3. Failure: print from explain, lastProgress, status, isActive > (@test_streaming.R#75) > any(grepl("\"description\" : \"MemorySink\"", > capture.output(lastProgress(q isn't true. > 4. Failure: Stream other format (@test_streaming.R#95) > - > head(sql("SELECT count(*) FROM people3"))[[1]] not equal to 3. > 1/1 mismatches > [1] 0 - 3 == -3 > 5. Failure: Stream other format (@test_streaming.R#98) > - > any(...) isn't true. > {code} > Need to investigate -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18267) Distribute PySpark via Python Package Index (pypi)
[ https://issues.apache.org/jira/browse/SPARK-18267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053253#comment-16053253 ] Sean Owen commented on SPARK-18267: --- [~holdenk] is all of this basically done? for 2.2? > Distribute PySpark via Python Package Index (pypi) > -- > > Key: SPARK-18267 > URL: https://issues.apache.org/jira/browse/SPARK-18267 > Project: Spark > Issue Type: New Feature > Components: Build, Project Infra, PySpark >Reporter: Reynold Xin > > The goal is to distribute PySpark via pypi, so users can simply run Spark on > a single node via "pip install pyspark" (or "pip install apache-spark"). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21125) PySpark context missing function to set Job Description.
[ https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21125: -- Target Version/s: (was: 2.2.0) Fix Version/s: (was: 2.2.0) Issue Type: Improvement (was: Bug) > PySpark context missing function to set Job Description. > > > Key: SPARK-21125 > URL: https://issues.apache.org/jira/browse/SPARK-21125 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Shane Jarvie >Priority: Trivial > Labels: beginner > Original Estimate: 1h > Remaining Estimate: 1h > > The PySpark API is missing a convienient function currently found in the > Scala API, which sets the Job Description for display in the Spark UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21101) Error running Hive temporary UDTF on latest Spark 2.2
[ https://issues.apache.org/jira/browse/SPARK-21101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053145#comment-16053145 ] Yuming Wang commented on SPARK-21101: - [~dyzhou], Can you try to override https://github.com/apache/hive/blob/release-2.0.0/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.java#L70 It works for me: {code:sql} add jar hdfs://nameservice1/tmp/wym/hive-exec-1.1.0-cdh5.4.3.jar; CREATE TEMPORARY FUNCTION spark_21101 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFStack'; select spark_21101(2,'A',10,date '2015-01-01','B',20,date '2016-01-01'); {code} Ref: https://github.com/apache/hive/blob/release-2.0.0/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFStack.java > Error running Hive temporary UDTF on latest Spark 2.2 > - > > Key: SPARK-21101 > URL: https://issues.apache.org/jira/browse/SPARK-21101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: Dayou Zhou > > I'm using temporary UDTFs on Spark 2.2, e.g. > CREATE TEMPORARY FUNCTION myudtf AS 'com.foo.MyUdtf' USING JAR > 'hdfs:///path/to/udf.jar'; > But when I try to invoke it, I get the following error: > {noformat} > 17/06/14 19:43:50 ERROR SparkExecuteStatementOperation: Error running hive > query: > org.apache.hive.service.cli.HiveSQLException: > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'com.foo.MyUdtf': java.lang.NullPointerException; line 1 pos 7 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:266) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Any help appreciated, thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21123: Assignee: (was: Apache Spark) > Options for file stream source are in a wrong table > --- > > Key: SPARK-21123 > URL: https://issues.apache.org/jira/browse/SPARK-21123 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Shixiong Zhu >Priority: Minor > Labels: starter > Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png > > > Right now options for file stream source are documented with file sink. We > should create a table for source options and fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21123: Assignee: Apache Spark > Options for file stream source are in a wrong table > --- > > Key: SPARK-21123 > URL: https://issues.apache.org/jira/browse/SPARK-21123 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Minor > Labels: starter > Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png > > > Right now options for file stream source are documented with file sink. We > should create a table for source options and fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21123) Options for file stream source are in a wrong table
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053125#comment-16053125 ] Apache Spark commented on SPARK-21123: -- User 'assafmendelson' has created a pull request for this issue: https://github.com/apache/spark/pull/18342 > Options for file stream source are in a wrong table > --- > > Key: SPARK-21123 > URL: https://issues.apache.org/jira/browse/SPARK-21123 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Shixiong Zhu >Priority: Minor > Labels: starter > Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png > > > Right now options for file stream source are documented with file sink. We > should create a table for source options and fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-21133: Comment: was deleted (was: I'll create a PR later.) > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused
[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21133: Assignee: (was: Apache Spark) > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.l
[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053093#comment-16053093 ] Apache Spark commented on SPARK-21133: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/18343 > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecu
[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21133: Assignee: Apache Spark > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java
[jira] [Assigned] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark
[ https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21126: - Assignee: liuzhaokun > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark > - > > Key: SPARK-21126 > URL: https://issues.apache.org/jira/browse/SPARK-21126 > Project: Spark > Issue Type: Bug > Components: Documentation, Spark Core >Affects Versions: 2.1.1 >Reporter: liuzhaokun >Assignee: liuzhaokun >Priority: Trivial > Fix For: 2.2.1 > > > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark,so I think it should be removed from > configuration.md. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark
[ https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21126. --- Resolution: Fixed Fix Version/s: 2.2.1 Issue resolved by pull request 18333 [https://github.com/apache/spark/pull/18333] > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark > - > > Key: SPARK-21126 > URL: https://issues.apache.org/jira/browse/SPARK-21126 > Project: Spark > Issue Type: Bug > Components: Documentation, Spark Core >Affects Versions: 2.1.1 >Reporter: liuzhaokun >Priority: Trivial > Fix For: 2.2.1 > > > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark,so I think it should be removed from > configuration.md. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
Yuming Wang created SPARK-21133: --- Summary: HighlyCompressedMapStatus#writeExternal throws NPE Key: SPARK-21133 URL: https://issues.apache.org/jira/browse/SPARK-21133 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Yuming Wang Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for simple: {code:sql} spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " set spark.sql.shuffle.partitions=2001; drop table if exists spark_hcms_npe; create table spark_hcms_npe as select id, count(*) from big_table group by id; " {code} Error logs: {noformat} 17/06/18 15:00:27 ERROR Utils: Exception encountered java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) at org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) at org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053091#comment-16053091 ] Yuming Wang commented on SPARK-21133: - I'll create a PR later. > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(T