[jira] [Commented] (SPARK-15032) When we create a new JDBC session, we may need to create a new session of executionHive
[ https://issues.apache.org/jira/browse/SPARK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275874#comment-15275874 ] Sagar commented on SPARK-15032: --- At the time of JDBC creation, we use thrift server executioHive, so what you are proposing is to create a new session of executionHive right? > When we create a new JDBC session, we may need to create a new session of > executionHive > --- > > Key: SPARK-15032 > URL: https://issues.apache.org/jira/browse/SPARK-15032 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Critical > > Right now, we only use executionHive in thriftserver. When we create a new > jdbc session, we probably need to create a new session of executionHive. I am > not sure what will break if we leave the code as is. But, I feel it will be > safer to create a new session of executionHive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15125) CSV data source recognizes empty quoted strings in the input as null.
[ https://issues.apache.org/jira/browse/SPARK-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272017#comment-15272017 ] Sagar commented on SPARK-15125: --- Shouldn't we always infer these as empty strings, and then users can do a simple project to turn them into nulls? I think we take these as empty strings and user can initiate NULL to all those empty string. > CSV data source recognizes empty quoted strings in the input as null. > -- > > Key: SPARK-15125 > URL: https://issues.apache.org/jira/browse/SPARK-15125 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Suresh Thalamati > > CSV data source does not differentiate between empty quoted strings and empty > fields as null. In some scenarios user would want to differentiate between > these values, especially in the context of SQL where NULL , and empty string > have different meanings If input data happens to be dump from traditional > relational data source, users will see different results for the SQL queries. > {code} > Repro: > Test Data: (test.csv) > year,make,model,comment,price > 2017,Tesla,Mode 3,looks nice.,35000.99 > 2016,Chevy,Bolt,"",29000.00 > 2015,Porsche,"",, > scala> val df= sqlContext.read.format("csv").option("header", > "true").option("inferSchema", "true").option("nullValue", > null).load("/tmp/test.csv") > df: org.apache.spark.sql.DataFrame = [year: int, make: string ... 3 more > fields] > scala> df.show > ++---+--+---++ > |year| make| model|comment| price| > ++---+--+---++ > |2017| Tesla|Mode 3|looks nice.|35000.99| > |2016| Chevy| Bolt| null| 29000.0| > |2015|Porsche| null| null|null| > ++---+--+---++ > Expected: > ++---+--+---++ > |year| make| model|comment| price| > ++---+--+---++ > |2017| Tesla|Mode 3|looks nice.|35000.99| > |2016| Chevy| Bolt| | 29000.0| > |2015|Porsche| | null|null| > ++---+--+---++ > {code} > Testing a fix for the this issue. I will give a shot at submitting a PR for > this soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272013#comment-15272013 ] Sagar commented on SPARK-15142: --- Spark Mesos dispatcher queues all the applications when Mesos master restarted running application lost their reference or I think after Mesos master restarts it includes previously queued up applications and start running them. > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15063) filtering and joining back doesn't work
[ https://issues.apache.org/jira/browse/SPARK-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270450#comment-15270450 ] Sagar commented on SPARK-15063: --- Yes but where we are using seq, there we can use t1 to get required results for each filter as mentioned above. In order to refer its column we need to involve t1. val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount") val t2s = t2a.filter(t2a("type") <=> "savings") t1. join(t2c, t1("uid") <=> t2c("uid"), "left"). join(t2s, t1("uid") <=> t2s("uid"), "left"). take(10) > filtering and joining back doesn't work > --- > > Key: SPARK-15063 > URL: https://issues.apache.org/jira/browse/SPARK-15063 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Neville Kadwa > > I'm trying to filter and join to do a simple pivot but getting very odd > results. > {quote} {noformat} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna")) > val accounts = Array( > (1, "checking", 100.0), > (1, "savings", 300.0), > (2, "savings", 1000.0), > (3, "carloan", 12000.0), > (3, "checking", 400.0) > ) > val t1 = sc.makeRDD(people).toDF("uid", "name") > val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2c = t2.filter(t2("type") <=> "checking") > val t2s = t2.filter(t2("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are wrong: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [1,sam,1,checking,100.0,2,savings,1000.0], > [2,joe,null,null,null,null,null,null], > [3,sally,3,checking,400.0,1,savings,300.0], > [3,sally,3,checking,400.0,2,savings,1000.0], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} > The way I can force it to work properly is to create a new df for each filter: > {quote} {noformat} > val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2s = t2a.filter(t2a("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are right: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [2,joe,null,null,null,2,savings,1000.0], > [3,sally,3,checking,400.0,null,null,null], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270155#comment-15270155 ] Sagar commented on SPARK-15072: --- [~techaddict] Yes it fails as assembly/assembly removed, test is ignored right now, means they are not considering it or what? > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport
[ https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270032#comment-15270032 ] Sagar commented on SPARK-15072: --- This helps to build test.jar $ ./build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive package assembly/assembly streaming-kafka-assembly/assembly streaming-flume-assembly/assembly streaming-mqtt-assembly/assembly streaming-mqtt/test:assembly streaming-kinesis-asl-assembly/assembly $ cd sql/hive/src/test/resources/regression-test-SPARK-8489/ $ scalac -classpath ~/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.3.0.jar Main.scala MyCoolClass.scala $ rm test.jar $ jar cvf test.jar *.class $ cd ~/spark $ ~/bin/spark-submit' '--conf' 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--driver-java-options' '-Dderby.system.durability=test' '--class' 'Main' 'sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar' Let me know if you are still working on it. > Remove SparkSession.withHiveSupport > --- > > Key: SPARK-15072 > URL: https://issues.apache.org/jira/browse/SPARK-15072 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Sandeep Singh > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15032) When we create a new JDBC session, we may need to create a new session of executionHive
[ https://issues.apache.org/jira/browse/SPARK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270018#comment-15270018 ] Sagar commented on SPARK-15032: --- You are right! It is safer to create new session of executionHive while creating JDBC session but I think the problem is that it terminates the executionHive process, let me know if you figured out other way, I can work on it. > When we create a new JDBC session, we may need to create a new session of > executionHive > --- > > Key: SPARK-15032 > URL: https://issues.apache.org/jira/browse/SPARK-15032 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Critical > > Right now, we only use executionHive in thriftserver. When we create a new > jdbc session, we probably need to create a new session of executionHive. I am > not sure what will break if we leave the code as is. But, I feel it will be > safer to create a new session of executionHive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15063) filtering and joining back doesn't work
[ https://issues.apache.org/jira/browse/SPARK-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270012#comment-15270012 ] Sagar commented on SPARK-15063: --- What else is required to do it in new df for each filter can you elaborate? > filtering and joining back doesn't work > --- > > Key: SPARK-15063 > URL: https://issues.apache.org/jira/browse/SPARK-15063 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.1 >Reporter: Neville Kadwa > > I'm trying to filter and join to do a simple pivot but getting very odd > results. > {quote} {noformat} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna")) > val accounts = Array( > (1, "checking", 100.0), > (1, "savings", 300.0), > (2, "savings", 1000.0), > (3, "carloan", 12000.0), > (3, "checking", 400.0) > ) > val t1 = sc.makeRDD(people).toDF("uid", "name") > val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2c = t2.filter(t2("type") <=> "checking") > val t2s = t2.filter(t2("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are wrong: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [1,sam,1,checking,100.0,2,savings,1000.0], > [2,joe,null,null,null,null,null,null], > [3,sally,3,checking,400.0,1,savings,300.0], > [3,sally,3,checking,400.0,2,savings,1000.0], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} > The way I can force it to work properly is to create a new df for each filter: > {quote} {noformat} > val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount") > val t2s = t2a.filter(t2a("type") <=> "savings") > t1. > join(t2c, t1("uid") <=> t2c("uid"), "left"). > join(t2s, t1("uid") <=> t2s("uid"), "left"). > take(10) > {noformat} {quote} > The results are right: > {quote} {noformat} > Array( > [1,sam,1,checking,100.0,1,savings,300.0], > [2,joe,null,null,null,2,savings,1000.0], > [3,sally,3,checking,400.0,null,null,null], > [4,joanna,null,null,null,null,null,null] > ) > {noformat} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized
[ https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270007#comment-15270007 ] Sagar commented on SPARK-15086: --- In order to update Java API once Scala terminates. Please provide more information in order to make it work what else it includes. > Update Java API once the Scala one is finalized > --- > > Key: SPARK-15086 > URL: https://issues.apache.org/jira/browse/SPARK-15086 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin > Fix For: 2.0.0 > > > We should make sure we update the Java API once the Scala one is finalized. > This includes adding the equivalent API in Java as well as deprecating the > old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7523) ERROR LiveListenerBus: Listener EventLoggingListener threw an exception
[ https://issues.apache.org/jira/browse/SPARK-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sagar updated SPARK-7523: - Attachment: schema.txt spark-0.0.1-SNAPSHOT.jar Spark jar and schema.txt attached This files i am using while executing the commands. ERROR LiveListenerBus: Listener EventLoggingListener threw an exception --- Key: SPARK-7523 URL: https://issues.apache.org/jira/browse/SPARK-7523 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.3.0 Environment: Prod Reporter: sagar Priority: Blocker Attachments: schema.txt, spark-0.0.1-SNAPSHOT.jar Hi Team, I am using CDH 5.4 with spark 1.3.0. I am getting below error while executing below command - I see jira's (SPARK-2906/SPARK-1407) specifying the issue is resolved, but i didnt get any solution what the fix for that. Can you pls guide/suggest as this is production issue. $ spark-submit --master local[4] --class org.sample.spark.SparkFilter --name Spark Sample Program spark-0.0.1-SNAPSHOT.jar /user/user1/schema.txt == 15/05/11 06:28:36 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:169) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:36) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:792) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1998) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) ... 19 more == -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7523) ERROR LiveListenerBus: Listener EventLoggingListener threw an exception
sagar created SPARK-7523: Summary: ERROR LiveListenerBus: Listener EventLoggingListener threw an exception Key: SPARK-7523 URL: https://issues.apache.org/jira/browse/SPARK-7523 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.3.0 Environment: Prod Reporter: sagar Priority: Blocker Hi Team, I am using CDH 5.4 with spark 1.3.0. I am getting below error while executing below command - I see jira's (SPARK-2906/SPARK-1407) specifying the issue is resolved, but i didnt get any solution what the fix for that. Can you pls guide/suggest as this is production issue. $ spark-submit --master local[4] --class org.sample.spark.SparkFilter --name Spark Sample Program spark-0.0.1-SNAPSHOT.jar /user/user1/schema.txt == 15/05/11 06:28:36 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:169) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:36) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:792) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1998) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) ... 19 more == -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1132) Persisting Web UI through refactoring the SparkListener interface
[ https://issues.apache.org/jira/browse/SPARK-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537292#comment-14537292 ] sagar commented on SPARK-1132: -- Hi Team, I see the issue is resolved and Fix Version/s: is -1.0.0. Is 1.0.0 is spark version ? Where i can get the spark version 1.0.0. Currently i am getting below error - 15/05/10 08:42:20 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/05/10 08:42:20 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/05/10 08:42:20 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/05/10 08:42:20 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/05/10 08:42:20 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/05/10 08:42:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1750 bytes result sent to driver 15/05/10 08:42:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 207 ms on localhost (1/1) 15/05/10 08:42:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/05/10 08:42:20 INFO DAGScheduler: Stage 0 (count at SparkFilter.java:22) finished in 0.225 s 15/05/10 08:42:20 INFO DAGScheduler: Job 0 finished: count at SparkFilter.java:22, took 0.314437 s 0 15/05/10 08:42:20 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144) at org.apache.spark.scheduler.EventLoggingListener.onStageCompleted(EventLoggingListener.scala:165) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:32) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:792) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1998) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) ... 19 more Persisting Web UI through refactoring the SparkListener interface - Key: SPARK-1132 URL: https://issues.apache.org/jira/browse/SPARK-1132 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Affects Versions: 0.9.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker Fix For: 1.0.0 This issue is a spin-off from another issue - https://spark-project.atlassian.net/browse/SPARK-969 The main issue with the existing Spark Web UI is that its information is lost as soon as the application terminates. This is the direct result of the SparkUI being coupled with SparkContext, which is stopped when the application is finished. The attached document proposes to tackle this by logging SparkListenerEvents to persist information displayed on the Web UI. We take this opportunity to replace the existing format for storing this information, HTML, with one that is more flexible, JSON. This allows further post-hoc analysis of a particular