[jira] [Comment Edited] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030025#comment-16030025 ] Mathieu D edited comment on SPARK-18881 at 5/30/17 7:52 PM: Just to mention a workaround for those experiencing the problem : try increase {{spark.scheduler.listenerbus.eventqueue.size}} (default 1). It may only postpone the problem, if the queue filling is faster than listeners for a long time. In our case, we have bursts of activity and raising this limit helps. was (Author: mathieude): Just to mention a workaround for those experiencing the problem : try increase {{spark.scheduler.listenerbus.eventqueue.size}} (default 1). It may only postpone the problem, if the queue filling is faster than listeners for a long time. In our case, we have bursts of activity and raising this limits helps. > Spark never finishes jobs and stages, JobProgressListener fails > --- > > Key: SPARK-18881 > URL: https://issues.apache.org/jira/browse/SPARK-18881 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 > Environment: yarn, deploy-mode = client >Reporter: Mathieu D > > We have a Spark application that process continuously a lot of incoming jobs. > Several jobs are processed in parallel, on multiple threads. > During intensive workloads, at some point, we start to have hundreds of > warnings like this : > {code} > 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379 > 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job > 64610 > 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage > 147405 > 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406 > 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job > 64622 > {code} > Starting from that, the performance of the app plummet, most of Stages and > Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs. > I can't see clearly another related exception happening before. Maybe this > one, but it concerns another listener : > {code} > 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because > no remaining room in event queue. This likely means one of the SparkListeners > is too slow and cannot keep up with the rate at which tasks are being started > by the scheduler. > 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since > Thu Jan 01 01:00:00 CET 1970 > {code} > This is very problematic for us, since it's hard to detect, and requires an > app restart. > *EDIT :* > I confirm the sequence : > 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining > room in event queue > then > 2- JobProgressListener losing track of job and stages. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030025#comment-16030025 ] Mathieu D commented on SPARK-18881: --- Just to mention a workaround for those experiencing the problem : try increase {{spark.scheduler.listenerbus.eventqueue.size}} (default 1). It may only postpone the problem, if the queue filling is faster than listeners for a long time. In our case, we have bursts of activity and raising this limits helps. > Spark never finishes jobs and stages, JobProgressListener fails > --- > > Key: SPARK-18881 > URL: https://issues.apache.org/jira/browse/SPARK-18881 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 > Environment: yarn, deploy-mode = client >Reporter: Mathieu D > > We have a Spark application that process continuously a lot of incoming jobs. > Several jobs are processed in parallel, on multiple threads. > During intensive workloads, at some point, we start to have hundreds of > warnings like this : > {code} > 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379 > 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job > 64610 > 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage > 147405 > 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406 > 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job > 64622 > {code} > Starting from that, the performance of the app plummet, most of Stages and > Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs. > I can't see clearly another related exception happening before. Maybe this > one, but it concerns another listener : > {code} > 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because > no remaining room in event queue. This likely means one of the SparkListeners > is too slow and cannot keep up with the rate at which tasks are being started > by the scheduler. > 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since > Thu Jan 01 01:00:00 CET 1970 > {code} > This is very problematic for us, since it's hard to detect, and requires an > app restart. > *EDIT :* > I confirm the sequence : > 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining > room in event queue > then > 2- JobProgressListener losing track of job and stages. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017528#comment-16017528 ] Mathieu D commented on SPARK-18838: --- I'm not very familiar with this part of Spark, but I'd like to share a thought. In my experience (SPARK-18881) when events start to be dropped because of full event queues, it's much more serious than just a failed job. The Spark driver became useless, I had to relaunch. So, besides the improvement of existing bus, listeners and threads, wouldn't be a kind of back-pressure mechanism (on tasks emission) better than dropping events ? I mean, this would obviously degrade the job performance, but it's still better than compromising the whole job or even the driver health. my2cent > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia > Attachments: SparkListernerComputeTime.xlsx > > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might > hurt the job performance significantly or even fail the job. For example, a > significant delay in receiving the `SparkListenerTaskStart` might cause > `ExecutorAllocationManager` manager to mistakenly remove an executor which is > not idle. > The problem is that the event processor in `ListenerBus` is a single thread > which loops through all the Listeners for each event and processes each event > synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > This single threaded processor often becomes the bottleneck for large jobs. > Also, if one of the Listener is very slow, all the listeners will pay the > price of delay incurred by the slow listener. In addition to that a slow > listener can cause events to be dropped from the event queue which might be > fatal to the job. > To solve the above problems, we propose to get rid of the event queue and the > single threaded event processor. Instead each listener will have its own > dedicate single threaded executor service . When ever an event is posted, it > will be submitted to executor service of all the listeners. The Single > threaded executor service will guarantee in order processing of the events > per listener. The queue used for the executor service will be bounded to > guarantee we do not grow the memory indefinitely. The downside of this > approach is separate event queue per listener will increase the driver memory > footprint. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D closed SPARK-20784. - Resolution: Not A Bug Oh boy, it was an OOM on the driver. Most of the times, it was silent. I just discovered an OOM exception in the middle of the logs in a task-result-getter. I guess the BroadcastExchange was just waiting for it > Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() > in YARN client mode > - > > Key: SPARK-20784 > URL: https://issues.apache.org/jira/browse/SPARK-20784 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.1 >Reporter: Mathieu D > > Spark hangs and stop executing any job or task (v2.0.2). > Web UI shows *0 active stages* and *0 active task* on executors, although a > driver thread is clearly working/finishing a stage (see below). > Our application runs several spark contexts for several users in parallel in > threads. spark version 2.0.2, yarn-client > Extract of thread stack below. > {noformat} > "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 > nid=0x484 waiting on condition [0x7fddd0bf > 6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00078c232760> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at > org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at >
[jira] [Updated] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-20784: -- Affects Version/s: 2.1.1 Description: Spark hangs and stop executing any job or task (v2.0.2). Web UI shows *0 active stages* and *0 active task* on executors, although a driver thread is clearly working/finishing a stage (see below). Our application runs several spark contexts for several users in parallel in threads. spark version 2.0.2, yarn-client Extract of thread stack below. {noformat} "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 nid=0x484 waiting on condition [0x7fddd0bf 6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00078c232760> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at
[jira] [Commented] (SPARK-20784) Spark hangs forever after a joinWith() and cache()
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013911#comment-16013911 ] Mathieu D commented on SPARK-20784: --- Changed the title. It's noted for the self reproducer, although it's a bit hard to do (and probably related to a yarn context), so it will take some time to prepare. > Spark hangs forever after a joinWith() and cache() > -- > > Key: SPARK-20784 > URL: https://issues.apache.org/jira/browse/SPARK-20784 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Mathieu D > > Spark hangs and stop executing any job or task. > Web UI shows *0 active stages* and *0 active task* on executors, although a > driver thread is clearly working/finishing a stage (see below). > Our application runs several spark contexts for several users in parallel in > threads. spark version 2.0.2, yarn-client > Extract of thread stack below. > {noformat} > "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 > nid=0x484 waiting on condition [0x7fddd0bf > 6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00078c232760> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at > org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) > at >
[jira] [Updated] (SPARK-20784) Spark hangs forever after a joinWith() and cache()
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-20784: -- Summary: Spark hangs forever after a joinWith() and cache() (was: Spark hangs forever) > Spark hangs forever after a joinWith() and cache() > -- > > Key: SPARK-20784 > URL: https://issues.apache.org/jira/browse/SPARK-20784 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Mathieu D > > Spark hangs and stop executing any job or task. > Web UI shows *0 active stages* and *0 active task* on executors, although a > driver thread is clearly working/finishing a stage (see below). > Our application runs several spark contexts for several users in parallel in > threads. spark version 2.0.2, yarn-client > Extract of thread stack below. > {noformat} > "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 > nid=0x484 waiting on condition [0x7fddd0bf > 6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00078c232760> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at > org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) > at >
[jira] [Updated] (SPARK-20784) Spark hangs forever
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-20784: -- Description: Spark hangs and stop executing any job or task. Web UI shows *0 active stages* and *0 active task* on executors, although a driver thread is clearly working/finishing a stage (see below). Our application runs several spark contexts for several users in parallel in threads. spark version 2.0.2, yarn-client Extract of thread stack below. {noformat} "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 nid=0x484 waiting on condition [0x7fddd0bf 6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00078c232760> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)
[jira] [Commented] (SPARK-20784) Spark hangs forever
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013784#comment-16013784 ] Mathieu D commented on SPARK-20784: --- Well, Spark is blocked, though. I had a couple of occurrences of this, and it seems to be around ThreadUtils$.awaitResultInForkJoinSafely. SPARK-18843 talks about that, but this JIRA is not detailed, so I can't tell for sure. > Spark hangs forever > --- > > Key: SPARK-20784 > URL: https://issues.apache.org/jira/browse/SPARK-20784 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Mathieu D > > Spark hangs and stop executing any job or task. > Web UI shows 0 active task on executors. > Our application runs several spark contexts for several users in parallel in > threads. > spark version 2.0.2, yarn-client > Extract of thread stack below. > {noformat} > "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 > nid=0x484 waiting on condition [0x7fddd0bf > 6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00078c232760> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at > org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) > at >
[jira] [Updated] (SPARK-20784) Spark hangs forever / potential deadlock ?
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-20784: -- Description: Spark hangs and stop executing any job or task. Web UI shows 0 active task on executors. Our application runs several spark contexts for several users in parallel in threads. spark version 2.0.2, yarn-client Extract of thread stack below. {noformat} "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 nid=0x484 waiting on condition [0x7fddd0bf 6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00078c232760> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30) at
[jira] [Updated] (SPARK-20784) Spark hangs forever
[ https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-20784: -- Summary: Spark hangs forever (was: Spark hangs forever / potential deadlock ?) > Spark hangs forever > --- > > Key: SPARK-20784 > URL: https://issues.apache.org/jira/browse/SPARK-20784 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Mathieu D > > Spark hangs and stop executing any job or task. > Web UI shows 0 active task on executors. > Our application runs several spark contexts for several users in parallel in > threads. > spark version 2.0.2, yarn-client > Extract of thread stack below. > {noformat} > "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 > nid=0x484 waiting on condition [0x7fddd0bf > 6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00078c232760> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at > org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) > at > org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at >
[jira] [Created] (SPARK-20784) Spark hangs forever / potential deadlock ?
Mathieu D created SPARK-20784: - Summary: Spark hangs forever / potential deadlock ? Key: SPARK-20784 URL: https://issues.apache.org/jira/browse/SPARK-20784 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.2 Reporter: Mathieu D Spark hangs and stop executing any job or task. Web UI shows 0 active task on executors. Our application runs several spark contexts for several users in parallel in threads. spark version 2.0.2, yarn-client Extract of thread stack below. {norformat} "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 nid=0x484 waiting on condition [0x7fddd0bf 6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00078c232760> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.jav a:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java: 1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at
[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958754#comment-15958754 ] Mathieu D commented on SPARK-20082: --- [~yuhaoyan] or [~josephkb] any feedback on this approach and PR ? > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu D > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892 ] Mathieu D edited comment on SPARK-20082 at 3/28/17 8:39 PM: [~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel, suported only by the Online optimizer. The implementation is inspired from the KMeans one. The initialModel is used as a replacement of the initial randomized matrix. Regarding the EM optimizer, in the same way, we could use an existing model instead of a randomly weighted graph, by adding new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? was (Author: mathieude): [~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel, suported only by the Online optimizer. Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu D > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892 ] Mathieu D commented on SPARK-20082: --- [~yuhaoyan] would you mind having a look to this PR. Right now, I added an initialModel only for the Online optimizer. Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu D > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892 ] Mathieu D edited comment on SPARK-20082 at 3/28/17 8:26 PM: [~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel only for the Online optimizer. Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? was (Author: mathieude): [~yuhaoyan] would you mind having a look to this PR. Right now, I added an initialModel only for the Online optimizer. Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu D > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892 ] Mathieu D edited comment on SPARK-20082 at 3/28/17 8:27 PM: [~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel, suported only by the Online optimizer. Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? was (Author: mathieude): [~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel only for the Online optimizer. Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ? > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu D > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
Mathieu D created SPARK-20082: - Summary: Incremental update of LDA model, by adding initialModel as start point Key: SPARK-20082 URL: https://issues.apache.org/jira/browse/SPARK-20082 Project: Spark Issue Type: Wish Components: MLlib Affects Versions: 2.1.0 Reporter: Mathieu D Some mllib models support an initialModel to start from and update it incrementally with new data. >From what I understand of OnlineLDAOptimizer, it is possible to incrementally >update an existing model with batches of new documents. I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17890) scala.ScalaReflectionException
[ https://issues.apache.org/jira/browse/SPARK-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874360#comment-15874360 ] Mathieu D edited comment on SPARK-17890 at 2/20/17 11:01 AM: - We experience the same issue. When ran from our tests (ie. no spark-submit) it works fine, but fails when ran through spark-submit. Spark 2.0.2 was (Author: mathieude): We experience the same issue. When running from our tests (no spark-submit) it works fine, but fails when ran through spark-submit. Spark 2.0.2 > scala.ScalaReflectionException > -- > > Key: SPARK-17890 > URL: https://issues.apache.org/jira/browse/SPARK-17890 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.1 > Environment: x86_64 GNU/Linux > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) >Reporter: Khalid Reid >Priority: Minor > Labels: newbie > > Hello, > I am seeing an error message in spark-shell when I map a DataFrame to a > Seq\[Foo\]. However, things work fine when I use flatMap. > {noformat} > scala> case class Foo(value:String) > defined class Foo > scala> val df = sc.parallelize(List(1,2,3)).toDF > df: org.apache.spark.sql.DataFrame = [value: int] > scala> df.map{x => Seq.empty[Foo]} > scala.ScalaReflectionException: object $line14.$read not found. > at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:162) > at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:22) > at $typecreator1$1.apply(:29) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232) > at > org.apache.spark.sql.SQLImplicits$$typecreator9$1.apply(SQLImplicits.scala:125) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:49) > at > org.apache.spark.sql.SQLImplicits.newProductSeqEncoder(SQLImplicits.scala:125) > ... 48 elided > scala> df.flatMap{_ => Seq.empty[Foo]} //flatMap works > res2: org.apache.spark.sql.Dataset[Foo] = [value: string] > {noformat} > I am seeing the same error reported > [here|https://issues.apache.org/jira/browse/SPARK-8465?jql=text%20~%20%22scala.ScalaReflectionException%22] > when I use spark-submit. > I am new to Spark but I don't expect this to throw an exception. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17890) scala.ScalaReflectionException
[ https://issues.apache.org/jira/browse/SPARK-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874360#comment-15874360 ] Mathieu D commented on SPARK-17890: --- We experience the same issue. When running from our tests (no spark-submit) it works fine, but fails when ran through spark-submit. Spark 2.0.2 > scala.ScalaReflectionException > -- > > Key: SPARK-17890 > URL: https://issues.apache.org/jira/browse/SPARK-17890 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.1 > Environment: x86_64 GNU/Linux > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) >Reporter: Khalid Reid >Priority: Minor > Labels: newbie > > Hello, > I am seeing an error message in spark-shell when I map a DataFrame to a > Seq\[Foo\]. However, things work fine when I use flatMap. > {noformat} > scala> case class Foo(value:String) > defined class Foo > scala> val df = sc.parallelize(List(1,2,3)).toDF > df: org.apache.spark.sql.DataFrame = [value: int] > scala> df.map{x => Seq.empty[Foo]} > scala.ScalaReflectionException: object $line14.$read not found. > at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:162) > at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:22) > at $typecreator1$1.apply(:29) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232) > at > org.apache.spark.sql.SQLImplicits$$typecreator9$1.apply(SQLImplicits.scala:125) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:49) > at > org.apache.spark.sql.SQLImplicits.newProductSeqEncoder(SQLImplicits.scala:125) > ... 48 elided > scala> df.flatMap{_ => Seq.empty[Foo]} //flatMap works > res2: org.apache.spark.sql.Dataset[Foo] = [value: string] > {noformat} > I am seeing the same error reported > [here|https://issues.apache.org/jira/browse/SPARK-8465?jql=text%20~%20%22scala.ScalaReflectionException%22] > when I use spark-submit. > I am new to Spark but I don't expect this to throw an exception. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-19136: -- Priority: Minor (was: Major) > Aggregator with case class as output type fails with ClassCastException > --- > > Key: SPARK-19136 > URL: https://issues.apache.org/jira/browse/SPARK-19136 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Mathieu D >Priority: Minor > > {{Aggregator}} with a case-class as output type returns a Row that cannot be > cast back to this type, it fails with {{ClassCastException}}. > Here is a dummy example to reproduce the problem > {code} > import org.apache.spark.sql._ > import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder > import org.apache.spark.sql.expressions.Aggregator > import spark.implicits._ > case class MinMax(min: Int, max: Int) > case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with > Serializable { > def zero: (Int, Int) = (Int.MaxValue, Int.MinValue) > def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, > a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0))) > def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2) > def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, > b2._1), Math.max(b1._2, b2._2)) > def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder() > def outputEncoder: Encoder[MinMax] = ExpressionEncoder() > } > val ds = Seq(1, 2, 3, 4).toDF("col1") > val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax")) > {code} > bq. {code} > ds: org.apache.spark.sql.DataFrame = [col1: int] > agg: org.apache.spark.sql.DataFrame = [minmax: struct] > {code} > {code}agg.printSchema(){code} > bq. {code} > root > |-- minmax: struct (nullable = true) > ||-- min: integer (nullable = false) > ||-- max: integer (nullable = false) > {code} > {code}agg.head(){code} > bq. {code} > res1: org.apache.spark.sql.Row = [[1,4]] > {code} > {code}agg.head().getAs[MinMax](0){code} > bq. {code} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast > to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax > [...] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816422#comment-15816422 ] Mathieu D commented on SPARK-19136: --- And... a RDD version based on treeAggregate is even quicker :-/ At least for that dummy min/max. > Aggregator with case class as output type fails with ClassCastException > --- > > Key: SPARK-19136 > URL: https://issues.apache.org/jira/browse/SPARK-19136 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Mathieu D > > {{Aggregator}} with a case-class as output type returns a Row that cannot be > cast back to this type, it fails with {{ClassCastException}}. > Here is a dummy example to reproduce the problem > {code} > import org.apache.spark.sql._ > import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder > import org.apache.spark.sql.expressions.Aggregator > import spark.implicits._ > case class MinMax(min: Int, max: Int) > case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with > Serializable { > def zero: (Int, Int) = (Int.MaxValue, Int.MinValue) > def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, > a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0))) > def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2) > def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, > b2._1), Math.max(b1._2, b2._2)) > def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder() > def outputEncoder: Encoder[MinMax] = ExpressionEncoder() > } > val ds = Seq(1, 2, 3, 4).toDF("col1") > val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax")) > {code} > bq. {code} > ds: org.apache.spark.sql.DataFrame = [col1: int] > agg: org.apache.spark.sql.DataFrame = [minmax: struct] > {code} > {code}agg.printSchema(){code} > bq. {code} > root > |-- minmax: struct (nullable = true) > ||-- min: integer (nullable = false) > ||-- max: integer (nullable = false) > {code} > {code}agg.head(){code} > bq. {code} > res1: org.apache.spark.sql.Row = [[1,4]] > {code} > {code}agg.head().getAs[MinMax](0){code} > bq. {code} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast > to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax > [...] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816366#comment-15816366 ] Mathieu D commented on SPARK-19136: --- Both queries are not equivalent, the dummy group generate a shuffle + repartition. 12s vs almost 2mn for 1M rows on my laptop. So the 2nd one is definitely not acceptable. So, I'll go with the first one, thanks for the hint ! > Aggregator with case class as output type fails with ClassCastException > --- > > Key: SPARK-19136 > URL: https://issues.apache.org/jira/browse/SPARK-19136 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Mathieu D > > {{Aggregator}} with a case-class as output type returns a Row that cannot be > cast back to this type, it fails with {{ClassCastException}}. > Here is a dummy example to reproduce the problem > {code} > import org.apache.spark.sql._ > import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder > import org.apache.spark.sql.expressions.Aggregator > import spark.implicits._ > case class MinMax(min: Int, max: Int) > case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with > Serializable { > def zero: (Int, Int) = (Int.MaxValue, Int.MinValue) > def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, > a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0))) > def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2) > def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, > b2._1), Math.max(b1._2, b2._2)) > def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder() > def outputEncoder: Encoder[MinMax] = ExpressionEncoder() > } > val ds = Seq(1, 2, 3, 4).toDF("col1") > val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax")) > {code} > bq. {code} > ds: org.apache.spark.sql.DataFrame = [col1: int] > agg: org.apache.spark.sql.DataFrame = [minmax: struct] > {code} > {code}agg.printSchema(){code} > bq. {code} > root > |-- minmax: struct (nullable = true) > ||-- min: integer (nullable = false) > ||-- max: integer (nullable = false) > {code} > {code}agg.head(){code} > bq. {code} > res1: org.apache.spark.sql.Row = [[1,4]] > {code} > {code}agg.head().getAs[MinMax](0){code} > bq. {code} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast > to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax > [...] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-19136: -- Summary: Aggregator with case class as output type fails with ClassCastException (was: Aggregator fails with case class as output type) > Aggregator with case class as output type fails with ClassCastException > --- > > Key: SPARK-19136 > URL: https://issues.apache.org/jira/browse/SPARK-19136 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Mathieu D > > {{Aggregator}} with a case-class as output type returns a Row that cannot be > cast back to this type, it fails with {{ClassCastException}}. > Here is a dummy example to reproduce the problem > {code} > import org.apache.spark.sql._ > import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder > import org.apache.spark.sql.expressions.Aggregator > import spark.implicits._ > case class MinMax(min: Int, max: Int) > case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with > Serializable { > def zero: (Int, Int) = (Int.MaxValue, Int.MinValue) > def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, > a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0))) > def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2) > def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, > b2._1), Math.max(b1._2, b2._2)) > def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder() > def outputEncoder: Encoder[MinMax] = ExpressionEncoder() > } > val ds = Seq(1, 2, 3, 4).toDF("col1") > val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax")) > {code} > bq. {code} > ds: org.apache.spark.sql.DataFrame = [col1: int] > agg: org.apache.spark.sql.DataFrame = [minmax: struct] > {code} > {code}agg.printSchema(){code} > bq. {code} > root > |-- minmax: struct (nullable = true) > ||-- min: integer (nullable = false) > ||-- max: integer (nullable = false) > {code} > {code}agg.head(){code} > bq. {code} > res1: org.apache.spark.sql.Row = [[1,4]] > {code} > {code}agg.head().getAs[MinMax](0){code} > bq. {code} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast > to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax > [...] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19136) Aggregator fails with case class as output type
Mathieu D created SPARK-19136: - Summary: Aggregator fails with case class as output type Key: SPARK-19136 URL: https://issues.apache.org/jira/browse/SPARK-19136 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0, 2.0.2 Reporter: Mathieu D {{Aggregator}} with a case-class as output type returns a Row that cannot be cast back to this type, it fails with {{ClassCastException}}. Here is a dummy example to reproduce the problem {code} import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.expressions.Aggregator import spark.implicits._ case class MinMax(min: Int, max: Int) case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with Serializable { def zero: (Int, Int) = (Int.MaxValue, Int.MinValue) def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0))) def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2) def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, b2._1), Math.max(b1._2, b2._2)) def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder() def outputEncoder: Encoder[MinMax] = ExpressionEncoder() } val ds = Seq(1, 2, 3, 4).toDF("col1") val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax")) {code} bq. {code} ds: org.apache.spark.sql.DataFrame = [col1: int] agg: org.apache.spark.sql.DataFrame = [minmax: struct] {code} {code}agg.printSchema(){code} bq. {code} root |-- minmax: struct (nullable = true) ||-- min: integer (nullable = false) ||-- max: integer (nullable = false) {code} {code}agg.head(){code} bq. {code} res1: org.apache.spark.sql.Row = [[1,4]] {code} {code}agg.head().getAs[MinMax](0){code} bq. {code} java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax [...] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-17668) Support representing structs with case classes and tuples in spark sql udf inputs
[ https://issues.apache.org/jira/browse/SPARK-17668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-17668: -- Comment: was deleted (was: I experience the same issue with a custom Aggregator having a case class as an output type. The resulting aggregate is a Row, that I can't cast back to my case class. I provided an outputEncoder, using Encoders.product[T]. ) > Support representing structs with case classes and tuples in spark sql udf > inputs > - > > Key: SPARK-17668 > URL: https://issues.apache.org/jira/browse/SPARK-17668 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.0.0 >Reporter: koert kuipers >Priority: Minor > > after having gotten used to have case classes represent complex structures in > Datasets, i am surprised to find out that when i work in DataFrames with udfs > no such magic exists, and i have to fall back to manipulating Row objects, > which is error prone and somewhat ugly. > for example: > {noformat} > case class Person(name: String, age: Int) > val df = Seq((Person("john", 33), 5), (Person("mike", 30), 6)).toDF("person", > "id") > val df1 = df.withColumn("person", udf({ (p: Person) => p.copy(age = p.age + > 1) }).apply(col("person"))) > df1.printSchema > df1.show > {noformat} > leads to: > {noformat} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast > to Person > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17668) Support representing structs with case classes and tuples in spark sql udf inputs
[ https://issues.apache.org/jira/browse/SPARK-17668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802399#comment-15802399 ] Mathieu D commented on SPARK-17668: --- I experience the same issue with a custom Aggregator having a case class as an output type. The resulting aggregate is a Row, that I can't cast back to my case class. I provided an outputEncoder, using Encoders.product[T]. > Support representing structs with case classes and tuples in spark sql udf > inputs > - > > Key: SPARK-17668 > URL: https://issues.apache.org/jira/browse/SPARK-17668 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.0.0 >Reporter: koert kuipers >Priority: Minor > > after having gotten used to have case classes represent complex structures in > Datasets, i am surprised to find out that when i work in DataFrames with udfs > no such magic exists, and i have to fall back to manipulating Row objects, > which is error prone and somewhat ugly. > for example: > {noformat} > case class Person(name: String, age: Int) > val df = Seq((Person("john", 33), 5), (Person("mike", 30), 6)).toDF("person", > "id") > val df1 = df.withColumn("person", udf({ (p: Person) => p.copy(age = p.age + > 1) }).apply(col("person"))) > df1.printSchema > df1.show > {noformat} > leads to: > {noformat} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast > to Person > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails
[ https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu D updated SPARK-18881: -- Description: We have a Spark application that process continuously a lot of incoming jobs. Several jobs are processed in parallel, on multiple threads. During intensive workloads, at some point, we start to have hundreds of warnings like this : {code} 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622 {code} Starting from that, the performance of the app plummet, most of Stages and Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs. I can't see clearly another related exception happening before. Maybe this one, but it concerns another listener : {code} 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 01:00:00 CET 1970 {code} This is very problematic for us, since it's hard to detect, and requires an app restart. *EDIT :* I confirm the sequence : 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue then 2- JobProgressListener losing track of job and stages. was: We have a Spark application that process continuously a lot of incoming jobs. Several jobs are processed in parallel, on multiple threads. During intensive workloads, at some point, we start to have hundreds of warnings like this : {code} 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622 {code} Starting from that, the performance of the app plummet, most of Stages and Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs. I can't see clearly another related exception happening before. Maybe this one, but it concerns another listener : {code} 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 01:00:00 CET 1970 {code} This is very problematic for us, since it's hard to detect, and requires an app restart. > Spark never finishes jobs and stages, JobProgressListener fails > --- > > Key: SPARK-18881 > URL: https://issues.apache.org/jira/browse/SPARK-18881 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 > Environment: yarn, deploy-mode = client >Reporter: Mathieu D > > We have a Spark application that process continuously a lot of incoming jobs. > Several jobs are processed in parallel, on multiple threads. > During intensive workloads, at some point, we start to have hundreds of > warnings like this : > {code} > 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379 > 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job > 64610 > 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage > 147405 > 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406 > 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job > 64622 > {code} > Starting from that, the performance of the app plummet, most of Stages and > Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs. > I can't see clearly another related exception happening before. Maybe this > one, but it concerns another listener : > {code} > 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because > no remaining room in event queue. This likely means one of the SparkListeners > is too slow and cannot keep up with the rate at which tasks are being started > by the scheduler. > 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since > Thu Jan 01 01:00:00 CET 1970 > {code} > This is very problematic for us, since it's hard to detect, and
[jira] [Commented] (SPARK-18883) FileNotFoundException on _temporary directory
[ https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763747#comment-15763747 ] Mathieu D commented on SPARK-18883: --- The problem does not appear with mapreduce.fileoutputcommitter.algorithm.version=2 so far > FileNotFoundException on _temporary directory > -- > > Key: SPARK-18883 > URL: https://issues.apache.org/jira/browse/SPARK-18883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 > Environment: We're on a CDH 5.7, Hadoop 2.6. >Reporter: Mathieu D > > I'm experiencing the following exception, usually after some time with heavy > load : > {code} > 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > java.io.FileNotFoundException: File > hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) > at > org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) > at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) > at > com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36) > at > com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23) > at > com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33) > at > com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13) > at >
[jira] [Commented] (SPARK-18883) FileNotFoundException on _temporary directory
[ https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751582#comment-15751582 ] Mathieu D commented on SPARK-18883: --- as suggested by [~steve_l], I'm going to try the mapreduce.fileoutputcommitter.algorithm.version = 2, and update this ticket. > FileNotFoundException on _temporary directory > -- > > Key: SPARK-18883 > URL: https://issues.apache.org/jira/browse/SPARK-18883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 > Environment: We're on a CDH 5.7, Hadoop 2.6. >Reporter: Mathieu D > > I'm experiencing the following exception, usually after some time with heavy > load : > {code} > 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > java.io.FileNotFoundException: File > hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) > at > org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) > at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) > at > com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36) > at > com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23) > at > com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33) > at >
[jira] [Commented] (SPARK-18512) FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A
[ https://issues.apache.org/jira/browse/SPARK-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751577#comment-15751577 ] Mathieu D commented on SPARK-18512: --- [SPARK-18883] > FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and > S3A > > > Key: SPARK-18512 > URL: https://issues.apache.org/jira/browse/SPARK-18512 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.1 > Environment: AWS EMR 5.0.1 > Spark 2.0.1 > S3 EU-West-1 (S3A) >Reporter: Giuseppe Bonaccorso > > After a few hours of streaming processing and data saving in Parquet format, > I got always this exception: > {code:java} > java.io.FileNotFoundException: No such file or directory: > s3a://xxx/_temporary/0/task_ > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1004) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:745) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:426) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) > at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) > {code} > I've tried also s3:// and s3n:// but it always happens after a 3-5 hours. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18883) FileNotFoundException on _temporary directory
Mathieu D created SPARK-18883: - Summary: FileNotFoundException on _temporary directory Key: SPARK-18883 URL: https://issues.apache.org/jira/browse/SPARK-18883 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.2 Environment: We're on a CDH 5.7, Hadoop 2.6. Reporter: Mathieu D I'm experiencing the following exception, usually after some time with heavy load : {code} 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. java.io.FileNotFoundException: File hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) at com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36) at com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23) at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33) at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13) at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21) at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at
[jira] [Comment Edited] (SPARK-18512) FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A
[ https://issues.apache.org/jira/browse/SPARK-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751329#comment-15751329 ] Mathieu D edited comment on SPARK-18512 at 12/15/16 1:17 PM: - I'm experiencing a very similar problem, but we're using HDFS, not S3, and it's not a streaming app. As Giuseppe, this usually happens after some time with heavy load. {code} 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. java.io.FileNotFoundException: File hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) at com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36) at com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23) at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33) at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13) at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21) at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at
[jira] [Commented] (SPARK-18512) FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A
[ https://issues.apache.org/jira/browse/SPARK-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751329#comment-15751329 ] Mathieu D commented on SPARK-18512: --- I'm experiencing the same problem, but we're using HDFS, not S3. {code} 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. java.io.FileNotFoundException: File hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488) at com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36) at com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23) at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33) at com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13) at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21) at com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at
[jira] [Created] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails
Mathieu D created SPARK-18881: - Summary: Spark never finishes jobs and stages, JobProgressListener fails Key: SPARK-18881 URL: https://issues.apache.org/jira/browse/SPARK-18881 Project: Spark Issue Type: Bug Affects Versions: 2.0.2 Environment: yarn, deploy-mode = client Reporter: Mathieu D We have a Spark application that process continuously a lot of incoming jobs. Several jobs are processed in parallel, on multiple threads. During intensive workloads, at some point, we start to have hundreds of warnings like this : {code} 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622 {code} Starting from that, the performance of the app plummet, most of Stages and Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs. I can't see clearly another related exception happening before. Maybe this one, but it concerns another listener : {code} 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 01:00:00 CET 1970 {code} This is very problematic for us, since it's hard to detect, and requires an app restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17168) CSV with header is incorrectly read if file is partitioned
[ https://issues.apache.org/jira/browse/SPARK-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429443#comment-15429443 ] Mathieu D commented on SPARK-17168: --- This is error-prone, because the scenario I show will drop rows while reading CSV depending on the number of partitions the file have. > CSV with header is incorrectly read if file is partitioned > -- > > Key: SPARK-17168 > URL: https://issues.apache.org/jira/browse/SPARK-17168 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Mathieu D >Priority: Minor > > If a CSV file is stored in a partitioned fashion, the DataframeReader.csv > with option header set to true skips the first line of *each partition* > instead of skipping only the first one. > ex: > {code} > // create a partitioned CSV file with header : > val rdd=sc.parallelize(Seq("hdr","1","2","3","4","5","6"), numSlices=2) > rdd.saveAsTextFile("foo") > {code} > Now, if we try to read it with DataframeReader, the first row of the 2nd > partition is skipped. > {code} > val df = spark.read.option("header","true").csv("foo") > df.show > +---+ > |hdr| > +---+ > | 1| > | 2| > | 4| > | 5| > | 6| > +---+ > // one row is missing > {code} > I more or less understand that this is to be consistent with the save > operation of dataframewriter which saves header on each individual partition. > But this is very error-prone. In our case, we have large CSV files with > headers already stored in a partitioned way, so we will lose rows if we read > with header set to true. So we have to manually handle the headers. > I suggest a tri-valued option for header, with something like > "skipOnFirstPartition" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17168) CSV with header is incorrectly read if file is partitioned
Mathieu D created SPARK-17168: - Summary: CSV with header is incorrectly read if file is partitioned Key: SPARK-17168 URL: https://issues.apache.org/jira/browse/SPARK-17168 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Mathieu D Priority: Minor If a CSV file is stored in a partitioned fashion, the DataframeReader.csv with option header set to true skips the first line of *each partition* instead of skipping only the first one. ex: {code} // create a partitioned CSV file with header : val rdd=sc.parallelize(Seq("hdr","1","2","3","4","5","6"), numSlices=2) rdd.saveAsTextFile("foo") {code} Now, if we try to read it with DataframeReader, the first row of the 2nd partition is skipped. {code} val df = spark.read.option("header","true").csv("foo") df.show +---+ |hdr| +---+ | 1| | 2| | 4| | 5| | 6| +---+ // one row is missing {code} I more or less understand that this is to be consistent with the save operation of dataframewriter which saves header on each individual partition. But this is very error-prone. In our case, we have large CSV files with headers already stored in a partitioned way, so we will lose rows if we read with header set to true. So we have to manually handle the headers. I suggest a tri-valued option for header, with something like "skipOnFirstPartition" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16938) Cannot resolve column name after a join
Mathieu D created SPARK-16938: - Summary: Cannot resolve column name after a join Key: SPARK-16938 URL: https://issues.apache.org/jira/browse/SPARK-16938 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Mathieu D Priority: Minor Found a change of behavior on spark-2.0.0, which breaks a query in our code base. The following works on previous spark versions, 1.6.1 up to 2.0.0-preview : {code} val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa") val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb") dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", "dfb.id")) {code} but fails with spark-2.0.0 with the exception : {code} Cannot resolve column name "dfa.id" among (id, a, id, b); org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" among (id, a, id, b); at org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819) at org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818) at org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817) at org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814) at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594) at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814) at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org