[jira] [Comment Edited] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2017-05-30 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030025#comment-16030025
 ] 

Mathieu D edited comment on SPARK-18881 at 5/30/17 7:52 PM:


Just to mention a workaround for those experiencing the problem : try increase 
{{spark.scheduler.listenerbus.eventqueue.size}} (default 1). 
It may only postpone the problem, if the queue filling is faster than listeners 
for a long time. In our case, we have bursts of activity and raising this limit 
helps.


was (Author: mathieude):
Just to mention a workaround for those experiencing the problem : try increase 
{{spark.scheduler.listenerbus.eventqueue.size}} (default 1). 
It may only postpone the problem, if the queue filling is faster than listeners 
for a long time. In our case, we have bursts of activity and raising this 
limits helps.

> Spark never finishes jobs and stages, JobProgressListener fails
> ---
>
> Key: SPARK-18881
> URL: https://issues.apache.org/jira/browse/SPARK-18881
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
> Environment: yarn, deploy-mode = client
>Reporter: Mathieu D
>
> We have a Spark application that process continuously a lot of incoming jobs. 
> Several jobs are processed in parallel, on multiple threads.
> During intensive workloads, at some point, we start to have hundreds of  
> warnings like this :
> {code}
> 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
> 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 
> 64610
> 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 
> 147405
> 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
> 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 
> 64622
> {code}
> Starting from that, the performance of the app plummet, most of Stages and 
> Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs.
> I can't see clearly another related exception happening before. Maybe this 
> one, but it concerns another listener :
> {code}
> 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because 
> no remaining room in event queue. This likely means one of the SparkListeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.
> 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since 
> Thu Jan 01 01:00:00 CET 1970
> {code}
> This is very problematic for us, since it's hard to detect, and requires an 
> app restart.
> *EDIT :*
> I confirm the sequence :
> 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining 
> room in event queue
> then
> 2- JobProgressListener losing track of job and stages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2017-05-30 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030025#comment-16030025
 ] 

Mathieu D commented on SPARK-18881:
---

Just to mention a workaround for those experiencing the problem : try increase 
{{spark.scheduler.listenerbus.eventqueue.size}} (default 1). 
It may only postpone the problem, if the queue filling is faster than listeners 
for a long time. In our case, we have bursts of activity and raising this 
limits helps.

> Spark never finishes jobs and stages, JobProgressListener fails
> ---
>
> Key: SPARK-18881
> URL: https://issues.apache.org/jira/browse/SPARK-18881
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
> Environment: yarn, deploy-mode = client
>Reporter: Mathieu D
>
> We have a Spark application that process continuously a lot of incoming jobs. 
> Several jobs are processed in parallel, on multiple threads.
> During intensive workloads, at some point, we start to have hundreds of  
> warnings like this :
> {code}
> 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
> 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 
> 64610
> 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 
> 147405
> 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
> 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 
> 64622
> {code}
> Starting from that, the performance of the app plummet, most of Stages and 
> Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs.
> I can't see clearly another related exception happening before. Maybe this 
> one, but it concerns another listener :
> {code}
> 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because 
> no remaining room in event queue. This likely means one of the SparkListeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.
> 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since 
> Thu Jan 01 01:00:00 CET 1970
> {code}
> This is very problematic for us, since it's hard to detect, and requires an 
> app restart.
> *EDIT :*
> I confirm the sequence :
> 1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining 
> room in event queue
> then
> 2- JobProgressListener losing track of job and stages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-19 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017528#comment-16017528
 ] 

Mathieu D commented on SPARK-18838:
---

I'm not very familiar with this part of Spark, but I'd like to share a thought.
In my experience (SPARK-18881) when events start to be dropped because of full 
event queues, it's much more serious than just a failed job. The Spark driver 
became useless, I had to relaunch.
So, besides the improvement of existing bus, listeners and threads, wouldn't be 
a kind of back-pressure mechanism (on tasks emission) better than dropping 
events ? I mean, this would obviously degrade the job performance, but it's 
still better than compromising the whole job or even the driver health. 
my2cent

> High latency of event processing for large jobs
> ---
>
> Key: SPARK-18838
> URL: https://issues.apache.org/jira/browse/SPARK-18838
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
> Attachments: SparkListernerComputeTime.xlsx
>
>
> Currently we are observing the issue of very high event processing delay in 
> driver's `ListenerBus` for large jobs with many tasks. Many critical 
> component of the scheduler like `ExecutorAllocationManager`, 
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might 
> hurt the job performance significantly or even fail the job.  For example, a 
> significant delay in receiving the `SparkListenerTaskStart` might cause 
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is 
> not idle.  
> The problem is that the event processor in `ListenerBus` is a single thread 
> which loops through all the Listeners for each event and processes each event 
> synchronously 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
>  This single threaded processor often becomes the bottleneck for large jobs.  
> Also, if one of the Listener is very slow, all the listeners will pay the 
> price of delay incurred by the slow listener. In addition to that a slow 
> listener can cause events to be dropped from the event queue which might be 
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the 
> single threaded event processor. Instead each listener will have its own 
> dedicate single threaded executor service . When ever an event is posted, it 
> will be submitted to executor service of all the listeners. The Single 
> threaded executor service will guarantee in order processing of the events 
> per listener.  The queue used for the executor service will be bounded to 
> guarantee we do not grow the memory indefinitely. The downside of this 
> approach is separate event queue per listener will increase the driver memory 
> footprint. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode

2017-05-18 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D closed SPARK-20784.
-
Resolution: Not A Bug

Oh boy, it was an OOM on the driver. Most of the times, it was silent. I just 
discovered an OOM exception in the middle of the logs in a task-result-getter.
I guess the BroadcastExchange was just waiting for it

> Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() 
> in YARN client mode
> -
>
> Key: SPARK-20784
> URL: https://issues.apache.org/jira/browse/SPARK-20784
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.1
>Reporter: Mathieu D
>
> Spark hangs and stop executing any job or task (v2.0.2).
> Web UI shows *0 active stages* and *0 active task* on executors, although a 
> driver thread is clearly working/finishing a stage (see below).
> Our application runs several spark contexts for several users in parallel in 
> threads. spark version 2.0.2, yarn-client
> Extract of thread stack below.
> {noformat}
> "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
> nid=0x484 waiting on condition [0x7fddd0bf
> 6000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00078c232760> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
> at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
> at 
> 

[jira] [Updated] (SPARK-20784) Spark hangs (v2.0) or Futures timed out (v2.1) after a joinWith() and cache() in YARN client mode

2017-05-18 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-20784:
--
Affects Version/s: 2.1.1
  Description: 
Spark hangs and stop executing any job or task (v2.0.2).
Web UI shows *0 active stages* and *0 active task* on executors, although a 
driver thread is clearly working/finishing a stage (see below).

Our application runs several spark contexts for several users in parallel in 
threads. spark version 2.0.2, yarn-client

Extract of thread stack below.

{noformat}
"ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
nid=0x484 waiting on condition [0x7fddd0bf
6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00078c232760> (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at 
org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 

[jira] [Commented] (SPARK-20784) Spark hangs forever after a joinWith() and cache()

2017-05-17 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013911#comment-16013911
 ] 

Mathieu D commented on SPARK-20784:
---

Changed the title.
It's noted for the self reproducer, although it's a bit hard to do (and 
probably related to a yarn context), so it will take some time to prepare.

> Spark hangs forever after a joinWith() and cache()
> --
>
> Key: SPARK-20784
> URL: https://issues.apache.org/jira/browse/SPARK-20784
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Mathieu D
>
> Spark hangs and stop executing any job or task.
> Web UI shows *0 active stages* and *0 active task* on executors, although a 
> driver thread is clearly working/finishing a stage (see below).
> Our application runs several spark contexts for several users in parallel in 
> threads. spark version 2.0.2, yarn-client
> Extract of thread stack below.
> {noformat}
> "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
> nid=0x484 waiting on condition [0x7fddd0bf
> 6000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00078c232760> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
> at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
> at 
> 

[jira] [Updated] (SPARK-20784) Spark hangs forever after a joinWith() and cache()

2017-05-17 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-20784:
--
Summary: Spark hangs forever after a joinWith() and cache()  (was: Spark 
hangs forever)

> Spark hangs forever after a joinWith() and cache()
> --
>
> Key: SPARK-20784
> URL: https://issues.apache.org/jira/browse/SPARK-20784
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Mathieu D
>
> Spark hangs and stop executing any job or task.
> Web UI shows *0 active stages* and *0 active task* on executors, although a 
> driver thread is clearly working/finishing a stage (see below).
> Our application runs several spark contexts for several users in parallel in 
> threads. spark version 2.0.2, yarn-client
> Extract of thread stack below.
> {noformat}
> "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
> nid=0x484 waiting on condition [0x7fddd0bf
> 6000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00078c232760> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
> at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
> at 
> 

[jira] [Updated] (SPARK-20784) Spark hangs forever

2017-05-17 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-20784:
--
Description: 
Spark hangs and stop executing any job or task.
Web UI shows *0 active stages* and *0 active task* on executors, although a 
driver thread is clearly working/finishing a stage (see below).

Our application runs several spark contexts for several users in parallel in 
threads. spark version 2.0.2, yarn-client

Extract of thread stack below.

{noformat}
"ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
nid=0x484 waiting on condition [0x7fddd0bf
6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00078c232760> (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at 
org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)
 

[jira] [Commented] (SPARK-20784) Spark hangs forever

2017-05-17 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013784#comment-16013784
 ] 

Mathieu D commented on SPARK-20784:
---

Well, Spark is blocked, though. I had a couple of occurrences of this, and it 
seems to be around ThreadUtils$.awaitResultInForkJoinSafely.
SPARK-18843 talks about that, but this JIRA is not detailed, so I can't tell 
for sure.

> Spark hangs forever
> ---
>
> Key: SPARK-20784
> URL: https://issues.apache.org/jira/browse/SPARK-20784
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Mathieu D
>
> Spark hangs and stop executing any job or task.
> Web UI shows 0 active task on executors.
> Our application runs several spark contexts for several users in parallel in 
> threads.
> spark version 2.0.2, yarn-client
> Extract of thread stack below.
> {noformat}
> "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
> nid=0x484 waiting on condition [0x7fddd0bf
> 6000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00078c232760> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
> at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
> at 
> 

[jira] [Updated] (SPARK-20784) Spark hangs forever / potential deadlock ?

2017-05-17 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-20784:
--
Description: 
Spark hangs and stop executing any job or task.
Web UI shows 0 active task on executors.

Our application runs several spark contexts for several users in parallel in 
threads.
spark version 2.0.2, yarn-client

Extract of thread stack below.

{noformat}
"ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
nid=0x484 waiting on condition [0x7fddd0bf
6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00078c232760> (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at 
org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)
at 

[jira] [Updated] (SPARK-20784) Spark hangs forever

2017-05-17 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-20784:
--
Summary: Spark hangs forever  (was: Spark hangs forever / potential 
deadlock ?)

> Spark hangs forever
> ---
>
> Key: SPARK-20784
> URL: https://issues.apache.org/jira/browse/SPARK-20784
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Mathieu D
>
> Spark hangs and stop executing any job or task.
> Web UI shows 0 active task on executors.
> Our application runs several spark contexts for several users in parallel in 
> threads.
> spark version 2.0.2, yarn-client
> Extract of thread stack below.
> {noformat}
> "ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
> nid=0x484 waiting on condition [0x7fddd0bf
> 6000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00078c232760> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
> at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
> at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
> at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
> at 
> org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
> at 
> org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
> at 
> org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
> at 
> org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
> at 
> 

[jira] [Created] (SPARK-20784) Spark hangs forever / potential deadlock ?

2017-05-17 Thread Mathieu D (JIRA)
Mathieu D created SPARK-20784:
-

 Summary: Spark hangs forever / potential deadlock ?
 Key: SPARK-20784
 URL: https://issues.apache.org/jira/browse/SPARK-20784
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.2
Reporter: Mathieu D


Spark hangs and stop executing any job or task.
Web UI shows 0 active task on executors.

Our application runs several spark contexts for several users in parallel in 
threads.
spark version 2.0.2, yarn-client

Extract of thread stack below.

{norformat}
"ForkJoinPool-1-worker-0" #107 daemon prio=5 os_prio=0 tid=0x7fddf0005800 
nid=0x484 waiting on condition [0x7fddd0bf
6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00078c232760> (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.jav
a:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:
1304)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at 
org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:212)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218)
at 
org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
 
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at 
org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at 

[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-04-06 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958754#comment-15958754
 ] 

Mathieu D commented on SPARK-20082:
---

[~yuhaoyan] or [~josephkb] any feedback on this approach and PR ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-28 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892
 ] 

Mathieu D edited comment on SPARK-20082 at 3/28/17 8:39 PM:


[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel, suported only by the Online optimizer.

The implementation is inspired from the KMeans one. The initialModel is used as 
a replacement of the initial randomized matrix.

Regarding the EM optimizer, in the same way, we could use an existing model 
instead of a randomly weighted graph, by adding new doc vertices and new 
doc->term edges to the existing graph. But it's unclear for me how the new doc 
vertices should be weighted when added. Right now for a new model, docs and 
terms vertices are weighted randomly, with the same total weight on docs and 
terms. If I add new docs to an existing graph, how to initialize the weights on 
this side ?



was (Author: mathieude):
[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel, suported only by the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-28 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892
 ] 

Mathieu D commented on SPARK-20082:
---

[~yuhaoyan] would you mind having a look to this PR. Right now, I added an 
initialModel only for the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-28 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892
 ] 

Mathieu D edited comment on SPARK-20082 at 3/28/17 8:26 PM:


[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel only for the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?


was (Author: mathieude):
[~yuhaoyan] would you mind having a look to this PR. Right now, I added an 
initialModel only for the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-28 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945892#comment-15945892
 ] 

Mathieu D edited comment on SPARK-20082 at 3/28/17 8:27 PM:


[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel, suported only by the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?


was (Author: mathieude):
[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an 
initialModel only for the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term 
edges to the existing graph. But it's unclear for me how the new doc vertices 
should be weighted when added. Right now for a new model, docs and terms 
vertices are weighted randomly, with the same total weight on docs and terms. 
If I add new docs to an existing graph, how to initialize the weights on this 
side ?

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2017-03-24 Thread Mathieu D (JIRA)
Mathieu D created SPARK-20082:
-

 Summary: Incremental update of LDA model, by adding initialModel 
as start point
 Key: SPARK-20082
 URL: https://issues.apache.org/jira/browse/SPARK-20082
 Project: Spark
  Issue Type: Wish
  Components: MLlib
Affects Versions: 2.1.0
Reporter: Mathieu D


Some mllib models support an initialModel to start from and update it 
incrementally with new data.

>From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
>update an existing model with batches of new documents.

I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17890) scala.ScalaReflectionException

2017-02-20 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874360#comment-15874360
 ] 

Mathieu D edited comment on SPARK-17890 at 2/20/17 11:01 AM:
-

We experience the same issue. When ran from our tests (ie. no spark-submit) it 
works fine, but fails when ran through spark-submit. Spark 2.0.2


was (Author: mathieude):
We experience the same issue. When running from our tests (no spark-submit) it 
works fine, but fails when ran through spark-submit. Spark 2.0.2

> scala.ScalaReflectionException
> --
>
> Key: SPARK-17890
> URL: https://issues.apache.org/jira/browse/SPARK-17890
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1
> Environment: x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
>Reporter: Khalid Reid
>Priority: Minor
>  Labels: newbie
>
> Hello,
> I am seeing an error message in spark-shell when I map a DataFrame to a 
> Seq\[Foo\].  However, things work fine when I use flatMap.  
> {noformat}
> scala> case class Foo(value:String)
> defined class Foo
> scala> val df = sc.parallelize(List(1,2,3)).toDF
> df: org.apache.spark.sql.DataFrame = [value: int]
> scala> df.map{x => Seq.empty[Foo]}
> scala.ScalaReflectionException: object $line14.$read not found.
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:162)
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:22)
>   at $typecreator1$1.apply(:29)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
>   at 
> org.apache.spark.sql.SQLImplicits$$typecreator9$1.apply(SQLImplicits.scala:125)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:49)
>   at 
> org.apache.spark.sql.SQLImplicits.newProductSeqEncoder(SQLImplicits.scala:125)
>   ... 48 elided
> scala> df.flatMap{_ => Seq.empty[Foo]} //flatMap works
> res2: org.apache.spark.sql.Dataset[Foo] = [value: string]
> {noformat}
> I am seeing the same error reported 
> [here|https://issues.apache.org/jira/browse/SPARK-8465?jql=text%20~%20%22scala.ScalaReflectionException%22]
>  when I use spark-submit.
> I am new to Spark but I don't expect this to throw an exception.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17890) scala.ScalaReflectionException

2017-02-20 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874360#comment-15874360
 ] 

Mathieu D commented on SPARK-17890:
---

We experience the same issue. When running from our tests (no spark-submit) it 
works fine, but fails when ran through spark-submit. Spark 2.0.2

> scala.ScalaReflectionException
> --
>
> Key: SPARK-17890
> URL: https://issues.apache.org/jira/browse/SPARK-17890
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1
> Environment: x86_64 GNU/Linux
> Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
>Reporter: Khalid Reid
>Priority: Minor
>  Labels: newbie
>
> Hello,
> I am seeing an error message in spark-shell when I map a DataFrame to a 
> Seq\[Foo\].  However, things work fine when I use flatMap.  
> {noformat}
> scala> case class Foo(value:String)
> defined class Foo
> scala> val df = sc.parallelize(List(1,2,3)).toDF
> df: org.apache.spark.sql.DataFrame = [value: int]
> scala> df.map{x => Seq.empty[Foo]}
> scala.ScalaReflectionException: object $line14.$read not found.
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:162)
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:22)
>   at $typecreator1$1.apply(:29)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
>   at 
> org.apache.spark.sql.SQLImplicits$$typecreator9$1.apply(SQLImplicits.scala:125)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:49)
>   at 
> org.apache.spark.sql.SQLImplicits.newProductSeqEncoder(SQLImplicits.scala:125)
>   ... 48 elided
> scala> df.flatMap{_ => Seq.empty[Foo]} //flatMap works
> res2: org.apache.spark.sql.Dataset[Foo] = [value: string]
> {noformat}
> I am seeing the same error reported 
> [here|https://issues.apache.org/jira/browse/SPARK-8465?jql=text%20~%20%22scala.ScalaReflectionException%22]
>  when I use spark-submit.
> I am new to Spark but I don't expect this to throw an exception.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-01-10 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-19136:
--
Priority: Minor  (was: Major)

> Aggregator with case class as output type fails with ClassCastException
> ---
>
> Key: SPARK-19136
> URL: https://issues.apache.org/jira/browse/SPARK-19136
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Mathieu D
>Priority: Minor
>
> {{Aggregator}} with a case-class as output type returns a Row that cannot be 
> cast back to this type, it fails with {{ClassCastException}}.
> Here is a dummy example to reproduce the problem 
> {code}
> import org.apache.spark.sql._
> import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
> import org.apache.spark.sql.expressions.Aggregator
> import spark.implicits._
> case class MinMax(min: Int, max: Int)
> case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with 
> Serializable {
>   def zero: (Int, Int) = (Int.MaxValue, Int.MinValue)
>   def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, 
> a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0)))
>   def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2)
>   def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, 
> b2._1), Math.max(b1._2, b2._2))
>   def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder()
>   def outputEncoder: Encoder[MinMax] = ExpressionEncoder()
> }
> val ds = Seq(1, 2, 3, 4).toDF("col1")
> val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax"))
> {code}
> bq. {code}
> ds: org.apache.spark.sql.DataFrame = [col1: int]
> agg: org.apache.spark.sql.DataFrame = [minmax: struct]
> {code}
> {code}agg.printSchema(){code}
> bq. {code}
> root
>  |-- minmax: struct (nullable = true)
>  ||-- min: integer (nullable = false)
>  ||-- max: integer (nullable = false)
> {code}
> {code}agg.head(){code}
> bq. {code}
> res1: org.apache.spark.sql.Row = [[1,4]]
> {code}
> {code}agg.head().getAs[MinMax](0){code}
> bq. {code}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-01-10 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816422#comment-15816422
 ] 

Mathieu D commented on SPARK-19136:
---

And... a RDD version based on treeAggregate is even quicker :-/
At least for that dummy min/max.


> Aggregator with case class as output type fails with ClassCastException
> ---
>
> Key: SPARK-19136
> URL: https://issues.apache.org/jira/browse/SPARK-19136
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Mathieu D
>
> {{Aggregator}} with a case-class as output type returns a Row that cannot be 
> cast back to this type, it fails with {{ClassCastException}}.
> Here is a dummy example to reproduce the problem 
> {code}
> import org.apache.spark.sql._
> import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
> import org.apache.spark.sql.expressions.Aggregator
> import spark.implicits._
> case class MinMax(min: Int, max: Int)
> case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with 
> Serializable {
>   def zero: (Int, Int) = (Int.MaxValue, Int.MinValue)
>   def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, 
> a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0)))
>   def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2)
>   def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, 
> b2._1), Math.max(b1._2, b2._2))
>   def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder()
>   def outputEncoder: Encoder[MinMax] = ExpressionEncoder()
> }
> val ds = Seq(1, 2, 3, 4).toDF("col1")
> val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax"))
> {code}
> bq. {code}
> ds: org.apache.spark.sql.DataFrame = [col1: int]
> agg: org.apache.spark.sql.DataFrame = [minmax: struct]
> {code}
> {code}agg.printSchema(){code}
> bq. {code}
> root
>  |-- minmax: struct (nullable = true)
>  ||-- min: integer (nullable = false)
>  ||-- max: integer (nullable = false)
> {code}
> {code}agg.head(){code}
> bq. {code}
> res1: org.apache.spark.sql.Row = [[1,4]]
> {code}
> {code}agg.head().getAs[MinMax](0){code}
> bq. {code}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-01-10 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816366#comment-15816366
 ] 

Mathieu D commented on SPARK-19136:
---

Both queries are not equivalent, the dummy group generate a shuffle + 
repartition.
12s vs almost 2mn for 1M rows on my laptop. 
So the 2nd one is definitely not acceptable.

So, I'll go with the first one, thanks for the hint !



> Aggregator with case class as output type fails with ClassCastException
> ---
>
> Key: SPARK-19136
> URL: https://issues.apache.org/jira/browse/SPARK-19136
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Mathieu D
>
> {{Aggregator}} with a case-class as output type returns a Row that cannot be 
> cast back to this type, it fails with {{ClassCastException}}.
> Here is a dummy example to reproduce the problem 
> {code}
> import org.apache.spark.sql._
> import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
> import org.apache.spark.sql.expressions.Aggregator
> import spark.implicits._
> case class MinMax(min: Int, max: Int)
> case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with 
> Serializable {
>   def zero: (Int, Int) = (Int.MaxValue, Int.MinValue)
>   def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, 
> a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0)))
>   def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2)
>   def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, 
> b2._1), Math.max(b1._2, b2._2))
>   def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder()
>   def outputEncoder: Encoder[MinMax] = ExpressionEncoder()
> }
> val ds = Seq(1, 2, 3, 4).toDF("col1")
> val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax"))
> {code}
> bq. {code}
> ds: org.apache.spark.sql.DataFrame = [col1: int]
> agg: org.apache.spark.sql.DataFrame = [minmax: struct]
> {code}
> {code}agg.printSchema(){code}
> bq. {code}
> root
>  |-- minmax: struct (nullable = true)
>  ||-- min: integer (nullable = false)
>  ||-- max: integer (nullable = false)
> {code}
> {code}agg.head(){code}
> bq. {code}
> res1: org.apache.spark.sql.Row = [[1,4]]
> {code}
> {code}agg.head().getAs[MinMax](0){code}
> bq. {code}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19136) Aggregator with case class as output type fails with ClassCastException

2017-01-09 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-19136:
--
Summary: Aggregator with case class as output type fails with 
ClassCastException  (was: Aggregator fails with case class as output type)

> Aggregator with case class as output type fails with ClassCastException
> ---
>
> Key: SPARK-19136
> URL: https://issues.apache.org/jira/browse/SPARK-19136
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Mathieu D
>
> {{Aggregator}} with a case-class as output type returns a Row that cannot be 
> cast back to this type, it fails with {{ClassCastException}}.
> Here is a dummy example to reproduce the problem 
> {code}
> import org.apache.spark.sql._
> import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
> import org.apache.spark.sql.expressions.Aggregator
> import spark.implicits._
> case class MinMax(min: Int, max: Int)
> case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with 
> Serializable {
>   def zero: (Int, Int) = (Int.MaxValue, Int.MinValue)
>   def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, 
> a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0)))
>   def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2)
>   def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, 
> b2._1), Math.max(b1._2, b2._2))
>   def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder()
>   def outputEncoder: Encoder[MinMax] = ExpressionEncoder()
> }
> val ds = Seq(1, 2, 3, 4).toDF("col1")
> val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax"))
> {code}
> bq. {code}
> ds: org.apache.spark.sql.DataFrame = [col1: int]
> agg: org.apache.spark.sql.DataFrame = [minmax: struct]
> {code}
> {code}agg.printSchema(){code}
> bq. {code}
> root
>  |-- minmax: struct (nullable = true)
>  ||-- min: integer (nullable = false)
>  ||-- max: integer (nullable = false)
> {code}
> {code}agg.head(){code}
> bq. {code}
> res1: org.apache.spark.sql.Row = [[1,4]]
> {code}
> {code}agg.head().getAs[MinMax](0){code}
> bq. {code}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19136) Aggregator fails with case class as output type

2017-01-09 Thread Mathieu D (JIRA)
Mathieu D created SPARK-19136:
-

 Summary: Aggregator fails with case class as output type
 Key: SPARK-19136
 URL: https://issues.apache.org/jira/browse/SPARK-19136
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.2
Reporter: Mathieu D


{{Aggregator}} with a case-class as output type returns a Row that cannot be 
cast back to this type, it fails with {{ClassCastException}}.

Here is a dummy example to reproduce the problem 
{code}
import org.apache.spark.sql._
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.expressions.Aggregator
import spark.implicits._

case class MinMax(min: Int, max: Int)

case class MinMaxAgg() extends Aggregator[Row, (Int, Int), MinMax] with 
Serializable {
  def zero: (Int, Int) = (Int.MaxValue, Int.MinValue)
  def reduce(b: (Int, Int), a: Row): (Int, Int) = (Math.min(b._1, 
a.getAs[Int](0)), Math.max(b._2, a.getAs[Int](0)))
  def finish(r: (Int, Int)): MinMax = MinMax(r._1, r._2)
  def merge(b1: (Int, Int), b2: (Int, Int)): (Int, Int) = (Math.min(b1._1, 
b2._1), Math.max(b1._2, b2._2))
  def bufferEncoder: Encoder[(Int, Int)] = ExpressionEncoder()
  def outputEncoder: Encoder[MinMax] = ExpressionEncoder()
}

val ds = Seq(1, 2, 3, 4).toDF("col1")
val agg = ds.agg(MinMaxAgg().toColumn.alias("minmax"))
{code}

bq. {code}
ds: org.apache.spark.sql.DataFrame = [col1: int]
agg: org.apache.spark.sql.DataFrame = [minmax: struct]
{code}

{code}agg.printSchema(){code}

bq. {code}
root
 |-- minmax: struct (nullable = true)
 ||-- min: integer (nullable = false)
 ||-- max: integer (nullable = false)
{code}

{code}agg.head(){code}

bq. {code}
res1: org.apache.spark.sql.Row = [[1,4]]
{code}

{code}agg.head().getAs[MinMax](0){code}

bq. {code}
java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
to line4c81e18af34342cda654c381ee91139525.$read$$iw$$iw$$iw$$iw$MinMax
[...]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-17668) Support representing structs with case classes and tuples in spark sql udf inputs

2017-01-09 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-17668:
--
Comment: was deleted

(was: I experience the same issue with a custom Aggregator having a case class 
as an output type. The resulting aggregate is a Row, that I can't cast back to 
my case class.
I provided an outputEncoder, using Encoders.product[T].



)

> Support representing structs with case classes and tuples in spark sql udf 
> inputs
> -
>
> Key: SPARK-17668
> URL: https://issues.apache.org/jira/browse/SPARK-17668
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: koert kuipers
>Priority: Minor
>
> after having gotten used to have case classes represent complex structures in 
> Datasets, i am surprised to find out that when i work in DataFrames with udfs 
> no such magic exists, and i have to fall back to manipulating Row objects, 
> which is error prone and somewhat ugly.
> for example:
> {noformat}
> case class Person(name: String, age: Int)
> val df = Seq((Person("john", 33), 5), (Person("mike", 30), 6)).toDF("person", 
> "id")
> val df1 = df.withColumn("person", udf({ (p: Person) => p.copy(age = p.age + 
> 1) }).apply(col("person")))
> df1.printSchema
> df1.show
> {noformat}
> leads to:
> {noformat}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to Person
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17668) Support representing structs with case classes and tuples in spark sql udf inputs

2017-01-05 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802399#comment-15802399
 ] 

Mathieu D commented on SPARK-17668:
---

I experience the same issue with a custom Aggregator having a case class as an 
output type. The resulting aggregate is a Row, that I can't cast back to my 
case class.
I provided an outputEncoder, using Encoders.product[T].





> Support representing structs with case classes and tuples in spark sql udf 
> inputs
> -
>
> Key: SPARK-17668
> URL: https://issues.apache.org/jira/browse/SPARK-17668
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: koert kuipers
>Priority: Minor
>
> after having gotten used to have case classes represent complex structures in 
> Datasets, i am surprised to find out that when i work in DataFrames with udfs 
> no such magic exists, and i have to fall back to manipulating Row objects, 
> which is error prone and somewhat ugly.
> for example:
> {noformat}
> case class Person(name: String, age: Int)
> val df = Seq((Person("john", 33), 5), (Person("mike", 30), 6)).toDF("person", 
> "id")
> val df1 = df.withColumn("person", udf({ (p: Person) => p.copy(age = p.age + 
> 1) }).apply(col("person")))
> df1.printSchema
> df1.show
> {noformat}
> leads to:
> {noformat}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to Person
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2016-12-20 Thread Mathieu D (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu D updated SPARK-18881:
--
Description: 
We have a Spark application that process continuously a lot of incoming jobs. 
Several jobs are processed in parallel, on multiple threads.

During intensive workloads, at some point, we start to have hundreds of  
warnings like this :

{code}
16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
{code}

Starting from that, the performance of the app plummet, most of Stages and Jobs 
never finish. On SparkUI, I can see figures like 13000 pending jobs.

I can't see clearly another related exception happening before. Maybe this one, 
but it concerns another listener :

{code}
16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no 
remaining room in event queue. This likely means one of the SparkListeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.
16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu 
Jan 01 01:00:00 CET 1970
{code}

This is very problematic for us, since it's hard to detect, and requires an app 
restart.


*EDIT :*

I confirm the sequence :
1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room 
in event queue
then
2- JobProgressListener losing track of job and stages.




  was:
We have a Spark application that process continuously a lot of incoming jobs. 
Several jobs are processed in parallel, on multiple threads.

During intensive workloads, at some point, we start to have hundreds of  
warnings like this :

{code}
16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
{code}

Starting from that, the performance of the app plummet, most of Stages and Jobs 
never finish. On SparkUI, I can see figures like 13000 pending jobs.

I can't see clearly another related exception happening before. Maybe this one, 
but it concerns another listener :

{code}
16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no 
remaining room in event queue. This likely means one of the SparkListeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.
16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu 
Jan 01 01:00:00 CET 1970
{code}

This is very problematic for us, since it's hard to detect, and requires an app 
restart.


> Spark never finishes jobs and stages, JobProgressListener fails
> ---
>
> Key: SPARK-18881
> URL: https://issues.apache.org/jira/browse/SPARK-18881
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
> Environment: yarn, deploy-mode = client
>Reporter: Mathieu D
>
> We have a Spark application that process continuously a lot of incoming jobs. 
> Several jobs are processed in parallel, on multiple threads.
> During intensive workloads, at some point, we start to have hundreds of  
> warnings like this :
> {code}
> 16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
> 16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 
> 64610
> 16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 
> 147405
> 16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
> 16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 
> 64622
> {code}
> Starting from that, the performance of the app plummet, most of Stages and 
> Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs.
> I can't see clearly another related exception happening before. Maybe this 
> one, but it concerns another listener :
> {code}
> 16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because 
> no remaining room in event queue. This likely means one of the SparkListeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.
> 16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since 
> Thu Jan 01 01:00:00 CET 1970
> {code}
> This is very problematic for us, since it's hard to detect, and 

[jira] [Commented] (SPARK-18883) FileNotFoundException on _temporary directory

2016-12-20 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763747#comment-15763747
 ] 

Mathieu D commented on SPARK-18883:
---

The problem does not appear with 
mapreduce.fileoutputcommitter.algorithm.version=2 so far

> FileNotFoundException on _temporary directory 
> --
>
> Key: SPARK-18883
> URL: https://issues.apache.org/jira/browse/SPARK-18883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: We're on a CDH 5.7, Hadoop 2.6.
>Reporter: Mathieu D
>
> I'm experiencing the following exception, usually after some time with heavy 
> load :
> {code}
> 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> java.io.FileNotFoundException: File 
> hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
> at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
> at 
> com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
> at 
> com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
> at 
> com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
> at 
> com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13)
> at 
> 

[jira] [Commented] (SPARK-18883) FileNotFoundException on _temporary directory

2016-12-15 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751582#comment-15751582
 ] 

Mathieu D commented on SPARK-18883:
---

as suggested by [~steve_l], I'm going to try the 
mapreduce.fileoutputcommitter.algorithm.version = 2, and update this ticket.

> FileNotFoundException on _temporary directory 
> --
>
> Key: SPARK-18883
> URL: https://issues.apache.org/jira/browse/SPARK-18883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: We're on a CDH 5.7, Hadoop 2.6.
>Reporter: Mathieu D
>
> I'm experiencing the following exception, usually after some time with heavy 
> load :
> {code}
> 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> java.io.FileNotFoundException: File 
> hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
> at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
> at 
> com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
> at 
> com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
> at 
> com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
> at 
> 

[jira] [Commented] (SPARK-18512) FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A

2016-12-15 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751577#comment-15751577
 ] 

Mathieu D commented on SPARK-18512:
---

[SPARK-18883]

> FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and 
> S3A
> 
>
> Key: SPARK-18512
> URL: https://issues.apache.org/jira/browse/SPARK-18512
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.1
> Environment: AWS EMR 5.0.1
> Spark 2.0.1
> S3 EU-West-1 (S3A)
>Reporter: Giuseppe Bonaccorso
>
> After a few hours of streaming processing and data saving in Parquet format, 
> I got always this exception:
> {code:java}
> java.io.FileNotFoundException: No such file or directory: 
> s3a://xxx/_temporary/0/task_
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1004)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:745)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:426)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
>   at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
>   at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
> {code}
> I've tried also s3:// and s3n:// but it always happens after a 3-5 hours. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18883) FileNotFoundException on _temporary directory

2016-12-15 Thread Mathieu D (JIRA)
Mathieu D created SPARK-18883:
-

 Summary: FileNotFoundException on _temporary directory 
 Key: SPARK-18883
 URL: https://issues.apache.org/jira/browse/SPARK-18883
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.2
 Environment: We're on a CDH 5.7, Hadoop 2.6.
Reporter: Mathieu D


I'm experiencing the following exception, usually after some time with heavy 
load :
{code}
16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
java.io.FileNotFoundException: File 
hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
at 
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
at 
com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
at 
com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
at 
com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
at 
com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13)
at 
com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
at 
com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at 

[jira] [Comment Edited] (SPARK-18512) FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A

2016-12-15 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751329#comment-15751329
 ] 

Mathieu D edited comment on SPARK-18512 at 12/15/16 1:17 PM:
-

I'm experiencing a very similar problem, but we're using HDFS,  not S3, and 
it's not a streaming app. 
As Giuseppe, this usually happens after some time with heavy load.

{code}
16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
java.io.FileNotFoundException: File 
hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
at 
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
at 
com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
at 
com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
at 
com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
at 
com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13)
at 
com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
at 
com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at 

[jira] [Commented] (SPARK-18512) FileNotFoundException on _temporary directory with Spark Streaming 2.0.1 and S3A

2016-12-15 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751329#comment-15751329
 ] 

Mathieu D commented on SPARK-18512:
---

I'm experiencing the same problem, but we're using HDFS,  not S3. 

{code}
16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
java.io.FileNotFoundException: File 
hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
at 
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
at 
com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
at 
com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
at 
com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
at 
com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13)
at 
com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
at 
com.bluedme.woda.cmd.ImmediateCommandImpl$$anonfun$run$1.apply(CommandImpl.scala:21)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 

[jira] [Created] (SPARK-18881) Spark never finishes jobs and stages, JobProgressListener fails

2016-12-15 Thread Mathieu D (JIRA)
Mathieu D created SPARK-18881:
-

 Summary: Spark never finishes jobs and stages, JobProgressListener 
fails
 Key: SPARK-18881
 URL: https://issues.apache.org/jira/browse/SPARK-18881
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.2
 Environment: yarn, deploy-mode = client
Reporter: Mathieu D


We have a Spark application that process continuously a lot of incoming jobs. 
Several jobs are processed in parallel, on multiple threads.

During intensive workloads, at some point, we start to have hundreds of  
warnings like this :

{code}
16/12/14 21:04:03 WARN JobProgressListener: Task end for unknown stage 147379
16/12/14 21:04:03 WARN JobProgressListener: Job completed for unknown job 64610
16/12/14 21:04:04 WARN JobProgressListener: Task start for unknown stage 147405
16/12/14 21:04:04 WARN JobProgressListener: Task end for unknown stage 147406
16/12/14 21:04:04 WARN JobProgressListener: Job completed for unknown job 64622
{code}

Starting from that, the performance of the app plummet, most of Stages and Jobs 
never finish. On SparkUI, I can see figures like 13000 pending jobs.

I can't see clearly another related exception happening before. Maybe this one, 
but it concerns another listener :

{code}
16/12/14 21:03:54 ERROR LiveListenerBus: Dropping SparkListenerEvent because no 
remaining room in event queue. This likely means one of the SparkListeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.
16/12/14 21:03:54 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu 
Jan 01 01:00:00 CET 1970
{code}

This is very problematic for us, since it's hard to detect, and requires an app 
restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17168) CSV with header is incorrectly read if file is partitioned

2016-08-20 Thread Mathieu D (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429443#comment-15429443
 ] 

Mathieu D commented on SPARK-17168:
---

This is error-prone, because the scenario I show will drop rows while reading 
CSV depending on the number of partitions the file have.

> CSV with header is incorrectly read if file is partitioned
> --
>
> Key: SPARK-17168
> URL: https://issues.apache.org/jira/browse/SPARK-17168
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Mathieu D
>Priority: Minor
>
> If a CSV file is stored in a partitioned fashion, the DataframeReader.csv 
> with option header set to true skips the first line of *each partition* 
> instead of skipping only the first one.
> ex:
> {code}
> // create a partitioned CSV file with header : 
> val rdd=sc.parallelize(Seq("hdr","1","2","3","4","5","6"), numSlices=2)
> rdd.saveAsTextFile("foo")
> {code}
> Now, if we try to read it with DataframeReader, the first row of the 2nd 
> partition is skipped.
> {code}
> val df = spark.read.option("header","true").csv("foo")
> df.show
> +---+
> |hdr|
> +---+
> |  1|
> |  2|
> |  4|
> |  5|
> |  6|
> +---+
> // one row is missing
> {code}
> I more or less understand that this is to be consistent with the save 
> operation of dataframewriter which saves header on each individual partition.
> But this is very error-prone. In our case, we have large CSV files with 
> headers already stored in a partitioned way, so we will lose rows if we read 
> with header set to true. So we have to manually handle the headers.
> I suggest a tri-valued option for header, with something like 
> "skipOnFirstPartition"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17168) CSV with header is incorrectly read if file is partitioned

2016-08-20 Thread Mathieu D (JIRA)
Mathieu D created SPARK-17168:
-

 Summary: CSV with header is incorrectly read if file is partitioned
 Key: SPARK-17168
 URL: https://issues.apache.org/jira/browse/SPARK-17168
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Mathieu D
Priority: Minor


If a CSV file is stored in a partitioned fashion, the DataframeReader.csv with 
option header set to true skips the first line of *each partition* instead of 
skipping only the first one.

ex:
{code}
// create a partitioned CSV file with header : 
val rdd=sc.parallelize(Seq("hdr","1","2","3","4","5","6"), numSlices=2)
rdd.saveAsTextFile("foo")
{code}

Now, if we try to read it with DataframeReader, the first row of the 2nd 
partition is skipped.

{code}
val df = spark.read.option("header","true").csv("foo")
df.show
+---+
|hdr|
+---+
|  1|
|  2|
|  4|
|  5|
|  6|
+---+
// one row is missing
{code}

I more or less understand that this is to be consistent with the save operation 
of dataframewriter which saves header on each individual partition.
But this is very error-prone. In our case, we have large CSV files with headers 
already stored in a partitioned way, so we will lose rows if we read with 
header set to true. So we have to manually handle the headers.

I suggest a tri-valued option for header, with something like 
"skipOnFirstPartition"




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16938) Cannot resolve column name after a join

2016-08-07 Thread Mathieu D (JIRA)
Mathieu D created SPARK-16938:
-

 Summary: Cannot resolve column name after a join
 Key: SPARK-16938
 URL: https://issues.apache.org/jira/browse/SPARK-16938
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Mathieu D
Priority: Minor


Found a change of behavior on spark-2.0.0, which breaks a query in our code 
base.

The following works on previous spark versions, 1.6.1 up to 2.0.0-preview :
{code}
val dfa = Seq((1, 2), (2, 3)).toDF("id", "a").alias("dfa")
val dfb = Seq((1, 0), (1, 1)).toDF("id", "b").alias("dfb")
dfa.join(dfb, dfa("id") === dfb("id")).dropDuplicates(Array("dfa.id", "dfb.id"))
{code}

but fails with spark-2.0.0 with the exception : 
{code}
Cannot resolve column name "dfa.id" among (id, a, id, b); 
org.apache.spark.sql.AnalysisException: Cannot resolve column name "dfa.id" 
among (id, a, id, b);
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36$$anonfun$apply$12.apply(Dataset.scala:1819)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1818)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1$$anonfun$36.apply(Dataset.scala:1817)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1817)
at 
org.apache.spark.sql.Dataset$$anonfun$dropDuplicates$1.apply(Dataset.scala:1814)
at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:2594)
at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1814)
at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:1840)
...
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org