[jira] [Updated] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21849:

Issue Type: Improvement  (was: Bug)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread DjvuLee (JIRA)
DjvuLee created SPARK-21849:
---

 Summary: Make the serializer function more robust
 Key: SPARK-21849
 URL: https://issues.apache.org/jira/browse/SPARK-21849
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: DjvuLee
Priority: Minor


make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120959#comment-16120959
 ] 

DjvuLee commented on SPARK-21682:
-

Yes, our company also faced with this scalability problem, the driver can 
easily died under a 70K partition.

> Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)
> 
>
> Key: SPARK-21682
> URL: https://issues.apache.org/jira/browse/SPARK-21682
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Ryan Williams
>
> h3. Summary
> * {{sc.parallelize(1 to 10, 10).cache.count}} causes a driver GC 
> stall midway through on every configuration and version I've tried in 2.x.
> * It runs fine with no Full GCs as of 1.6.3
> * I think that {{internal.metrics.updatedBlockStatuses}} is the culprit, and 
> breaks a contract about what big-O sizes accumulators' values can be:
> ** they are each of size O(P), where P is the number of partitions in a 
> cached RDD
> ** ⇒ the driver must process O(P²) data from {{TaskEnd}} events, instead of 
> O(P)
> ** ⇒ the driver also must process O(P*E) work every 10s from 
> {{ExecutorMetricsUpdates}} (where E is the number of executors; cf. 
> {{spark.executor.heartbeatInterval}})
> * when operating on a 100k-partition cached RDD, the driver enters a GC loop 
> due to all the allocations it must do to process {{ExecutorMetricsUpdate}} 
> and {{TaskEnd}} events with {{updatedBlockStatuses}} attached
> * this metric should be disabled, or some ability to blacklist it from the 
> command-line should be added.
> * [SPARK-20084|https://issues.apache.org/jira/browse/SPARK-20084] addressed 
> one part of this - the event-log size had exploded - but the root problem 
> still exists / is worse
> h3. {{count}} a 100k-partition RDD: works fine without {{.cache}}
> In Spark 2.2.0 or 2.1.1:
> {code}
> spark-shell --conf spark.driver.extraJavaOptions="-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -verbose:gc"
> scala> val rdd = sc.parallelize(1 to 10, 10)
> scala> rdd.count
> {code}
> In YARN and local modes, this finishes in ~20s seconds with ~20 partial GCs 
> logged, all taking under 0.1s ([example 
> output|https://gist.github.com/ryan-williams/a3d02ea49d2b5021b53e1680f8fedc37#file-local-mode-rdd-not-cached-works-fine]);
>  all is well!
> h3. {{count}} a 100k-partition cached RDD: GC-dies
> If we {{cache}} the RDD first, the same {{count}} job quickly sends the 
> driver into a GC death spiral: full GC's start after a few thousand tasks and 
> increase in frequency and length until they last minutes / become continuous 
> (and, in YARN, the driver loses contact with any executors).
> Example outputs: 
> [local|https://gist.github.com/ryan-williams/a3d02ea49d2b5021b53e1680f8fedc37#file-local-mode-rdd-cached-crashes],
>  
> [YARN|https://gist.github.com/ryan-williams/a3d02ea49d2b5021b53e1680f8fedc37#file-yarn-mode-cached-rdd-dies].
> The YARN example removes any confusion about whether the storing of the 
> blocks is causing memory pressure on the driver; the driver is basically 
> doing no work except receiving executor updates and events, and yet it 
> becomes overloaded. 
> h3. Can't effectively throw driver heap at the problem
> I've tested with 1GB, 10GB, and 20GB heaps, and the larger heaps do what we'd 
> expect: delay the first Full GC, and make Full GCs longer when they happen. 
> I don't have a clear sense on whether the onset is linear or quadratic (i.e. 
> do I get twice as far into the job before the first Full GC with a 20GB as 
> with a 10GB heap, or only sqrt(2) times as far?).
> h3. Mostly memory pressure, not OOMs
> An interesting note is that I'm rarely seeing OOMs as a result of this, even 
> on small heaps.
> I think this is consistent with the idea that all this data is being 
> immediately discarded by the driver, as opposed to kept around to serve web 
> UIs or somesuch.
> h3. Eliminating {{ExecutorMetricsUpdate}}'s doesn't seem to help
> Interestingly, setting large values of {{spark.executor.heartbeatInterval}} 
> doesn't seem to mitigate the problem; GC-stall sets in at about the same 
> point in the {{count}} job.
> This implies that, in this example, the {{TaskEnd}} events are doing most or 
> all of the damage.
> h3. CMS helps but doesn't solve the problem
> In some rough testing, I saw the {{count}} get about twice as far before 
> dying when using the CMS collector.
> h3. What bandwidth do we expect the driver to process events at?
> IIUC, every 10s the driver gets O(T) (~100k?) block updates from each of ~500 
> executors, and allocating objects for these updates is pushing it over a 
> tipping point where it can't keep up. 
> I don't know how to get good numbers on how much data the driver is 
> 

[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time

2017-07-29 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106292#comment-16106292
 ] 

DjvuLee commented on SPARK-21547:
-

Ok, I will try and posted the result later.



> Spark cleaner cost too many time
> 
>
> Key: SPARK-21547
> URL: https://issues.apache.org/jira/browse/SPARK-21547
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: DjvuLee
>
> Spark Streaming sometime cost so many time deal with cleaning, and this can 
> become worse when enable the dynamic allocation.
> I post the Driver's Log in the following comments, we can find that the 
> cleaner costs more than 2min.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time

2017-07-27 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103459#comment-16103459
 ] 

DjvuLee commented on SPARK-21547:
-

Yes, I agree that this has a relationship with the work, but doing nothing 
about 3min is too long for a Streaming Application.

My proposal is try to let us to inspect whether the current cleaner strategy is 
good enough.

> Spark cleaner cost too many time
> 
>
> Key: SPARK-21547
> URL: https://issues.apache.org/jira/browse/SPARK-21547
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: DjvuLee
>
> Spark Streaming sometime cost so many time deal with cleaning, and this can 
> become worse when enable the dynamic allocation.
> I post the Driver's Log in the following comments, we can find that the 
> cleaner costs more than 2min.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21547) Spark cleaner cost too many time

2017-07-27 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21547:

Description: 
Spark Streaming sometime cost so many time deal with cleaning, and this can 
become worse when enable the dynamic allocation.

I post the Driver's Log in the following comments, we can find that the cleaner 
costs more than 2min.

  was:
Spark Streaming sometime cost so many time deal with cleaning, and this can 
become worse when enable the dynamic allocation.

I post the Driver Log in the following, in this log we can find that the 
cleaner cost more than 2min.


> Spark cleaner cost too many time
> 
>
> Key: SPARK-21547
> URL: https://issues.apache.org/jira/browse/SPARK-21547
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: DjvuLee
>
> Spark Streaming sometime cost so many time deal with cleaning, and this can 
> become worse when enable the dynamic allocation.
> I post the Driver's Log in the following comments, we can find that the 
> cleaner costs more than 2min.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21547) Spark cleaner cost too many time

2017-07-27 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21547:

Description: 
Spark Streaming sometime cost so many time deal with cleaning, and this can 
become worse when enable the dynamic allocation.

I post the Driver Log in the following, in this log we can find that the 
cleaner cost more than 2min.

  was:Spark Streaming sometime cost so many time deal with cleaning, and this 
can become worse when enable the dynamic allocation.


> Spark cleaner cost too many time
> 
>
> Key: SPARK-21547
> URL: https://issues.apache.org/jira/browse/SPARK-21547
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: DjvuLee
>
> Spark Streaming sometime cost so many time deal with cleaning, and this can 
> become worse when enable the dynamic allocation.
> I post the Driver Log in the following, in this log we can find that the 
> cleaner cost more than 2min.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21547) Spark cleaner cost too many time

2017-07-27 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103005#comment-16103005
 ] 

DjvuLee commented on SPARK-21547:
-

17/07/27 11:29:51 INFO TaskSetManager: Finished task 169.0 in stage 1504.0 (TID 
1504369) in 43975 ms on n6-195-137.byted.org (999/1000)
17/07/27 11:29:55 INFO TaskSetManager: Finished task 882.0 in stage 1504.0 (TID 
1504905) in 44153 ms on n6-195-137.byted.org (1000/1000)
17/07/27 11:29:55 INFO YarnScheduler: Removed TaskSet 1504.0, whose tasks have 
all completed, from pool
17/07/27 11:29:55 INFO DAGScheduler: ResultStage 1504 (call at 
/spark2/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py:2230) finished in 
457.863 s
17/07/27 11:29:55 INFO DAGScheduler: Job 1504 finished: call at 
/spark2/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py:2230, took 
457.877969 s
17/07/27 11:30:02 INFO JobScheduler: Added jobs for time 150112620 ms
17/07/27 11:30:32 INFO JobScheduler: Added jobs for time 150112623 ms
17/07/27 11:31:02 INFO JobScheduler: Added jobs for time 150112626 ms
17/07/27 11:31:32 INFO JobScheduler: Added jobs for time 150112629 ms
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906391
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906392
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906396
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906402
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906404
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492509
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492508
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492507
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492506
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492505
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492504
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492503
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 12492502
...
7/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906397
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906398
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906395
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906399
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906403
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906400
17/07/27 11:31:53 INFO ContextCleaner: Cleaned accumulator 10906401
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
10.6.131.75:23734 in memory (size: 35.9 KB, free: 2.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-157-227.byted.org:13090 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-157-158.byted.org:21120 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n6-195-150.byted.org:13277 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-156-165.byted.org:35355 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n6-132-023.byted.org:52521 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-136-133.byted.org:25696 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-150-029.byted.org:34673 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-148-038.byted.org:22503 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:31:53 INFO BlockManagerInfo: Removed broadcast_1504_piece0 on 
n8-150-038.byted.org:28209 in memory (size: 35.9 KB, free: 9.4 GB)

...

17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on 
n8-163-151.byted.org:33703 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on 
n8-148-028.byted.org:36086 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on 
n8-151-039.byted.org:21081 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:32:01 INFO BlockManagerInfo: Removed broadcast_1442_piece0 on 
n8-157-167.byted.org:29370 in memory (size: 35.9 KB, free: 9.4 GB)
17/07/27 11:32:02 INFO JobScheduler: Added jobs for time 150112632 ms
17/07/27 11:32:32 INFO JobScheduler: Added jobs for time 150112635 ms
17/07/27 11:32:45 INFO JobScheduler: Finished job streaming job 150111696 
ms.0 from job set of time 150111696 ms
17/07/27 11:32:45 INFO JobScheduler: Total delay: 9405.183 s for time 
150111696 ms (execution: 1169.595 s)
17/07/27 11:32:45 INFO JobScheduler: Starting job streaming 

[jira] [Updated] (SPARK-21547) Spark cleaner cost too many time

2017-07-27 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21547:

Description: Spark Streaming sometime cost so many time deal with cleaning, 
and this can become worse when enable the dynamic allocation.

> Spark cleaner cost too many time
> 
>
> Key: SPARK-21547
> URL: https://issues.apache.org/jira/browse/SPARK-21547
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: DjvuLee
>
> Spark Streaming sometime cost so many time deal with cleaning, and this can 
> become worse when enable the dynamic allocation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21547) Spark cleaner cost too many time

2017-07-27 Thread DjvuLee (JIRA)
DjvuLee created SPARK-21547:
---

 Summary: Spark cleaner cost too many time
 Key: SPARK-21547
 URL: https://issues.apache.org/jira/browse/SPARK-21547
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 2.0.0
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21383) YARN can allocate too many executors

2017-07-17 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21383:

Summary: YARN can allocate too many executors  (was: YARN can allocate to 
many executors)

> YARN can allocate too many executors
> 
>
> Key: SPARK-21383
> URL: https://issues.apache.org/jira/browse/SPARK-21383
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>
> The YarnAllocator doesn't properly track containers being launched but not 
> yet running.  If it takes time to launch the containers on the NM they don't 
> show up as numExecutorsRunning, but they are already out of the Pending list, 
> so if the allocateResources call happens again it can think it has missing 
> executors even when it doesn't (they just haven't been launched yet).
> This was introduced by SPARK-12447 
> Where it check for missing:
> https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L297
> Only updates the numRunningExecutors after NM has started it:
> https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L524
> Thus if the NM is slow or the network is slow, it can miscount and start 
> additional executors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Affects Version/s: (was: 2.2.1)
   2.3.0

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.0
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049965#comment-16049965
 ] 

DjvuLee commented on SPARK-21082:
-

Data locality, input size for task, scheduling order affect a lot, even all the 
nodes have the same computation capacity.


Suppose there are two Executors with same computation capacity and four tasks 
with input size: 10G, 3G, 10G, 20G. 
So there is a chance that one Executor will cache 30GB, one will  cache 13GB 
under current scheduling policy。 
If the Executor have only 25GB memory for storage, then not all the data can be 
cached in memory.

I will give a more detail description for the propose if it seems OK now.

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049933#comment-16049933
 ] 

DjvuLee edited comment on SPARK-21082 at 6/15/17 2:47 AM:
--

Not a really fast node and slow node problem.

Even all the nodes have equal computation power, but there are lots of factor 
affect the data cached by Executors. Such as the data locality for the task's 
input, the network, and scheduling order etc.

`it is reasonable to schedule more tasks on to fast node.`
but the fact is schedule more tasks to ideal Executors. Scheduler has no 
meaning of fast or slow for each Executor, it considers more about locality and 
idle.

I agree that it is better not to change the code, but I can not find any 
configuration to solve the problem.
Is there any good solution to keep the used memory balanced across Executors? 


was (Author: djvulee):
Not a really fast node and slow node problem.

Even all the nodes have equal computation power, but there are lots of factor 
affect the data cached by Executors. Such as the data locality for the task's 
input, the network, and scheduling order etc.

`it is reasonable to schedule more tasks on to fast node.`
but the fact is schedule more tasks to ideal Executors. Scheduler has no 
meaning of fast or slow for each Executor, it considers more about locality and 
idle.

I agree that it is better not to change the code, but I can not find any 
configuration to solve the problem.
Is there any good solution to keep the used memory balanced across the 
Executors? 

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049933#comment-16049933
 ] 

DjvuLee commented on SPARK-21082:
-

Not a really fast node and slow node problem.

Even all the nodes have equal computation power, but there are lots of factor 
affect the data cached by Executors. Such as the data locality for the task's 
input, the network, and scheduling order etc.

`it is reasonable to schedule more tasks on to fast node.`
but the fact is schedule more tasks to ideal Executors. Scheduler has no 
meaning of fast or slow for each Executor, it considers more about locality and 
idle.

I agree that it is better not to change the code, but I can not find any 
configuration to solve the problem.
Is there any good solution to keep the used memory balanced across the 
Executors? 

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Description: 
 Spark Scheduler do not consider the memory usage during dispatch tasks, this 
can lead to Executor OOM if the RDD is cached sometimes, because Spark can not 
estimate the memory usage well enough(especially when the RDD type is not 
flatten), scheduler may dispatch so many tasks on one Executor.

We can offer a configuration for user to decide whether scheduler will consider 
the memory usage.

  was:
 Spark Scheduler do not consider the memory usage during dispatch tasks, this 
can lead to Executor OOM if the RDD is cached sometimes, because Spark can not 
estimate the memory usage enough well(especially when the RDD type is not 
flatten), scheduler may dispatch so many tasks on one Executor.

We can offer a configuration for user to decide whether scheduler will consider 
the memory usage.


> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048835#comment-16048835
 ] 

DjvuLee commented on SPARK-21082:
-

Yes, one of the reason why Spark do not balance tasks well enough is affected 
by data locality.

Consider data locality is good in most case, but when we want to cache the RDD 
and analysis many times on this RDD,
memory balance is more important than keep data locality when load the data。
If we can not guarantee all the consideration well enough, offer a 
configuration to users is valuable when dealing with memory.

I will give a pull request soon if this suggestion is not defective at first 
sight.

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048653#comment-16048653
 ] 

DjvuLee edited comment on SPARK-21082 at 6/14/17 3:15 AM:
--

[~srowen] This situation occurred when the partition number is larger than the 
CPU core. 

Consider there are 1000 partition and 100 CPU core, we want cache RDD among all 
the Executors.
If  one  Executor executes tasks fast at first time, then the scheduler will 
dispatch more tasks to it, 
so after all the tasks is scheduled, some Executors will used all the storage 
memory, but some Executors just use few memory,
Executors which used more memory may not cache all the RDD partition scheduled 
on it, because there is no more memory for some tasks.
Under this situation, we can not cache all the partition even we have enough 
memory.

What's more, if some Executors occurred OOM during following compute, the 
scheduler may dispatch tasks to Executor which have no more storage memory, and 
sometimes can lead to more and more OOM if Spark can not estimate the memory 
well enough.

But if the scheduler try to schedule tasks to Executors which own more free 
memory can ease this situation.


Maybe we can use the `coalesce` to decrease the partition number, but this is 
not good enough for speculating.


was (Author: djvulee):
[~srowen] This situation occurred when the partition number is larger than the 
CPU core. 

Consider there are 1000 partition and 100 CPU core, we want cache RDD among all 
the Executors.
If  one  Executor executes tasks fast at first time, then the scheduler will 
dispatch more tasks to it, 
so after all the tasks is scheduled, some Executors will used all the storage 
memory, but some Executors just use few memory,
Executors which used more memory may not cache all the RDD partition scheduled 
on it, because there is no more memory for some tasks.
Under this situation, we can not cache all the partition even we have enough 
memory.

What's more, if some Executors occurred OOM during following compute, the 
scheduler may dispatch tasks to Executor which have no more storage memory, and 
sometimes can lead to more and more OOM if Spark can not estimate the memory.

But if the scheduler try to schedule tasks to Executors which own more free 
memory can ease this situation.


Maybe we can use the `coalesce` to decrease the partition number, but this is 
not good enough for speculating.

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048654#comment-16048654
 ] 

DjvuLee commented on SPARK-21082:
-

My idea is try to consider the BlockManger information when scheduling if user 
specify the configuration.

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048653#comment-16048653
 ] 

DjvuLee commented on SPARK-21082:
-

[~srowen] This situation occurred when the partition number is larger than the 
CPU core. 

Consider there are 1000 partition and 100 CPU core, we want cache RDD among all 
the Executors.
If  one  Executor executes tasks fast at first time, then the scheduler will 
dispatch more tasks to it, 
so after all the tasks is scheduled, some Executors will used all the storage 
memory, but some Executors just use few memory,
Executors which used more memory may not cache all the RDD partition scheduled 
on it, because there is no more memory for some tasks.
Under this situation, we can not cache all the partition even we have enough 
memory.

What's more, if some Executors occurred OOM during following compute, the 
scheduler may dispatch tasks to Executor which have no more storage memory, and 
sometimes can lead to more and more OOM if Spark can not estimate the memory.

But if the scheduler try to schedule tasks to Executors which own more free 
memory can ease this situation.


Maybe we can use the `coalesce` to decrease the partition number, but this is 
not good enough for speculating.

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Description: 
 Spark Scheduler do not consider the memory usage during dispatch tasks, this 
can lead to Executor OOM if the RDD is cached sometimes, because Spark can not 
estimate the memory usage enough well(especially when the RDD type is not 
flatten), scheduler may dispatch so many tasks on one Executor.

We can offer a configuration for user to decide whether scheduler will consider 
the memory usage.

  was:
 Spark Scheduler do not consider the memory usage during dispatch tasks, this 
can lead to Executor OOM if the RDD is cached sometimes, because Spark can not 
estimate the memory usage enough well(especially when the RDD type is not 
flatten), scheduler may dispatch so many task on one Executor.

We can offer a configuration for user to decide whether scheduler will consider 
the memory usage.


> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048146#comment-16048146
 ] 

DjvuLee commented on SPARK-21082:
-

If this feature is a good suggestion(we encounter this problem in fact), I will 
give a pull request.

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many task on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Description: 
 Spark Scheduler do not consider the memory usage during dispatch tasks, this 
can lead to Executor OOM if the RDD is cached sometimes, because Spark can not 
estimate the memory usage enough well(especially when the RDD type is not 
flatten), scheduler may dispatch so many task on one Executor.

We can offer a configuration for user to decide whether scheduler will consider 
the memory usage.

  was: Spark Scheduler do not consider the memory usage during dispatch tasks, 
this can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
not estimate the memory usage enough well(especially when the RDD type is not 
flatten). We can offer a configuration for user to decide whether scheduler 
will consider the memory usage to relief the OOM.


> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten), scheduler may dispatch so many task on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Description:  Spark Scheduler do not consider the memory usage during 
dispatch tasks, this can lead to Executor OOM if the RDD is cached sometimes, 
because Spark can not estimate the memory usage enough well(especially when the 
RDD type is not flatten). We can offer a configuration for user to decide 
whether scheduler will consider the memory usage to relief the OOM.  (was: When 
we cache the )

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage enough well(especially when the RDD type is not 
> flatten). We can offer a configuration for user to decide whether scheduler 
> will consider the memory usage to relief the OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Description: When we cache the 

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
> When we cache the 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Component/s: Scheduler

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>
> When we cache the 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21082) Consider the Executor's Memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)
DjvuLee created SPARK-21082:
---

 Summary: Consider the Executor's Memory usage when scheduling task 
 Key: SPARK-21082
 URL: https://issues.apache.org/jira/browse/SPARK-21082
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21082:

Summary: Consider Executor's memory usage when scheduling task   (was: 
Consider the Executor's Memory usage when scheduling task )

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21064) Fix the default value bug in NettyBlockTransferServiceSuite

2017-06-12 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046475#comment-16046475
 ] 

DjvuLee commented on SPARK-21064:
-

The defalut value for `spark.port.maxRetries` is 100, but we use the 10
in the suite file.

> Fix the default value bug in NettyBlockTransferServiceSuite
> ---
>
> Key: SPARK-21064
> URL: https://issues.apache.org/jira/browse/SPARK-21064
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21064) Fix the default value bug in NettyBlockTransferServiceSuite

2017-06-12 Thread DjvuLee (JIRA)
DjvuLee created SPARK-21064:
---

 Summary: Fix the default value bug in 
NettyBlockTransferServiceSuite
 Key: SPARK-21064
 URL: https://issues.apache.org/jira/browse/SPARK-21064
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications

2017-06-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040633#comment-16040633
 ] 

DjvuLee commented on SPARK-18085:
-

[~vanzin] the procedure of loading the history summary page is still a little 
long. Is there any further plan to solve this?  

> Better History Server scalability for many / large applications
> ---
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications

2017-06-05 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038210#comment-16038210
 ] 

DjvuLee commented on SPARK-18085:
-

[~vanzin] I want to try your branch. Does all the information is the same with 
you post above? 

If there is anything I can help you, you can post the work below.

> Better History Server scalability for many / large applications
> ---
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18085) Better History Server scalability for many / large applications

2017-03-06 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-18085:


Yes, you're right. I just want to impact as few as possible.




> Better History Server scalability for many / large applications
> ---
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications

2017-03-05 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896652#comment-15896652
 ] 

DjvuLee commented on SPARK-18085:
-

"A separate jar file" means we generate a new jar file for the history 
function, just like Spark put `network` function in a new jar file, not in the 
Spark-core,I just want do not impact the existing jar file。

You can update the information when this design is ready for complete, maybe 
many people want to try it.

Thanks for your reply!


> Better History Server scalability for many / large applications
> ---
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896107#comment-15896107
 ] 

DjvuLee commented on SPARK-19823:
-

[~zsxwing] Can you have a look at?

> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096
 ] 

DjvuLee edited comment on SPARK-19823 at 3/5/17 7:19 AM:
-

When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

Under current way, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executor, even we have more cores in 
each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 
we only nee to modify this:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L340

val shuffledOffers = shuffleOffers(filteredOffers)





was (Author: djvulee):
When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

Under current way, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executor, even we have more cores in 
each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 




> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096
 ] 

DjvuLee edited comment on SPARK-19823 at 3/5/17 7:10 AM:
-

When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

In current ways, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executor, even we have more cores in 
each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 





was (Author: djvulee):
When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

In current ways, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executors, even we have more cores 
in each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 




> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096
 ] 

DjvuLee edited comment on SPARK-19823 at 3/5/17 7:10 AM:
-

When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

Under current way, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executor, even we have more cores in 
each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 





was (Author: djvulee):
When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

In current ways, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executor, even we have more cores in 
each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 




> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096
 ] 

DjvuLee edited comment on SPARK-19823 at 3/5/17 7:10 AM:
-

When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

In current ways, Executors may not get released since the long-running stage 
will distributed at lease one task in each Executors, even we have more cores 
in each Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 





was (Author: djvulee):
When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

In current ways, Executors may not get released since the long-running stage 
will distributed on task in each Executors, even we have more cores in each 
Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 




> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896096#comment-15896096
 ] 

DjvuLee commented on SPARK-19823:
-

When Spark distributes tasks to Executors,  it uses a Round-Robin way, this 
means we distribute tasks to all Executors as possible, this is a good strategy 
in most case.


However, when we introduce the dynamic allocation and concurrent  jobs, users 
may need the gang distribution for one single job, this means we allocate tasks 
for one job in few Executors, so we can release the unused resource.

In current ways, Executors may not get released since the long-running stage 
will distributed on task in each Executors, even we have more cores in each 
Executor.

We can offer a configuration for users, and this will not introduce much 
complexity. 




> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896097#comment-15896097
 ] 

DjvuLee commented on SPARK-19823:
-

If this is a good advice, I will give a Pull Request.

> Support Gang Distribution of Task 
> --
>
> Key: SPARK-19823
> URL: https://issues.apache.org/jira/browse/SPARK-19823
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19823) Support Gang Distribution of Task

2017-03-04 Thread DjvuLee (JIRA)
DjvuLee created SPARK-19823:
---

 Summary: Support Gang Distribution of Task 
 Key: SPARK-19823
 URL: https://issues.apache.org/jira/browse/SPARK-19823
 Project: Spark
  Issue Type: Improvement
  Components: Scheduler, Spark Core
Affects Versions: 2.0.2
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896070#comment-15896070
 ] 

DjvuLee commented on SPARK-18085:
-

[~vanzin] Thanks for your reply!

Does your new solution will generate a separate jar file,  I would like to try 
it if true.
The history problem is a issue for our production environment now.

> Better History Server scalability for many / large applications
> ---
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19821) Throw out the Read-only disk information when create file for Shuffle

2017-03-04 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896067#comment-15896067
 ] 

DjvuLee commented on SPARK-19821:
-

Currently, when the disk is just read-only, we will just throw out the 
FileNotFoundException, 
we can do better to give out the disk is  read-only information, and maybe we 
can achieve better fault tolerance for single task.

> Throw out the Read-only disk information when create file for Shuffle
> -
>
> Key: SPARK-19821
> URL: https://issues.apache.org/jira/browse/SPARK-19821
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>
> java.io.FileNotFoundException: 
> /data01/yarn/nmdata/usercache/tiger/appcache/application_1486364177723_1047735/blockmgr-23098754-a97a-4673-ba73-3de5e167da87/2c/shuffle_55_47_0.index.0347f74b-a9c1-473e-b81f-40be394cc00f
>  (Input/output error)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:143)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:219)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19821) Throw out the Read-only disk information when create file for Shuffle

2017-03-04 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-19821:

Description: 
java.io.FileNotFoundException: 
/data01/yarn/nmdata/usercache/tiger/appcache/application_1486364177723_1047735/blockmgr-23098754-a97a-4673-ba73-3de5e167da87/2c/shuffle_55_47_0.index.0347f74b-a9c1-473e-b81f-40be394cc00f
 (Input/output error)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:162)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:143)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:219)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

> Throw out the Read-only disk information when create file for Shuffle
> -
>
> Key: SPARK-19821
> URL: https://issues.apache.org/jira/browse/SPARK-19821
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.2
>Reporter: DjvuLee
>
> java.io.FileNotFoundException: 
> /data01/yarn/nmdata/usercache/tiger/appcache/application_1486364177723_1047735/blockmgr-23098754-a97a-4673-ba73-3de5e167da87/2c/shuffle_55_47_0.index.0347f74b-a9c1-473e-b81f-40be394cc00f
>  (Input/output error)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:143)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:219)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19821) Throw out the Read-only disk information when create file for Shuffle

2017-03-04 Thread DjvuLee (JIRA)
DjvuLee created SPARK-19821:
---

 Summary: Throw out the Read-only disk information when create file 
for Shuffle
 Key: SPARK-19821
 URL: https://issues.apache.org/jira/browse/SPARK-19821
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 2.0.2
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications

2017-03-03 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894074#comment-15894074
 ] 

DjvuLee commented on SPARK-18085:
-

[~vanzin] This is a nice design.

There is not much information about the delete. The history log can be large 
after a few weeks, does this local db will delete the data as specified by the 
configuration?

> Better History Server scalability for many / large applications
> ---
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17300) ClosedChannelException caused by missing block manager when speculative tasks are killed

2017-02-05 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853630#comment-15853630
 ] 

DjvuLee commented on SPARK-17300:
-

[~rdblue] Is there any fix for this issue later?

> ClosedChannelException caused by missing block manager when speculative tasks 
> are killed
> 
>
> Key: SPARK-17300
> URL: https://issues.apache.org/jira/browse/SPARK-17300
> Project: Spark
>  Issue Type: Bug
>Reporter: Ryan Blue
>
> We recently backported SPARK-10530 to our Spark build, which kills 
> unnecessary duplicate/speculative tasks when one completes (either a 
> speculative task or the original). In large jobs with 500+ executors, this 
> caused some executors to die and resulted in the same error that was fixed by 
> SPARK-15262: ClosedChannelException when trying to connect to the block 
> manager on affected hosts.
> {code}
> java.nio.channels.ClosedChannelException
>   at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
>   at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>   at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>   at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19327) Improve the partition method when read data from jdbc

2017-01-21 Thread DjvuLee (JIRA)
DjvuLee created SPARK-19327:
---

 Summary: Improve the partition method when read data from jdbc
 Key: SPARK-19327
 URL: https://issues.apache.org/jira/browse/SPARK-19327
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: DjvuLee


The partition method in `jdbc` when specify the column using the equal step, 
this can lead to skew between partitions. The new method introduce a balance 
partition method base on the elements, this can avoid the skew problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19239) Check the lowerBound and upperBound whether equal None in jdbc API

2017-01-16 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-19239:

Summary: Check the lowerBound and upperBound whether equal None in jdbc API 
 (was: Check the lowerBound and upperBound equal None in jdbc API)

> Check the lowerBound and upperBound whether equal None in jdbc API
> --
>
> Key: SPARK-19239
> URL: https://issues.apache.org/jira/browse/SPARK-19239
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: DjvuLee
>
> When we use the ``jdbc`` in pyspark, if we check the lowerBound and 
> upperBound, we can give a more friendly suggestion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19239) Check the lowerBound and upperBound equal None in jdbc API

2017-01-16 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-19239:

Description: When we use the ``jdbc`` in pyspark, if we check the 
lowerBound and upperBound, we can give a more friendly suggestion.

> Check the lowerBound and upperBound equal None in jdbc API
> --
>
> Key: SPARK-19239
> URL: https://issues.apache.org/jira/browse/SPARK-19239
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: DjvuLee
>
> When we use the ``jdbc`` in pyspark, if we check the lowerBound and 
> upperBound, we can give a more friendly suggestion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19239) Check the lowerBound and upperBound equal None in jdbc API

2017-01-16 Thread DjvuLee (JIRA)
DjvuLee created SPARK-19239:
---

 Summary: Check the lowerBound and upperBound equal None in jdbc API
 Key: SPARK-19239
 URL: https://issues.apache.org/jira/browse/SPARK-19239
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:46 AM:
--

When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-shell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala. the OS is 
Debian 4.6.4.


was (Author: djvulee):
When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-shell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731455#comment-15731455
 ] 

DjvuLee commented on SPARK-18778:
-

[~srowen] [~andrewor14] can you have a look at?

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:38 AM:
--

I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix 
this.[https://issues.apache.org/jira/browse/SPARK-4161]


was (Author: djvulee):
I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:38 AM:
--

I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix 
this.[https://issues.apache.org/jira/browse/SPARK-4161]


was (Author: djvulee):
I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix 
this.[https://issues.apache.org/jira/browse/SPARK-4161]

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:35 AM:
--

I give a fix in the https://github.com/apache/spark/pull/16210.

This seems a little wired, since the -Dscala.usejavacp=true is try to fix this.


was (Author: djvulee):
I give a fix in the https://github.com/apache/spark/pull/16210

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731431#comment-15731431
 ] 

DjvuLee commented on SPARK-18778:
-

I give a fix in the https://github.com/apache/spark/pull/16210

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403
 ] 

DjvuLee commented on SPARK-18778:
-

When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-sell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731403#comment-15731403
 ] 

DjvuLee edited comment on SPARK-18778 at 12/8/16 7:25 AM:
--

When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-shell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.


was (Author: djvulee):
When I just run the ./bin/spark-shell under our environment, the spark-shell 
occurs the above error.

I can fix it by pass the -usejavacp directly to the spark-shell, like running 
./bin/spark-sell -usejavacp

My environment is jdk1.8.0_91, and we do not install the scala.

the -Dscala.usejavacp=true in bin/spark-shell seems  not work.

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-18778:

Affects Version/s: 1.6.1
   2.0.2

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-18778:

Description: 
Failed to initialize compiler: object scala.runtime in compiler mirror not 
found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.AssertionError: assertion failed: null
at scala.Predef$.assert(Predef.scala:179)
at 
org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

> Fix the Scala classpath in the spark-shell
> --
>
> Key: SPARK-18778
> URL: https://issues.apache.org/jira/browse/SPARK-18778
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.0.2
>Reporter: DjvuLee
>
> Failed to initialize compiler: object scala.runtime in compiler mirror not 
> found.
> ** Note that as of 2.8 scala does not assume use of the java classpath.
> ** For the old behavior pass -usejavacp to scala, or if using a Settings
> ** object programatically, settings.usejavacp.value = true.
> Exception in thread "main" java.lang.AssertionError: assertion failed: null
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.scala:247)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:990)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at 
> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18778) Fix the Scala classpath in the spark-shell

2016-12-07 Thread DjvuLee (JIRA)
DjvuLee created SPARK-18778:
---

 Summary: Fix the Scala classpath in the spark-shell
 Key: SPARK-18778
 URL: https://issues.apache.org/jira/browse/SPARK-18778
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: DjvuLee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18181) Huge managed memory leak (2.7G) when running reduceByKey

2016-11-21 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685568#comment-15685568
 ] 

DjvuLee commented on SPARK-18181:
-

[~barrybecker4] can you reproduce this on the spark2.x version?

> Huge managed memory leak (2.7G) when running reduceByKey
> 
>
> Key: SPARK-18181
> URL: https://issues.apache.org/jira/browse/SPARK-18181
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Barry Becker
>
> For a while now, I have noticed messages like 
> 16/10/31 09:44:25 ERROR Executor: Managed memory leak detected; size = 
> 5251642 bytes, TID = 64204
> when running jobs with spark 1.6.2.
> I have seen others post bugs on this, but they are all marked fixed in 
> earlier versions. I am certain that the issue still exists in 1.6.2.
> In the following case, I can even get it to leak 2.7G all at once.
> The message is:
> 16/10/31 11:12:47 ERROR Executor: Managed memory leak detected; size = 
> 2724723111 bytes, TID = 18
> The code snippet causing it is:
> {code}
> val nonZeros: RDD[((Int, Float), Array[Long])] =
>   featureValues.map(y => (y._1._1 + "," + y._1._2, y._2)).reduceByKey { 
> case (v1, v2) =>
>   (v1, v2).zipped.map(_ + _)
> }.map(y => {
>   val s = y._1.split(",")
>   ((s(0).toInt, s(1).toFloat), y._2)
> })
> {code}
> and the stack trace is:
> {code}
> 16/10/31 11:12:47 ERROR Executor: Exception in task 0.0 in stage 11.0 (TID 18)
> java.lang.OutOfMemoryError: Java heap space
>   at 
> scala.collection.mutable.ArrayBuilder$ofLong.mkArray(ArrayBuilder.scala:388)
>   at 
> scala.collection.mutable.ArrayBuilder$ofLong.resize(ArrayBuilder.scala:394)
>   at 
> scala.collection.mutable.ArrayBuilder$ofLong.sizeHint(ArrayBuilder.scala:399)
>   at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
>   at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
>   at scala.runtime.Tuple2Zipped$.map$extension(Tuple2Zipped.scala:41)
>   at 
> org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:151)
>   at 
> org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:150)
>   at 
> org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:187)
>   at 
> org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:186)
>   at 
> org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:144)
>   at 
> org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 16/10/31 11:12:47 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
> thread Thread[Executor task launch worker-0,5,main]
> java.lang.OutOfMemoryError: Java heap space
>   at 
> scala.collection.mutable.ArrayBuilder$ofLong.mkArray(ArrayBuilder.scala:388)
>   at 
> scala.collection.mutable.ArrayBuilder$ofLong.resize(ArrayBuilder.scala:394)
>   at 
> scala.collection.mutable.ArrayBuilder$ofLong.sizeHint(ArrayBuilder.scala:399)
>   at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
>   at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
>   at scala.runtime.Tuple2Zipped$.map$extension(Tuple2Zipped.scala:41)
>   at 
> org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:151)
>   at 
> org.apache.spark.mllib.feature.MDLPDiscretizer$$anonfun$11.apply(MDLPDiscretizer.scala:150)
>   at 
> org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:187)
>   at 
> org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:186)
>   at 
> org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:144)
>   at 
> org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
>   at 
> 

[jira] [Commented] (SPARK-18528) limit + groupBy leads to java.lang.NullPointerException

2016-11-21 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15685451#comment-15685451
 ] 

DjvuLee commented on SPARK-18528:
-

I just test your example, but it works.

>>> df.limit(3).groupBy("user_id").count().show()
+---+-+
|user_id|count|
+---+-+
|  1|2|
|  2|1|
+---+-+


I test on a the spark2.0.1-hadoop2.6 version downloaded from the spark website.

can you reproduce your error?



> limit + groupBy leads to java.lang.NullPointerException
> ---
>
> Key: SPARK-18528
> URL: https://issues.apache.org/jira/browse/SPARK-18528
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.0.1
> Environment: CentOS release 6.6, Linux 2.6.32-504.el6.x86_64
>Reporter: Corey
>
> Using limit on a DataFrame prior to groupBy will lead to a crash. 
> Repartitioning will avoid the crash.
> *will crash:* {{df.limit(3).groupBy("user_id").count().show()}}
> *will work:* {{df.limit(3).coalesce(1).groupBy('user_id').count().show()}}
> *will work:* 
> {{df.limit(3).repartition('user_id').groupBy('user_id').count().show()}}
> Here is a reproducible example along with the error message:
> {quote}
> >>> df = spark.createDataFrame([ (1, 1), (1, 3), (2, 1), (3, 2), (3, 3) ], 
> >>> ["user_id", "genre_id"])
> >>>
> >>> df.show()
> +---++
> |user_id|genre_id|
> +---++
> |  1|   1|
> |  1|   3|
> |  2|   1|
> |  3|   2|
> |  3|   3|
> +---++
> >>> df.groupBy("user_id").count().show()
> +---+-+
> |user_id|count|
> +---+-+
> |  1|2|
> |  3|2|
> |  2|1|
> +---+-+
> >>> df.limit(3).groupBy("user_id").count().show()
> [Stage 8:===>(1964 + 24) / 
> 2000]16/11/21 01:59:27 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 
> 8204, lvsp20hdn012.stubprod.com): java.lang.NullPointerException
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right

2016-09-12 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee closed SPARK-17500.
---
Resolution: Not A Bug

> The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right
> -
>
> Key: SPARK-17500
> URL: https://issues.apache.org/jira/browse/SPARK-17500
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: DjvuLee
>
> The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increased 
> by file size in each spill, but we only need the file size at the last time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right

2016-09-11 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-17500:

Description: The DiskBytesSpilled metric in ExternalMerger && 
ExternalGroupBy is increased by file size in each spill, but we only need the 
file size at the last time.  (was: The DiskBytesSpilled metric in 
ExternalMerger && ExternalGroupBy is increment by file size in each spill, but 
we only need the file size at the last time.)

> The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right
> -
>
> Key: SPARK-17500
> URL: https://issues.apache.org/jira/browse/SPARK-17500
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: DjvuLee
>
> The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increased 
> by file size in each spill, but we only need the file size at the last time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right

2016-09-11 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-17500:

Summary: The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy 
is not right  (was: The DiskBytesSpilled metric in ExternalMerger && 
ExternalGroupBy is wrong)

> The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is not right
> -
>
> Key: SPARK-17500
> URL: https://issues.apache.org/jira/browse/SPARK-17500
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: DjvuLee
>
> The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increment 
> by file size in each spill, but we only need the file size at the last time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17500) The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is wrong

2016-09-11 Thread DjvuLee (JIRA)
DjvuLee created SPARK-17500:
---

 Summary: The DiskBytesSpilled metric in ExternalMerger && 
ExternalGroupBy is wrong
 Key: SPARK-17500
 URL: https://issues.apache.org/jira/browse/SPARK-17500
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.0.0, 1.6.2, 1.6.1, 1.6.0
Reporter: DjvuLee


The DiskBytesSpilled metric in ExternalMerger && ExternalGroupBy is increment 
by file size in each spill, but we only need the file size at the last time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR

2016-08-22 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-3630:
---
Comment: was deleted

(was: How much data do you test?  we encounter this error in our production. 
Our data is about several TB. The Spark version is 1.6.1, and the snappy 
version is 1.1.2.4。)

> Identify cause of Kryo+Snappy PARSING_ERROR
> ---
>
> Key: SPARK-3630
> URL: https://issues.apache.org/jira/browse/SPARK-3630
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>
> A recent GraphX commit caused non-deterministic exceptions in unit tests so 
> it was reverted (see SPARK-3400).
> Separately, [~aash] observed the same exception stacktrace in an 
> application-specific Kryo registrator:
> {noformat}
> com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to 
> uncompress the chunk: PARSING_ERROR(2)
> com.esotericsoftware.kryo.io.Input.fill(Input.java:142) 
> com.esotericsoftware.kryo.io.Input.require(Input.java:169) 
> com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) 
> com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127)
>  
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>  
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> ...
> {noformat}
> This ticket is to identify the cause of the exception in the GraphX commit so 
> the faulty commit can be fixed and merged back into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR

2016-08-22 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430215#comment-15430215
 ] 

DjvuLee edited comment on SPARK-3630 at 8/22/16 7:10 AM:
-

Can I know how much data do you test?  We encounter this error in our 
production, our data is about several TB. The Spark version is 1.6.1, and the 
snappy version is 1.1.2.4。When the data is small, we never encounter this error.


was (Author: djvulee):
How much data do you test?  we encounter this error in our production. Our data 
is about several TB. The Spark version is 1.6.1, and the snappy version is 
1.1.2.4。

> Identify cause of Kryo+Snappy PARSING_ERROR
> ---
>
> Key: SPARK-3630
> URL: https://issues.apache.org/jira/browse/SPARK-3630
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>
> A recent GraphX commit caused non-deterministic exceptions in unit tests so 
> it was reverted (see SPARK-3400).
> Separately, [~aash] observed the same exception stacktrace in an 
> application-specific Kryo registrator:
> {noformat}
> com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to 
> uncompress the chunk: PARSING_ERROR(2)
> com.esotericsoftware.kryo.io.Input.fill(Input.java:142) 
> com.esotericsoftware.kryo.io.Input.require(Input.java:169) 
> com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) 
> com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127)
>  
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>  
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> ...
> {noformat}
> This ticket is to identify the cause of the exception in the GraphX commit so 
> the faulty commit can be fixed and merged back into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR

2016-08-22 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430215#comment-15430215
 ] 

DjvuLee commented on SPARK-3630:


How much data do you test?  we encounter this error in our production. Our data 
is about several TB. The Spark version is 1.6.1, and the snappy version is 
1.1.2.4。

> Identify cause of Kryo+Snappy PARSING_ERROR
> ---
>
> Key: SPARK-3630
> URL: https://issues.apache.org/jira/browse/SPARK-3630
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>
> A recent GraphX commit caused non-deterministic exceptions in unit tests so 
> it was reverted (see SPARK-3400).
> Separately, [~aash] observed the same exception stacktrace in an 
> application-specific Kryo registrator:
> {noformat}
> com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to 
> uncompress the chunk: PARSING_ERROR(2)
> com.esotericsoftware.kryo.io.Input.fill(Input.java:142) 
> com.esotericsoftware.kryo.io.Input.require(Input.java:169) 
> com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) 
> com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127)
>  
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>  
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> ...
> {noformat}
> This ticket is to identify the cause of the exception in the GraphX commit so 
> the faulty commit can be fixed and merged back into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR

2016-08-22 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430217#comment-15430217
 ] 

DjvuLee commented on SPARK-3630:


How much data do you test?  we encounter this error in our production. Our data 
is about several TB. The Spark version is 1.6.1, and the snappy version is 
1.1.2.4。

> Identify cause of Kryo+Snappy PARSING_ERROR
> ---
>
> Key: SPARK-3630
> URL: https://issues.apache.org/jira/browse/SPARK-3630
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>
> A recent GraphX commit caused non-deterministic exceptions in unit tests so 
> it was reverted (see SPARK-3400).
> Separately, [~aash] observed the same exception stacktrace in an 
> application-specific Kryo registrator:
> {noformat}
> com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to 
> uncompress the chunk: PARSING_ERROR(2)
> com.esotericsoftware.kryo.io.Input.fill(Input.java:142) 
> com.esotericsoftware.kryo.io.Input.require(Input.java:169) 
> com.esotericsoftware.kryo.io.Input.readInt(Input.java:325) 
> com.esotericsoftware.kryo.io.Input.readFloat(Input.java:624) 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:127)
>  
> com.esotericsoftware.kryo.serializers.DefaultSerializers$FloatSerializer.read(DefaultSerializers.java:117)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>  
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>  
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> ...
> {noformat}
> This ticket is to identify the cause of the exception in the GraphX commit so 
> the faulty commit can be fixed and merged back into master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11054) WARN ReliableDeliverySupervisor: Association with remote system

2015-10-11 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952225#comment-14952225
 ] 

DjvuLee commented on SPARK-11054:
-

Since the default value for spark.akka.heartbeat.interval or 
spark.akka.heartbeat.pauses is very large, there should no relation with this 
Error.

> WARN ReliableDeliverySupervisor: Association with remote system
> ---
>
> Key: SPARK-11054
> URL: https://issues.apache.org/jira/browse/SPARK-11054
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: DjvuLee
>Priority: Blocker
>
> WARN ReliableDeliverySupervisor: Association with remote system 
> [akka.tcp://sparkExecutor@10.23.0.36:38375] has failed, address is now gated 
> for [5000] ms. Reason: [Disassociated]
> =
> This is WARN occurred very frequently in 1.5.0 version, and I do not in 
> 1.4.1.  Some related issue report set spark.akka.heartbeat.interval(default 
> 1000s) or spark.akka.heartbeat.pauses(default 6000s) can help, but is not 
> work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11054) WARN ReliableDeliverySupervisor: Association with remote system

2015-10-11 Thread DjvuLee (JIRA)
DjvuLee created SPARK-11054:
---

 Summary: WARN ReliableDeliverySupervisor: Association with remote 
system
 Key: SPARK-11054
 URL: https://issues.apache.org/jira/browse/SPARK-11054
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
Reporter: DjvuLee
Priority: Blocker


WARN ReliableDeliverySupervisor: Association with remote system 
[akka.tcp://sparkExecutor@10.23.0.36:38375] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]


=
This is WARN occurred very frequently in 1.5.0 version, and I do not in 1.4.1.  
Some related issue report set spark.akka.heartbeat.interval(default 1000s) or 
spark.akka.heartbeat.pauses(default 6000s) can help, but is not work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11054) WARN ReliableDeliverySupervisor: Association with remote system

2015-10-11 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952653#comment-14952653
 ] 

DjvuLee commented on SPARK-11054:
-

I am very sorry for set the wrong Priority! 

I knew this problem occurred many times, not just for me, so I try to report 
this.

> WARN ReliableDeliverySupervisor: Association with remote system
> ---
>
> Key: SPARK-11054
> URL: https://issues.apache.org/jira/browse/SPARK-11054
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: DjvuLee
>Priority: Minor
>
> WARN ReliableDeliverySupervisor: Association with remote system 
> [akka.tcp://sparkExecutor@10.23.0.36:38375] has failed, address is now gated 
> for [5000] ms. Reason: [Disassociated]
> =
> This is WARN occurred very frequently in 1.5.0 version, and I do not in 
> 1.4.1.  Some related issue report set spark.akka.heartbeat.interval(default 
> 1000s) or spark.akka.heartbeat.pauses(default 6000s) can help, but is not 
> work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10717) remove the with Loging in the NioBlockTransferService

2015-09-19 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877397#comment-14877397
 ] 

DjvuLee commented on SPARK-10717:
-

change to 

final class NioBlockTransferService(conf: SparkConf, securityManager: 
SecurityManager) 
extends BlockTransferService {}

> remove the with Loging in the NioBlockTransferService
> -
>
> Key: SPARK-10717
> URL: https://issues.apache.org/jira/browse/SPARK-10717
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: DjvuLee
>Priority: Minor
>
> Since BlockTransferService implement the Logging interface, we can remove the 
> with Logging in the NioBlockTransferService, keep the same as 
> NettyBlockTransferService
> abstract class BlockTransferService extends ShuffleClient with Closeable with 
> Logging {}
> final class NioBlockTransferService(conf: SparkConf, securityManager: 
> SecurityManager) 
> extends BlockTransferService with Logging {}
> class NettyBlockTransferService(conf: SparkConf, securityManager: 
> SecurityManager, numCores: Int) extends BlockTransferService {}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10717) remove the with Loging in the NioBlockTransferService

2015-09-19 Thread DjvuLee (JIRA)
DjvuLee created SPARK-10717:
---

 Summary: remove the with Loging in the NioBlockTransferService
 Key: SPARK-10717
 URL: https://issues.apache.org/jira/browse/SPARK-10717
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.5.0
Reporter: DjvuLee
Priority: Minor


Since BlockTransferService implement the Logging interface, we can remove the 
with Logging in the NioBlockTransferService, keep the same as 
NettyBlockTransferService

abstract class BlockTransferService extends ShuffleClient with Closeable with 
Logging {}

final class NioBlockTransferService(conf: SparkConf, securityManager: 
SecurityManager) 
extends BlockTransferService with Logging {}

class NettyBlockTransferService(conf: SparkConf, securityManager: 
SecurityManager, numCores: Int) extends BlockTransferService {}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-08-06 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661207#comment-14661207
 ] 

DjvuLee commented on SPARK-4105:


Is there any progress on this bug? 

I have the same error on spark1.1.0, I occurred when I use the GroupByKey just 
as posted above or a sql join. when the data is not very big,  this bug do not 
occurred, but when the data is bigger, the the FAILED_TO_UNCOMPRESS throwed.

by the way, I just using the Hash-based shuffle.

 FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based 
 shuffle
 -

 Key: SPARK-4105
 URL: https://issues.apache.org/jira/browse/SPARK-4105
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 1.2.0, 1.2.1, 1.3.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker
 Attachments: JavaObjectToSerialize.java, 
 SparkFailedToUncompressGenerator.scala


 We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during 
 shuffle read.  Here's a sample stacktrace from an executor:
 {code}
 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 
 33053)
 java.io.IOException: FAILED_TO_UNCOMPRESS(5)
   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
   at org.xerial.snappy.Snappy.uncompress(Snappy.java:427)
   at 
 org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127)
   at 
 org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
   at org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)
   at 
 org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
   at 
 org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090)
   at 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
   at 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
   at 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
   at 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
   at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
   at 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
   at 
 org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
   at 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
   at 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
   at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
   at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
   at 
 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
   at 
 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
   at 
 

[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map

2015-02-11 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316393#comment-14316393
 ] 

DjvuLee commented on SPARK-5739:


Yes, 1M maybe enough for the Kmeans algorithm.

But if we consider other machine learning algorithm, such as  logistic 
regression, then 10^7 dimension is not such big. LR  in the ad click model in 
the real maybe common(I ever heard by my friends), so how Spark can deal well 
with this? 

Maybe the weight parameter in LR is only one, but when the dimension is up to 
billion, the data can up to GB.

 Size exceeds Integer.MAX_VALUE in File Map
 --

 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
 node.
Reporter: DjvuLee

 I just run the kmeans algorithm using a random generate data,but occurred 
 this problem after some iteration. I try several time, and this problem is 
 reproduced. 
 Because the data is random generate, so I guess is there a bug ? Or if random 
 data can lead to such a scenario that the size is bigger than 
 Integer.MAX_VALUE, can we check the size before using the file map?
 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN  
 org.apache.spark.util.SizeEstimator - Failed to check whether 
 UseCompressedOops is set; assuming yes
 [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds 
 Integer.MAX_VALUE
 java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
   at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
   at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
   at 
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
   at 
 org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270)
   at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143)
   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348)
   at KMeansDataGenerator$.main(kmeans.scala:105)
   at KMeansDataGenerator.main(kmeans.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
   at java.lang.reflect.Method.invoke(Method.java:619)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map

2015-02-11 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316430#comment-14316430
 ] 

DjvuLee commented on SPARK-5739:


Ok, Got it, I will look the code for more detail. 

I will close this issue, but add a check for the size is still a good advise?

 Size exceeds Integer.MAX_VALUE in File Map
 --

 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
 node.
Reporter: DjvuLee

 I just run the kmeans algorithm using a random generate data,but occurred 
 this problem after some iteration. I try several time, and this problem is 
 reproduced. 
 Because the data is random generate, so I guess is there a bug ? Or if random 
 data can lead to such a scenario that the size is bigger than 
 Integer.MAX_VALUE, can we check the size before using the file map?
 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN  
 org.apache.spark.util.SizeEstimator - Failed to check whether 
 UseCompressedOops is set; assuming yes
 [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds 
 Integer.MAX_VALUE
 java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
   at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
   at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
   at 
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
   at 
 org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270)
   at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143)
   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348)
   at KMeansDataGenerator$.main(kmeans.scala:105)
   at KMeansDataGenerator.main(kmeans.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
   at java.lang.reflect.Method.invoke(Method.java:619)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map

2015-02-11 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317384#comment-14317384
 ] 

DjvuLee commented on SPARK-5739:


Yes, I do not explain cleanly. What I mean is that we can add a check in the 
getBytes method in the DiskStore.scala file. 
just before call the following function.

Some(channel.map(MapMode.READ_ONLY, segment.offset, segment.length))

 Size exceeds Integer.MAX_VALUE in File Map
 --

 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
 node.
Reporter: DjvuLee

 I just run the kmeans algorithm using a random generate data,but occurred 
 this problem after some iteration. I try several time, and this problem is 
 reproduced. 
 Because the data is random generate, so I guess is there a bug ? Or if random 
 data can lead to such a scenario that the size is bigger than 
 Integer.MAX_VALUE, can we check the size before using the file map?
 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN  
 org.apache.spark.util.SizeEstimator - Failed to check whether 
 UseCompressedOops is set; assuming yes
 [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds 
 Integer.MAX_VALUE
 java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
   at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
   at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
   at 
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
   at 
 org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270)
   at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143)
   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348)
   at KMeansDataGenerator$.main(kmeans.scala:105)
   at KMeansDataGenerator.main(kmeans.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
   at java.lang.reflect.Method.invoke(Method.java:619)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map

2015-02-11 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-5739:
---
Summary: Size exceeds Integer.MAX_VALUE in File Map  (was: Size exceeds 
Integer.MAX_VALUE in FileMap)

 Size exceeds Integer.MAX_VALUE in File Map
 --

 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
 node.
Reporter: DjvuLee

 I just run the kmeans algorithm using a random generate data,but occurred 
 this problem after some iteration. I try several time, and this problem is 
 reproduced. 
 Because the data is random generate, so I guess is there a bug ? Or if random 
 data can lead to such a scenario that the size is bigger than 
 Integer.MAX_VALUE, can we check the size before using the file map?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map

2015-02-11 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315845#comment-14315845
 ] 

DjvuLee commented on SPARK-5739:


the data is generated by the example KMeansDataGenerator in the Spark, I just 
change the parameter  for 

val parts = 480
val numPoints = 480 //1500
val k = 10 //args(3).toInt
val d = 1000//args(4).toInt
val r = 1.0 //args(5).toDouble
val iter = 8

 Size exceeds Integer.MAX_VALUE in File Map
 --

 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
 node.
Reporter: DjvuLee

 I just run the kmeans algorithm using a random generate data,but occurred 
 this problem after some iteration. I try several time, and this problem is 
 reproduced. 
 Because the data is random generate, so I guess is there a bug ? Or if random 
 data can lead to such a scenario that the size is bigger than 
 Integer.MAX_VALUE, can we check the size before using the file map?
 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN  
 org.apache.spark.util.SizeEstimator - Failed to check whether 
 UseCompressedOops is set; assuming yes
 [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds 
 Integer.MAX_VALUE
 java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
   at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
   at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
   at 
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
   at 
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
   at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
   at 
 org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270)
   at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143)
   at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338)
   at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348)
   at KMeansDataGenerator$.main(kmeans.scala:105)
   at KMeansDataGenerator.main(kmeans.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
   at java.lang.reflect.Method.invoke(Method.java:619)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5739) Size exceeds Integer.MAX_VALUE in FileMap

2015-02-11 Thread DjvuLee (JIRA)
DjvuLee created SPARK-5739:
--

 Summary: Size exceeds Integer.MAX_VALUE in FileMap
 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
node.
Reporter: DjvuLee


I just run the kmeans algorithm using a random generate data,but occurred this 
problem after some iteration. I try several time, and this problem is 
reproduced. 

Because the data is random generate, so I guess is there a bug ? Or if random 
data can lead to such a scenario that the size is bigger than 
Integer.MAX_VALUE, can we check the size before using the file map?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5739) Size exceeds Integer.MAX_VALUE in File Map

2015-02-11 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-5739:
---
Description: 
I just run the kmeans algorithm using a random generate data,but occurred this 
problem after some iteration. I try several time, and this problem is 
reproduced. 

Because the data is random generate, so I guess is there a bug ? Or if random 
data can lead to such a scenario that the size is bigger than 
Integer.MAX_VALUE, can we check the size before using the file map?



015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN  
org.apache.spark.util.SizeEstimator - Failed to check whether UseCompressedOops 
is set; assuming yes
[error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds 
Integer.MAX_VALUE
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
at 
org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
at 
org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
at 
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598)
at 
org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869)
at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79)
at 
org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
at 
org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270)
at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143)
at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126)
at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338)
at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348)
at KMeansDataGenerator$.main(kmeans.scala:105)
at KMeansDataGenerator.main(kmeans.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)

  was:
I just run the kmeans algorithm using a random generate data,but occurred this 
problem after some iteration. I try several time, and this problem is 
reproduced. 

Because the data is random generate, so I guess is there a bug ? Or if random 
data can lead to such a scenario that the size is bigger than 
Integer.MAX_VALUE, can we check the size before using the file map?




 Size exceeds Integer.MAX_VALUE in File Map
 --

 Key: SPARK-5739
 URL: https://issues.apache.org/jira/browse/SPARK-5739
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.1
 Environment: Spark1.1.1 on a cluster with 12 node. Every node with 
 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a 
 node.
Reporter: DjvuLee

 I just run the kmeans algorithm using a random generate data,but occurred 
 this problem after some iteration. I try several time, and this problem is 
 reproduced. 
 Because the data is random generate, so I guess is there a bug ? Or if random 
 data can lead to such a scenario that the size is bigger than 
 Integer.MAX_VALUE, can we check the size before using the file map?
 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN  
 org.apache.spark.util.SizeEstimator - Failed to check whether 
 UseCompressedOops is set; assuming yes
 [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds 
 Integer.MAX_VALUE
 java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850)
   at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105)
   at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140)
   at 
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747)
  

[jira] [Updated] (SPARK-5375) Specify more clearly about the max thread meaning in the ConnectionManager

2015-01-22 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-5375:
---
Description: 
In the ConnectionManager.scala file, there is three thread pool: 
handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.

such as:
private val handleMessageExecutor = new ThreadPoolExecutor(
  conf.getInt(spark.core.connection.handler.threads.min, 20),
 conf.getInt(spark.core.connection.handler.threads.max, 60),
 conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
 TimeUnit.SECONDS,
 new LinkedBlockingDeque[Runnable](),
 Utils.namedThreadFactory(handle-message-executor))

Since we use a LinkedBlockingDeque, so the max thread parameter has no meaning. 
Every time I read the code, this  can lead to Confusing for me , Maybe we can 
add some comment in those place?

  was:
In the ConnectionManager.scala file, there is three thread pool: 
handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.

such as:
private val handleMessageExecutor = new ThreadPoolExecutor(
  conf.getInt(spark.core.connection.handler.threads.min, 20),
 conf.getInt(spark.core.connection.handler.threads.max, 60),
 conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
 TimeUnit.SECONDS,
 new LinkedBlockingDeque[Runnable](),
 Utils.namedThreadFactory(handle-message-executor))

Since we use a LinkedBlockingDeque, so the max thread parameter have no 
meaning. Every time I read the code, this  can lead to Confusing for me , Maybe 
we can add some comment in those place?


 Specify more clearly about the max thread meaning in the ConnectionManager
 --

 Key: SPARK-5375
 URL: https://issues.apache.org/jira/browse/SPARK-5375
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: DjvuLee

 In the ConnectionManager.scala file, there is three thread pool: 
 handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.
 such as:
 private val handleMessageExecutor = new ThreadPoolExecutor(
   conf.getInt(spark.core.connection.handler.threads.min, 20),
  conf.getInt(spark.core.connection.handler.threads.max, 60),
  conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
  TimeUnit.SECONDS,
  new LinkedBlockingDeque[Runnable](),
  Utils.namedThreadFactory(handle-message-executor))
 Since we use a LinkedBlockingDeque, so the max thread parameter has no 
 meaning. Every time I read the code, this  can lead to Confusing for me , 
 Maybe we can add some comment in those place?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5375) Specify more clearly about the max thread meaning in the ConnectionManager

2015-01-22 Thread DjvuLee (JIRA)
DjvuLee created SPARK-5375:
--

 Summary: Specify more clearly about the max thread meaning in the 
ConnectionManager
 Key: SPARK-5375
 URL: https://issues.apache.org/jira/browse/SPARK-5375
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: DjvuLee


In the ConnectionManager.scala file, there is three thread pool: 
handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.

such as:
private val handleMessageExecutor = new ThreadPoolExecutor(
conf.getInt(spark.core.connection.handler.threads.min, 20),
conf.getInt(spark.core.connection.handler.threads.max, 60),
conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
TimeUnit.SECONDS,
new LinkedBlockingDeque[Runnable](),
Utils.namedThreadFactory(handle-message-executor))

Since we use a LinkedBlockingDeque, so the max thread parameter have no 
meaning. Every time I read the code, this  can lead to Confusing for me , Maybe 
we can add some comment in those place?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5375) Specify more clearly about the max thread meaning in the ConnectionManager

2015-01-22 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-5375:
---
Description: 
In the ConnectionManager.scala file, there is three thread pool: 
handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.

such as:
private val handleMessageExecutor = new ThreadPoolExecutor(
  conf.getInt(spark.core.connection.handler.threads.min, 20),
 conf.getInt(spark.core.connection.handler.threads.max, 60),
 conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
 TimeUnit.SECONDS,
 new LinkedBlockingDeque[Runnable](),
 Utils.namedThreadFactory(handle-message-executor))

Since we use a LinkedBlockingDeque, so the max thread parameter have no 
meaning. Every time I read the code, this  can lead to Confusing for me , Maybe 
we can add some comment in those place?

  was:
In the ConnectionManager.scala file, there is three thread pool: 
handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.

such as:
private val handleMessageExecutor = new ThreadPoolExecutor(
conf.getInt(spark.core.connection.handler.threads.min, 20),
conf.getInt(spark.core.connection.handler.threads.max, 60),
conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
TimeUnit.SECONDS,
new LinkedBlockingDeque[Runnable](),
Utils.namedThreadFactory(handle-message-executor))

Since we use a LinkedBlockingDeque, so the max thread parameter have no 
meaning. Every time I read the code, this  can lead to Confusing for me , Maybe 
we can add some comment in those place?


 Specify more clearly about the max thread meaning in the ConnectionManager
 --

 Key: SPARK-5375
 URL: https://issues.apache.org/jira/browse/SPARK-5375
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: DjvuLee

 In the ConnectionManager.scala file, there is three thread pool: 
 handleMessageExecutor,  handleReadWriteExecutor, handleConnectExecutor.
 such as:
 private val handleMessageExecutor = new ThreadPoolExecutor(
   conf.getInt(spark.core.connection.handler.threads.min, 20),
  conf.getInt(spark.core.connection.handler.threads.max, 60),
  conf.getInt(spark.core.connection.handler.threads.keepalive, 60), 
  TimeUnit.SECONDS,
  new LinkedBlockingDeque[Runnable](),
  Utils.namedThreadFactory(handle-message-executor))
 Since we use a LinkedBlockingDeque, so the max thread parameter have no 
 meaning. Every time I read the code, this  can lead to Confusing for me , 
 Maybe we can add some comment in those place?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution

2014-07-24 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073244#comment-14073244
 ] 

DjvuLee commented on SPARK-1112:


Does anyone test in version0.9.2,I found it also failed , while  v1.0.1  
v1.1.0 is ok. 

 When spark.akka.frameSize  10, task results bigger than 10MiB block execution
 --

 Key: SPARK-1112
 URL: https://issues.apache.org/jira/browse/SPARK-1112
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0, 1.0.0
Reporter: Guillaume Pitel
Assignee: Xiangrui Meng
Priority: Blocker
 Fix For: 0.9.2


 When I set the spark.akka.frameSize to something over 10, the messages sent 
 from the executors to the driver completely block the execution if the 
 message is bigger than 10MiB and smaller than the frameSize (if it's above 
 the frameSize, it's ok)
 Workaround is to set the spark.akka.frameSize to 10. In this case, since 
 0.8.1, the blockManager deal with  the data to be sent. It seems slower than 
 akka direct message though.
 The configuration seems to be correctly read (see actorSystemConfig.txt), so 
 I don't see where the 10MiB could come from 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks

2014-07-17 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064965#comment-14064965
 ] 

DjvuLee commented on SPARK-2156:


I see this fixed in the spark branch-0.9 in the github, but does it updated in 
the spark v0.9.1 in the http://spark.apache.org/ site?

 When the size of serialized results for one partition is slightly smaller 
 than 10MB (the default akka.frameSize), the execution blocks
 --

 Key: SPARK-2156
 URL: https://issues.apache.org/jira/browse/SPARK-2156
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1, 1.0.0
 Environment: AWS EC2 1 master 2 slaves with the instance type of 
 r3.2xlarge
Reporter: Chen Jin
Assignee: Xiangrui Meng
Priority: Blocker
 Fix For: 0.9.2, 1.0.1, 1.1.0

   Original Estimate: 504h
  Remaining Estimate: 504h

  I have done some experiments when the frameSize is around 10MB .
 1) spark.akka.frameSize = 10
 If one of the partition size is very close to 10MB, say 9.97MB, the execution 
 blocks without any exception or warning. Worker finished the task to send the 
 serialized result, and then throw exception saying hadoop IPC client 
 connection stops (changing the logging to debug level). However, the master 
 never receives the results and the program just hangs.
 But if sizes for all the partitions less than some number btw 9.96MB amd 
 9.97MB, the program works fine.
 2) spark.akka.frameSize = 9
 when the partition size is just a little bit smaller than 9MB, it fails as 
 well.
 This bug behavior is not exactly what spark-1112 is about.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-07-15 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061804#comment-14061804
 ] 

DjvuLee commented on SPARK-2138:


[~piotrszul] In my opinion, if your task size if bigger than the 
akka.frameSize, it should work failed, because the akka.frameSize is the 
threshold. 

What I confused is that the serialized task size should not become bigger and 
bigger. In any opinion, it seem that the KMeans can work succeeded in the 
v1.0.1 version. In the v0.9.0 version, even you increase the akka.frameSize 
parameter, your kmeans job can not be done.

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-07-15 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061843#comment-14061843
 ] 

DjvuLee commented on SPARK-2138:


Oh, So can we improve this better? 

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-07-15 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062026#comment-14062026
 ] 

DjvuLee commented on SPARK-2138:


In my experiment, I set the akka.frameSize=200, my data is 1.8G, when the 
iteration become 18th, the size of serialized task is larger than 200, and the 
Spark application exit, so maybe we should improve this.

As for this issue, it actually not solved, although kmeans job can run success 
when set a large akka.framesize parameter. so I think keep this issue open is 
helpful.

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-07-15 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062163#comment-14062163
 ] 

DjvuLee commented on SPARK-2138:


oh, I am a little sorry that I write some mistaken in my last comment, 18 is 
not the iteration, it is the stage id.

[~mengxr] your speculate is right, after I read the code, I found it happens in 
the initialization stage indeed, with no relationship with iteration number. 
And if you have a higher cluster for the final result, the size of serialized 
task will become higher during initialization stage.

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-30 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048488#comment-14048488
 ] 

DjvuLee commented on SPARK-2138:


@[~piotrszul] Thanks for your test, I will also test this as soon as possible.
If I have any result, I will update on this issue.

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-27 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046597#comment-14046597
 ] 

DjvuLee commented on SPARK-2138:


If this bug fixed, shall we closed this issue? [~mengxr]

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-18 Thread DjvuLee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036892#comment-14036892
 ] 

DjvuLee commented on SPARK-2138:


Thanks very much!  I am very glad to see that I reported a real bug, not caused 
by my mistaken.

 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee
Assignee: Xiangrui Meng

 When the algorithm running at certain stage, when running the reduceBykey() 
 function, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-13 Thread DjvuLee (JIRA)
DjvuLee created SPARK-2138:
--

 Summary: The KMeans algorithm in the MLlib can lead to the 
Serialized Task size become bigger and bigger
 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.1, 0.9.0
Reporter: DjvuLee


When the algorithm running at certain stage, when running the reduceBykey() 
algorithm, It can lead to Executor Lost and Task lost, after several times. the 
application exit.

When this error occurred, the size of serialized task is bigger than 10MB, and 
the size become larger as the iteration increase.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

2014-06-13 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-2138:
---

Description: 
When the algorithm running at certain stage, when running the reduceBykey() 
algorithm, It can lead to Executor Lost and Task lost, after several times. the 
application exit.

When this error occurred, the size of serialized task is bigger than 10MB, and 
the size become larger as the iteration increase.

the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622

the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5

  was:
When the algorithm running at certain stage, when running the reduceBykey() 
algorithm, It can lead to Executor Lost and Task lost, after several times. the 
application exit.

When this error occurred, the size of serialized task is bigger than 10MB, and 
the size become larger as the iteration increase.



 The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
 bigger and bigger
 ---

 Key: SPARK-2138
 URL: https://issues.apache.org/jira/browse/SPARK-2138
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 0.9.0, 0.9.1
Reporter: DjvuLee

 When the algorithm running at certain stage, when running the reduceBykey() 
 algorithm, It can lead to Executor Lost and Task lost, after several times. 
 the application exit.
 When this error occurred, the size of serialized task is bigger than 10MB, 
 and the size become larger as the iteration increase.
 the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
 the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >