[jira] [Updated] (SPARK-5663) Delete appStagingDir on local file system

2015-02-07 Thread Weizhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong updated SPARK-5663:

Description: 
As we know, in yarn mode Client will create appStagingDir on file system, and 
AppMaster will delete this appStagingDir when it exit. If file system is HDFS 
then it will work OK.

As we know, to run Spark on Tachyon will create a core-site.xml on 
${SPARK_HOME}/conf, so when load core-site.xml will read 
${SPARK_HOME}/conf/core-site.xml, and in ${SPARK_HOME}/conf/core-site.xml don't 
set fs.defaultFS, so we will get local file system. So in yarn mode Client will 
create appStagingDir on local file system, and if Client and AppMaster are not 
in the same node, then the appStagingDir will not be deleted.

To solve this issue, we can do:
1. add fs.defaultFS setting to ${SPARK_HOME}/conf/core-site.xml so that when 
get file system will return HDFS
2. or cleanup appStagingDir while Client exit or stop.

  was:
As we know, in yarn mode Client will create appStagingDir on file system, and 
AppMaster will delete this appStagingDir when it exit. If file system is HDFS 
then it will work OK.

But if we don't add HADOOP_CONF_DIR to classpath, then default file system is 
local file system(Use FileSystem.get(conf) to get fs). So in yarn mode Client 
will create appStagingDir on local file system, and if Client and AppMaster are 
not in the same node, then the appStagingDir will not be deleted.


> Delete appStagingDir on local file system
> -
>
> Key: SPARK-5663
> URL: https://issues.apache.org/jira/browse/SPARK-5663
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Weizhong
>Priority: Minor
>
> As we know, in yarn mode Client will create appStagingDir on file system, and 
> AppMaster will delete this appStagingDir when it exit. If file system is HDFS 
> then it will work OK.
> As we know, to run Spark on Tachyon will create a core-site.xml on 
> ${SPARK_HOME}/conf, so when load core-site.xml will read 
> ${SPARK_HOME}/conf/core-site.xml, and in ${SPARK_HOME}/conf/core-site.xml 
> don't set fs.defaultFS, so we will get local file system. So in yarn mode 
> Client will create appStagingDir on local file system, and if Client and 
> AppMaster are not in the same node, then the appStagingDir will not be 
> deleted.
> To solve this issue, we can do:
> 1. add fs.defaultFS setting to ${SPARK_HOME}/conf/core-site.xml so that when 
> get file system will return HDFS
> 2. or cleanup appStagingDir while Client exit or stop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5663) Delete appStagingDir on local file system

2015-02-07 Thread Weizhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weizhong updated SPARK-5663:

Description: 
As we know, in yarn mode Client will create appStagingDir on file system, and 
AppMaster will delete this appStagingDir when it exit. If file system is HDFS 
then it will work OK.

As we know, to run Spark on Tachyon will create a core-site.xml on 
SPARK_HOME/conf, so when load core-site.xml will read 
SPARK_HOME/conf/core-site.xml, and in SPARK_HOME/conf/core-site.xml don't set 
fs.defaultFS, so we will get local file system. So in yarn mode Client will 
create appStagingDir on local file system, and if Client and AppMaster are not 
in the same node, then the appStagingDir will not be deleted.

To solve this issue, we can do:
1. add fs.defaultFS setting to SPARK_HOME/conf/core-site.xml so that when get 
file system will return HDFS
2. or cleanup appStagingDir while Client exit or stop.

  was:
As we know, in yarn mode Client will create appStagingDir on file system, and 
AppMaster will delete this appStagingDir when it exit. If file system is HDFS 
then it will work OK.

As we know, to run Spark on Tachyon will create a core-site.xml on 
${SPARK_HOME}/conf, so when load core-site.xml will read 
${SPARK_HOME}/conf/core-site.xml, and in ${SPARK_HOME}/conf/core-site.xml don't 
set fs.defaultFS, so we will get local file system. So in yarn mode Client will 
create appStagingDir on local file system, and if Client and AppMaster are not 
in the same node, then the appStagingDir will not be deleted.

To solve this issue, we can do:
1. add fs.defaultFS setting to ${SPARK_HOME}/conf/core-site.xml so that when 
get file system will return HDFS
2. or cleanup appStagingDir while Client exit or stop.


> Delete appStagingDir on local file system
> -
>
> Key: SPARK-5663
> URL: https://issues.apache.org/jira/browse/SPARK-5663
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Weizhong
>Priority: Minor
>
> As we know, in yarn mode Client will create appStagingDir on file system, and 
> AppMaster will delete this appStagingDir when it exit. If file system is HDFS 
> then it will work OK.
> As we know, to run Spark on Tachyon will create a core-site.xml on 
> SPARK_HOME/conf, so when load core-site.xml will read 
> SPARK_HOME/conf/core-site.xml, and in SPARK_HOME/conf/core-site.xml don't set 
> fs.defaultFS, so we will get local file system. So in yarn mode Client will 
> create appStagingDir on local file system, and if Client and AppMaster are 
> not in the same node, then the appStagingDir will not be deleted.
> To solve this issue, we can do:
> 1. add fs.defaultFS setting to SPARK_HOME/conf/core-site.xml so that when get 
> file system will return HDFS
> 2. or cleanup appStagingDir while Client exit or stop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310643#comment-14310643
 ] 

Apache Spark commented on SPARK-4550:
-

User 'sryza' has created a pull request for this issue:
https://github.com/apache/spark/pull/4450

> In sort-based shuffle, store map outputs in serialized form
> ---
>
> Key: SPARK-4550
> URL: https://issues.apache.org/jira/browse/SPARK-4550
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Critical
> Attachments: SPARK-4550-design-v1.pdf
>
>
> One drawback with sort-based shuffle compared to hash-based shuffle is that 
> it ends up storing many more java objects in memory.  If Spark could store 
> map outputs in serialized form, it could
> * spill less often because the serialized form is more compact
> * reduce GC pressure
> This will only work when the serialized representations of objects are 
> independent from each other and occupy contiguous segments of memory.  E.g. 
> when Kryo reference tracking is left on, objects may contain pointers to 
> objects farther back in the stream, which means that the sort can't relocate 
> objects without corrupting them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5664) Restore stty settings when exiting for launching spark-shell from SBT

2015-02-07 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-5664:
--

 Summary: Restore stty settings when exiting for launching 
spark-shell from SBT
 Key: SPARK-5664
 URL: https://issues.apache.org/jira/browse/SPARK-5664
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5664) Restore stty settings when exiting for launching spark-shell from SBT

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310678#comment-14310678
 ] 

Apache Spark commented on SPARK-5664:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/4451

> Restore stty settings when exiting for launching spark-shell from SBT
> -
>
> Key: SPARK-5664
> URL: https://issues.apache.org/jira/browse/SPARK-5664
> Project: Spark
>  Issue Type: Bug
>Reporter: Liang-Chi Hsieh
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4897) Python 3 support

2015-02-07 Thread Jimmy C (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310715#comment-14310715
 ] 

Jimmy C commented on SPARK-4897:


I'm very interested in using Spark in my projects, but the lack of Python 3 
support unfortunately makes this very difficult. I hope this ticket can be 
prioritized.

This has recently been brought up on Reddit as well 
https://www.reddit.com/r/Python/comments/2uz513/is_it_possible_to_use_apache_spark_with_python_3/

> Python 3 support
> 
>
> Key: SPARK-4897
> URL: https://issues.apache.org/jira/browse/SPARK-4897
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Josh Rosen
>Priority: Minor
>
> It would be nice to have Python 3 support in PySpark, provided that we can do 
> it in a way that maintains backwards-compatibility with Python 2.6.
> I started looking into porting this; my WIP work can be found at 
> https://github.com/JoshRosen/spark/compare/python3
> I was able to use the 
> [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
> tool to handle the basic conversion of things like {{print}} statements, etc. 
> and had to manually fix up a few imports for packages that moved / were 
> renamed, but the major blocker that I hit was {{cloudpickle}}:
> {code}
> [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
> Python 3.4.2 (default, Oct 19 2014, 17:52:17)
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Traceback (most recent call last):
>   File "/Users/joshrosen/Documents/Spark/python/pyspark/shell.py", line 28, 
> in 
> import pyspark
>   File "/Users/joshrosen/Documents/spark/python/pyspark/__init__.py", line 
> 41, in 
> from pyspark.context import SparkContext
>   File "/Users/joshrosen/Documents/spark/python/pyspark/context.py", line 26, 
> in 
> from pyspark import accumulators
>   File "/Users/joshrosen/Documents/spark/python/pyspark/accumulators.py", 
> line 97, in 
> from pyspark.cloudpickle import CloudPickler
>   File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 
> 120, in 
> class CloudPickler(pickle.Pickler):
>   File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 
> 122, in CloudPickler
> dispatch = pickle.Pickler.dispatch.copy()
> AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
> {code}
> This code looks like it will be hard difficult to port to Python 3, so this 
> might be a good reason to switch to 
> [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310753#comment-14310753
 ] 

Apache Spark commented on SPARK-4267:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4452

> Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later
> --
>
> Key: SPARK-4267
> URL: https://issues.apache.org/jira/browse/SPARK-4267
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Tsuyoshi OZAWA
>Priority: Blocker
>
> Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 
> uses protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this:
> {code}
>  ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn 
> -Dhadoop.version=2.5.1 -Dprotobuf.version=2.5.0
> {code}
> Then Spark on YARN fails to launch jobs with NPE.
> {code}
> $ bin/spark-shell --master yarn-client
> scala> sc.textFile("hdfs:///user/ozawa/wordcountInput20G").flatMap(line 
> => line.split(" ")).map(word => (word, 1)).persist().reduceByKey((a, b) => a 
> + b, 16).saveAsTextFile("hdfs:///user/ozawa/sparkWordcountOutNew2");
> java.lang.NullPointerException
>   
>   
> 
> at 
> org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284)
> at 
> org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291)   
>   
>   
>  
> at 
> org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480)
> at $iwC$$iwC$$iwC$$iwC.(:13)   
>   
>   
> 
> at $iwC$$iwC$$iwC.(:18)
> at $iwC$$iwC.(:20) 
>   
>   
> 
> at $iwC.(:22)
> at (:24)   
>   
>   
> 
> at .(:28)
> at .()   
>   
>   
> 
> at .(:7)
> at .()   
>   
>   
> 
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   
>   
> 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   
>   
>   
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) 
>   
>   
>  
> at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
> at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
>   
>   
>  
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)   
>   

[jira] [Updated] (SPARK-5616) Add examples for PySpark API

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5616:
-
Priority: Minor  (was: Major)
Target Version/s:   (was: 1.3.0)
   Fix Version/s: (was: 1.3.0)

> Add examples for PySpark API
> 
>
> Key: SPARK-5616
> URL: https://issues.apache.org/jira/browse/SPARK-5616
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: dongxu
>Priority: Minor
>  Labels: examples, pyspark, python
>
> PySpark API examples are less than Spark scala API. For example:  
> 1.Boardcast: how to use boardcast operation APi 
> 2.Module: how to import a other python file in zip file.
> Add more examples for freshman who wanna use PySpark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5408) MaxPermSize is ignored by ExecutorRunner and DriverRunner

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5408.
--
   Resolution: Fixed
Fix Version/s: (was: 1.2.1)
   (was: 1.3.0)
   1.4.0

Issue resolved by pull request 4203
[https://github.com/apache/spark/pull/4203]

> MaxPermSize is ignored by ExecutorRunner and DriverRunner
> -
>
> Key: SPARK-5408
> URL: https://issues.apache.org/jira/browse/SPARK-5408
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Jacek Lewandowski
> Fix For: 1.4.0
>
>
> ExecutorRunner and DriverRunner uses CommandUtils to build the command which 
> runs executor or driver. The problem is that it has hardcoded 
> {{-XX:MaxPermSize=128m}} and uses it regardless it is specified in 
> extraJavaOpts or not. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5408) MaxPermSize is ignored by ExecutorRunner and DriverRunner

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5408:
-
Priority: Minor  (was: Major)
Assignee: Jacek Lewandowski

> MaxPermSize is ignored by ExecutorRunner and DriverRunner
> -
>
> Key: SPARK-5408
> URL: https://issues.apache.org/jira/browse/SPARK-5408
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Minor
> Fix For: 1.4.0
>
>
> ExecutorRunner and DriverRunner uses CommandUtils to build the command which 
> runs executor or driver. The problem is that it has hardcoded 
> {{-XX:MaxPermSize=128m}} and uses it regardless it is specified in 
> extraJavaOpts or not. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5665) Update netlib-java documentation

2015-02-07 Thread Sean Owen (JIRA)
Sean Owen created SPARK-5665:


 Summary: Update netlib-java documentation
 Key: SPARK-5665
 URL: https://issues.apache.org/jira/browse/SPARK-5665
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.2.0
Reporter: Sean Owen
Priority: Minor


Sam Halliday has suggested some updates to the documentation of netlib-java: 
https://github.com/apache/spark/pull/4448  I opened this JIRA to track it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5665) Update netlib-java documentation

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310791#comment-14310791
 ] 

Apache Spark commented on SPARK-5665:
-

User 'fommil' has created a pull request for this issue:
https://github.com/apache/spark/pull/4448

> Update netlib-java documentation
> 
>
> Key: SPARK-5665
> URL: https://issues.apache.org/jira/browse/SPARK-5665
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Sean Owen
>Priority: Minor
>
> Sam Halliday has suggested some updates to the documentation of netlib-java: 
> https://github.com/apache/spark/pull/4448  I opened this JIRA to track it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-603) add simple Counter API

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-603:

Component/s: Spark Core

> add simple Counter API
> --
>
> Key: SPARK-603
> URL: https://issues.apache.org/jira/browse/SPARK-603
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Priority: Minor
>
> Users need a very simple way to create counters in their jobs.  Accumulators 
> provide a way to do this, but are a little clunky, for two reasons:
> 1) the setup is a nuisance
> 2) w/ delayed evaluation, you don't know when it will actually run, so its 
> hard to look at the values
> consider this code:
> {code}
> def filterBogus(rdd:RDD[MyCustomClass], sc: SparkContext) = {
>   val filterCount = sc.accumulator(0)
>   val filtered = rdd.filter{r =>
> if (isOK(r)) true else {filterCount += 1; false}
>   }
>   println("removed " + filterCount.value + " records)
>   filtered
> }
> {code}
> The println will always say 0 records were filtered, because its printed 
> before anything has actually run.  I could print out the value later on, but 
> note that it would destroy the modularity of the method -- kinda ugly to 
> return the accumulator just so that it can get printed later on.  (and of 
> course, the caller in turn might not know when the filter is going to get 
> applied, and would have to pass the accumulator up even further ...)
> I'd like to have Counters which just automatically get printed out whenever a 
> stage has been run, and also with some api to get them back.  I realize this 
> is tricky b/c a stage can get re-computed, so maybe you should only increment 
> the counters once.
> Maybe a more general way to do this is to provide some callback for whenever 
> an RDD is computed -- by default, you would just print the counters, but the 
> user could replace w/ a custom handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-573) Clarify semantics of the parallelized closures

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-573:

Component/s: Spark Core

> Clarify semantics of the parallelized closures
> --
>
> Key: SPARK-573
> URL: https://issues.apache.org/jira/browse/SPARK-573
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: tjhunter
>
> I do not think there is any guideline about which features of scala are 
> allowed/forbidden in the closure that gets sent to the remote nodes. Two 
> examples I have are a return statement and updating mutable variables of 
> singletons.
> Ideally, a compiler plugin could give an error at compile time, but a good 
> error message at run time would be good also.
> Are there any other cases that should not be allowed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-634) Track and display a read count for each block replica in BlockManager

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-634:

Component/s: Block Manager

> Track and display a read count for each block replica in BlockManager
> -
>
> Key: SPARK-634
> URL: https://issues.apache.org/jira/browse/SPARK-634
> Project: Spark
>  Issue Type: New Feature
>  Components: Block Manager
>Reporter: Reynold Xin
>Assignee: Patrick Cogan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-516) Improve error reporting when slaves fail to start

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-516:

Component/s: Spark Core

> Improve error reporting when slaves fail to start
> -
>
> Key: SPARK-516
> URL: https://issues.apache.org/jira/browse/SPARK-516
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Reynold Xin
>
> Currently Spark just hangs waiting for resources and slaves to respond. This 
> behavior is very confusing to users, especially first time users.
> If an error message is generated, it should be propagated back to the master 
> so the user is aware of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-625) Client hangs when connecting to standalone cluster using wrong address

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-625:

Component/s: Spark Core

> Client hangs when connecting to standalone cluster using wrong address
> --
>
> Key: SPARK-625
> URL: https://issues.apache.org/jira/browse/SPARK-625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.7.0, 0.7.1, 0.8.0
>Reporter: Josh Rosen
>Priority: Minor
>
> I launched a standalone cluster on my laptop, connecting the workers to the 
> master using my machine's public IP address (128.32.*.*:7077).  If I try to 
> connect spark-shell to the master using "spark://0.0.0.0:7077", it 
> successfully brings up a Scala prompt but hangs when I try to run a job.
> From the standalone master's log, it looks like the client's messages are 
> being dropped without the client discovering that the connection has failed:
> {code}
> 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message 
> RegisterJob(JobDescription(Spark shell)) for non-local recipient 
> akka://spark@0.0.0.0:7077/user/Master at akka://spark@128.32.*.*:7077 local 
> is akka://spark@128.32.*.*:7077
> 12/11/27 14:00:52 ERROR NettyRemoteTransport(null): dropping message 
> DaemonMsgWatch(Actor[akka://spark@128.32.*.*:57518/user/$a],Actor[akka://spark@0.0.0.0:7077/user/Master])
>  for non-local recipient akka://spark@0.0.0.0:7077/remote at 
> akka://spark@128.32.*.*:7077 local is akka://spark@128.32.*.*:7077
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-665) Create RPM packages for Spark

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-665:

Component/s: Build

> Create RPM packages for Spark
> -
>
> Key: SPARK-665
> URL: https://issues.apache.org/jira/browse/SPARK-665
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Matei Zaharia
>
> This could be doable with the JRPM Maven plugin, similar to how we make 
> Debian packages now, but I haven't looked into it. The plugin is described at 
> http://jrpm.sourceforge.net.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3603) InvalidClassException on a Linux VM - probably problem with serialization

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3603:
-
Component/s: Deploy

> InvalidClassException on a Linux VM - probably problem with serialization
> -
>
> Key: SPARK-3603
> URL: https://issues.apache.org/jira/browse/SPARK-3603
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
> Environment: Linux version 2.6.32-358.32.3.el6.x86_64 
> (mockbu...@x86-029.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-3) (GCC) ) #1 SMP Fri Jan 17 08:42:31 EST 2014
> java version "1.7.0_25"
> OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)
> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> Spark (either 1.0.0 or 1.1.0)
>Reporter: Tomasz Dudziak
>Priority: Critical
>  Labels: scala, serialization, spark
>
> I have a Scala app connecting to a standalone Spark cluster. It works fine on 
> Windows or on a Linux VM; however, when I try to run the app and the Spark 
> cluster on another Linux VM (the same Linux kernel, Java and Spark - tested 
> for versions 1.0.0 and 1.1.0) I get the below exception. This looks kind of 
> similar to the Big-Endian (IBM Power7) Spark Serialization issue 
> (SPARK-2018), but... my system is definitely little endian and I understand 
> the big endian issue should be already fixed in Spark 1.1.0 anyway. I'd 
> appreaciate your help.
> 01:34:53.251 WARN  [Result resolver thread-0][TaskSetManager] Lost TID 2 
> (task 1.0:2)
> 01:34:53.278 WARN  [Result resolver thread-0][TaskSetManager] Loss was due to 
> java.io.InvalidClassException
> java.io.InvalidClassException: scala.reflect.ClassTag$$anon$1; local class 
> incompatible: stream classdesc serialVersionUID = -4937928798201944954, local 
> class serialVersionUID = -8102093212602380348
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(N

[jira] [Commented] (SPARK-5531) Spark download .tgz file does not get unpacked

2015-02-07 Thread DeepakVohra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310796#comment-14310796
 ] 

DeepakVohra commented on SPARK-5531:


Earlier the following link was getting listed with Direct Download.
http://www.apache.org/dyn/closer.cgi/spark/spark-1.2.0/spark-1.2.0-bin-cdh4.tgz

Seems to have been updated to the tgz file link, which is fine. 

> Spark download .tgz file does not get unpacked
> --
>
> Key: SPARK-5531
> URL: https://issues.apache.org/jira/browse/SPARK-5531
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: Linux
>Reporter: DeepakVohra
>
> The spark-1.2.0-bin-cdh4.tgz file downloaded from 
> http://spark.apache.org/downloads.html does not get unpacked.
> tar xvf spark-1.2.0-bin-cdh4.tgz
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4122) Add library to write data back to Kafka

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4122:
-
Component/s: Streaming

> Add library to write data back to Kafka
> ---
>
> Key: SPARK-4122
> URL: https://issues.apache.org/jira/browse/SPARK-4122
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4503) The history server is not compatible with HDFS HA

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4503:
-
Component/s: Deploy

> The history server is not compatible with HDFS HA
> -
>
> Key: SPARK-4503
> URL: https://issues.apache.org/jira/browse/SPARK-4503
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: MarsXu
>Priority: Minor
>
>   I use a high availability of HDFS to store the history server data.
>   Can be written eventlog to HDFS , but history server cannot be started.
>   
>   Error log when execute "sbin/start-history-server.sh":
> {quote}
> 
> 14/11/20 10:25:04 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(root, ); users 
> with modify permissions: Set(root, )
> 14/11/20 10:25:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:187)
> at 
> org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
> Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> appcluster
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
> 
> {quote}
> When I set  SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://s161.zw.db.d:53310/spark_history">
>  in spark-evn.sh, can start, but no high availability.
> Environment
> {quote}
> spark-1.1.0-bin-hadoop2.4
> hadoop-2.5.1
> zookeeper-3.4.6
> {quote}
>   The config file is as follows:
> {quote}
> !### spark-defaults.conf ###
> spark.eventLog.dirhdfs://appcluster/history_server/
> spark.yarn.historyServer.addresss161.zw.db.d:18080
> !### spark-env.sh ###
> export 
> SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://appcluster/history_server"
> !### core-site.xml ###
> 
> fs.defaultFS
> hdfs://appcluster
> 
> !### hdfs-site.xml ###
> 
> dfs.nameservices
> appcluster
> 
> 
> dfs.ha.namenodes.appcluster
> nn1,nn2
> 
> 
> dfs.namenode.rpc-address.appcluster.nn1
> s161.zw.db.d:8020
> 
> 
> dfs.namenode.rpc-address.appcluster.nn2
> s162.zw.db.d:8020
> 
> 
> dfs.namenode.servicerpc-address.appcluster.nn1
> s161.zw.db.d:53310
> 
> 
> dfs.namenode.servicerpc-address.appcluster.nn2
> s162.zw.db.d:53310
> 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4206) BlockManager warnings in local mode: "Block $blockId already exists on this machine; not re-adding it

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4206:
-
Component/s: Block Manager

> BlockManager warnings in local mode: "Block $blockId already exists on this 
> machine; not re-adding it
> -
>
> Key: SPARK-4206
> URL: https://issues.apache.org/jira/browse/SPARK-4206
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
> Environment: local mode, branch-1.1 & master
>Reporter: Imran Rashid
>Priority: Minor
>
> When running in local mode, you often get log warning messages like:
> WARN storage.BlockManager: Block input-0-1415022975000 already exists on this 
> machine; not re-adding it
> (eg., try running the TwitterPopularTags example in local mode)
> I think these warning messages are pretty unsettling for a new user, and 
> should be removed.  If they are truly innocuous, they should be changed to 
> logInfo, or maybe even logDebug.  Or if they might actually indicate a 
> problem, we should find the root cause and fix it.
> I *think* the problem is caused by a replication level > 1 when running in 
> local mode.  In BlockManager.doPut, first the block is put locally:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L692
> and then if the replication level > 1, a request is sent out to replicate the 
> block:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L827
> However, in local mode, there isn't anywhere else to replicate the block; the 
> request comes back to the same node, which then issues the warning that the 
> block has already been added.
> If that analysis is right, the easy fix would be to make sure 
> replicationLevel = 1 in local mode.  But, its a little disturbing that a 
> replication request could result in an attempt to replicate on the same node 
> -- and that if something is wrong, we only issue a warning and keep going.
> If this really the culprit, then it might be worth taking a closer look at 
> the logic of replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4566) Multiple --py-files command line options to spark-submit replace instead of adding to previous options

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4566:
-
Component/s: Spark Submit

> Multiple --py-files command line options to spark-submit replace instead of 
> adding to previous options
> --
>
> Key: SPARK-4566
> URL: https://issues.apache.org/jira/browse/SPARK-4566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Reporter: Phil Roth
>Priority: Minor
>
> If multiple --py-files are specified to spark-submit, previous lists of files 
> are replaced instead of added to. This is certainly a minor issue, but it 
> cost me a lot of debugging time.
> If people want the current behavior to stay the same, I would suggest 
> updating the help messages to highlight that the suggested usage is one 
> option with a comma separated list of files.
> If people want this behavior updated, I'd love to submit a pull request in 
> the next day or two. I think it would be a perfect small task to get me 
> started as a contributor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4820) Spark build encounters "File name too long" on some encrypted filesystems

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4820:
-
Component/s: Build

> Spark build encounters "File name too long" on some encrypted filesystems
> -
>
> Key: SPARK-4820
> URL: https://issues.apache.org/jira/browse/SPARK-4820
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Patrick Wendell
>
> This was reported by Luchesar Cekov on github along with a proposed fix. The 
> fix has some potential downstream issues (it will modify the classnames) so 
> until we understand better how many users are affected we aren't going to 
> merge it. However, I'd like to include the issue and workaround here. If you 
> encounter this issue please comment on the JIRA so we can assess the 
> frequency.
> The issue produces this error:
> {code}
> [error] == Expanded type of tree ==
> [error] 
> [error] ConstantType(value = Constant(Throwable))
> [error] 
> [error] uncaught exception during compilation: java.io.IOException
> [error] File name too long
> [error] two errors found
> {code}
> The workaround is in maven under the compile options add: 
> {code}
> +  -Xmax-classfile-name
> +  128
> {code}
> In SBT add:
> {code}
> +scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5326) Show fetch wait time as optional metric in the UI

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5326:
-
Component/s: Web UI

> Show fetch wait time as optional metric in the UI
> -
>
> Key: SPARK-5326
> URL: https://issues.apache.org/jira/browse/SPARK-5326
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
>
> Time blocked waiting on shuffle read time can be a cause of slow jobs.  We 
> currently store this information but don't show it in the UI; we should add 
> it to the UI as an optional additional metric.
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5533) Replace explicit dependency on org.codehaus.jackson

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5533:
-
Component/s: Build

> Replace explicit dependency on org.codehaus.jackson
> ---
>
> Key: SPARK-5533
> URL: https://issues.apache.org/jira/browse/SPARK-5533
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>
> We should use the newer com.fasterxml.jackson, which we currently also 
> include and use as a dependency from Tachyon. Instead of having both versions 
> magically work, we should clean up the dependency structure to make sure we 
> only use one version of Jackson.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-636:

Component/s: Spark Core

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2610) When spark.serializer is set as org.apache.spark.serializer.KryoSerializer, importing a method causes multiple spark applications creations

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2610:
-
Component/s: Spark Shell

> When spark.serializer is set as org.apache.spark.serializer.KryoSerializer, 
> importing a method causes multiple spark applications creations  
> -
>
> Key: SPARK-2610
> URL: https://issues.apache.org/jira/browse/SPARK-2610
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.0.1
>Reporter: Yin Huai
>Priority: Minor
>
> To reproduce, set
> {code}
> spark.serializerorg.apache.spark.serializer.KryoSerializer
> {code}
> in conf/spark-defaults.conf and launch a spark shell.
> Then, execute
> {code}
> class X() { println("What!"); def y = 3 }
> val x = new X
> import x.y
> case class Person(name: String, age: Int)
> val serializer = org.apache.spark.serializer.Serializer.getSerializer(null)
> val kryoSerializer = serializer.newInstance
> val value = kryoSerializer.serialize(Person("abc", 1))
> kryoSerializer.deserialize(value): Person
> // Once you execute this line, you will see ...
> // What!
> // What!
> // res1: Person = Person(abc,1)
> {code}
> Basically, importing a method of a class causes the constructor of that class 
> been called twice.
> It affects our branch 1.0 and master.
> For the master, you can use 
> {code}
> val serializer = org.apache.spark.serializer.Serializer.getSerializer(None)
> {code}
> to get the serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3133) Piggyback get location RPC call to fetch small blocks

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3133:
-
Component/s: Block Manager

> Piggyback get location RPC call to fetch small blocks
> -
>
> Key: SPARK-3133
> URL: https://issues.apache.org/jira/browse/SPARK-3133
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We should add a new API to the BlockManagerMasterActor to get location or the 
> data block directly if the data block is small.
> This effectively makes TorrentBroadcast behaves similarly to HttpBroadcast 
> for small blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3210) Flume Polling Receiver must be more tolerant to connection failures.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3210:
-
Component/s: Streaming

> Flume Polling Receiver must be more tolerant to connection failures.
> 
>
> Key: SPARK-3210
> URL: https://issues.apache.org/jira/browse/SPARK-3210
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Hari Shreedharan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3153) shuffle will run out of space when disks have different free space

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3153:
-
Component/s: Shuffle

> shuffle will run out of space when disks have different free space
> --
>
> Key: SPARK-3153
> URL: https://issues.apache.org/jira/browse/SPARK-3153
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Reporter: Davies Liu
>
> If we have several disks in SPARK_LOCAL_DIRS, and one of them is much smaller 
> than others (maybe added in my mistake, or special disk, SSD), them the 
> shuffle will meet the problem of run out of space with this smaller disk.
> PySpark also has this issue during spilling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5531) Spark download .tgz file does not get unpacked

2015-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310811#comment-14310811
 ] 

Sean Owen commented on SPARK-5531:
--

No, the page didn't change: 
http://svn.apache.org/viewvc/spark/site/downloads.html?view=log
You maybe don't have javascript enabled, or are mistaken about the download 
link you copied.

> Spark download .tgz file does not get unpacked
> --
>
> Key: SPARK-5531
> URL: https://issues.apache.org/jira/browse/SPARK-5531
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: Linux
>Reporter: DeepakVohra
>
> The spark-1.2.0-bin-cdh4.tgz file downloaded from 
> http://spark.apache.org/downloads.html does not get unpacked.
> tar xvf spark-1.2.0-bin-cdh4.tgz
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3134) Update block locations asynchronously in TorrentBroadcast

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3134:
-
Component/s: Block Manager

> Update block locations asynchronously in TorrentBroadcast
> -
>
> Key: SPARK-3134
> URL: https://issues.apache.org/jira/browse/SPARK-3134
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager
>Reporter: Reynold Xin
>
> Once the TorrentBroadcast gets the data blocks, it needs to tell the master 
> the new location. We should make the location update non-blocking to reduce 
> roundtrips we need to launch tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1910) Add onBlockComplete API to receiver

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1910:
-
Component/s: Block Manager

> Add onBlockComplete API to receiver
> ---
>
> Key: SPARK-1910
> URL: https://issues.apache.org/jira/browse/SPARK-1910
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Reporter: Hari Shreedharan
>
> This can allow the receiver to ACK all data that has already been 
> successfully stored by the block generator. This means the receiver's store 
> methods must now receive the block Id, so the receiver can recognize which 
> events are the ones that have been stored



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1799) Add init script to the debian packaging

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1799:
-
Component/s: Deploy
 Build

> Add init script to the debian packaging
> ---
>
> Key: SPARK-1799
> URL: https://issues.apache.org/jira/browse/SPARK-1799
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Deploy
>Reporter: Nicolas Lalevée
>
> See https://github.com/apache/spark/pull/733



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2018) Big-Endian (IBM Power7) Spark Serialization issue

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2018:
-
Component/s: Deploy

> Big-Endian (IBM Power7)  Spark Serialization issue
> --
>
> Key: SPARK-2018
> URL: https://issues.apache.org/jira/browse/SPARK-2018
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
> Environment: hardware : IBM Power7
> OS:Linux version 2.6.32-358.el6.ppc64 
> (mockbu...@ppc-017.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-3) (GCC) ) #1 SMP Tue Jan 29 11:43:27 EST 2013
> JDK: Java(TM) SE Runtime Environment (build pxp6470sr5-20130619_01(SR5))
> IBM J9 VM (build 2.6, JRE 1.7.0 Linux ppc64-64 Compressed References 
> 20130617_152572 (JIT enabled, AOT enabled)
> Hadoop:Hadoop-0.2.3-CDH5.0
> Spark:Spark-1.0.0 or Spark-0.9.1
> spark-env.sh:
> export JAVA_HOME=/opt/ibm/java-ppc64-70/
> export SPARK_MASTER_IP=9.114.34.69
> export SPARK_WORKER_MEMORY=1m
> export SPARK_CLASSPATH=/home/test1/spark-1.0.0-bin-hadoop2/lib
> export  STANDALONE_SPARK_MASTER_HOST=9.114.34.69
> #export SPARK_JAVA_OPTS=' -Xdebug 
> -Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n '
>Reporter: Yanjie Gao
>
> We have an application run on Spark on Power7 System .
> But we meet an important issue about serialization.
> The example HdfsWordCount can meet the problem.
> ./bin/run-example  org.apache.spark.examples.streaming.HdfsWordCount 
> localdir
> We used Power7 (Big-Endian arch) and Redhat  6.4.
> Big-Endian  is the main cause since the example ran successfully in another 
> Power-based Little Endian setup.
> here is the exception stack and log:
> Spark Executor Command: "/opt/ibm/java-ppc64-70//bin/java" "-cp" 
> "/home/test1/spark-1.0.0-bin-hadoop2/lib::/home/test1/src/spark-1.0.0-bin-hadoop2/conf:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/test1/src/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/:/home/test1/src/hadoop-2.3.0-cdh5.0.0/etc/hadoop/"
>  "-XX:MaxPermSize=128m"  "-Xdebug" 
> "-Xrunjdwp:transport=dt_socket,address=9,server=y,suspend=n" "-Xms512M" 
> "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" 
> "akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler" "2" 
> "p7hvs7br16" "4" "akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker" 
> "app-20140604023054-"
> 
> 14/06/04 02:31:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 14/06/04 02:31:21 INFO spark.SecurityManager: Changing view acls to: 
> test1,yifeng
> 14/06/04 02:31:21 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(test1, yifeng)
> 14/06/04 02:31:22 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 14/06/04 02:31:22 INFO Remoting: Starting remoting
> 14/06/04 02:31:22 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkExecutor@p7hvs7br16:39658]
> 14/06/04 02:31:22 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sparkExecutor@p7hvs7br16:39658]
> 14/06/04 02:31:22 INFO executor.CoarseGrainedExecutorBackend: Connecting to 
> driver: akka.tcp://spark@9.186.105.141:60253/user/CoarseGrainedScheduler
> 14/06/04 02:31:22 INFO worker.WorkerWatcher: Connecting to worker 
> akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker
> 14/06/04 02:31:23 INFO worker.WorkerWatcher: Successfully connected to 
> akka.tcp://sparkWorker@p7hvs7br16:59240/user/Worker
> 14/06/04 02:31:24 INFO executor.CoarseGrainedExecutorBackend: Successfully 
> registered with driver
> 14/06/04 02:31:24 INFO spark.SecurityManager: Changing view acls to: 
> test1,yifeng
> 14/06/04 02:31:24 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(test1, yifeng)
> 14/06/04 02:31:24 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 14/06/04 02:31:24 INFO Remoting: Starting remoting
> 14/06/04 02:31:24 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://spark@p7hvs7br16:58990]
> 14/06/04 02:31:24 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://spark@p7hvs7br16:58990]
> 14/06/04 02:31:24 INFO spark.SparkEnv: Connecting to MapOutputTracker: 
> akka.tcp://spark@9.186.105.141:60253/user/MapOutputTracker
> 14/06/04 02:31:25 INFO spark.SparkEnv: Connecting to BlockManagerMaster: 
> akka.tcp://spark@9.186.105.141:60253/user/BlockManagerMaster
> 14/06/04 02:31:25 INFO storage.DiskBlockMa

[jira] [Resolved] (SPARK-1742) Profiler for Spark

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1742.
--
Resolution: Not a Problem

It's since been documented how to use standard profiling tools with Spark:

https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit

> Profiler for Spark
> --
>
> Key: SPARK-1742
> URL: https://issues.apache.org/jira/browse/SPARK-1742
> Project: Spark
>  Issue Type: Wish
>Reporter: Kousuke Saruta
>
> Sometimes, I hope there were the profiler for Spark Job.
> I think it's useful to find critical path of DAG or  find the bottle-necked 
> Stage / transformations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1980) problems introduced by broadcast

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1980.
--
  Resolution: Invalid
Target Version/s:   (was: 1.0.0)

> problems introduced by broadcast
> 
>
> Key: SPARK-1980
> URL: https://issues.apache.org/jira/browse/SPARK-1980
> Project: Spark
>  Issue Type: Bug
>Reporter: zhoudi
>
> i am writing a word embedding on SPARK. The scale of the model is about 60w * 
> 100  * Float.size. Because of the large scale, I have to use broadcast to 
> deliver the current model to executors. After each iteration, I would update 
> the model, and then broadcast it again. The pseudo-code is as follows,
> for (i <- 0 to 100) {
> broadcast_model <- broadcast(model);   
> e_model = xxx.map(Func(broadcast_model)) // handle of broadcast_model to 
> Func;   
>   .reduce(_ + _) 
> model <- model + e_model // Update the model
> }
> My problem is that an Error would come out after six iteration. The Error 
> Info is as follows,
> ./bin/spark-submit: line 44: 28232 killed
> $SPARK_HOME/bin/spark-class org.apache.spark.deploy.Spark "${ORIG_ARGS[@]}"   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2016:
-
Component/s: Web UI

> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> 
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Reporter: Reynold Xin
>  Labels: starter
>
> Try run
> {code}
> sc.parallelize(1 to 100, 100).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2690) Make unidoc part of our test process

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2690:
-
Component/s: Documentation
 Build

> Make unidoc part of our test process
> 
>
> Key: SPARK-2690
> URL: https://issues.apache.org/jira/browse/SPARK-2690
> Project: Spark
>  Issue Type: Test
>  Components: Build, Documentation
>Reporter: Yin Huai
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3511) Create a RELEASE-NOTES.txt file in the repo

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3511:
-
Component/s: Project Infra

> Create a RELEASE-NOTES.txt file in the repo
> ---
>
> Key: SPARK-3511
> URL: https://issues.apache.org/jira/browse/SPARK-3511
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
>
> There are a few different things we need to do a better job of tracking. This 
> file would allow us to track things:
> 1. When we want to give credit to secondary people for contributing to a patch
> 2. Changes to default configuration values w/ how to restore legacy options
> 3. New features that are disabled by default
> 4. Known API breaks (if any) along w/ explanation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-911) Support map pruning on sorted (K, V) RDD's

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-911:

Component/s: Spark Core

> Support map pruning on sorted (K, V) RDD's
> --
>
> Key: SPARK-911
> URL: https://issues.apache.org/jira/browse/SPARK-911
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>
> If someone has sorted a (K, V) rdd, we should offer them a way to filter a 
> range of the partitions that employs map pruning. This would be simple using 
> a small range index within the rdd itself. A good example is I sort my 
> dataset by time and then I want to serve queries that are restricted to a 
> certain time range.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3750) Log ulimit settings at warning if they are too low

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3750:
-
Component/s: Deploy

> Log ulimit settings at warning if they are too low
> --
>
> Key: SPARK-3750
> URL: https://issues.apache.org/jira/browse/SPARK-3750
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: Andrew Ash
>
> In recent versions of Spark the shuffle implementation is much more 
> aggressive about writing many files out to disk at once.  Most linux kernels 
> have a default limit in the number of open files per process, and Spark can 
> exhaust this limit.  The current hash-based shuffle implementation requires 
> as many files as the product of the map and reduce partition counts in a wide 
> dependency.
> In order to reduce the errors we're seeing on the user list, we should 
> determine a value that is considered "too low" for normal operations and log 
> a warning on executor startup when that value isn't met.
> 1. determine what ulimit is acceptable
> 2. log when that value isn't met



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1182) Sort the configuration parameters in configuration.md

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1182:
-
Component/s: Documentation

> Sort the configuration parameters in configuration.md
> -
>
> Key: SPARK-1182
> URL: https://issues.apache.org/jira/browse/SPARK-1182
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Reporter: Reynold Xin
>Assignee: prashant
>Priority: Minor
>
> It is a little bit confusing right now since the config options are all over 
> the place in some arbitrarily sorted order.
> https://github.com/apache/spark/blob/master/docs/configuration.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1762) Add functionality to pin RDDs in cache

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1762:
-
Component/s: Spark Core

> Add functionality to pin RDDs in cache
> --
>
> Key: SPARK-1762
> URL: https://issues.apache.org/jira/browse/SPARK-1762
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> Right now, all RDDs are created equal, and there is no mechanism to identify 
> a certain RDD to be more important than the rest. This is a problem if the 
> RDD fraction is small, because just caching a few RDDs can evict more 
> important ones.
> A side effect of this feature is that we can now more safely allocate a 
> smaller spark.storage.memoryFraction if we know how large our important RDDs 
> are, without having to worry about them being evicted. This allows us to use 
> more memory for shuffles, for instance, and avoid disk spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-650:

Component/s: Spark Core

> Add a "setup hook" API for running initialization code on each executor
> ---
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-746) Automatically Use Avro Serialization for Avro Objects

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-746:

Component/s: Spark Core

> Automatically Use Avro Serialization for Avro Objects
> -
>
> Key: SPARK-746
> URL: https://issues.apache.org/jira/browse/SPARK-746
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Cogan
>
> All generated objects extend org.apache.avro.specific.SpecificRecordBase (or 
> there may be a higher up class as well).
> Since Avro records aren't JavaSerializable by default people currently have 
> to wrap their records. It would be good if we could use an implicit 
> conversion to do this for them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-913) log the size of each shuffle block in block manager

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-913:

Component/s: Block Manager

> log the size of each shuffle block in block manager
> ---
>
> Key: SPARK-913
> URL: https://issues.apache.org/jira/browse/SPARK-913
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Reynold Xin
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4721) Improve first thread to put block failed

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4721:
-
Component/s: Block Manager

> Improve first thread to put block failed
> 
>
> Key: SPARK-4721
> URL: https://issues.apache.org/jira/browse/SPARK-4721
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: SuYan
>
> In current code, it assumes that multi-thread try to put same blockID block 
> in blockManager, the thread that first put info in blockinfos to do the put 
> process, and others will wait until the put in failed or success.
> it's ok in put success, but if fails, have some problem:
> 1. the failed thread will remove info from blockinfo
> 2. other threads wake up, and use the old info.synchronized to try put
> 3. and if success, mark success will tell not in pending status, and “mark 
> success” failed. all other remaining threads will do the same thing: got 
> info.syn and mark success or failed even that have one success.
> first, I can't understand why remove info from blockinfos while there have 
> other threads was wait. the comment tell us is for other threads to create 
> new block info. but block info is just a ID and level, use the old one and 
> the new one is doesn't matters if there any waits threads.
> second, how about if there first threads is failed, other waits thread can do 
> the same process one by one but need less than all .
> or just if first thread is failed, all other threads log a warning and return 
> after waking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-578) Fix interpreter code generation to only capture needed dependencies

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-578:

Component/s: Spark Core

> Fix interpreter code generation to only capture needed dependencies
> ---
>
> Key: SPARK-578
> URL: https://issues.apache.org/jira/browse/SPARK-578
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-720) Statically guarantee serialization will succeed

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-720:

Component/s: Spark Core

> Statically guarantee serialization will succeed
> ---
>
> Key: SPARK-720
> URL: https://issues.apache.org/jira/browse/SPARK-720
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 0.7.1
>Reporter: Eric Christiansen
>
> First, thanks for developing Spark. It's great.
> Maybe I'm trying to serialize weird objects (eg Shapeless constructs), but I 
> tend to get quite a few NotSerializableExceptions. These are pretty annoying 
> because they happen at runtime, lengthening my code/debug cycle. 
> I'd like it if Spark could introduce a serialization system that could 
> statically check that serialization will succeed. One approach is to use 
> typeclasses, perhaps using Spray-Json as inspiration. An added benefit of 
> typeclasses is they can be used to serialize objects that were not originally 
> intended to be serialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2555:
-
Component/s: (was: Mesos)
 (was: Spark Core)
 Scheduler

> Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos 
> mode.
> 
>
> Key: SPARK-2555
> URL: https://issues.apache.org/jira/browse/SPARK-2555
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 1.0.0
>Reporter: Zhihui
>
> In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
> introduced, but it only support  Standalone and Yarn mode.
> This is try to introduce the configuration to Mesos mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-864) DAGScheduler Exception if A Node is Added then Deleted

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-864:

Component/s: Scheduler

> DAGScheduler Exception if A Node is Added then Deleted
> --
>
> Key: SPARK-864
> URL: https://issues.apache.org/jira/browse/SPARK-864
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.8.0
>Reporter: Patrick Cogan
>Assignee: xiajunluan
>
> According to [~markhamstra], if you run the UI tester locally and remove a 
> slave, then add another slave, everything freezes. UPDATE: This appears to be 
> caused by the DAGScheduler:
> {code}
> Exception in thread "DAGScheduler" java.util.NoSuchElementException: key not 
> found: 2
>   at scala.collection.MapLike$class.default(MapLike.scala:225)
>   at scala.collection.mutable.HashMap.default(HashMap.scala:45)
>   at scala.collection.MapLike$class.apply(MapLike.scala:135)
>   at scala.collection.mutable.HashMap.apply(HashMap.scala:45)
>   at 
> spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:515)
>   at 
> spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:481)
>   at 
> spark.scheduler.DAGScheduler$$anonfun$resubmitFailedStages$3.apply(DAGScheduler.scala:383)
>   at 
> spark.scheduler.DAGScheduler$$anonfun$resubmitFailedStages$3.apply(DAGScheduler.scala:382)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
>   at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)
>   at 
> spark.scheduler.DAGScheduler.resubmitFailedStages(DAGScheduler.scala:382)
>   at 
> spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:433)
>   at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:135)
> {code}
> This code is related to the FairScheduler change. Hey [~andrew xia] - could 
> you take a look at this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-794) Remove sleep() in ClusterScheduler.stop

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-794:

Component/s: (was: Spark Core)
 Scheduler

> Remove sleep() in ClusterScheduler.stop
> ---
>
> Key: SPARK-794
> URL: https://issues.apache.org/jira/browse/SPARK-794
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.9.0
>Reporter: Matei Zaharia
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> This temporary change made a while back slows down the unit tests quite a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4654) Clean up DAGScheduler's getMissingParentStages() and stageDependsOn() methods

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4654:
-
Component/s: (was: Spark Core)
 Scheduler

> Clean up DAGScheduler's getMissingParentStages() and stageDependsOn() methods
> -
>
> Key: SPARK-4654
> URL: https://issues.apache.org/jira/browse/SPARK-4654
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> DAGScheduler has {{getMissingParentStages()}} and {{stageDependsOn()}} 
> methods, which are suspiciously similar to {{getParentStages()}}.  All of 
> these methods perform traversal of the RDD / Stage graph to inspect parent 
> stages.  We can remove both of these methods, though: the set of parent 
> stages is known when a {{Stage}} instance is constructed and is already 
> stored in {{Stage.parents}}, so we can just check for missing stages by 
> looking for unavailable stages in {{Stage.parents}}.  Similarly, we can 
> determine whether one stage depends on another by searching {{Stage.parents}} 
> rather than performing the entire graph traversal from scratch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4949) shutdownCallback in SparkDeploySchedulerBackend should be enclosed by synchronized block.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4949:
-
Component/s: (was: Spark Core)
 Scheduler

> shutdownCallback in SparkDeploySchedulerBackend should be enclosed by 
> synchronized block.
> -
>
> Key: SPARK-4949
> URL: https://issues.apache.org/jira/browse/SPARK-4949
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>
> A variable `shutdownCallback` in SparkDeploySchedulerBackend can be accessed 
> from multiple threads so it should be enclosed by synchronized block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-985) Support Job Cancellation on Mesos Scheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-985:

Component/s: (was: Mesos)
 Scheduler

> Support Job Cancellation on Mesos Scheduler
> ---
>
> Key: SPARK-985
> URL: https://issues.apache.org/jira/browse/SPARK-985
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 0.9.0
>Reporter: Josh Rosen
>
> https://github.com/apache/incubator-spark/pull/29 added job cancellation but 
> may still need support for Mesos scheduler backends:
> Quote: 
> {quote}
> This looks good except that MesosSchedulerBackend isn't yet calling Mesos's 
> killTask. Do you want to add that too or are you planning to push it till 
> later? I don't think it's a huge change.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1697) Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to java.io.FileNotFoundException

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1697:
-
Component/s: Scheduler

> Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to 
> java.io.FileNotFoundException
> --
>
> Key: SPARK-1697
> URL: https://issues.apache.org/jira/browse/SPARK-1697
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: Arup Malakar
>
> We are running spark-streaming 0.9.0 on top of Yarn (Hadoop 
> 2.2.0-cdh5.0.0-beta-2). It reads from kafka and processes the data. So far we 
> haven't seen any issues, except today we saw an exception in the driver log 
> and it is not consuming kafka messages any more. 
> Here is the exception we saw:
> {code}
> 2014-05-01 10:00:43,962 [Result resolver thread-3] WARN  
> org.apache.spark.scheduler.TaskSetManager - Loss was due to 
> java.io.FileNotFoundException
> java.io.FileNotFoundException: http://10.50.40.85:53055/broadcast_2412
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
>   at 
> org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
>   at 
> org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
>   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   

[jira] [Updated] (SPARK-4957) TaskScheduler: when no resources are available: backoff after # of tries and crash.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4957:
-
Component/s: (was: Spark Core)
 Scheduler

> TaskScheduler: when no resources are available: backoff after # of tries and 
> crash.
> ---
>
> Key: SPARK-4957
> URL: https://issues.apache.org/jira/browse/SPARK-4957
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 1.1.0
>Reporter: Nathan Bijnens
>  Labels: scheduler
>
> Currently the TaskSchedulerImpl retries scheduling if there are no resources 
> available. Unfortunately it keeps retrying with a small delay. It would make 
> sense to throw an exception after a number of tries, instead of hanging 
> indefinitely. 
> https://github.com/apache/spark/blob/cb0eae3b78d7f6f56c0b9521ee48564a4967d3de/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L164-175



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1715) Ensure actor is self-contained in DAGScheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1715:
-
Component/s: Scheduler

> Ensure actor is self-contained in DAGScheduler
> --
>
> Key: SPARK-1715
> URL: https://issues.apache.org/jira/browse/SPARK-1715
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: Nan Zhu
>Assignee: Nan Zhu
>
> Though the current supervisor-child structure works fine for fault-tolerance, 
> it violates the basic rule that the actor is better to be self-contained
> We should forward the message from supervisor to the child actor, so that we 
> can eliminate the hard-coded timeout threshold for starting the DAGScheduler 
> and provide more convenient interface for future development like parallel 
> DAGScheduler, or new changes to the DAGScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-823) spark.default.parallelism's default is inconsistent across scheduler backends

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-823:

Component/s: (was: Documentation)
 (was: PySpark)
 (was: Spark Core)
 Scheduler

> spark.default.parallelism's default is inconsistent across scheduler backends
> -
>
> Key: SPARK-823
> URL: https://issues.apache.org/jira/browse/SPARK-823
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.8.0, 0.7.3, 0.9.1
>Reporter: Josh Rosen
>Priority: Minor
>
> The [0.7.3 configuration 
> guide|http://spark-project.org/docs/latest/configuration.html] says that 
> {{spark.default.parallelism}}'s default is 8, but the default is actually 
> max(totalCoreCount, 2) for the standalone scheduler backend, 8 for the Mesos 
> scheduler, and {{threads}} for the local scheduler:
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala#L157
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/mesos/MesosSchedulerBackend.scala#L317
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/local/LocalScheduler.scala#L150
> Should this be clarified in the documentation?  Should the Mesos scheduler 
> backend's default be revised?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4454:
-
Component/s: Scheduler

> Race condition in DAGScheduler
> --
>
> Key: SPARK-4454
> URL: https://issues.apache.org/jira/browse/SPARK-4454
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.1.0
>Reporter: Rafal Kwasny
>Priority: Minor
>
> It seems to be a race condition in DAGScheduler that manifests on jobs with 
> high concurrency:
> {noformat}
>  Exception in thread "main" java.util.NoSuchElementException: key not found: 
> 35
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:58)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> at 
> org.apache.spark.scheduler.DAGScheduler.getCacheLocs(DAGScheduler.scala:201)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1292)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1307)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1306)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1306)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1304)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1304)
> at 
> org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1275)
> at 
> org.apache.spark.SparkContext.getPreferredLocs(SparkContext.scala:937)
> at 
> org.apache.spark.rdd.PartitionCoalescer.currPrefLocs(CoalescedRDD.scala:175)
> at

[jira] [Updated] (SPARK-5374) abstract RDD's DAG graph iteration in DAGScheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5374:
-
Component/s: (was: Spark Core)
 Scheduler

> abstract RDD's DAG graph iteration in DAGScheduler
> --
>
> Key: SPARK-5374
> URL: https://issues.apache.org/jira/browse/SPARK-5374
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Wenchen Fan
>
> DAGScheduler has many methods that iterate an RDD's DAG graph, we should 
> abstract the iterate process to reduce code size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2456) Scheduler refactoring

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2456:
-
Component/s: (was: Spark Core)
 Scheduler

> Scheduler refactoring
> -
>
> Key: SPARK-2456
> URL: https://issues.apache.org/jira/browse/SPARK-2456
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Reynold Xin
>
> This is an umbrella ticket to track scheduler refactoring. We want to clearly 
> define semantics and responsibilities of each component, and define explicit 
> public interfaces for them so it is easier to understand and to contribute 
> (also less buggy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1928) DAGScheduler suspended by local task OOM

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1928:
-
Component/s: (was: Spark Core)
 Scheduler

> DAGScheduler suspended by local task OOM
> 
>
> Key: SPARK-1928
> URL: https://issues.apache.org/jira/browse/SPARK-1928
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.9.0
>Reporter: Peng Zhen
>
> DAGScheduler does not handle local task OOM properly, and will wait for the 
> job result forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-790) Implement the reregistered() callback in MesosScheduler to support master failover

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-790:

Component/s: (was: Spark Core)
 Scheduler

> Implement the reregistered() callback in MesosScheduler to support master 
> failover
> --
>
> Key: SPARK-790
> URL: https://issues.apache.org/jira/browse/SPARK-790
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-912) Take stage breakdown functionality and runLocally out of the main event loop in DAGScheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-912:

Component/s: Scheduler

> Take stage breakdown functionality and runLocally out of the main event loop 
> in DAGScheduler
> 
>
> Key: SPARK-912
> URL: https://issues.apache.org/jira/browse/SPARK-912
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 0.8.0
>Reporter: Reynold Xin
>
> This can reduce the complexity of the main event loop and improve performance 
> (since the main event loop is single threaded).
> We can also take the result task deserialization code out of the main loop 
> (maybe Kay is already working on this?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4653) DAGScheduler refactoring and cleanup

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4653:
-
Component/s: (was: Spark Core)
 Scheduler

> DAGScheduler refactoring and cleanup
> 
>
> Key: SPARK-4653
> URL: https://issues.apache.org/jira/browse/SPARK-4653
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> This is an umbrella JIRA for DAGScheduler refactoring and cleanup.  Please 
> comment or open sub-issues if you have refactoring suggestions that should 
> fall under this umbrella.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4962) Put TaskScheduler.start back in SparkContext to shorten cluster resources occupation period

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4962:
-
Component/s: (was: Spark Core)
 Scheduler

> Put TaskScheduler.start back in SparkContext to shorten cluster resources 
> occupation period
> ---
>
> Key: SPARK-4962
> URL: https://issues.apache.org/jira/browse/SPARK-4962
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: YanTang Zhai
>Priority: Minor
>
> When SparkContext object is instantiated, TaskScheduler is started and some 
> resources are allocated from cluster. However, these
> resources may be not used for the moment. For example, 
> DAGScheduler.JobSubmitted is processing and so on. These resources are wasted 
> in
> this period. Thus, we want to put TaskScheduler.start back to shorten cluster 
> resources occupation period specially for busy cluster.
> TaskScheduler could be started just before running stages.
> We could analyse and compare the  resources occupation period before and 
> after optimization.
> TaskScheduler.start execution time: [time1__]
> DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or 
> TaskScheduler.start) execution time: [time2_]
> HadoopRDD.getPartitions execution time: [time3___]
> Stages execution time: [time4_]
> The cluster resources occupation period before optimization is 
> [time2_][time3___][time4_].
> The cluster resources occupation period after optimization 
> is[time3___][time4_].
> In summary, the cluster resources
> occupation period after optimization is less than before.
> If HadoopRDD.getPartitions could be put forward (SPARK-4961), the period may 
> be shorten more which is [time4_].
> The resources saving is important for busy cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2581) complete or withdraw visitedStages optimization in DAGScheduler’s stageDependsOn

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2581:
-
Component/s: (was: Spark Core)
 Scheduler

> complete or withdraw visitedStages optimization in DAGScheduler’s 
> stageDependsOn
> 
>
> Key: SPARK-2581
> URL: https://issues.apache.org/jira/browse/SPARK-2581
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Aaron Staple
>Priority: Minor
>
> Right now the visitedStages HashSet is populated with stages, but never 
> queried to limit examination of previously visited stages.  It may make sense 
> to check whether a mapStage has been visited previously before visiting it 
> again, as in the nearby visitedRdds check.  Or it may be that the existing 
> visitedRdds check sufficiently optimizes this function, and visitedStages can 
> simply be removed.
> See discussion here: 
> https://github.com/apache/spark/pull/1362#discussion-diff-15018046L1107



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3419) Scheduler shouldn't delay running a task when executors don't reside at any of its preferred locations

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3419:
-
Component/s: (was: Spark Core)
 Scheduler

> Scheduler shouldn't delay running a task when executors don't reside at any 
> of its preferred locations 
> ---
>
> Key: SPARK-3419
> URL: https://issues.apache.org/jira/browse/SPARK-3419
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4732) All application progress on the standalone scheduler can be halted by one systematically faulty node

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4732:
-
Component/s: (was: Spark Core)
 Scheduler

> All application progress on the standalone scheduler can be halted by one 
> systematically faulty node
> 
>
> Key: SPARK-4732
> URL: https://issues.apache.org/jira/browse/SPARK-4732
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.1.0, 1.2.0
> Environment:  - Spark Standalone scheduler
>Reporter: Harry Brundage
>
> We've experienced several cluster wide outages caused by unexpected system 
> wide faults on one of our spark workers if that worker is failing 
> systematically. By systematically, I mean that every executor launched by 
> that worker will definitely fail due to some reason out of Spark's control 
> like the log directory disk being completely out of space, or a permissions 
> error for a file that's always read during executor launch. We screw up all 
> the time on our team and cause stuff like this to happen, but because of the 
> way the standalone scheduler allocates resources, our cluster doesn't recover 
> gracefully from these failures. 
> When there are more tasks to do than executors, I am pretty sure the way the 
> scheduler works is that it just waits for more resource offers and then 
> allocates tasks from the queue to those resources. If an executor dies 
> immediately after starting, the worker monitor process will notice that it's 
> dead. The master will allocate that worker's now free cores/memory to a 
> currently running application that is below its spark.cores.max, which in our 
> case I've observed as usually the app that just had the executor die. A new 
> executor gets spawned on the same worker that the last one just died on, gets 
> allocated that one task that failed, and then the whole process fails again 
> for the same systematic reason, and lather rinse repeat. This happens 10 
> times or whatever the max task failure count is, and then the whole app is 
> deemed a failure by the driver and shut down completely.
> This happens to us for all applications in the cluster as well. We usually 
> run roughly as many cores as we have hadoop nodes. We also usually have many 
> more input splits than we have tasks, which means the locality of the first 
> few tasks which I believe determines where our executors run is well spread 
> out over the cluster, and often covers 90-100% of nodes. This means the 
> likelihood of any application getting an executor scheduled any broken node 
> is quite high. After an old application goes through the above mentioned 
> process and dies, the next application to start or not be at it's requested 
> max capacity gets an executor scheduled on the broken node, and is promptly 
> taken down as well. This happens over and over as well, to the point where 
> none of our spark jobs are making any progress because of one tiny 
> permissions mistake on one node.
> Now, I totally understand this is usually an "error between keyboard and 
> screen" kind of situation where it is the responsibility of the people 
> deploying spark to ensure it is deployed correctly. The systematic issues 
> we've encountered are almost always of this nature: permissions errors, disk 
> full errors, one node not getting a new spark jar from a configuration error, 
> configurations being out of sync, etc. That said, disks are going to fail or 
> half fail, fill up, node rot is going to ruin configurations, etc etc etc, 
> and as hadoop clusters scale in size this becomes more and more likely, so I 
> think its reasonable to ask that Spark be resilient to this kind of failure 
> and keep on truckin'. 
> I think a good simple fix would be to have applications, or the master, 
> blacklist workers (not executors) at a failure count lower than the task 
> failure count. This would also serve as a belt and suspenders fix for 
> SPARK-4498.
>  If the scheduler stopped trying to schedule on nodes that fail a lot, we 
> could still make progress. These blacklist events are really important and I 
> think would need to be well logged and surfaced in the UI, but I'd rather log 
> and carry on than fail hard. I think the tradeoff here is that you risk 
> blacklisting ever worker as well if there is something systematically wrong 
> with communication or whatever else I can't imagine.
> Please let me know if I've misunderstood how the scheduler works or you need 
> more information or anything like that and I'll be happy to provide. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.ap

[jira] [Updated] (SPARK-3545) Put HadoopRDD.getPartitions forward and put TaskScheduler.start back in SparkContext to reduce DAGScheduler.JobSubmitted processing time and shorten cluster resources occ

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3545:
-
Component/s: Scheduler

> Put HadoopRDD.getPartitions forward and put TaskScheduler.start back in 
> SparkContext to reduce DAGScheduler.JobSubmitted processing time and shorten 
> cluster resources occupation period
> 
>
> Key: SPARK-3545
> URL: https://issues.apache.org/jira/browse/SPARK-3545
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: YanTang Zhai
>Priority: Minor
>
> We have two problems:
> (1) HadoopRDD.getPartitions is lazyied to process in 
> DAGScheduler.JobSubmitted. If inputdir is large, getPartitions may spend much 
> time. 
> For example, in our cluster, it needs from 0.029s to 766.699s. If one 
> JobSubmitted event is processing, others should wait. Thus, we 
> want to put HadoopRDD.getPartitions forward to reduce 
> DAGScheduler.JobSubmitted processing time. Then other JobSubmitted event 
> don't 
> need to wait much time. HadoopRDD object could get its partitons when it is 
> instantiated.
> (2) When SparkContext object is instantiated, TaskScheduler is started and 
> some resources are allocated from cluster. However, these 
> resources may be not used for the moment. For example, 
> DAGScheduler.JobSubmitted is processing and so on. These resources are wasted 
> in 
> this period. Thus, we want to put TaskScheduler.start back to shorten cluster 
> resources occupation period specially for busy cluster. 
> TaskScheduler could be started just before running stages.
> We could analyse and compare the execution time before and after optimization.
> TaskScheduler.start execution time: [time1__]
> DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or 
> TaskScheduler.start) execution time: [time2_]
> HadoopRDD.getPartitions execution time: [time3___]
> Stages execution time: [time4_]
> (1) The app has only one job
> (a)
> The execution time of the job before optimization is 
> [time1__][time2_][time3___][time4_].
> The execution time of the job after optimization 
> is[time3___][time2_][time1__][time4_].
> (b)
> The cluster resources occupation period before optimization is 
> [time2_][time3___][time4_].
> The cluster resources occupation period after optimization is[time4_].
> In summary, if the app has only one job, the total execution time is same 
> before and after optimization while the cluster resources 
> occupation period after optimization is less than before.
> (2) The app has 4 jobs
> (a) Before optimization,
> job1 execution time is [time2_][time3___][time4_],
> job2 execution time is [time2__][time3___][time4_],
> job3 execution time 
> is[time2][time3___][time4_],
> job4 execution time 
> is[time2__][time3___][time4_].
> After optimization,  
> job1 execution time is [time3___][time2_][time1__][time4_],
> job2 execution time is [time3___][time2__][time4_],
> job3 execution time 
> is[time3___][time2_][time4_],
> job4 execution time 
> is[time3___][time2__][time4_].
> In summary, if the app has multiple jobs, average execution time after 
> optimization is less than before and the cluster resources 
> occupation period after optimization is less than before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4961) Put HadoopRDD.getPartitions forward to reduce DAGScheduler.JobSubmitted processing time

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4961:
-
Component/s: (was: Spark Core)
 Scheduler

> Put HadoopRDD.getPartitions forward to reduce DAGScheduler.JobSubmitted 
> processing time
> ---
>
> Key: SPARK-4961
> URL: https://issues.apache.org/jira/browse/SPARK-4961
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: YanTang Zhai
>Priority: Minor
>
> HadoopRDD.getPartitions is lazyied to process in DAGScheduler.JobSubmitted. 
> If inputdir is large, getPartitions may spend much time.
> For example, in our cluster, it needs from 0.029s to 766.699s. If one 
> JobSubmitted event is processing, others should wait. Thus, we
> want to put HadoopRDD.getPartitions forward to reduce 
> DAGScheduler.JobSubmitted processing time. Then other JobSubmitted event don't
> need to wait much time. HadoopRDD object could get its partitons when it is 
> instantiated.
> We could analyse and compare the execution time before and after optimization.
> TaskScheduler.start execution time: [time1__]
> DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or 
> TaskScheduler.start) execution time: [time2_]
> HadoopRDD.getPartitions execution time: [time3___]
> Stages execution time: [time4_]
> (1) The app has only one job
> (a)
> The execution time of the job before optimization is 
> [time1__][time2_][time3___][time4_].
> The execution time of the job after optimization 
> is[time1__][time3___][time2_][time4_].
> In summary, if the app has only one job, the total execution time is same 
> before and after optimization.
> (2) The app has 4 jobs
> (a) Before optimization,
> job1 execution time is [time2_][time3___][time4_],
> job2 execution time is [time2__][time3___][time4_],
> job3 execution time 
> is[time2][time3___][time4_],
> job4 execution time 
> is[time2_][time3___][time4_].
> After optimization, 
> job1 execution time is [time3___][time2_][time4_],
> job2 execution time is [time3___][time2__][time4_],
> job3 execution time 
> is[time3___][time2_][time4_],
> job4 execution time 
> is[time3___][time2__][time4_].
> In summary, if the app has multiple jobs, average execution time after 
> optimization is less than before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3714) Spark workflow scheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3714:
-
Component/s: (was: Project Infra)
 Scheduler

> Spark workflow scheduler
> 
>
> Key: SPARK-3714
> URL: https://issues.apache.org/jira/browse/SPARK-3714
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Reporter: Egor Pakhomov
>Priority: Minor
>
> [Design doc | 
> https://docs.google.com/document/d/1q2Q8Ux-6uAkH7wtLJpc3jz-GfrDEjlbWlXtf20hvguk/edit?usp=sharing]
> Spark stack currently hard to use in the production processes due to the lack 
> of next features:
> * Scheduling spark jobs
> * Retrying failed spark job in big pipeline
> * Share context among jobs in pipeline
> * Queue jobs
> Typical usecase for such platform would be - wait for new data, process new 
> data, learn ML models on new data, compare model with previous one, in case 
> of success - rewrite model in HDFS directory for current production model 
> with new one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5191) Pyspark: scheduler hangs when importing a standalone pyspark app

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5191:
-
Component/s: (was: PySpark)
 Scheduler

> Pyspark: scheduler hangs when importing a standalone pyspark app
> 
>
> Key: SPARK-5191
> URL: https://issues.apache.org/jira/browse/SPARK-5191
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.0.2, 1.1.1, 1.3.0, 1.2.1
>Reporter: Daniel Liu
>
> In a.py:
> {code}
> from pyspark import SparkContext
> sc = SparkContext("local", "test spark")
> rdd = sc.parallelize(range(1, 10))
> print rdd.count()
> {code}
> In b.py:
> {code}
> from a import *
> {code}
> {{python a.py}} runs fine
> {{python b.py}} will hang at TaskSchedulerImpl: Removed TaskSet 0.0, whose 
> tasks have all completed, from pool
> {{./bin/spark-submit --py-files a.py b.py}} has the same problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5488) SPARK_LOCAL_IP not read by mesos scheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5488:
-
Component/s: (was: Mesos)
 Scheduler

> SPARK_LOCAL_IP not read by mesos scheduler
> --
>
> Key: SPARK-5488
> URL: https://issues.apache.org/jira/browse/SPARK-5488
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.1.1
>Reporter: Martin Tapp
>Priority: Minor
>
> My environment sets SPARK_LOCAL_IP and my driver sees it. But mesos sees the 
> one from my first available network adapter.
> I can even see that SPARK_LOCAL_IP is read correctly by Utils.localHostName 
> and Utils.localIpAddress 
> (core/src/main/scala/org/apache/spark/util/Utils.scala). Seems spark mesos 
> framework doesn't use it.
> Work around for now is to disable my first adapter such that the second one 
> becomes the one seen by Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5316) DAGScheduler may make shuffleToMapStage leak if getParentStages failes

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5316:
-
Component/s: (was: Spark Core)
 Scheduler

> DAGScheduler may make shuffleToMapStage leak if getParentStages failes
> --
>
> Key: SPARK-5316
> URL: https://issues.apache.org/jira/browse/SPARK-5316
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: YanTang Zhai
>Priority: Minor
>
> DAGScheduler may make shuffleToMapStage leak if getParentStages failes.
> If getParentStages has exception for example input path does not exist, 
> DAGScheduler would fail to handle job submission, while shuffleToMapStage may 
> be put some records when getParentStages. However these records in 
> shuffleToMapStage aren't going to be cleaned.
> A simple job as follows:
> {code:java}
> val inputFile1 = ... // Input path does not exist when this job submits
> val inputFile2 = ...
> val outputFile = ...
> val conf = new SparkConf()
> val sc = new SparkContext(conf)
> val rdd1 = sc.textFile(inputFile1)
> .flatMap(line => line.split(" "))
> .map(word => (word, 1))
> .reduceByKey(_ + _, 1)
> val rdd2 = sc.textFile(inputFile2)
> .flatMap(line => line.split(","))
> .map(word => (word, 1))
> .reduceByKey(_ + _, 1)
> try {
>   val rdd3 = new PairRDDFunctions(rdd1).join(rdd2, 1)
>   rdd3.saveAsTextFile(outputFile)
> } catch {
>   case e : Exception =>
>   logError(e)
> }
> // print the information of DAGScheduler's shuffleToMapStage to check
> // whether it still has uncleaned records.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2618) use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2618:
-
Component/s: Scheduler

> use config spark.scheduler.priority for specifying TaskSet's priority on 
> DAGScheduler
> -
>
> Key: SPARK-2618
> URL: https://issues.apache.org/jira/browse/SPARK-2618
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Lianhui Wang
>
> we use shark server to do interative query. every sql run with a job. 
> sometimes we want to immediately run a query that later be submitted to shark 
> server. so we need to provide user to define a job's priority and ensure that 
> high priority job can be firstly  launched.
> i have created a pull request: https://github.com/apache/spark/pull/1528



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4346) YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4346:
-
Component/s: (was: YARN)
 Scheduler

> YarnClientSchedulerBack.asyncMonitorApplication should be common with 
> Client.monitorApplication
> ---
>
> Key: SPARK-4346
> URL: https://issues.apache.org/jira/browse/SPARK-4346
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Thomas Graves
>
> The YarnClientSchedulerBackend.asyncMonitorApplication routine should move 
> into ClientBase and be made common with monitorApplication.  Make sure stop 
> is handled properly.
> See discussion on https://github.com/apache/spark/pull/3143



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2647) DAGScheduler plugs others when processing one JobSubmitted event

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2647:
-
Component/s: (was: Spark Core)
 Scheduler

> DAGScheduler plugs others when processing one JobSubmitted event
> 
>
> Key: SPARK-2647
> URL: https://issues.apache.org/jira/browse/SPARK-2647
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: YanTang Zhai
>
> If a few of jobs are submitted, DAGScheduler plugs others when processing one 
> JobSubmitted event.
> For example ont JobSubmitted event is processed as follows and costs much time
> "spark-akka.actor.default-dispatcher-67" daemon prio=10 
> tid=0x7f75ec001000 nid=0x7dd6 in Object.wait() [0x7f76063e1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at org.apache.hadoopcdh3.ipc.Client.call(Client.java:1130)
>   - locked <0x000783b17330> (a org.apache.hadoopcdh3.ipc.Client$Call)
>   at org.apache.hadoopcdh3.ipc.RPC$Invoker.invoke(RPC.java:241)
>   at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoopcdh3.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83)
>   at 
> org.apache.hadoopcdh3.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:60)
>   at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
>   at 
> org.apache.hadoopcdh3.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1472)
>   at 
> org.apache.hadoopcdh3.hdfs.DFSClient.getBlockLocations(DFSClient.java:1498)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$1.doCall(Cdh3DistributedFileSystem.java:208)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$1.doCall(Cdh3DistributedFileSystem.java:204)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getFileBlockLocations(Cdh3DistributedFileSystem.java:204)
>   at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812)
>   at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:233)
>   at 
> StorageEngineClient.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:141)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>   at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:54)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:54)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:54)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions

[jira] [Updated] (SPARK-985) Support Job Cancellation on Mesos Scheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-985:

Component/s: Mesos

> Support Job Cancellation on Mesos Scheduler
> ---
>
> Key: SPARK-985
> URL: https://issues.apache.org/jira/browse/SPARK-985
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 0.9.0
>Reporter: Josh Rosen
>
> https://github.com/apache/incubator-spark/pull/29 added job cancellation but 
> may still need support for Mesos scheduler backends:
> Quote: 
> {quote}
> This looks good except that MesosSchedulerBackend isn't yet calling Mesos's 
> killTask. Do you want to add that too or are you planning to push it till 
> later? I don't think it's a huge change.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-823) spark.default.parallelism's default is inconsistent across scheduler backends

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-823:

Component/s: PySpark
 Documentation

> spark.default.parallelism's default is inconsistent across scheduler backends
> -
>
> Key: SPARK-823
> URL: https://issues.apache.org/jira/browse/SPARK-823
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark, Scheduler
>Affects Versions: 0.8.0, 0.7.3, 0.9.1
>Reporter: Josh Rosen
>Priority: Minor
>
> The [0.7.3 configuration 
> guide|http://spark-project.org/docs/latest/configuration.html] says that 
> {{spark.default.parallelism}}'s default is 8, but the default is actually 
> max(totalCoreCount, 2) for the standalone scheduler backend, 8 for the Mesos 
> scheduler, and {{threads}} for the local scheduler:
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala#L157
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/mesos/MesosSchedulerBackend.scala#L317
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/local/LocalScheduler.scala#L150
> Should this be clarified in the documentation?  Should the Mesos scheduler 
> backend's default be revised?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2555:
-
Component/s: Mesos

> Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos 
> mode.
> 
>
> Key: SPARK-2555
> URL: https://issues.apache.org/jira/browse/SPARK-2555
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 1.0.0
>Reporter: Zhihui
>
> In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was 
> introduced, but it only support  Standalone and Yarn mode.
> This is try to introduce the configuration to Mesos mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3714) Spark workflow scheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3714:
-
Component/s: Deploy

> Spark workflow scheduler
> 
>
> Key: SPARK-3714
> URL: https://issues.apache.org/jira/browse/SPARK-3714
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, Scheduler
>Reporter: Egor Pakhomov
>Priority: Minor
>
> [Design doc | 
> https://docs.google.com/document/d/1q2Q8Ux-6uAkH7wtLJpc3jz-GfrDEjlbWlXtf20hvguk/edit?usp=sharing]
> Spark stack currently hard to use in the production processes due to the lack 
> of next features:
> * Scheduling spark jobs
> * Retrying failed spark job in big pipeline
> * Share context among jobs in pipeline
> * Queue jobs
> Typical usecase for such platform would be - wait for new data, process new 
> data, learn ML models on new data, compare model with previous one, in case 
> of success - rewrite model in HDFS directory for current production model 
> with new one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5191) Pyspark: scheduler hangs when importing a standalone pyspark app

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5191:
-
Component/s: PySpark

> Pyspark: scheduler hangs when importing a standalone pyspark app
> 
>
> Key: SPARK-5191
> URL: https://issues.apache.org/jira/browse/SPARK-5191
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Scheduler
>Affects Versions: 1.0.2, 1.1.1, 1.3.0, 1.2.1
>Reporter: Daniel Liu
>
> In a.py:
> {code}
> from pyspark import SparkContext
> sc = SparkContext("local", "test spark")
> rdd = sc.parallelize(range(1, 10))
> print rdd.count()
> {code}
> In b.py:
> {code}
> from a import *
> {code}
> {{python a.py}} runs fine
> {{python b.py}} will hang at TaskSchedulerImpl: Removed TaskSet 0.0, whose 
> tasks have all completed, from pool
> {{./bin/spark-submit --py-files a.py b.py}} has the same problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5488) SPARK_LOCAL_IP not read by mesos scheduler

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5488:
-
Component/s: Mesos

> SPARK_LOCAL_IP not read by mesos scheduler
> --
>
> Key: SPARK-5488
> URL: https://issues.apache.org/jira/browse/SPARK-5488
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos, Scheduler
>Affects Versions: 1.1.1
>Reporter: Martin Tapp
>Priority: Minor
>
> My environment sets SPARK_LOCAL_IP and my driver sees it. But mesos sees the 
> one from my first available network adapter.
> I can even see that SPARK_LOCAL_IP is read correctly by Utils.localHostName 
> and Utils.localIpAddress 
> (core/src/main/scala/org/apache/spark/util/Utils.scala). Seems spark mesos 
> framework doesn't use it.
> Work around for now is to disable my first adapter such that the second one 
> becomes the one seen by Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4346) YarnClientSchedulerBack.asyncMonitorApplication should be common with Client.monitorApplication

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4346:
-
Component/s: YARN

> YarnClientSchedulerBack.asyncMonitorApplication should be common with 
> Client.monitorApplication
> ---
>
> Key: SPARK-4346
> URL: https://issues.apache.org/jira/browse/SPARK-4346
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Reporter: Thomas Graves
>
> The YarnClientSchedulerBackend.asyncMonitorApplication routine should move 
> into ClientBase and be made common with monitorApplication.  Make sure stop 
> is handled properly.
> See discussion on https://github.com/apache/spark/pull/3143



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1452) dynamic partition creation not working on cached table

2015-02-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1452.
--
Resolution: Won't Fix

If this is a Shark issue then I believe it's WontFix, as Shark isn't developed 
anymore.

> dynamic partition creation not working on cached table
> --
>
> Key: SPARK-1452
> URL: https://issues.apache.org/jira/browse/SPARK-1452
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 0.9.0
> Environment: Shark git 
> commit dfc0e81366c0e1d0293ecf9b490eeabcc2a9c904
> Merge: 517ebca 7652f0d
>Reporter: Jai Kumar Singh
>  Labels: Shark
>
> dynamic partition creation via shark QL command not working with cached 
> table. Though it works fine with non-cached tables.
> Also static partition is working fine with cached table. 
> shark> desc sample;
> OK
> cid string  None
> hoststring  None
> url string  None
> bytes   int None
> pckts   int None
> app string  None
> cat string  None
> Time taken: 0.149 seconds
> shark> 
> shark> desc sample_cached;
> OK
> cat string  from deserializer   
> hoststring  from deserializer   
> cid string  None
>  
> # Partition Information  
> # col_name  data_type   comment 
>  
> cid string  None
> Time taken: 0.15 seconds
> shark> 
> shark> insert into table sample_cached partition(cid) select cat,host,cid 
> from sample;
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> shark> 
> shark> insert into table sample_cached partition(cid="my-cid") select 
> cat,host from sample limit 20;
> java.lang.InstantiationException: scala.Some
> Continuing ...
> java.lang.RuntimeException: failed to evaluate: =Class.new();
> Continuing ...
> Loading data to table default.sample_cached partition (cid=my-cid)
> OK
> Time taken: 64.268 seconds
> I am logging this issue over here because 
> https://spark-project.atlassian.net/browse/SHARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel
>  not allowing me to log the issue there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5666) Adhere to accepte coding standards

2015-02-07 Thread Prabeesh K (JIRA)
Prabeesh K created SPARK-5666:
-

 Summary: Adhere to accepte coding standards
 Key: SPARK-5666
 URL: https://issues.apache.org/jira/browse/SPARK-5666
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Prabeesh K
Priority: Minor


Cleanup the the all source code related to the Mqtt Spark Streaming to adhere 
to accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5666) Adhere to accepte coding standards

2015-02-07 Thread Prabeesh K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabeesh K updated SPARK-5666:
--
Description: Cleanup the source code related to the Mqtt Spark Streaming to 
adhere to accept coding standards.  (was: Cleanup the the all source code 
related to the Mqtt Spark Streaming to adhere to accept coding standards.)

> Adhere to accepte coding standards
> --
>
> Key: SPARK-5666
> URL: https://issues.apache.org/jira/browse/SPARK-5666
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Prabeesh K
>Priority: Minor
>
> Cleanup the source code related to the Mqtt Spark Streaming to adhere to 
> accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5666) Adhere to accepte coding standards

2015-02-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310842#comment-14310842
 ] 

Apache Spark commented on SPARK-5666:
-

User 'prabeesh' has created a pull request for this issue:
https://github.com/apache/spark/pull/4178

> Adhere to accepte coding standards
> --
>
> Key: SPARK-5666
> URL: https://issues.apache.org/jira/browse/SPARK-5666
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Prabeesh K
>Priority: Minor
>
> Cleanup the source code related to the Mqtt Spark Streaming to adhere to 
> accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5666) Adhere to accepte coding standards

2015-02-07 Thread Prabeesh K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310845#comment-14310845
 ] 

Prabeesh K commented on SPARK-5666:
---

[~srowen] Please review this

> Adhere to accepte coding standards
> --
>
> Key: SPARK-5666
> URL: https://issues.apache.org/jira/browse/SPARK-5666
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Prabeesh K
>Priority: Minor
>
> Cleanup the source code related to the Mqtt Spark Streaming to adhere to 
> accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5326) Show fetch wait time as optional metric in the UI

2015-02-07 Thread Kay Ousterhout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout resolved SPARK-5326.
---
Resolution: Fixed

https://github.com/apache/spark/pull/4110

> Show fetch wait time as optional metric in the UI
> -
>
> Key: SPARK-5326
> URL: https://issues.apache.org/jira/browse/SPARK-5326
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
>
> Time blocked waiting on shuffle read time can be a cause of slow jobs.  We 
> currently store this information but don't show it in the UI; we should add 
> it to the UI as an optional additional metric.
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5326) Show fetch wait time as optional metric in the UI

2015-02-07 Thread Kay Ousterhout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout updated SPARK-5326:
--
Fix Version/s: 1.3.0

> Show fetch wait time as optional metric in the UI
> -
>
> Key: SPARK-5326
> URL: https://issues.apache.org/jira/browse/SPARK-5326
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Kay Ousterhout
>Assignee: Kay Ousterhout
>Priority: Minor
> Fix For: 1.3.0
>
>
> Time blocked waiting on shuffle read time can be a cause of slow jobs.  We 
> currently store this information but don't show it in the UI; we should add 
> it to the UI as an optional additional metric.
> cc [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5666) Adhere to accept coding standards

2015-02-07 Thread Prabeesh K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabeesh K updated SPARK-5666:
--
Summary: Adhere to accept coding standards  (was: Adhere to accepte coding 
standards)

> Adhere to accept coding standards
> -
>
> Key: SPARK-5666
> URL: https://issues.apache.org/jira/browse/SPARK-5666
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Prabeesh K
>Priority: Minor
>
> Cleanup the source code related to the Mqtt Spark Streaming to adhere to 
> accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5666) To accept coding standards

2015-02-07 Thread Prabeesh K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabeesh K updated SPARK-5666:
--
Summary: To accept coding standards  (was: Adhere to accept coding 
standards)

> To accept coding standards
> --
>
> Key: SPARK-5666
> URL: https://issues.apache.org/jira/browse/SPARK-5666
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Prabeesh K
>Priority: Minor
>
> Cleanup the source code related to the Mqtt Spark Streaming to adhere to 
> accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5666) To accept coding standards

2015-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310847#comment-14310847
 ] 

Sean Owen commented on SPARK-5666:
--

This concerns MQTT, and the primary motivation isn't code cleanup is it? for 
example you're changing the retry and error handling semantics. I might update 
the title and PR title accordingly

> To accept coding standards
> --
>
> Key: SPARK-5666
> URL: https://issues.apache.org/jira/browse/SPARK-5666
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Prabeesh K
>Priority: Minor
>
> Cleanup the source code related to the Mqtt Spark Streaming to adhere to 
> accept coding standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5667) Remove version from spark-ec2 example.

2015-02-07 Thread Miguel Peralvo (JIRA)
Miguel Peralvo created SPARK-5667:
-

 Summary: Remove version from spark-ec2 example.
 Key: SPARK-5667
 URL: https://issues.apache.org/jira/browse/SPARK-5667
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.3.0
Reporter: Miguel Peralvo
Priority: Trivial
 Fix For: 1.3.0


Remove version from spark-ec2 example for spark-ec2/Launch Cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >