date:20140805

[jira] [Resolved] (SPARK-1779) Warning when spark.storage.memoryFraction is not between 0 and 1

2014-08-05 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1779.


Resolution: Fixed

Fixed via:
https://github.com/apache/spark/pull/714

 Warning when spark.storage.memoryFraction is not between 0 and 1
 

 Key: SPARK-1779
 URL: https://issues.apache.org/jira/browse/SPARK-1779
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 0.9.0, 1.0.0
Reporter: wangfei
 Fix For: 1.1.0


 There should be a warning when memoryFraction is lower than 0 or greater than 
 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2859) Update url of Kryo project in tuning.md

2014-08-05 Thread Guancheng Chen (JIRA)

Guancheng Chen created SPARK-2859:
-

 Summary: Update url of Kryo project in tuning.md
 Key: SPARK-2859
 URL: https://issues.apache.org/jira/browse/SPARK-2859
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Guancheng Chen
Priority: Trivial


Kryo project has been migrated from googlecode to github, hence we need to 
update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2862) DoubleRDDFunctions.histogram() throws exception for some inputs

2014-08-05 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086182#comment-14086182
 ] 

Apache Spark commented on SPARK-2862:
-

User 'nrchandan' has created a pull request for this issue:
https://github.com/apache/spark/pull/1787

 DoubleRDDFunctions.histogram() throws exception for some inputs
 ---

 Key: SPARK-2862
 URL: https://issues.apache.org/jira/browse/SPARK-2862
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0, 0.9.1, 1.0.0
 Environment: Scala version 2.9.2 (OpenJDK 64-Bit Server VM, Java 
 1.7.0_55) running on Ubuntu 14.04
Reporter: Chandan Kumar

 histogram method call throws the below stack trace when the choice of 
 bucketCount partitions the RDD in irrational increments e.g. 
 scala val r = sc.parallelize(6 to 99)
 r: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at 
 console:12
 scala r.histogram(9)
 java.lang.IndexOutOfBoundsException: 9
 at scala.collection.immutable.NumericRange.apply(NumericRange.scala:124)
 at 
 scala.collection.immutable.NumericRange$$anon$1.apply(NumericRange.scala:176)
 at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:66)
 at scala.collection.IterableLike$class.copyToArray(IterableLike.scala:237)
 at scala.collection.AbstractIterable.copyToArray(Iterable.scala:54)
 at 
 scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
 at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
 at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
 at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
 at 
 org.apache.spark.rdd.DoubleRDDFunctions.histogram(DoubleRDDFunctions.scala:116)
 at $iwC$$iwC$$iwC$$iwC.init(console:15)
 at $iwC$$iwC$$iwC.init(console:20)
 at $iwC$$iwC.init(console:22)
 at $iwC.init(console:24)
 at init(console:26)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2862) DoubleRDDFunctions.histogram() throws exception for some inputs

2014-08-05 Thread Prashant Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-2862:
---

Affects Version/s: 1.0.1

 DoubleRDDFunctions.histogram() throws exception for some inputs
 ---

 Key: SPARK-2862
 URL: https://issues.apache.org/jira/browse/SPARK-2862
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.0.1
 Environment: Scala version 2.9.2 (OpenJDK 64-Bit Server VM, Java 
 1.7.0_55) running on Ubuntu 14.04
Reporter: Chandan Kumar

 histogram method call throws the below stack trace when the choice of 
 bucketCount partitions the RDD in irrational increments e.g. 
 scala val r = sc.parallelize(6 to 99)
 r: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at 
 console:12
 scala r.histogram(9)
 java.lang.IndexOutOfBoundsException: 9
 at scala.collection.immutable.NumericRange.apply(NumericRange.scala:124)
 at 
 scala.collection.immutable.NumericRange$$anon$1.apply(NumericRange.scala:176)
 at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:66)
 at scala.collection.IterableLike$class.copyToArray(IterableLike.scala:237)
 at scala.collection.AbstractIterable.copyToArray(Iterable.scala:54)
 at 
 scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:241)
 at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:105)
 at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:249)
 at scala.collection.AbstractTraversable.toArray(Traversable.scala:105)
 at 
 org.apache.spark.rdd.DoubleRDDFunctions.histogram(DoubleRDDFunctions.scala:116)
 at $iwC$$iwC$$iwC$$iwC.init(console:15)
 at $iwC$$iwC$$iwC.init(console:20)
 at $iwC$$iwC.init(console:22)
 at $iwC.init(console:24)
 at init(console:26)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2861) Doc comment of DoubleRDDFunctions.histogram is incorrect

2014-08-05 Thread Chandan Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandan Kumar updated SPARK-2861:
-

Description: 
The documentation comment of histogram method of DoubleRDDFunctions class in 
source file DoubleRDDFunctions.scala is  inconsistent. This might confuse 
somebody reading the documentation.

Comment in question:
{code}
  /**
   * Compute a histogram using the provided buckets. The buckets are all open
   * to the left except for the last which is closed
   *  e.g. for the array
   *  [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
   *  e.g 1=x10 , 10=x20, 20=x50
   *  And on the input of 1 and 50 we would have a histogram of 1, 0, 0
{code}

The buckets are all open to the right (NOT left) except for the last which is 
closed
For the example quoted, the last bucket should be 20=x=50.
Also, the histogram result on input of 1 and 50 would be 1, 0, 1 (NOT 1, 0, 0). 
This works correctly in Spark but the doc comment is incorrect.


  was:The documentation comment of histogram method of DoubleRDDFunctions class 
in source file DoubleRDDFunctions.scala is partially incorrect, hence 
inconsistent. This might confuse somebody reading the documentation.


 Doc comment of DoubleRDDFunctions.histogram is incorrect
 

 Key: SPARK-2861
 URL: https://issues.apache.org/jira/browse/SPARK-2861
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0, 0.9.1, 1.0.0
Reporter: Chandan Kumar
Priority: Trivial

 The documentation comment of histogram method of DoubleRDDFunctions class in 
 source file DoubleRDDFunctions.scala is  inconsistent. This might confuse 
 somebody reading the documentation.
 Comment in question:
 {code}
   /**
* Compute a histogram using the provided buckets. The buckets are all open
* to the left except for the last which is closed
*  e.g. for the array
*  [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
*  e.g 1=x10 , 10=x20, 20=x50
*  And on the input of 1 and 50 we would have a histogram of 1, 0, 0
 {code}
 The buckets are all open to the right (NOT left) except for the last which is 
 closed
 For the example quoted, the last bucket should be 20=x=50.
 Also, the histogram result on input of 1 and 50 would be 1, 0, 1 (NOT 1, 0, 
 0). This works correctly in Spark but the doc comment is incorrect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2863) Emulate Hive type coercion in native reimplementations of Hive UDFs

2014-08-05 Thread William Benton (JIRA)

William Benton created SPARK-2863:
-

 Summary: Emulate Hive type coercion in native reimplementations of 
Hive UDFs
 Key: SPARK-2863
 URL: https://issues.apache.org/jira/browse/SPARK-2863
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: William Benton


Native reimplementations of Hive functions no longer have the same 
type-coercion behavior as they would if executed via Hive.  As a 
href=https://github.com/apache/spark/pull/1750#discussion_r15790970; Michael 
Armbrust points out/a, queries like {{SELECT SQRT(2) FROM src LIMIT 1}} 
succeed in Hive but fail if {{SQRT}} is implemented natively.

Spark SQL should have Hive-compatible type coercions for arguments to 
natively-implemented functions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-08-05 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086494#comment-14086494
 ] 

Marcelo Vanzin commented on SPARK-2636:
---

(BTW, just checked SPARK-2321, so if you really mean the {{Job}} id, ignore my 
comments, since yes, it's kind of a pain to know the ID of a job you're 
submitting to the context.)

 no where to get job identifier while submit spark job through spark API
 ---

 Key: SPARK-2636
 URL: https://issues.apache.org/jira/browse/SPARK-2636
 Project: Spark
  Issue Type: New Feature
Reporter: Chengxiang Li

 In Hive on Spark, we want to track spark job status through Spark API, the 
 basic idea is as following:
 # create an hive-specified spark listener and register it to spark listener 
 bus.
 # hive-specified spark listener generate job status by spark listener events.
 # hive driver track job status through hive-specified spark listener. 
 the current problem is that hive driver need job identifier to track 
 specified job status through spark listener, but there is no spark API to get 
 job identifier(like job id) while submit spark job.
 I think other project whoever try to track job status with spark API would 
 suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2864) fix random seed in Word2Vec

2014-08-05 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-2864:


 Summary: fix random seed in Word2Vec
 Key: SPARK-2864
 URL: https://issues.apache.org/jira/browse/SPARK-2864
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng


The random seed is not fixed in word2vec, making the unit tests fail randomly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2622) Add Jenkins build numbers to SparkQA messages

2014-08-05 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086513#comment-14086513
 ] 

Xiangrui Meng commented on SPARK-2622:
--

The build number is included in the SparkQA message, for example: 
https://github.com/apache/spark/pull/1788

The build number 17941 is in the URL 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17941/consoleFull;.
 Just need to be careful to match the number.

 Add Jenkins build numbers to SparkQA messages
 -

 Key: SPARK-2622
 URL: https://issues.apache.org/jira/browse/SPARK-2622
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.0.1
Reporter: Xiangrui Meng
Priority: Minor

 It takes Jenkins 2 hours to finish testing. It is possible to have the 
 following:
 {code}
 Build 1 started.
 PR updated.
 Build 2 started.
 Build 1 finished successfully.
 A committer merged the PR because the last build seemed to be okay.
 Build 2 failed.
 {code}
 It would be nice to put the build number in the SparkQA message so it is easy 
 to match the result with the build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2622) Add Jenkins build numbers to SparkQA messages

2014-08-05 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng closed SPARK-2622.


Resolution: Fixed

 Add Jenkins build numbers to SparkQA messages
 -

 Key: SPARK-2622
 URL: https://issues.apache.org/jira/browse/SPARK-2622
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 1.0.1
Reporter: Xiangrui Meng
Priority: Minor

 It takes Jenkins 2 hours to finish testing. It is possible to have the 
 following:
 {code}
 Build 1 started.
 PR updated.
 Build 2 started.
 Build 1 finished successfully.
 A committer merged the PR because the last build seemed to be okay.
 Build 2 failed.
 {code}
 It would be nice to put the build number in the SparkQA message so it is easy 
 to match the result with the build.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1890) add modify acls to the web UI for the kill button

2014-08-05 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1890.
--

Resolution: Fixed

 add modify acls to the web UI for the kill button
 ---

 Key: SPARK-1890
 URL: https://issues.apache.org/jira/browse/SPARK-1890
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Critical
 Fix For: 1.1.0


 A kill button has been added to the UI to allow you to kill tasks.  
 Currently this is either enabled or disabled. 
 I think we should add another set of acls to control who has permission to 
 use this.  We currently have view acls in the Security Manager which take 
 affect if you have a servlet filter that does authentication installed.  We 
 should add another set of acls modify acls, that control who has permission 
 to use the kill button.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2854) Finalize _acceptable_types in pyspark.sql

2014-08-05 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086560#comment-14086560
 ] 

Yin Huai commented on SPARK-2854:
-

Since we have already do conversions for ByteType and ShortType when we have 
int values, to be consistent, we can also support long values for ByteType, 
ShortType and IntegerType.

For datetime.time and datetime.date values, because there are SQL Time and Date 
types, datetime.time and datetime.date will not be allowed as value types for 
TimestampType.

So, here will be the updated _acceptable_types
{code}
_acceptable_types = {
BooleanType: (bool,),
ByteType: (int, long),
ShortType: (int, long),
IntegerType: (int, long),
LongType: (int, long),
FloatType: (float,),
DoubleType: (float,),
DecimalType: (decimal.Decimal,),
StringType: (str, unicode),
TimestampType: (datetime.datetime,),
ArrayType: (list, tuple, array),
MapType: (dict,),
StructType: (tuple, list),
}
{code}

 Finalize _acceptable_types in pyspark.sql
 -

 Key: SPARK-2854
 URL: https://issues.apache.org/jira/browse/SPARK-2854
 Project: Spark
  Issue Type: Task
  Components: SQL
Reporter: Yin Huai
Priority: Blocker

 In PySpark, _acceptable_types defines accepted Python data types for every 
 Spark SQL data type. The list is shown below. 
 {code}
 _acceptable_types = {
 BooleanType: (bool,),
 ByteType: (int, long),
 ShortType: (int, long),
 IntegerType: (int, long),
 LongType: (int, long),
 FloatType: (float,),
 DoubleType: (float,),
 DecimalType: (decimal.Decimal,),
 StringType: (str, unicode),
 TimestampType: (datetime.datetime, datetime.time, datetime.date),
 ArrayType: (list, tuple, array),
 MapType: (dict,),
 StructType: (tuple, list),
 }
 {code}
 Let's double check this mapping before 1.1 release.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2865) Potential deadlock: tasks could hang forever waiting to fetch a remote block even though most tasks finish

2014-08-05 Thread Zongheng Yang (JIRA)

Zongheng Yang created SPARK-2865:


 Summary: Potential deadlock: tasks could hang forever waiting to 
fetch a remote block even though most tasks finish
 Key: SPARK-2865
 URL: https://issues.apache.org/jira/browse/SPARK-2865
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 1.0.1, 1.1.0
 Environment: 16-node EC2 r3.2xlarge cluster
Reporter: Zongheng Yang
Priority: Blocker


In the application I tested, most of the tasks out of 128 tasks could finish, 
but sometimes (pretty deterministically) either 1 or 3 tasks would just hang 
forever with the following stack trace. There were no apparent failures from 
the UI, also the nodes where the stuck tasks were running had no apparent 
memory/CPU/disk pressures.

{noformat}
Executor task launch worker-0 daemon prio=10 tid=0x7f32ec003800 nid=0xaac 
waiting on condition [0x7f33f4428000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x7f3e0d7198e8 (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at 
org.apache.spark.network.ConnectionManager.sendMessageReliablySync(ConnectionManager.scala:832)
at 
org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:122)
at 
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:497)
at 
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:495)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:495)
at 
org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:481)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:524)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

This behavior does *not* appear on 1.0 (reusing the same cluster), but appears 
on the master branch as of Aug 4, 2014 *and* 1.0.1. Further, I tried out [this 
patch|https://github.com/apache/spark/pull/1758], and it didn't fix the 
behavior.

Further, when this behavior happened, the driver printed out the following line 
repeatedly:

{noformat}
14/08/04 23:32:42 WARN storage.BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-172-31-6-74.us-west-1.compute.internal, 59408, 0) with no 
recent heart beats: 67331ms exceeds 45000ms
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-2865) Potential deadlock: tasks could hang forever waiting to fetch a remote block even though most tasks finish

2014-08-05 Thread Zongheng Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongheng Yang updated SPARK-2865:
-

Description: 
In the application I tested, most of the tasks out of 128 tasks could finish, 
but sometimes (pretty deterministically) either 1 or 3 tasks would just hang 
forever ( 5 hrs with no progress at all) with the following stack trace. There 
were no apparent failures from the UI, also the nodes where the stuck tasks 
were running had no apparent memory/CPU/disk pressures.

{noformat}
Executor task launch worker-0 daemon prio=10 tid=0x7f32ec003800 nid=0xaac 
waiting on condition [0x7f33f4428000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x7f3e0d7198e8 (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at 
org.apache.spark.network.ConnectionManager.sendMessageReliablySync(ConnectionManager.scala:832)
at 
org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:122)
at 
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:497)
at 
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:495)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:495)
at 
org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:481)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:524)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

This behavior does *not* appear on 1.0 (reusing the same cluster), but appears 
on the master branch as of Aug 4, 2014 *and* 1.0.1. Further, I tried out [this 
patch|https://github.com/apache/spark/pull/1758], and it didn't fix the 
behavior.

When this behavior happened, the driver printed out the following line 
repeatedly:

{noformat}
14/08/04 23:32:42 WARN storage.BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-172-31-6-74.us-west-1.compute.internal, 59408, 0) with no 
recent heart beats: 67331ms exceeds 45000ms
{noformat}

  was:
In the application I tested, most of the tasks out of 128 tasks could finish, 
but sometimes (pretty deterministically) either 1 or 3 tasks would just hang 
forever with the following stack trace. There were no apparent failures from 
the UI, also the nodes where the stuck tasks were running had no apparent 
memory/CPU/disk pressures.

{noformat}
Executor task launch worker-0 daemon prio=10 tid=0x7f32ec003800

[jira] [Updated] (SPARK-2865) Potential deadlock: tasks could hang forever waiting to fetch a remote block even though most tasks finish

2014-08-05 Thread Zongheng Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongheng Yang updated SPARK-2865:
-

Description: 
In the application I tested, most of the tasks out of 128 tasks could finish, 
but sometimes (pretty deterministically) either 1 or 3 tasks would just hang 
forever with the following stack trace. There were no apparent failures from 
the UI, also the nodes where the stuck tasks were running had no apparent 
memory/CPU/disk pressures.

{noformat}
Executor task launch worker-0 daemon prio=10 tid=0x7f32ec003800 nid=0xaac 
waiting on condition [0x7f33f4428000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x7f3e0d7198e8 (a 
scala.concurrent.impl.Promise$CompletionLatch)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at 
org.apache.spark.network.ConnectionManager.sendMessageReliablySync(ConnectionManager.scala:832)
at 
org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:122)
at 
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:497)
at 
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:495)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:495)
at 
org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:481)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:524)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

This behavior does *not* appear on 1.0 (reusing the same cluster), but appears 
on the master branch as of Aug 4, 2014 *and* 1.0.1. Further, I tried out [this 
patch|https://github.com/apache/spark/pull/1758], and it didn't fix the 
behavior.

When this behavior happened, the driver printed out the following line 
repeatedly:

{noformat}
14/08/04 23:32:42 WARN storage.BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(7, ip-172-31-6-74.us-west-1.compute.internal, 59408, 0) with no 
recent heart beats: 67331ms exceeds 45000ms
{noformat}

  was:
In the application I tested, most of the tasks out of 128 tasks could finish, 
but sometimes (pretty deterministically) either 1 or 3 tasks would just hang 
forever with the following stack trace. There were no apparent failures from 
the UI, also the nodes where the stuck tasks were running had no apparent 
memory/CPU/disk pressures.

{noformat}
Executor task launch worker-0 daemon prio=10 tid=0x7f32ec003800 nid=0xaac 
waiting on condition

[jira] [Resolved] (SPARK-2860) Resolving CASE WHEN throws None.get exception

2014-08-05 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2860.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 Resolving CASE WHEN throws None.get exception
 -

 Key: SPARK-2860
 URL: https://issues.apache.org/jira/browse/SPARK-2860
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Critical
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2866) ORDER BY attributes must appear in SELECT clause

2014-08-05 Thread Michael Armbrust (JIRA)

Michael Armbrust created SPARK-2866:
---

 Summary: ORDER BY attributes must appear in SELECT clause
 Key: SPARK-2866
 URL: https://issues.apache.org/jira/browse/SPARK-2866
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2859) Update url of Kryo project in related docs

2014-08-05 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2859.


   Resolution: Fixed
Fix Version/s: 1.1.0
   1.0.3

Issue resolved by pull request 1782
[https://github.com/apache/spark/pull/1782]

 Update url of Kryo project in related docs
 --

 Key: SPARK-2859
 URL: https://issues.apache.org/jira/browse/SPARK-2859
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Guancheng Chen
Priority: Trivial
 Fix For: 1.0.3, 1.1.0


 Kryo project has been migrated from googlecode to github, hence we need to 
 update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2534) Avoid pulling in the entire RDD or PairRDDFunctions in various operators

2014-08-05 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-2534:
-

Component/s: Spark Core

 Avoid pulling in the entire RDD or PairRDDFunctions in various operators
 

 Key: SPARK-2534
 URL: https://issues.apache.org/jira/browse/SPARK-2534
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical
 Fix For: 1.1.0, 1.0.2


 The way groupByKey is written actually pulls the entire PairRDDFunctions into 
 the 3 closures, sometimes resulting in gigantic task sizes:
 {code}
   def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])] = {
 // groupByKey shouldn't use map side combine because map side combine 
 does not
 // reduce the amount of data shuffled and requires all map side data be 
 inserted
 // into a hash table, leading to more objects in the old gen.
 def createCombiner(v: V) = ArrayBuffer(v)
 def mergeValue(buf: ArrayBuffer[V], v: V) = buf += v
 def mergeCombiners(c1: ArrayBuffer[V], c2: ArrayBuffer[V]) = c1 ++ c2
 val bufs = combineByKey[ArrayBuffer[V]](
   createCombiner _, mergeValue _, mergeCombiners _, partitioner, 
 mapSideCombine=false)
 bufs.mapValues(_.toIterable)
   }
 {code}
 Changing the functions from def to val would solve it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2867) saveAsHadoopFile() in PairRDDFunction.scala should allow use other OutputCommiter class

2014-08-05 Thread Joseph Su (JIRA)

Joseph Su created SPARK-2867:


 Summary: saveAsHadoopFile() in PairRDDFunction.scala should allow 
use other OutputCommiter class
 Key: SPARK-2867
 URL: https://issues.apache.org/jira/browse/SPARK-2867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0, 1.1.0
Reporter: Joseph Su
Priority: Minor



The saveAsHadoopFile() in PairRDDFunction.scala hard-coded the OutputCommitter 
class as FileOutputCommitter because of the following code in the source:

   hadoopConf.setOutputCommitter(classOf[FileOutputCommitter])

 However, OutputCommitter is a changeable option in regular Hadoop MapReduce 
program. Users can specify mapred.output.committer.class to change the 
committer class used by other Hadoop programs.

  The saveAsHadoopFile() function should remove this hard-coded assignment and 
provide a way to specify the OutputCommitte used here. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1977) mutable.BitSet in ALS not serializable with KryoSerializer

2014-08-05 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-1977:
-

Fix Version/s: (was: 1.0.1)
   1.0.2

 mutable.BitSet in ALS not serializable with KryoSerializer
 --

 Key: SPARK-1977
 URL: https://issues.apache.org/jira/browse/SPARK-1977
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Neville Li
Priority: Minor
 Fix For: 1.1.0, 1.0.2


 OutLinkBlock in ALS.scala has an Array[mutable.BitSet] member.
 KryoSerializer uses AllScalaRegistrar from Twitter chill but it doesn't 
 register mutable.BitSet.
 Right now we have to register mutable.BitSet manually. A proper fix would be 
 using immutable.BitSet in ALS or register mutable.BitSet in upstream chill.
 {code}
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 1724.0:9 failed 4 times, most recent failure: Exception failure in TID 
 68548 on host lon4-hadoopslave-b232.lon4.spotify.net: 
 com.esotericsoftware.kryo.KryoException: java.lang.ArrayStoreException: 
 scala.collection.mutable.HashSet
 Serialization trace:
 shouldSend (org.apache.spark.mllib.recommendation.OutLinkBlock)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
 com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
 
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 org.apache.spark.scheduler.Task.run(Task.scala:51)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 java.lang.Thread.run(Thread.java:662)
 Driver stacktrace:
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
   at scala.Option.foreach(Option.scala:236)
   at

[jira] [Commented] (SPARK-1834) NoSuchMethodError when invoking JavaPairRDD.reduce() in Java

2014-08-05 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086649#comment-14086649
 ] 

Sean Owen commented on SPARK-1834:
--

Ah, you're right: 
https://github.com/apache/spark/commit/181ec5030792a10f3ce77e997d0e2eda9bcd6139
It was unlikely to be the problem anyway. Very strange.

 NoSuchMethodError when invoking JavaPairRDD.reduce() in Java
 

 Key: SPARK-1834
 URL: https://issues.apache.org/jira/browse/SPARK-1834
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1
 Environment: Redhat Linux, Java 7, Hadoop 2.2, Scala 2.10.4
Reporter: John Snodgrass

 I get a java.lang.NoSuchMethod error when invoking JavaPairRDD.reduce(). Here 
 is the partial stack trace:
 Exception in thread main java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:39)
 at 
 org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.spark.api.java.JavaPairRDD.reduce(Lorg/apache/spark/api/java/function/Function2;)Lscala/Tuple2;
 at JavaPairRDDReduceTest.main(JavaPairRDDReduceTest.java:49)...
 I'm using Spark 0.9.1. I checked to ensure that I'm compiling with the same 
 version of Spark as I am running on the cluster. The reduce() method works 
 fine with JavaRDD, just not with JavaPairRDD. Here is a code snippet that 
 exhibits the problem: 
   ArrayListInteger array = new ArrayList();
   for (int i = 0; i  10; ++i) {
 array.add(i);
   }
   JavaRDDInteger rdd = javaSparkContext.parallelize(array);
   JavaPairRDDString, Integer testRDD = rdd.map(new 
 PairFunctionInteger, String, Integer() {
 @Override
 public Tuple2String, Integer call(Integer t) throws Exception {
   return new Tuple2( + t, t);
 }
   }).cache();
   
   testRDD.reduce(new Function2Tuple2String, Integer, Tuple2String, 
 Integer, Tuple2String, Integer() {
 @Override
 public Tuple2String, Integer call(Tuple2String, Integer arg0, 
 Tuple2String, Integer arg1) throws Exception { 
   return new Tuple2(arg0._1 + arg1._1, arg0._2 * 10 + arg0._2);
 }
   });



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-08-05 Thread Zhan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086657#comment-14086657
 ] 

Zhan Zhang commented on SPARK-1537:
---

I am also interested in it and trying to integrate spark to yarn timeline 
server. Do you have any concrete plan in mind? I can start prototype it and 
then we can work together on this topic.  How do you think?

 Add integration with Yarn's Application Timeline Server
 ---

 Key: SPARK-1537
 URL: https://issues.apache.org/jira/browse/SPARK-1537
 Project: Spark
  Issue Type: New Feature
  Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 It would be nice to have Spark integrate with Yarn's Application Timeline 
 Server (see YARN-321, YARN-1530). This would allow users running Spark on 
 Yarn to have a single place to go for all their history needs, and avoid 
 having to manage a separate service (Spark's built-in server).
 At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
 although there is still some ongoing work. But the basics are there, and I 
 wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2848) Shade Guava in Spark deliverables

2014-08-05 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086670#comment-14086670
 ] 

Marcelo Vanzin commented on SPARK-2848:
---

Question for others ([~pwendell], [~sowen], maybe others): how important do you 
think it is to support this from the sbt side of the build?

This is trivial to do on the maven side (just a few pom file changes). But I 
can't seem to find any sbt plugin that does class relocation like 
maven-shade-plugin. I could write the code, but that seems to go in the wrong 
direction of keeping the sbt build code small-ish.

 Shade Guava in Spark deliverables
 -

 Key: SPARK-2848
 URL: https://issues.apache.org/jira/browse/SPARK-2848
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 As discussed in SPARK-2420, this task covers the work of shading Guava in 
 Spark deliverables so that they don't conflict with the Hadoop classpath (nor 
 user's classpath).
 Since one Guava class is exposed through Spark's API, that class will be 
 forked from 14.0.1 (current version used by Spark) and excluded from any 
 shading.
 The end result is that Spark's Guava won't be exposed to users anymore. This 
 has the side-effect of effectively downgrading to version 11 (the one used by 
 Hadoop) for those that do not explicitly depend on / package Guava with their 
 apps. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2699) Improve compatibility with parquet file/table

2014-08-05 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2699:


Target Version/s: 1.2.0  (was: 1.1.0)

 Improve compatibility with parquet file/table
 -

 Key: SPARK-2699
 URL: https://issues.apache.org/jira/browse/SPARK-2699
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Teng Qiu

 after SPARK-2446, the compatibility with parquet file created by old spark 
 release (spark 1.0.x) and by impala (all of versions until now: 1.4.x-cdh5) 
 is broken.
 strings in those parquet files are not annotated with UTF8 or are just only 
 ASCII char set (impala doesn't support UTF8 yet)
 this ticket aims to add a configuration option or some version check to 
 support those parquet files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2380) Support displaying accumulator contents in the web UI

2014-08-05 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2380:
---

Fix Version/s: 1.1.0

 Support displaying accumulator contents in the web UI
 -

 Key: SPARK-2380
 URL: https://issues.apache.org/jira/browse/SPARK-2380
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Critical
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2380) Support displaying accumulator contents in the web UI

2014-08-05 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2380.


Resolution: Fixed

Resolved by: https://github.com/apache/spark/pull/1309

 Support displaying accumulator contents in the web UI
 -

 Key: SPARK-2380
 URL: https://issues.apache.org/jira/browse/SPARK-2380
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2868) Support named accumulators in Python

2014-08-05 Thread Patrick Wendell (JIRA)

Patrick Wendell created SPARK-2868:
--

 Summary: Support named accumulators in Python
 Key: SPARK-2868
 URL: https://issues.apache.org/jira/browse/SPARK-2868
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Patrick Wendell


SPARK-2380 added this for Java/Scala. To allow this in Python we'll need to 
make some additional changes. One potential path is to have a 1:1 
correspondence with Scala accumulators (instead of a one-to-many). A challenge 
is exposing the stringified values of the accumulators to the Scala code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-2583) ConnectionManager cannot distinguish whether error occurred or not

2014-08-05 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-2583:
-

Assignee: Josh Rosen  (was: Kousuke Saruta)

 ConnectionManager cannot distinguish whether error occurred or not
 --

 Key: SPARK-2583
 URL: https://issues.apache.org/jira/browse/SPARK-2583
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Kousuke Saruta
Assignee: Josh Rosen
Priority: Critical

 ConnectionManager#handleMessage sent empty messages to another peer if some 
 error occurred or not in onReceiveCalback.
 {code}
  val ackMessage = if (onReceiveCallback != null) {
 logDebug(Calling back)
 onReceiveCallback(bufferMessage, connectionManagerId)
   } else {
 logDebug(Not calling back as callback is null)
 None
   }
   if (ackMessage.isDefined) {
 if (!ackMessage.get.isInstanceOf[BufferMessage]) {
   logDebug(Response to  + bufferMessage +  is not a buffer 
 message, it is of type 
 + ackMessage.get.getClass)
 } else if (!ackMessage.get.asInstanceOf[BufferMessage].hasAckId) {
   logDebug(Response to  + bufferMessage +  does not have ack 
 id set)
   ackMessage.get.asInstanceOf[BufferMessage].ackId = 
 bufferMessage.id
 }
   }
 // We have no way to tell peer whether error occurred or not
   sendMessage(connectionManagerId, ackMessage.getOrElse {
 Message.createBufferMessage(bufferMessage.id)
   })
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1856) Standardize MLlib interfaces

2014-08-05 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086697#comment-14086697
 ] 

Xiangrui Meng commented on SPARK-1856:
--

Yes, MLI and MLbase are research projects at AMPLab. They are exploring the 
frontier of practical machine learning. Stable ideas/features from MLI and 
MLbase will be migrated into MLlib, and this is part of the effort.

 Standardize MLlib interfaces
 

 Key: SPARK-1856
 URL: https://issues.apache.org/jira/browse/SPARK-1856
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Blocker

 Instead of expanding MLlib based on the current class naming scheme 
 (ProblemWithAlgorithm),  we should standardize MLlib's interfaces that 
 clearly separate datasets, formulations, algorithms, parameter sets, and 
 models.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1680) Clean up use of setExecutorEnvs in SparkConf

2014-08-05 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1680.
--

Resolution: Fixed

 Clean up use of setExecutorEnvs in SparkConf 
 -

 Key: SPARK-1680
 URL: https://issues.apache.org/jira/browse/SPARK-1680
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Thomas Graves
Priority: Blocker
 Fix For: 1.1.0


 We should make this consistent between YARN and Standalone. Basically, YARN 
 mode should just use the executorEnvs from the Spark conf and not need 
 SPARK_YARN_USER_ENV.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2863) Emulate Hive type coercion in native reimplementations of Hive functions

2014-08-05 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2863:


Assignee: William Benton

 Emulate Hive type coercion in native reimplementations of Hive functions
 

 Key: SPARK-2863
 URL: https://issues.apache.org/jira/browse/SPARK-2863
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: William Benton
Assignee: William Benton

 Native reimplementations of Hive functions no longer have the same 
 type-coercion behavior as they would if executed via Hive.  As [Michael 
 Armbrust points 
 out|https://github.com/apache/spark/pull/1750#discussion_r15790970], queries 
 like {{SELECT SQRT(2) FROM src LIMIT 1}} succeed in Hive but fail if 
 {{SQRT}} is implemented natively.
 Spark SQL should have Hive-compatible type coercions for arguments to 
 natively-implemented functions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2844) Existing JVM Hive Context not correctly used in Python Hive Context

2014-08-05 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2844:


Priority: Major  (was: Minor)
Target Version/s: 1.1.0

 Existing JVM Hive Context not correctly used in Python Hive Context
 ---

 Key: SPARK-2844
 URL: https://issues.apache.org/jira/browse/SPARK-2844
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Reporter: Ahir Reddy
Assignee: Ahir Reddy

 Unlike the SQLContext, assing an existing JVM HiveContext object into the 
 Python HiveContext constructor does not actually re-use that object. Instead 
 it will create a new HiveContext.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-08-05 Thread Zhan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086808#comment-14086808
 ] 

Zhan Zhang commented on SPARK-1537:
---

Do you mind sharing your thoughts, design document or prototype code?

Thanks.

 Add integration with Yarn's Application Timeline Server
 ---

 Key: SPARK-1537
 URL: https://issues.apache.org/jira/browse/SPARK-1537
 Project: Spark
  Issue Type: New Feature
  Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 It would be nice to have Spark integrate with Yarn's Application Timeline 
 Server (see YARN-321, YARN-1530). This would allow users running Spark on 
 Yarn to have a single place to go for all their history needs, and avoid 
 having to manage a separate service (Spark's built-in server).
 At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
 although there is still some ongoing work. But the basics are there, and I 
 wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-08-05 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086817#comment-14086817
 ] 

Marcelo Vanzin commented on SPARK-1537:
---

Currently busy with other more urgent tasks, but I'll push to my repo and post 
a link when I get some time.

 Add integration with Yarn's Application Timeline Server
 ---

 Key: SPARK-1537
 URL: https://issues.apache.org/jira/browse/SPARK-1537
 Project: Spark
  Issue Type: New Feature
  Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 It would be nice to have Spark integrate with Yarn's Application Timeline 
 Server (see YARN-321, YARN-1530). This would allow users running Spark on 
 Yarn to have a single place to go for all their history needs, and avoid 
 having to manage a separate service (Spark's built-in server).
 At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
 although there is still some ongoing work. But the basics are there, and I 
 wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2854) Finalize _acceptable_types in pyspark.sql

2014-08-05 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086877#comment-14086877
 ] 

Apache Spark commented on SPARK-2854:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/1793

 Finalize _acceptable_types in pyspark.sql
 -

 Key: SPARK-2854
 URL: https://issues.apache.org/jira/browse/SPARK-2854
 Project: Spark
  Issue Type: Task
  Components: SQL
Reporter: Yin Huai
Priority: Blocker

 In PySpark, _acceptable_types defines accepted Python data types for every 
 Spark SQL data type. The list is shown below. 
 {code}
 _acceptable_types = {
 BooleanType: (bool,),
 ByteType: (int, long),
 ShortType: (int, long),
 IntegerType: (int, long),
 LongType: (int, long),
 FloatType: (float,),
 DoubleType: (float,),
 DecimalType: (decimal.Decimal,),
 StringType: (str, unicode),
 TimestampType: (datetime.datetime, datetime.time, datetime.date),
 ArrayType: (list, tuple, array),
 MapType: (dict,),
 StructType: (tuple, list),
 }
 {code}
 Let's double check this mapping before 1.1 release.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2650) Wrong initial sizes for in-memory column buffers

2014-08-05 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-2650:
--

Target Version/s: 1.2.0  (was: 1.1.0)

 Wrong initial sizes for in-memory column buffers
 

 Key: SPARK-2650
 URL: https://issues.apache.org/jira/browse/SPARK-2650
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0, 1.0.1
Reporter: Michael Armbrust
Assignee: Cheng Lian
Priority: Critical

 The logic for setting up the initial column buffers is different for Spark 
 SQL compared to Shark and I'm seeing OOMs when caching tables that are larger 
 than available memory (where shark was okay).
 Two suspicious things: the intialSize is always set to 0 so we always go with 
 the default.  The default looks like it was copied from code like 10 * 1024 * 
 1024... but in Spark SQL its 10 * 102 * 1024.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2866) ORDER BY attributes must appear in SELECT clause

2014-08-05 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087077#comment-14087077
 ] 

Apache Spark commented on SPARK-2866:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1795

 ORDER BY attributes must appear in SELECT clause
 

 Key: SPARK-2866
 URL: https://issues.apache.org/jira/browse/SPARK-2866
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2636) no where to get job identifier while submit spark job through spark API

2014-08-05 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087102#comment-14087102
 ] 

Chengxiang Li commented on SPARK-2636:
--

{quote}
There are two ways I think. One is for DAGScheduler.runJob to return an integer 
(or long) id for the job. An alternative, which I think is better and relates 
to SPARK-2321, is for runJob to return some Job object that has information 
about the id and can be queried about progress.
{quote}
DAGScheduler is Spark internal class, User can hardly use it directly. I like 
your second idea,  return a Job info object while submit spark job in 
SparkContext(JavaSparkContext in this case) or RDD level. Actually 
AsyncRDDActions has done part of this work, I think it maybe a good place to 
fix this issue.

 no where to get job identifier while submit spark job through spark API
 ---

 Key: SPARK-2636
 URL: https://issues.apache.org/jira/browse/SPARK-2636
 Project: Spark
  Issue Type: New Feature
Reporter: Chengxiang Li

 In Hive on Spark, we want to track spark job status through Spark API, the 
 basic idea is as following:
 # create an hive-specified spark listener and register it to spark listener 
 bus.
 # hive-specified spark listener generate job status by spark listener events.
 # hive driver track job status through hive-specified spark listener. 
 the current problem is that hive driver need job identifier to track 
 specified job status through spark listener, but there is no spark API to get 
 job identifier(like job id) while submit spark job.
 I think other project whoever try to track job status with spark API would 
 suffer from this as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2872) Fix conflict between code and doc in YarnClientSchedulerBackend

2014-08-05 Thread Zhihui (JIRA)

Zhihui created SPARK-2872:
-

 Summary: Fix conflict between code and doc in 
YarnClientSchedulerBackend
 Key: SPARK-2872
 URL: https://issues.apache.org/jira/browse/SPARK-2872
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Zhihui


Doc say: system properties override environment variables.
https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala#L71

But code is conflict with it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2872) Fix conflict between code and doc in YarnClientSchedulerBackend

2014-08-05 Thread Zhihui (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087120#comment-14087120
 ] 

Zhihui commented on SPARK-2872:
---

PR https://github.com/apache/spark/pull/1684

 Fix conflict between code and doc in YarnClientSchedulerBackend
 ---

 Key: SPARK-2872
 URL: https://issues.apache.org/jira/browse/SPARK-2872
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Zhihui

 Doc say: system properties override environment variables.
 https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala#L71
 But code is conflict with it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2848) Shade Guava in Spark deliverables

2014-08-05 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087179#comment-14087179
 ] 

Marcelo Vanzin commented on SPARK-2848:
---

Nevermind the question, I got code mostly working to do this on the sbt side.

 Shade Guava in Spark deliverables
 -

 Key: SPARK-2848
 URL: https://issues.apache.org/jira/browse/SPARK-2848
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 As discussed in SPARK-2420, this task covers the work of shading Guava in 
 Spark deliverables so that they don't conflict with the Hadoop classpath (nor 
 user's classpath).
 Since one Guava class is exposed through Spark's API, that class will be 
 forked from 14.0.1 (current version used by Spark) and excluded from any 
 shading.
 The end result is that Spark's Guava won't be exposed to users anymore. This 
 has the side-effect of effectively downgrading to version 11 (the one used by 
 Hadoop) for those that do not explicitly depend on / package Guava with their 
 apps. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2874) Spark SQL related scripts don't show complete usage message

2014-08-05 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-2874:
-

 Summary: Spark SQL related scripts don't show complete usage 
message
 Key: SPARK-2874
 URL: https://issues.apache.org/jira/browse/SPARK-2874
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.2
Reporter: Cheng Lian
Priority: Minor


Due to [SPARK-2678|https://issues.apache.org/jira/browse/SPARK-2678], 
{{--help}} is shadowed by {{spark-submit}}, thus {{bin/spark-sql}} and 
{{sbin/start-thriftserver2.sh}} can't show application customized usage 
messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

43 matches

Mail list logo