date:20140910

[jira] [Updated] (SPARK-3469) All TaskCompletionListeners should be called even if some of them fail

2014-09-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3469:
---
Summary: All TaskCompletionListeners should be called even if some of them 
fail  (was: Make sure TaskCompletionListeners are called in presence of 
failures)

 All TaskCompletionListeners should be called even if some of them fail
 --

 Key: SPARK-3469
 URL: https://issues.apache.org/jira/browse/SPARK-3469
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Reynold Xin
Assignee: Reynold Xin

 If there are multiple TaskCompletionListeners, and any one of them misbehaves 
 (e.g. throws an exception), then we will skip executing the rest of them.
 As we are increasingly relying on TaskCompletionListener for cleaning up of 
 resources, we should make sure they are always called, even if the previous 
 ones fail. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3469) All TaskCompletionListeners should be called even if some of them fail

2014-09-10 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3469:
---
Component/s: Spark Core

 All TaskCompletionListeners should be called even if some of them fail
 --

 Key: SPARK-3469
 URL: https://issues.apache.org/jira/browse/SPARK-3469
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Reynold Xin
Assignee: Reynold Xin

 If there are multiple TaskCompletionListeners, and any one of them misbehaves 
 (e.g. throws an exception), then we will skip executing the rest of them.
 As we are increasingly relying on TaskCompletionListener for cleaning up of 
 resources, we should make sure they are always called, even if the previous 
 ones fail. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3282) It should support multiple receivers at one socketInputDStream

2014-09-10 Thread shenhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong resolved SPARK-3282.
-
Resolution: Won't Fix

 It should support multiple receivers at one socketInputDStream 
 ---

 Key: SPARK-3282
 URL: https://issues.apache.org/jira/browse/SPARK-3282
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.2
Reporter: shenhong

 At present, a socketInputDStream support at most one receiver, it will be 
 bottleneck when large inputStrem appear. 
 It should support multiple receivers at one socketInputDStream 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Shay Rojansky (JIRA)

Shay Rojansky created SPARK-3470:


 Summary: Have JavaSparkContext implement Closeable/AutoCloseable
 Key: SPARK-3470
 URL: https://issues.apache.org/jira/browse/SPARK-3470
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor


After discussion in SPARK-2972, it seems like a good idea to allow Java 
developers to use Java 7 automatic resource management with JavaSparkContext, 
like so:

{code:java}
try (JavaSparkContext ctx = new JavaSparkContext(...)) {
   return br.readLine();
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3395) {SQL] DSL uses incorrect attribute ids after a distinct()

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3395.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

 {SQL] DSL uses incorrect attribute ids after a distinct()
 -

 Key: SPARK-3395
 URL: https://issues.apache.org/jira/browse/SPARK-3395
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Eric Liang
Assignee: Eric Liang
Priority: Minor
 Fix For: 1.2.0


 In the following example, 
 val rdd = ... // two columns: {key, value}
 val derivedRDD = rdd.distinct().limit(1)
 sql(explain select * from rdd inner join derivedRDD on rdd.key = 
 derivedRDD.key)
 The inner join executes incorrectly since the two keys end up with the same 
 attribute id after analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3469) All TaskCompletionListeners should be called even if some of them fail

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128149#comment-14128149
 ] 

Apache Spark commented on SPARK-3469:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/2343

 All TaskCompletionListeners should be called even if some of them fail
 --

 Key: SPARK-3469
 URL: https://issues.apache.org/jira/browse/SPARK-3469
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Reynold Xin
Assignee: Reynold Xin

 If there are multiple TaskCompletionListeners, and any one of them misbehaves 
 (e.g. throws an exception), then we will skip executing the rest of them.
 As we are increasingly relying on TaskCompletionListener for cleaning up of 
 resources, we should make sure they are always called, even if the previous 
 ones fail. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3471) Automatic resource manager for SparkContext in Scala?

2014-09-10 Thread Shay Rojansky (JIRA)

Shay Rojansky created SPARK-3471:


 Summary: Automatic resource manager for SparkContext in Scala?
 Key: SPARK-3471
 URL: https://issues.apache.org/jira/browse/SPARK-3471
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor


After discussion in SPARK-2972, it seems like a good idea to add automatic 
resource management semantics to SparkContext (i.e. with in Python 
(SPARK-3458), Closeable/AutoCloseable in Java (SPARK-3470)).

I have no knowledge of Scala whatsoever, but a quick search seems to indicate 
that there isn't a standard mechanism for this - someone with real Scala 
knowledge should take a look and make a decision...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

2014-09-10 Thread Shay Rojansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shay Rojansky closed SPARK-2972.

Resolution: Won't Fix

 APPLICATION_COMPLETE not created in Python unless context explicitly stopped
 

 Key: SPARK-2972
 URL: https://issues.apache.org/jira/browse/SPARK-2972
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.0.2
 Environment: Cloudera 5.1, yarn master on ubuntu precise
Reporter: Shay Rojansky

 If you don't explicitly stop a SparkContext at the end of a Python 
 application with sc.stop(), an APPLICATION_COMPLETE file isn't created and 
 the job doesn't get picked up by the history server.
 This can be easily reproduced with pyspark (but affects scripts as well).
 The current workaround is to wrap the entire script with a try/finally and 
 stop manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3472) Option to take top n elements (unsorted)

2014-09-10 Thread Kanwaljit Singh (JIRA)

Kanwaljit Singh created SPARK-3472:
--

 Summary: Option to take top n elements (unsorted)
 Key: SPARK-3472
 URL: https://issues.apache.org/jira/browse/SPARK-3472
 Project: Spark
  Issue Type: New Feature
Reporter: Kanwaljit Singh
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3472) Option to take top n elements (unsorted)

2014-09-10 Thread Kanwaljit Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanwaljit Singh closed SPARK-3472.
--
Resolution: Invalid

 Option to take top n elements (unsorted)
 

 Key: SPARK-3472
 URL: https://issues.apache.org/jira/browse/SPARK-3472
 Project: Spark
  Issue Type: New Feature
Reporter: Kanwaljit Singh
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3364.

Resolution: Fixed

 Zip equal-length but unequally-partition
 

 Key: SPARK-3364
 URL: https://issues.apache.org/jira/browse/SPARK-3364
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Kevin Jung
 Fix For: 1.1.0


 ZippedRDD losts some elements after zipping RDDs with equal numbers of 
 partitions but unequal numbers of elements in their each partitions.
 This can happen when a user creates RDD by sc.textFile(path,partitionNumbers) 
 with physically unbalanced HDFS file.
 {noformat}
 var x = sc.parallelize(1 to 9,3)
 var y = sc.parallelize(Array(1,1,1,1,1,2,2,3,3),3).keyBy(i=i)
 var z = y.partitionBy(new RangePartitioner(3,y))
 expected
 x.zip(y).count()
 9
 x.zip(y).collect()
 Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(1,1)), 
 (5,(1,1)), (6,(2,2)), (7,(2,2)), (8,(3,3)), (9,(3,3)))
 unexpected
 x.zip(z).count()
 7
 x.zip(z).collect()
 Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(2,2)), 
 (5,(2,2)), (7,(3,3)), (8,(3,3)))
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3345) Do correct parameters for ShuffleFileGroup

2014-09-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3345.

   Resolution: Fixed
Fix Version/s: (was: 1.1.1)

 Do correct parameters for ShuffleFileGroup
 --

 Key: SPARK-3345
 URL: https://issues.apache.org/jira/browse/SPARK-3345
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Liang-Chi Hsieh
Assignee: Liang-Chi Hsieh
Priority: Minor
 Fix For: 1.2.0


 In the method newFileGroup of class FileShuffleBlockManager, the parameters 
 for creating new ShuffleFileGroup object is in wrong order.
 Wrong: new ShuffleFileGroup(fileId, shuffleId, files)
 Corrent: new ShuffleFileGroup(shuffleId, fileId, files)
 Because in current codes, the parameters shuffleId and fileId are not used. 
 So it doesn't cause problem now. However it should be corrected for 
 readability and avoid future problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3326) can't access a static variable after init in mapper

2014-09-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3326.

Resolution: Not a Problem

 can't access a static variable after init in mapper
 ---

 Key: SPARK-3326
 URL: https://issues.apache.org/jira/browse/SPARK-3326
 Project: Spark
  Issue Type: Bug
 Environment: CDH5.1.0
 Spark1.0.0
Reporter: Gavin Zhang

 I wrote a object like:
 object Foo {
private Bar bar = null
def init(Bar bar){
this.bar = bar
}
def getSome(){
bar.someDef()
}
 }
 In Spark main def, I read some text from HDFS and init this object. And after 
 then calling getSome().
 I was successful with this code:
 sc.textFile(args(0)).take(10).map(println(Foo.getSome()))
 However, when I changed it for write output to HDFS, I found the bar variable 
 in Foo object is null:
 sc.textFile(args(0)).map(line=Foo.getSome()).saveAsTextFile(args(1))
 WHY?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3473) Expose task status when converting TaskInfo into JSON representation

2014-09-10 Thread Kousuke Saruta (JIRA)

Kousuke Saruta created SPARK-3473:
-

 Summary: Expose task status when converting TaskInfo into JSON 
representation
 Key: SPARK-3473
 URL: https://issues.apache.org/jira/browse/SPARK-3473
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Kousuke Saruta


When TaskInfo is converted into JSON by JsonProtocol, status is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3473) Expose task status when converting TaskInfo into JSON representation

2014-09-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta closed SPARK-3473.
-
Resolution: Won't Fix

I can know task status from failed field and finishTime so I close this.

 Expose task status when converting TaskInfo into JSON representation
 

 Key: SPARK-3473
 URL: https://issues.apache.org/jira/browse/SPARK-3473
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Kousuke Saruta

 When TaskInfo is converted into JSON by JsonProtocol, status is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128232#comment-14128232
 ] 

Sean Owen commented on SPARK-3470:
--

If you implement {{AutoCloseable}}, then Spark will not work on Java 6, since 
this class does not exist before Java 7. Implementing {{Closeable}} is fine of 
course. I assume it would just call {{stop()}}

 Have JavaSparkContext implement Closeable/AutoCloseable
 ---

 Key: SPARK-3470
 URL: https://issues.apache.org/jira/browse/SPARK-3470
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor

 After discussion in SPARK-2972, it seems like a good idea to allow Java 
 developers to use Java 7 automatic resource management with JavaSparkContext, 
 like so:
 {code:java}
 try (JavaSparkContext ctx = new JavaSparkContext(...)) {
return br.readLine();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Shay Rojansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128285#comment-14128285
 ] 

Shay Rojansky commented on SPARK-3470:
--

Good point about AutoCloseable. Yes, the idea is for Closeable to call stop(). 
I'd submit a PR myself but I don't know any Scala whatsoever...

 Have JavaSparkContext implement Closeable/AutoCloseable
 ---

 Key: SPARK-3470
 URL: https://issues.apache.org/jira/browse/SPARK-3470
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor

 After discussion in SPARK-2972, it seems like a good idea to allow Java 
 developers to use Java 7 automatic resource management with JavaSparkContext, 
 like so:
 {code:java}
 try (JavaSparkContext ctx = new JavaSparkContext(...)) {
return br.readLine();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

2014-09-10 Thread Chunjun Xiao (JIRA)

Chunjun Xiao created SPARK-3474:
---

 Summary: Rename the env variable SPARK_MASTER_IP to 
SPARK_MASTER_HOST
 Key: SPARK-3474
 URL: https://issues.apache.org/jira/browse/SPARK-3474
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.1
Reporter: Chunjun Xiao


There's some inconsistency regarding the env variable used to specify the spark 
master host server.

In spark source code (MasterArguments.scala), the env variable is 
SPARK_MASTER_HOST, while in the shell script (e.g., spark-env.sh, 
start-master.sh), it's named SPARK_MASTER_IP.

This will introduce an issue in some case, e.g., if spark master is started  
via service spark-master start, which is built based on latest bigtop (refer 
to bigtop/spark-master.svc).
In this case, SPARK_MASTER_IP will have no effect.
I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128300#comment-14128300
 ] 

Sean Owen commented on SPARK-3474:
--

(You can deprecate but still support old variable names, right? so 
SPARK_MASTER_IP has the effect of setting new SPARK_MASTER_HOST but generates a 
warning. You wouldn't want to or need to remove old vars immediately.)

 Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST
 

 Key: SPARK-3474
 URL: https://issues.apache.org/jira/browse/SPARK-3474
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.1
Reporter: Chunjun Xiao

 There's some inconsistency regarding the env variable used to specify the 
 spark master host server.
 In spark source code (MasterArguments.scala), the env variable is 
 SPARK_MASTER_HOST, while in the shell script (e.g., spark-env.sh, 
 start-master.sh), it's named SPARK_MASTER_IP.
 This will introduce an issue in some case, e.g., if spark master is started  
 via service spark-master start, which is built based on latest bigtop 
 (refer to bigtop/spark-master.svc).
 In this case, SPARK_MASTER_IP will have no effect.
 I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3407) Add Date type support

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128421#comment-14128421
 ] 

Apache Spark commented on SPARK-3407:
-

User 'adrian-wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/2344

 Add Date type support
 -

 Key: SPARK-3407
 URL: https://issues.apache.org/jira/browse/SPARK-3407
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

2014-09-10 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128451#comment-14128451
 ] 

Cody Koeninger commented on SPARK-3462:
---

Created a PR for feedback.

https://github.com/apache/spark/pull/2345

Seems to do the right thing locally, will see about testing on a cluster

 parquet pushdown for unionAll
 -

 Key: SPARK-3462
 URL: https://issues.apache.org/jira/browse/SPARK-3462
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cody Koeninger

 http://apache-spark-developers-list.1001551.n3.nabble.com/parquet-predicate-projection-pushdown-into-unionAll-td8339.html
 // single table, pushdown
 scala p.where('age  40).select('name)
 res36: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[97] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project [name#3]
  ParquetTableScan [name#3,age#4], (ParquetRelation /var/tmp/people, 
 Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
 mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), [(age#4  
 40)]
 // union of 2 tables, no pushdown
 scala b.where('age  40).select('name)
 res37: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[99] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project [name#3]
  Filter (age#4  40)
   Union [ParquetTableScan [name#3,age#4,phones#5], (ParquetRelation 
 /var/tmp/people, Some(Configuration: core-default.xml, core-site.xml, 
 mapred-default.xml, mapred-site.xml), 
 org.apache.spark.sql.SQLContext@6d7e79f6, []), []
 ,ParquetTableScan [name#0,age#1,phones#2], (ParquetRelation /var/tmp/people2, 
 Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
 mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), []
 ]  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128478#comment-14128478
 ] 

Apache Spark commented on SPARK-3462:
-

User 'koeninger' has created a pull request for this issue:
https://github.com/apache/spark/pull/2345

 parquet pushdown for unionAll
 -

 Key: SPARK-3462
 URL: https://issues.apache.org/jira/browse/SPARK-3462
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cody Koeninger

 http://apache-spark-developers-list.1001551.n3.nabble.com/parquet-predicate-projection-pushdown-into-unionAll-td8339.html
 // single table, pushdown
 scala p.where('age  40).select('name)
 res36: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[97] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project [name#3]
  ParquetTableScan [name#3,age#4], (ParquetRelation /var/tmp/people, 
 Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
 mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), [(age#4  
 40)]
 // union of 2 tables, no pushdown
 scala b.where('age  40).select('name)
 res37: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[99] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project [name#3]
  Filter (age#4  40)
   Union [ParquetTableScan [name#3,age#4,phones#5], (ParquetRelation 
 /var/tmp/people, Some(Configuration: core-default.xml, core-site.xml, 
 mapred-default.xml, mapred-site.xml), 
 org.apache.spark.sql.SQLContext@6d7e79f6, []), []
 ,ParquetTableScan [name#0,age#1,phones#2], (ParquetRelation /var/tmp/people2, 
 Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
 mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), []
 ]  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128552#comment-14128552
 ] 

Apache Spark commented on SPARK-3470:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2346

 Have JavaSparkContext implement Closeable/AutoCloseable
 ---

 Key: SPARK-3470
 URL: https://issues.apache.org/jira/browse/SPARK-3470
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor

 After discussion in SPARK-2972, it seems like a good idea to allow Java 
 developers to use Java 7 automatic resource management with JavaSparkContext, 
 like so:
 {code:java}
 try (JavaSparkContext ctx = new JavaSparkContext(...)) {
return br.readLine();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

2014-09-10 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128682#comment-14128682
 ] 

Cody Koeninger commented on SPARK-3462:
---

Tested this on a cluster against unions of 2 and 3 parquet tables, around 
2billion records.  

Seems like a big performance win - previously, simple queries (eg count, approx 
distinct count of single column) against a union of 2 tables were taking 5 to 
10x as long as a single table.  

Now it's closer to linear, e.g. 35 secs for one table, 74 for union of 2, etc.

 parquet pushdown for unionAll
 -

 Key: SPARK-3462
 URL: https://issues.apache.org/jira/browse/SPARK-3462
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cody Koeninger

 http://apache-spark-developers-list.1001551.n3.nabble.com/parquet-predicate-projection-pushdown-into-unionAll-td8339.html
 // single table, pushdown
 scala p.where('age  40).select('name)
 res36: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[97] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project [name#3]
  ParquetTableScan [name#3,age#4], (ParquetRelation /var/tmp/people, 
 Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
 mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), [(age#4  
 40)]
 // union of 2 tables, no pushdown
 scala b.where('age  40).select('name)
 res37: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[99] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project [name#3]
  Filter (age#4  40)
   Union [ParquetTableScan [name#3,age#4,phones#5], (ParquetRelation 
 /var/tmp/people, Some(Configuration: core-default.xml, core-site.xml, 
 mapred-default.xml, mapred-site.xml), 
 org.apache.spark.sql.SQLContext@6d7e79f6, []), []
 ,ParquetTableScan [name#0,age#1,phones#2], (ParquetRelation /var/tmp/people2, 
 Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
 mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), []
 ]  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3286) Cannot view ApplicationMaster UI when Yarn’s url scheme is https

2014-09-10 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-3286.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

 Cannot view ApplicationMaster UI when Yarn’s url scheme is https
 

 Key: SPARK-3286
 URL: https://issues.apache.org/jira/browse/SPARK-3286
 Project: Spark
  Issue Type: Bug
  Components: Web UI, YARN
Affects Versions: 1.0.2
Reporter: Benoy Antony
 Fix For: 1.2.0

 Attachments: SPARK-3286-branch-1-0.patch, SPARK-3286.patch


 The spark Application Master starts its web UI at http://host-name:port.
 When Spark ApplicationMaster registers its URL with Resource Manager , the 
 URL does not contain URI scheme.
 If the URL scheme is absent, Resource Manager’s web app proxy will use the 
 HTTP Policy of the Resource Manager.(YARN-1553)
 If the HTTP Policy of the Resource Manager is https, then web app proxy  will 
 try to access https://host-name:port.
 This will result in error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3475) dev/merge_spark_pr.py fails on mac

2014-09-10 Thread Thomas Graves (JIRA)

Thomas Graves created SPARK-3475:


 Summary: dev/merge_spark_pr.py fails on mac 
 Key: SPARK-3475
 URL: https://issues.apache.org/jira/browse/SPARK-3475
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.2.0
Reporter: Thomas Graves


commit 
https://github.com/apache/spark/commit/4f4a9884d9268ba9808744b3d612ac23c75f105a#diff-c321b6c82ebb21d8fd225abea9b7b74c
 adding in print statement in the run command. When I try to run on mac it 
errors out when it hits these print statements. Perhaps there is workaround of 
issue with my environment.

Automatic merge went well; stopped before committing as requested
git log HEAD..PR_TOOL_MERGE_PR_2276 --pretty=format:%an %ae
git log HEAD..PR_TOOL_MERGE_PR_2276 --pretty=format:%h [%an] %s
Traceback (most recent call last):
  File ./dev/merge_spark_pr.py, line 332, in module
merge_hash = merge_pr(pr_num, target_ref)
  File ./dev/merge_spark_pr.py, line 156, in merge_pr
run_cmd(['git', 'commit', '--author=%s' % primary_author] + 
merge_message_flags)
  File ./dev/merge_spark_pr.py, line 77, in run_cmd
print  .join(cmd)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128751#comment-14128751
 ] 

Matthew Farrellee commented on SPARK-3470:
--

while you can implement Closeable in java 7+ and use try (Closeable c = new 
...) { ... } (at least w/ openjdk 1.8), since spark targets java 7+, why not 
just use AutoCloseable?

 Have JavaSparkContext implement Closeable/AutoCloseable
 ---

 Key: SPARK-3470
 URL: https://issues.apache.org/jira/browse/SPARK-3470
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor

 After discussion in SPARK-2972, it seems like a good idea to allow Java 
 developers to use Java 7 automatic resource management with JavaSparkContext, 
 like so:
 {code:java}
 try (JavaSparkContext ctx = new JavaSparkContext(...)) {
return br.readLine();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128764#comment-14128764
 ] 

Sean Owen commented on SPARK-3470:
--

Spark retains compatibility with Java 6 on purpose AFAIK. But implementing 
Closeable is fine and also works with try-with-resources in Java 7, yes.

 Have JavaSparkContext implement Closeable/AutoCloseable
 ---

 Key: SPARK-3470
 URL: https://issues.apache.org/jira/browse/SPARK-3470
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: Shay Rojansky
Priority: Minor

 After discussion in SPARK-2972, it seems like a good idea to allow Java 
 developers to use Java 7 automatic resource management with JavaSparkContext, 
 like so:
 {code:java}
 try (JavaSparkContext ctx = new JavaSparkContext(...)) {
return br.readLine();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1484) MLlib should warn if you are using an iterative algorithm on non-cached data

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128800#comment-14128800
 ] 

Apache Spark commented on SPARK-1484:
-

User 'staple' has created a pull request for this issue:
https://github.com/apache/spark/pull/2347

 MLlib should warn if you are using an iterative algorithm on non-cached data
 

 Key: SPARK-1484
 URL: https://issues.apache.org/jira/browse/SPARK-1484
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Matei Zaharia

 Not sure what the best way to warn is, but even printing to the log is 
 probably fine. We may want to print at the end of the training run as well as 
 the beginning to make it more visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3478) Profile Python tasks stage by stage in worker

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129503#comment-14129503
 ] 

Apache Spark commented on SPARK-3478:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/2351

 Profile Python tasks stage by stage in worker
 -

 Key: SPARK-3478
 URL: https://issues.apache.org/jira/browse/SPARK-3478
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Davies Liu
Assignee: Davies Liu

 The Python code in driver is easy to profile by users, but the code run in 
 worker is distributed in clusters, is not easy to profile by users.
 So we need a way to do the profiling in worker and aggregate all the result 
 together for users.
 This also can be used to analys the bottleneck in PySpark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'

2014-09-10 Thread Yi Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhou updated SPARK-3480:
---
Description: 
Symptom:
Run ./dev/run-tests and dump outputs as following:
SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
-Pkinesis-asl
[Warn] Java 8 tests will not run because JDK version is  1.8.
=
Running Apache RAT checks
=
RAT checks passed.
=
Running Scala style checks
=
Scalastyle checks failed at following occurrences:
[error] Expected ID character
[error] Not a valid command: yarn-alpha
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Not a valid key: yarn-alpha
[error] yarn-alpha/scalastyle
[error]   ^

Possible Cause:
I checked the dev/scalastyle, found that there are 2 parameters 
'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
yarn-alpha/scalastyle \
   scalastyle.txt

echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
yarn/scalastyle \
   scalastyle.txt

From above error message, sbt seems to complain them due to '/' separator. So 
it can be run through after  I manually modified original ones to  
'yarn-alpha:scalastyle' and 'yarn:scalastyle'..

  was:
Symptom:
Run ./dev/run-tests and dump outputs as following:
SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
-Pkinesis-asl
[Warn] Java 8 tests will not run because JDK version is  1.8.
=
Running Apache RAT checks
=
RAT checks passed.
=
Running Scala style checks
=
Scalastyle checks failed at following occurrences:
[error] Expected ID character
[error] Not a valid command: yarn-alpha
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Not a valid key: yarn-alpha
[error] yarn-alpha/scalastyle
[error]   ^

Possible Cause:
I checked the dev/scalastyle, found that there are 2 parameters 
'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
yarn-alpha/scalastyle \
   scalastyle.txt
# Check style with YARN built too
echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
yarn/scalastyle \
   scalastyle.txt

From above error message, sbt seems to complain them due to '/' separator. So 
it can be run through after  I manually modified original ones to  
'yarn-alpha:scalastyle' and 'yarn:scalastyle'..


 Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for 
 sbt build tool during 'Running Scala style checks'
 ---

 Key: SPARK-3480
 URL: https://issues.apache.org/jira/browse/SPARK-3480
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Yi Zhou
Priority: Minor

 Symptom:
 Run ./dev/run-tests and dump outputs as following:
 SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
 -Pkinesis-asl
 [Warn] Java 8 tests will not run because JDK version is  1.8.
 =
 Running Apache RAT checks
 =
 RAT checks passed.
 =
 Running Scala style checks
 =
 Scalastyle checks failed at following occurrences:
 [error] Expected ID character
 [error] Not a valid command: yarn-alpha
 [error] Expected project ID
 [error] Expected configuration
 [error] Expected ':' (if selecting a configuration)
 [error] Expected key
 [error] Not a valid key: yarn-alpha
 [error] yarn-alpha/scalastyle
 [error]   ^
 Possible Cause:
 I checked the dev/scalastyle, found that there are 2 parameters 
 'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
 echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
 yarn-alpha/scalastyle \
scalastyle.txt
 echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
 yarn/scalastyle \
scalastyle.txt
 From above error

[jira] [Resolved] (SPARK-3447) Kryo NPE when serializing JListWrapper

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3447.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

 Kryo NPE when serializing JListWrapper
 --

 Key: SPARK-3447
 URL: https://issues.apache.org/jira/browse/SPARK-3447
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
 Fix For: 1.2.0


 Repro (provided by [~davies]):
 {code}
 from pyspark.sql import SQLContext; 
 SQLContext(sc).inferSchema(sc.parallelize([{a: 
 [3]}]))._jschema_rdd.collect()
 {code}
 {code}
 14/09/05 21:59:47 ERROR TaskResultGetter: Exception while getting task result
 com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
 Serialization trace:
 underlying (scala.collection.convert.Wrappers$JListWrapper)
 values (org.apache.spark.sql.catalyst.expressions.GenericRow)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
 at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
 at 
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at 
 org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
 at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
 at 
 org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
 at 
 org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1276)
 at 
 org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:701)
 Caused by: java.lang.NullPointerException
 at 
 scala.collection.convert.Wrappers$MutableBufferWrapper.add(Wrappers.scala:80)
 at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
 at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
 at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
 ... 23 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3469) All TaskCompletionListeners should be called even if some of them fail

[jira] [Updated] (SPARK-3469) All TaskCompletionListeners should be called even if some of them fail

[jira] [Resolved] (SPARK-3282) It should support multiple receivers at one socketInputDStream

[jira] [Created] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

[jira] [Resolved] (SPARK-3395) {SQL] DSL uses incorrect attribute ids after a distinct()

[jira] [Commented] (SPARK-3469) All TaskCompletionListeners should be called even if some of them fail

[jira] [Created] (SPARK-3471) Automatic resource manager for SparkContext in Scala?

[jira] [Closed] (SPARK-2972) APPLICATION_COMPLETE not created in Python unless context explicitly stopped

[jira] [Created] (SPARK-3472) Option to take top n elements (unsorted)

[jira] [Closed] (SPARK-3472) Option to take top n elements (unsorted)

[jira] [Resolved] (SPARK-3364) Zip equal-length but unequally-partition

[jira] [Resolved] (SPARK-3345) Do correct parameters for ShuffleFileGroup

[jira] [Resolved] (SPARK-3326) can't access a static variable after init in mapper

[jira] [Created] (SPARK-3473) Expose task status when converting TaskInfo into JSON representation

[jira] [Closed] (SPARK-3473) Expose task status when converting TaskInfo into JSON representation

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

[jira] [Created] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

[jira] [Commented] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

[jira] [Commented] (SPARK-3407) Add Date type support

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

[jira] [Resolved] (SPARK-3286) Cannot view ApplicationMaster UI when Yarn’s url scheme is https

[jira] [Created] (SPARK-3475) dev/merge_spark_pr.py fails on mac

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

[jira] [Commented] (SPARK-1484) MLlib should warn if you are using an iterative algorithm on non-cached data

[jira] [Commented] (SPARK-3478) Profile Python tasks stage by stage in worker

[jira] [Updated] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'

[jira] [Resolved] (SPARK-3447) Kryo NPE when serializing JListWrapper

32 matches

Site Navigation

Mail list logo

Footer information