from:"Patrick Wendell"


 [ 
https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7920:
---
Labels:   (was: spark.tc)

 Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).
 

 Key: SPARK-7920
 URL: https://issues.apache.org/jira/browse/SPARK-7920
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.3.1, 1.4.0
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.0


 The MLlib ChiSqSelector class is not serializable, and so the example in the 
 ChiSqSelector documentation fails.  Also, that example is missing the import 
 of ChiSqSelector.  ChiSqSelector should just extend Serializable.
 Steps:
 1. Locate the MLlib ChiSqSelector documentation example.
 2. Fix the example by adding an import statement for ChiSqSelector.
 3. Attempt to run - notice that it will fail due to ChiSqSelector not being 
 serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions


 [ 
https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8927:
---
Labels:   (was: spark.tc)

 Doc format wrong for some config descriptions
 -

 Key: SPARK-8927
 URL: https://issues.apache.org/jira/browse/SPARK-8927
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Jon Alter
Assignee: Jon Alter
Priority: Trivial
 Fix For: 1.4.2, 1.5.0


 In the docs, a couple descriptions of configuration (under Network) are not 
 inside td/td and are being displayed immediately under the section title 
 instead of in their row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7985) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples.


 [ 
https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7985:
---
Labels:   (was: spark.tc)

 Remove fittingParamMap references. Update ML Doc Estimator, Transformer, 
 and Param examples.
 

 Key: SPARK-7985
 URL: https://issues.apache.org/jira/browse/SPARK-7985
 Project: Spark
  Issue Type: Bug
  Components: Documentation, ML
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.0


 Update ML Doc's Estimator, Transformer, and Param Scala  Java examples to 
 use model.extractParamMap instead of model.fittingParamMap, which no longer 
 exists.  Remove all other references to fittingParamMap throughout Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column


 [ 
https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7969:
---
Labels:   (was: spark.tc)

 Drop method on Dataframes should handle Column
 --

 Key: SPARK-7969
 URL: https://issues.apache.org/jira/browse/SPARK-7969
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Olivier Girardot
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.1, 1.5.0


 For now the drop method available on Dataframe since Spark 1.4.0 only accepts 
 a column name (as a string), it should also accept a Column as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7830) ML doc cleanup: logreg, classification link


 [ 
https://issues.apache.org/jira/browse/SPARK-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7830:
---
Labels:   (was: spark.tc)

 ML doc cleanup: logreg, classification link
 ---

 Key: SPARK-7830
 URL: https://issues.apache.org/jira/browse/SPARK-7830
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, MLlib
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Trivial
 Fix For: 1.4.0


 Add logistic regression to the list of Multiclass Classification Supported 
 Methods in the MLlib Classification and Regression documentation, and fix 
 related broken link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides


 [ 
https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8343:
---
Labels:   (was: spark.tc)

 Improve the Spark Streaming Guides
 --

 Key: SPARK-8343
 URL: https://issues.apache.org/jira/browse/SPARK-8343
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Streaming
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.1, 1.5.0


 Improve the Spark Streaming Guides by fixing broken links, rewording 
 confusing sections, fixing typos, adding missing words, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7977) Disallow println


 [ 
https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7977:
---
Labels: starter  (was: spark.tc starter)

 Disallow println
 

 Key: SPARK-7977
 URL: https://issues.apache.org/jira/browse/SPARK-7977
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Reynold Xin
Assignee: Jon Alter
  Labels: starter
 Fix For: 1.5.0


 Very often we see pull requests that added println from debugging, but the 
 author forgot to remove it before code review.
 We can use the regex checker to disallow println. For legitimate use of 
 println, we can then disable the rule where they are used.
 Add to scalastyle-config.xml file:
 {code}
   check customId=println level=error 
 class=org.scalastyle.scalariform.TokenChecker enabled=true
 parametersparameter name=regex^println$/parameter/parameters
 customMessage![CDATA[Are you sure you want to println? If yes, wrap 
 the code block with 
   // scalastyle:off println
   println(...)
   // scalastyle:on println]]/customMessage
   /check
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8570) Improve MLlib Local Matrix Documentation.


 [ 
https://issues.apache.org/jira/browse/SPARK-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8570:
---
Labels:   (was: spark.tc)

 Improve MLlib Local Matrix Documentation.
 -

 Key: SPARK-8570
 URL: https://issues.apache.org/jira/browse/SPARK-8570
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, MLlib
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.5.0


 Update the MLlib Data Types Local Matrix documentation as follows:
 -Include information on sparse matrices.
 -Add sparse matrix examples to the existing Scala and Java examples.
 -Add Python examples for both dense and sparse matrices (currently no Python 
 examples exist for the Local Matrix section).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.


 [ 
https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7883:
---
Labels:   (was: spark.tc)

 Fixing broken trainImplicit example in MLlib Collaborative Filtering 
 documentation.
 ---

 Key: SPARK-7883
 URL: https://issues.apache.org/jira/browse/SPARK-7883
 Project: Spark
  Issue Type: Bug
  Components: Documentation, MLlib
Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Trivial
 Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0


 The trainImplicit Scala example near the end of the MLlib Collaborative 
 Filtering documentation refers to an ALS.trainImplicit function signature 
 that does not exist.  Rather than add an extra function, let's just fix the 
 example.
 Currently, the example refers to a function that would have the following 
 signature: 
 def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: 
 Double) : MatrixFactorizationModel
 Instead, let's change the example to refer to this function, which does exist 
 (notice the addition of the lambda parameter):
 def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: 
 Double, alpha: Double) : MatrixFactorizationModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7426) spark.ml AttributeFactory.fromStructField should allow other NumericTypes


 [ 
https://issues.apache.org/jira/browse/SPARK-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7426:
---
Labels:   (was: spark.tc)

 spark.ml AttributeFactory.fromStructField should allow other NumericTypes
 -

 Key: SPARK-7426
 URL: https://issues.apache.org/jira/browse/SPARK-7426
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.5.0


 It currently only supports DoubleType, but it should support others, at least 
 for fromStructField (importing into ML attribute format, rather than 
 exporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md


 [ 
https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8639:
---
Labels:   (was: spark.tc)

 Instructions for executing jekyll in docs/README.md could be slightly more 
 clear, typo in docs/api.md
 -

 Key: SPARK-8639
 URL: https://issues.apache.org/jira/browse/SPARK-8639
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Rosstin Murphy
Assignee: Rosstin Murphy
Priority: Trivial
 Fix For: 1.4.1, 1.5.0


 In docs/README.md, the text states around line 31
 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll 
 will create a directory called '_site' containing index.html as well as the 
 rest of the compiled files.
 It might be more clear if we said
 Execute 'jekyll build' from the 'docs/' directory to compile the site. 
 Compiling the site with Jekyll will create a directory called '_site' 
 containing index.html as well as the rest of the compiled files.
 In docs/api.md: Here you can API docs for Spark and its submodules.
 should be something like: Here you can read API docs for Spark and its 
 submodules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7357) Improving HBaseTest example


 [ 
https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7357:
---
Labels:   (was: spark.tc)

 Improving HBaseTest example
 ---

 Key: SPARK-7357
 URL: https://issues.apache.org/jira/browse/SPARK-7357
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Minor
 Fix For: 1.5.0

   Original Estimate: 2m
  Remaining Estimate: 2m

 Minor improvement to HBaseTest example, when Hbase related configurations 
 e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are 
 not set to default (localhost:2181), connection to zookeeper might hang as 
 shown in following stack
 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=xxx.xxx.xxx:2181 sessionTimeout=9 
 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to 
 server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown 
 error)
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to 
 xxx.xxx.xxx/9.30.94.121:2181, initiating session
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete 
 on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, 
 negotiated timeout = 4
 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper 
 is null
 this is due to hbase-site.xml is not placed on spark class path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)


 [ 
https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8746:
---
Labels: documentation test  (was: documentation spark.tc test)

 Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
 --

 Key: SPARK-8746
 URL: https://issues.apache.org/jira/browse/SPARK-8746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Christian Kadner
Assignee: Christian Kadner
Priority: Trivial
  Labels: documentation, test
 Fix For: 1.4.1, 1.5.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) 
 describes how to generate golden answer files for new hive comparison test 
 cases. However the download link for the Hive 0.13.1 jars points to 
 https://hive.apache.org/downloads.html but none of the linked mirror sites 
 still has the 0.13.1 version.
 We need to update the link to 
 https://archive.apache.org/dist/hive/hive-0.13.1/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark


 [ 
https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6485:
---
Labels:   (was: spark.tc)

 Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
 --

 Key: SPARK-6485
 URL: https://issues.apache.org/jira/browse/SPARK-6485
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Xiangrui Meng

 We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in 
 PySpark. Internally, we can use DataFrames for serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7744) Distributed matrix section in MLlib Data Types documentation should be reordered.


 [ 
https://issues.apache.org/jira/browse/SPARK-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7744:
---
Labels:   (was: spark.tc)

 Distributed matrix section in MLlib Data Types documentation should be 
 reordered.
 -

 Key: SPARK-7744
 URL: https://issues.apache.org/jira/browse/SPARK-7744
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, MLlib
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.3.2, 1.4.0


 The documentation for BlockMatrix should come after RowMatrix, 
 IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later 
 three types, and RowMatrix is considered the basic distributed matrix.  
 This will improve comprehensibility of the Distributed matrix section, 
 especially for the new reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly


 [ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6785:
---
Labels:   (was: spark.tc)

 DateUtils can not handle date before 1970/01/01 correctly
 -

 Key: SPARK-6785
 URL: https://issues.apache.org/jira/browse/SPARK-6785
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Christian Kadner
 Fix For: 1.5.0


 {code}
 scala val d = new Date(100)
 d: java.sql.Date = 1969-12-31
 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
 res1: java.sql.Date = 1970-01-01
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5562) LDA should handle empty documents


 [ 
https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5562:
---
Labels: starter  (was: spark.tc starter)

 LDA should handle empty documents
 -

 Key: SPARK-5562
 URL: https://issues.apache.org/jira/browse/SPARK-5562
 Project: Spark
  Issue Type: Test
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Alok Singh
Priority: Minor
  Labels: starter
 Fix For: 1.5.0

   Original Estimate: 96h
  Remaining Estimate: 96h

 Latent Dirichlet Allocation (LDA) could easily be given empty documents when 
 people select a small vocabulary.  We should check to make sure it is robust 
 to empty documents.
 This will hopefully take the form of a unit test, but may require modifying 
 the LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support


 [ 
https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7265:
---
Labels:   (was: spark.tc)

 Improving documentation for Spark SQL Hive support 
 ---

 Key: SPARK-7265
 URL: https://issues.apache.org/jira/browse/SPARK-7265
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Trivial
 Fix For: 1.5.0


 miscellaneous documentation improvement for Spark SQL Hive support, Yarn 
 cluster deployment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs


 [ 
https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2859:
---
Labels:   (was: spark.tc)

 Update url of Kryo project in related docs
 --

 Key: SPARK-2859
 URL: https://issues.apache.org/jira/browse/SPARK-2859
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Guancheng Chen
Assignee: Guancheng Chen
Priority: Trivial
 Fix For: 1.0.3, 1.1.0


 Kryo project has been migrated from googlecode to github, hence we need to 
 update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-07-13 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1403.

  Resolution: Fixed
Target Version/s:   (was: 1.5.0)

Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible without the environment? Thanks!

 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0, 1.3.0, 1.4.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker
 Fix For: 1.0.0


 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-07-13 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625739#comment-14625739
 ] 

Patrick Wendell edited comment on SPARK-1403 at 7/14/15 2:59 AM:
-

Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible about the environment? Thanks!


was (Author: pwendell):
Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible without the environment? Thanks!

 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0, 1.3.0, 1.4.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker
 Fix For: 1.0.0


 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-13 Thread Patrick Wendell

This vote passes with 14 +1 (7 binding) votes and no 0 or -1 votes.

+1 (14):
Patrick Wendell
Reynold Xin
Sean Owen
Burak Yavuz
Mark Hamstra
Michael Armbrust
Andrew Or
York, Brennon
Krishna Sankar
Luciano Resende
Holden Karau
Tom Graves
Denny Lee
Sean McNamara

- Patrick

On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Foundation policy on releases and Spark nightly builds

2015-07-12 Thread Patrick Wendell

Thanks Sean O. I was thinking something like NOTE: Nightly builds are
meant for development and testing purposes. They do not go through
Apache's release auditing process and are not official releases.

- Patrick

On Sun, Jul 12, 2015 at 3:39 PM, Sean Owen so...@cloudera.com wrote:
 (This sounds pretty good to me. Mark it developers-only, not formally
 tested by the community, etc.)

 On Sun, Jul 12, 2015 at 7:50 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean B.,

 Thanks for bringing this to our attention. I think putting them on the
 developer wiki would substantially decrease visibility in a way that
 is not beneficial to the project - this feature was specifically
 requested by developers from other projects that integrate with Spark.

 If the concern underlying that policy is that snapshot builds could be
 misconstrued as formal releases, I think it would work to put a very
 clear disclaimer explaining the difference directly adjacent to the
 link. That's arguably more explicit than just moving the same text to
 a different page.

 The formal policy asks us not to include links that encourage
 non-developers to download the builds. Stating clearly that the
 audience for those links is developers, in my interpretation that
 would satisfy the letter and spirit of this policy.

 - Patrick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-12 Thread Patrick Wendell

I think we can close this vote soon. Any addition votes/testing would
be much appreciated!

On Fri, Jul 10, 2015 at 11:30 AM, Sean McNamara
sean.mcnam...@webtrends.com wrote:
 +1

 Sean

 On Jul 8, 2015, at 11:55 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2015-07-12 Thread Patrick Wendell (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624086#comment-14624086
]

Patrick Wendell commented on SPARK-2089:

Yeah - we can open it again later if someone who maintains this code is wanting
to work on this feature. I just want to have this JIRA reflect the current
status (i.e. for 5 versions there hasn't been any action in Spark) which is
that it is not actively being fixed and make sure the documentation correctly
reflects what we have now, to discourage the use of a feature that does not
work.

With YARN, preferredNodeLocalityData isn't honored
---

Key: SPARK-2089
URL: https://issues.apache.org/jira/browse/SPARK-2089
Project: Spark
Issue Type: Bug
Components: YARN
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical

When running in YARN cluster mode, apps can pass preferred locality data when
constructing a Spark context that will dictate where to request executor
containers.
This is currently broken because of a race condition. The Spark-YARN code
runs the user class and waits for it to start up a SparkContext. During its
initialization, the SparkContext will create a YarnClusterScheduler, which
notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then
immediately fetches the preferredNodeLocationData from the SparkContext and
uses it to start requesting containers.
But in the SparkContext constructor that takes the preferredNodeLocationData,
setting preferredNodeLocationData comes after the rest of the initialization,
so, if the Spark-YARN code comes around quickly enough after being notified,
the data that's fetched is the empty unset version. The occurred during all
of my runs.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Foundation policy on releases and Spark nightly builds

2015-07-12 Thread Patrick Wendell

Hey Sean B.,

Thanks for bringing this to our attention. I think putting them on the
developer wiki would substantially decrease visibility in a way that
is not beneficial to the project - this feature was specifically
requested by developers from other projects that integrate with Spark.

If the concern underlying that policy is that snapshot builds could be
misconstrued as formal releases, I think it would work to put a very
clear disclaimer explaining the difference directly adjacent to the
link. That's arguably more explicit than just moving the same text to
a different page.

The formal policy asks us not to include links that encourage
non-developers to download the builds. Stating clearly that the
audience for those links is developers, in my interpretation that
would satisfy the letter and spirit of this policy.

- Patrick

On Sat, Jul 11, 2015 at 11:53 AM, Sean Owen so...@cloudera.com wrote:
 From a developer perspective, I also find it surprising to hear that
 nightly builds should be hidden from non-developer end users. In an
 age of Github, what on earth is the problem with distributing the
 content of master? However I do understand why this exists.

 To the extent the ASF provides any value, it is at least a legal
 framework for defining what it means for you and I to give software to
 a bunch of other people. Software artifacts released according to an
 ASF process becomes something the ASF can take responsibility for as
 an entity. Nightly builds are not. It might matter to the committers
 if, say, somebody commits a serious data loss bug. You don't want to
 be on the hook individually for putting that into end-user hands.

 More practically, I think this exists to prevent some projects from
 lazily depending on unofficial nightly builds as pseudo-releases for
 long periods of time. End users may come to perceive them as official
 sanctioned releases when they aren't. That's not the case here of
 course.

 I think nightlies aren't for end-users anyway, and I think developers
 who care would know how to get nightlies anyway. There's little cost
 to moving this info to the wiki, so I'd do it.

 On Sat, Jul 11, 2015 at 4:29 PM, Reynold Xin r...@databricks.com wrote:
 I don't get this rule. It is arbitrary, and does not seem like something
 that should be enforced at the foundation level. By this reasoning, are we
 not allowed to list source code management on the project public page as
 well?

 The download page clearly states the nightly builds are bleeding-edge.

 Note that technically we did not violate any rules, since the ones we showed
 were not nightly builds by the foundation's definition: Nightly Builds
 are simply built from the Subversion trunk, usually once a day.. Spark
 nightly artifacts were built from git, not svn trunk. :)  (joking).



 On Sat, Jul 11, 2015 at 7:44 AM, Sean Busbey bus...@cloudera.com wrote:

 That would be great.

 A note on that page that it's meant for the use of folks working on the
 project with a link to your get involved howto would be nice additional
 context.

 --
 Sean

 On Jul 11, 2015 6:18 AM, Sean Owen so...@cloudera.com wrote:

 I suggest we move this info to the developer wiki, to keep it out from
 the place all and users look for downloads. What do you think about
 that Sean B?

 On Sat, Jul 11, 2015 at 5:34 AM, Sean Busbey bus...@cloudera.com wrote:
  Hi Folks!
 
  I noticed that Spark website's download page lists nightly builds and
  instructions for accessing SNAPSHOT maven artifacts[1]. The ASF policy
  on
  releases expressly forbids this kind of publishing outside of the
  dev@spark
  community[2].
 
  If you'd like to discuss having the policy updated (including expanding
  the
  definition of in the development community), please contribute to the
  discussion on general@incubator[3] after removing the offending items.
 
  [1]:
  http://spark.apache.org/downloads.html#nightly-packages-and-artifacts
  [2]: http://www.apache.org/dev/release.html#what
  [3]: http://s.apache.org/XFP
 
  --
  Sean



 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

SparkHub: a new community site for Apache Spark

2015-07-10 Thread Patrick Wendell

Hi All,

Today, I'm happy to announce SparkHub
(http://sparkhub.databricks.com), a service for the Apache Spark
community to easily find the most relevant Spark resources on the web.

SparkHub is a curated list of Spark news, videos and talks, package
releases, upcoming events around the world, and a Spark Meetup
directory to help you find a meetup close to you.

We will continue to expand the site in the coming months and add more
content. I hope SparkHub can help you find Spark related information
faster and more easily than is currently possible. Everything is
sourced from the Spark community, and we welcome input from you as
well!

- Patrick

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[jira] [Created] (SPARK-8957) Backport Hive 1.X support to Branch 1.4

Patrick Wendell created SPARK-8957:
--

 Summary: Backport Hive 1.X support to Branch 1.4
 Key: SPARK-8957
 URL: https://issues.apache.org/jira/browse/SPARK-8957
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Patrick Wendell
Assignee: Michael Armbrust


We almost never to feature backports. But I think it would be really useful to 
backport support for newer Hive versions to the 1.4 branch, for the following 
reasons:

1. It blocks a large number of users from using Hive support.
2. It's a relatively small set of patches, since most of the heavy lifting 
was done in Spark 1.4.0's classloader refactoring.
3. Some distributions have already done this, with success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8957) Backport Hive 1.X support to Branch 1.4


 [ 
https://issues.apache.org/jira/browse/SPARK-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8957:
---
Priority: Critical  (was: Major)

 Backport Hive 1.X support to Branch 1.4
 ---

 Key: SPARK-8957
 URL: https://issues.apache.org/jira/browse/SPARK-8957
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Patrick Wendell
Assignee: Michael Armbrust
Priority: Critical

 We almost never to feature backports. But I think it would be really useful 
 to backport support for newer Hive versions to the 1.4 branch, for the 
 following reasons:
 1. It blocks a large number of users from using Hive support.
 2. It's a relatively small set of patches, since most of the heavy lifting 
 was done in Spark 1.4.0's classloader refactoring.
 3. Some distributions have already done this, with success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-09 Thread Patrick Wendell

+1

On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored


[ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620051#comment-14620051
 ] 

Patrick Wendell commented on SPARK-2089:


Yeah - I think let's get SPARK-4352 merged and then just close this as won't 
fix and add a JIRA to document it's non working-ness. This hasn't worked since 
before Spark 1.0, and SPARK-5352 is just a strictly better solution than this. 

 With YARN, preferredNodeLocalityData isn't honored 
 ---

 Key: SPARK-2089
 URL: https://issues.apache.org/jira/browse/SPARK-2089
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical

 When running in YARN cluster mode, apps can pass preferred locality data when 
 constructing a Spark context that will dictate where to request executor 
 containers.
 This is currently broken because of a race condition.  The Spark-YARN code 
 runs the user class and waits for it to start up a SparkContext.  During its 
 initialization, the SparkContext will create a YarnClusterScheduler, which 
 notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
 immediately fetches the preferredNodeLocationData from the SparkContext and 
 uses it to start requesting containers.
 But in the SparkContext constructor that takes the preferredNodeLocationData, 
 setting preferredNodeLocationData comes after the rest of the initialization, 
 so, if the Spark-YARN code comes around quickly enough after being notified, 
 the data that's fetched is the empty unset version.  The occurred during all 
 of my runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8949) Remove references to preferredNodeLocalityData in javadoc and print warning when used

Patrick Wendell created SPARK-8949:
--

 Summary: Remove references to preferredNodeLocalityData in javadoc 
and print warning when used
 Key: SPARK-8949
 URL: https://issues.apache.org/jira/browse/SPARK-8949
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Reporter: Patrick Wendell
Priority: Blocker


The SparkContext constructor that takes preferredNodeLocalityData has not 
worked since before Spark 1.0. Also, the feature in SPARK-4352 is strictly 
better than a correct implementation of that feature.

We should remove any documentation references to that feature and print a 
warning when it is used saying it doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell

Yeah - we can fix the docs separately from the release.

- Patrick

On Wed, Jul 8, 2015 at 10:03 AM, Mark Hamstra m...@clearstorydata.com wrote:
 HiveSparkSubmitSuite is fine for me, but I do see the same issue with
 DataFrameStatSuite -- OSX 10.10.4, java

 1.7.0_75, -Phive -Phive-thriftserver -Phadoop-2.4 -Pyarn


 On Wed, Jul 8, 2015 at 4:18 AM, Sean Owen so...@cloudera.com wrote:

 The POM issue is resolved and the build succeeds. The license and sigs
 still work. The tests pass for me with -Pyarn -Phadoop-2.6, with the
 following two exceptions. Is anyone else seeing these? this is
 consistent on Ubuntu 14 with Java 7/8:

 DataFrameStatSuite:
 ...
 - special crosstab elements (., '', null, ``) *** FAILED ***
   java.lang.NullPointerException:
   at
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:131)
   at
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:121)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at
 org.apache.spark.sql.execution.stat.StatFunctions$.crossTabulate(StatFunctions.scala:121)
   at
 org.apache.spark.sql.DataFrameStatFunctions.crosstab(DataFrameStatFunctions.scala:94)
   at
 org.apache.spark.sql.DataFrameStatSuite$$anonfun$5.apply$mcV$sp(DataFrameStatSuite.scala:97)
   ...

 HiveSparkSubmitSuite:
 - SPARK-8368: includes jars passed in through --jars *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)
 - SPARK-8020: set sql conf in spark conf *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)
 - SPARK-8489: MissingRequirementError during reflection *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)

 On Tue, Jul 7, 2015 at 8:06 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  3e8ae38944f13895daf328555c1ad22cd590b089
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
  https://repository.apache.org/content/repositories/orgapachespark-1123/
  [published as version: 1.4.1-rc3]
  https://repository.apache.org/content/repositories/orgapachespark-1124/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Friday, July 10, at 20:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf


[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619681#comment-14619681
 ] 

Patrick Wendell edited comment on SPARK-8768 at 7/9/15 1:04 AM:


So it turns out that build/mvn still uses the system maven even if it downloads 
the newer version (this was the original design). Is it possible that is why 
it's breaking?

It might be nice to modify that script to have a flag like --force that will 
always use the downloaded maven.


was (Author: pwendell):
So it turns out that build/mvn still uses the system maven even if it downloads 
the newer version (this was the original design). Is it possible that is why 
it's breaking?

 SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
 Akka Protobuf
 -

 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker

 The end-to-end SparkSubmitSuite tests (launch simple application with 
 spark-submit, include jars passed in through --jars, and include jars 
 passed in through --packages) are currently failing for the pre-YARN Hadoop 
 builds.
 I managed to reproduce one of the Jenkins failures locally:
 {code}
 build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
 -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
 -Dtest=none
 {code}
 Here's the output from unit-tests.log:
 {code}
 = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
 application with spark-submit' =
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Class path contains multiple SLF4J bindings.
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
 1.5.0-SNAPSHOT
 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(joshrosen); users with modify permissions: Set(joshrosen)
 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
 ActorSystem [sparkDriver]
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: java.lang.VerifyError: class 
 akka.remote.WireFormats$AkkaControlMessage overrides final method 
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass1(Native Method)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449

[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf


[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619687#comment-14619687
 ] 

Patrick Wendell commented on SPARK-8768:


I created SPARK-8933 to track improvements to our maven script.

 SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
 Akka Protobuf
 -

 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker

 The end-to-end SparkSubmitSuite tests (launch simple application with 
 spark-submit, include jars passed in through --jars, and include jars 
 passed in through --packages) are currently failing for the pre-YARN Hadoop 
 builds.
 I managed to reproduce one of the Jenkins failures locally:
 {code}
 build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
 -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
 -Dtest=none
 {code}
 Here's the output from unit-tests.log:
 {code}
 = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
 application with spark-submit' =
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Class path contains multiple SLF4J bindings.
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
 1.5.0-SNAPSHOT
 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(joshrosen); users with modify permissions: Set(joshrosen)
 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
 ActorSystem [sparkDriver]
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: java.lang.VerifyError: class 
 akka.remote.WireFormats$AkkaControlMessage overrides final method 
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass1(Native Method)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 15/07/01 13:40:00.010 redirect

[jira] [Created] (SPARK-8933) Provide a --force flag to build/mvn that always uses downloaded maven

Patrick Wendell created SPARK-8933:
--

 Summary: Provide a --force flag to build/mvn that always uses 
downloaded maven
 Key: SPARK-8933
 URL: https://issues.apache.org/jira/browse/SPARK-8933
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Brennon York


I noticed the other day that build/mvn will still use the system maven if mvn 
binary is installed. I think this was intentional to support just using zinc 
and using the system maven (and to match the semantics of sbt/sbt). It would be 
nice to have a flag that will force it to use the downloaded maven. I was 
thinking it could have a --force flag, and then it could swallow that flag and 
not pass it onto maven.

This is useful in some cases like our test runners, where we want to coerce a 
specific version of maven is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell

Hey All,

The issue that Josh pointed out is not just a test failure, it's an
issue with an important bug fix that was not correctly back-ported
into the 1.4 branch. Unfortunately the overall state of the 1.4 branch
tests on Jenkins was not in great shape so this was missed earlier on.

Given that this is fixed now, I have prepared another RC and am
leaning towards restarting the vote. If anyone feels strongly one way
or the other let me know, otherwise I'll restart it in a few hours. I
figured since this will likely finalize over the weekend anyways, it's
not so bad to wait 1 additional day in order to get that fix.

- Patrick

On Wed, Jul 8, 2015 at 12:00 PM, Josh Rosen rosenvi...@gmail.com wrote:
 I've filed https://issues.apache.org/jira/browse/SPARK-8903 to fix the
 DataFrameStatSuite test failure. The problem turned out to be caused by a
 mistake made while resolving a merge-conflict when backporting that patch to
 branch-1.4.

 I've submitted https://github.com/apache/spark/pull/7295 to fix this issue.

 On Wed, Jul 8, 2015 at 11:30 AM, Sean Owen so...@cloudera.com wrote:

 I see, but shouldn't this test not be run when Hive isn't in the build?

 On Wed, Jul 8, 2015 at 7:13 PM, Andrew Or and...@databricks.com wrote:
  @Sean You actually need to run HiveSparkSubmitSuite with `-Phive` and
  `-Phive-thriftserver`. The MissingRequirementsError is just complaining
  that
  it can't find the right classes. The other one (DataFrameStatSuite) is a
  little more concerning.
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf


[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619681#comment-14619681
 ] 

Patrick Wendell commented on SPARK-8768:


So it turns out that build/mvn still uses the system maven even if it downloads 
the newer version (this was the original design). Is it possible that is why 
it's breaking?

 SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
 Akka Protobuf
 -

 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker

 The end-to-end SparkSubmitSuite tests (launch simple application with 
 spark-submit, include jars passed in through --jars, and include jars 
 passed in through --packages) are currently failing for the pre-YARN Hadoop 
 builds.
 I managed to reproduce one of the Jenkins failures locally:
 {code}
 build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
 -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
 -Dtest=none
 {code}
 Here's the output from unit-tests.log:
 {code}
 = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
 application with spark-submit' =
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Class path contains multiple SLF4J bindings.
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
 1.5.0-SNAPSHOT
 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(joshrosen); users with modify permissions: Set(joshrosen)
 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
 ActorSystem [sparkDriver]
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: java.lang.VerifyError: class 
 akka.remote.WireFormats$AkkaControlMessage overrides final method 
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass1(Native Method)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark

[VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-08 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1125/
[published as version: 1.4.1-rc4]
https://repository.apache.org/content/repositories/orgapachespark-1126/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Sunday, July 12, at 06:55 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell

This vote is cancelled in favor of RC4.

- Patrick

On Tue, Jul 7, 2015 at 12:06 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 3e8ae38944f13895daf328555c1ad22cd590b089

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1123/
 [published as version: 1.4.1-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1124/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Friday, July 10, at 20:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-07 Thread Patrick Wendell

Hey All,

This vote is cancelled in favor of RC3.

- Patrick

On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 07b95c7adf88f0662b7ab1c47e302ff5e6859606

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1120/
 [published as version: 1.4.1-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1121/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Monday, July 06, at 22:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Updated] (SPARK-6805) ML Pipeline API in SparkR

2015-07-07 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6805:
---
Priority: Critical  (was: Major)

 ML Pipeline API in SparkR
 -

 Key: SPARK-6805
 URL: https://issues.apache.org/jira/browse/SPARK-6805
 Project: Spark
  Issue Type: Umbrella
  Components: ML, SparkR
Reporter: Xiangrui Meng
Priority: Critical

 SparkR was merged. So let's have this umbrella JIRA for the ML pipeline API 
 in SparkR. The implementation should be similar to the pipeline API 
 implementation in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Can not build master

2015-07-04 Thread Patrick Wendell

Hi Tomo,

For now you can do that as a work around. We are working on a fix for
this in the master branch but it may take a couple of days since the
issue is fairly complicated.

- Patrick

On Sat, Jul 4, 2015 at 7:00 AM, tomo cocoa cocoatom...@gmail.com wrote:
 Hi all,

 I have a same error and it seems depending on Maven versions.

 I tried building Spark using Maven with several versions on Jenkins.

 + Output of
 /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3/bin/mvn
 -version:

 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T20:57:37+09:00)
 Maven home:
 /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3
 Java version: 1.8.0, vendor: Oracle Corporation
 Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 + Jenkins Configuration:
 Jenkins project type: Maven Project
 Goals and options: -Phadoop-2.6 -DskipTests clean package

 + Maven versions and results:
 3.3.3 - infinite loop
 3.3.1 - infinite loop
 3.2.5 - SUCCESS


 So do we prefer to build Spark with Maven 3.2.5?


 On 4 July 2015 at 12:28, Andrew Or and...@databricks.com wrote:

 Thanks, I just tried it with 3.3.3 and I was able to reproduce it as well.

 2015-07-03 18:51 GMT-07:00 Tarek Auel tarek.a...@gmail.com:

 That's mine

 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)

 Maven home: /usr/local/Cellar/maven/3.3.3/libexec

 Java version: 1.8.0_45, vendor: Oracle Corporation

 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/jre

 Default locale: en_US, platform encoding: UTF-8

 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac


 On Fri, Jul 3, 2015 at 6:32 PM Ted Yu yuzhih...@gmail.com wrote:

 Here is mine:

 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
 2015-03-13T13:10:27-07:00)
 Maven home: /home/hbase/apache-maven-3.3.1
 Java version: 1.8.0_45, vendor: Oracle Corporation
 Java home: /home/hbase/jdk1.8.0_45/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-504.el6.x86_64, arch: amd64,
 family: unix

 On Fri, Jul 3, 2015 at 6:05 PM, Andrew Or and...@databricks.com wrote:

 @Tarek and Ted, what maven versions are you using?

 2015-07-03 17:35 GMT-07:00 Krishna Sankar ksanka...@gmail.com:

 Patrick,
I assume an RC3 will be out for folks like me to test the
 distribution. As usual, I will run the tests when you have a new
 distribution.
 Cheers
 k/

 On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Patch that added test-jar dependencies:
 https://github.com/apache/spark/commit/bfe74b34

 Patch that originally disabled dependency reduced poms:

 https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

 Patch that reverted the disabling of dependency reduced poms:

 https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e

 On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Okay I did some forensics with Sean Owen. Some things about this
  bug:
 
  1. The underlying cause is that we added some code to make the
  tests
  of sub modules depend on the core tests. For unknown reasons this
  causes Spark to hit MSHADE-148 for *some* combinations of build
  profiles.
 
  2. MSHADE-148 can be worked around by disabling building of
  dependency reduced poms because then the buggy code path is
  circumvented. Andrew Or did this in a patch on the 1.4 branch.
  However, that is not a tenable option for us because our
  *published*
  pom files require dependency reduction to substitute in the scala
  version correctly for the poms published to maven central.
 
  3. As a result, Andrew Or reverted his patch recently, causing some
  package builds to start failing again (but publishing works now).
 
  4. The reason this is not detected in our test harness or release
  build is that it is sensitive to the profiles enabled. The
  combination
  of profiles we enable in the test harness and release builds do not
  trigger this bug.
 
  The best path I see forward right now is to do the following:
 
  1. Disable creation of dependency reduced poms by default (this
  doesn't matter for people doing a package build) so typical users
  won't have this bug.
 
  2. Add a profile that re-enables that setting.
 
  3. Use the above profile when publishing release artifacts to maven
  central.
 
  4. Hope that we don't hit this bug for publishing.
 
  - Patrick
 
  On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
  Doesn't change anything for me.
 
  On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell
  pwend...@gmail.com wrote:
 
  Can you try using the built in maven build/mvn...? All of our
  builds
  are passing on Jenkins so I wonder if it's a maven version issue:
 
  https

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Hm - what if you do a fresh git checkout (just to make sure you don't
have an older maven version downloaded). It also might be that this
really is an issue even with Maven 3.3.3. I just am not sure why it's
not reflected in our continuous integration or the build of the
release packages themselves:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

It could be that it's dependent on which modules are enabled.

On Fri, Jul 3, 2015 at 3:46 PM, Robin East robin.e...@xense.co.uk wrote:
 which got me thinking:

 build/mvn -version
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
 support was removed in 8.0
 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
 2015-03-13T20:10:27+00:00)
 Maven home: /usr/local/Cellar/maven/3.3.1/libexec
 Java version: 1.8.0_40, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.2, arch: x86_64, family: “mac

 Seems to be using 3.3.1

 On 3 Jul 2015, at 23:44, Robin East robin.e...@xense.co.uk wrote:

 I used the following build command:

 build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
 package

 this also gave the ‘Dependency-reduced POM’ loop

 Robin

 On 3 Jul 2015, at 23:41, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote:

 Yep, happens to me as well. Build loops.
 Cheers
 k/

 On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:


 Patrick:
 I used the following command:
 ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean
 package

 The build doesn't seem to stop.
 Here is tail of build output:

 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml

 Here is part of the stack trace for the build process:

 http://pastebin.com/xL2Y0QMU

 FYI

 On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
 wrote:


 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 07b95c7adf88f0662b7ab1c47e302ff5e6859606

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1120/
 [published as version: 1.4.1-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1121/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Monday, July 06, at 22:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Let's continue the disucssion on the other thread relating to the master build.

On Fri, Jul 3, 2015 at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote:
 Thanks - it appears this is just a legitimate issue with the build,
 affecting all versions of Maven.

 On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote:
 I have 3.3.3
 USS-Defiant:NW ksankar$ mvn -version
 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)
 Maven home: /usr/local/apache-maven-3.3.3
 Java version: 1.7.0_60, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 Let me nuke it and reinstall maven.

 Cheers
 k/

 On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com
 wrote:
  Yep, happens to me as well. Build loops.
  Cheers
  k/
 
  On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Patrick:
  I used the following command:
  ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive
  clean
  package
 
  The build doesn't seem to stop.
  Here is tail of build output:
 
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 
  Here is part of the stack trace for the build process:
 
  http://pastebin.com/xL2Y0QMU
 
  FYI
 
  On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  07b95c7adf88f0662b7ab1c47e302ff5e6859606
 
  The release files, including signatures, digests, etc. can be found
  at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
 
  https://repository.apache.org/content/repositories/orgapachespark-1120/
  [published as version: 1.4.1-rc2]
 
  https://repository.apache.org/content/repositories/orgapachespark-1121/
 
  The documentation corresponding to this release can be found at:
 
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Monday, July 06, at 22:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-03 Thread Patrick Wendell

Okay I did some forensics with Sean Owen. Some things about this bug:

1. The underlying cause is that we added some code to make the tests
of sub modules depend on the core tests. For unknown reasons this
causes Spark to hit MSHADE-148 for *some* combinations of build
profiles.

2. MSHADE-148 can be worked around by disabling building of
dependency reduced poms because then the buggy code path is
circumvented. Andrew Or did this in a patch on the 1.4 branch.
However, that is not a tenable option for us because our *published*
pom files require dependency reduction to substitute in the scala
version correctly for the poms published to maven central.

3. As a result, Andrew Or reverted his patch recently, causing some
package builds to start failing again (but publishing works now).

4. The reason this is not detected in our test harness or release
build is that it is sensitive to the profiles enabled. The combination
of profiles we enable in the test harness and release builds do not
trigger this bug.

The best path I see forward right now is to do the following:

1. Disable creation of dependency reduced poms by default (this
doesn't matter for people doing a package build) so typical users
won't have this bug.

2. Add a profile that re-enables that setting.

3. Use the above profile when publishing release artifacts to maven central.

4. Hope that we don't hit this bug for publishing.

- Patrick

On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote:
 Doesn't change anything for me.

 On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote:

 Can you try using the built in maven build/mvn...? All of our builds
 are passing on Jenkins so I wonder if it's a maven version issue:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

 - Patrick

 On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at SPARK-8781
  (https://github.com/apache/spark/pull/7193)
 
  Cheers
 
  On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:
 
  I found a solution, there might be a better one.
 
  https://github.com/apache/spark/pull/7217
 
  On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk
  wrote:
 
  Yes me too
 
  On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:
 
  This is what I got (the last line was repeated non-stop):
 
  [INFO] Replacing original artifact with shaded artifact.
  [INFO] Replacing
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with
 
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
 
  On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
 
  Hi all,
 
  I am trying to build the master, but it stucks and prints
 
  [INFO] Dependency-reduced POM written at:
  /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
 
  build command:  mvn -DskipTests clean package
 
  Do others have the same issue?
 
  Regards,
  Tarek
 
 
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-03 Thread Patrick Wendell

Patch that added test-jar dependencies:
https://github.com/apache/spark/commit/bfe74b34

Patch that originally disabled dependency reduced poms:
https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

Patch that reverted the disabling of dependency reduced poms:
https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e

On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com wrote:
 Okay I did some forensics with Sean Owen. Some things about this bug:

 1. The underlying cause is that we added some code to make the tests
 of sub modules depend on the core tests. For unknown reasons this
 causes Spark to hit MSHADE-148 for *some* combinations of build
 profiles.

 2. MSHADE-148 can be worked around by disabling building of
 dependency reduced poms because then the buggy code path is
 circumvented. Andrew Or did this in a patch on the 1.4 branch.
 However, that is not a tenable option for us because our *published*
 pom files require dependency reduction to substitute in the scala
 version correctly for the poms published to maven central.

 3. As a result, Andrew Or reverted his patch recently, causing some
 package builds to start failing again (but publishing works now).

 4. The reason this is not detected in our test harness or release
 build is that it is sensitive to the profiles enabled. The combination
 of profiles we enable in the test harness and release builds do not
 trigger this bug.

 The best path I see forward right now is to do the following:

 1. Disable creation of dependency reduced poms by default (this
 doesn't matter for people doing a package build) so typical users
 won't have this bug.

 2. Add a profile that re-enables that setting.

 3. Use the above profile when publishing release artifacts to maven central.

 4. Hope that we don't hit this bug for publishing.

 - Patrick

 On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote:
 Doesn't change anything for me.

 On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote:

 Can you try using the built in maven build/mvn...? All of our builds
 are passing on Jenkins so I wonder if it's a maven version issue:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

 - Patrick

 On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at SPARK-8781
  (https://github.com/apache/spark/pull/7193)
 
  Cheers
 
  On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:
 
  I found a solution, there might be a better one.
 
  https://github.com/apache/spark/pull/7217
 
  On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk
  wrote:
 
  Yes me too
 
  On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:
 
  This is what I got (the last line was repeated non-stop):
 
  [INFO] Replacing original artifact with shaded artifact.
  [INFO] Replacing
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with
 
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
 
  On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
 
  Hi all,
 
  I am trying to build the master, but it stucks and prints
 
  [INFO] Dependency-reduced POM written at:
  /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
 
  build command:  mvn -DskipTests clean package
 
  Do others have the same issue?
 
  Regards,
  Tarek
 
 
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Can not build master

2015-07-03 Thread Patrick Wendell

Can you try using the built in maven build/mvn...? All of our builds
are passing on Jenkins so I wonder if it's a maven version issue:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

- Patrick

On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
 Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193)

 Cheers

 On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:

 I found a solution, there might be a better one.

 https://github.com/apache/spark/pull/7217

 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote:

 Yes me too

 On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:

 This is what I got (the last line was repeated non-stop):

 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing
 /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with
 /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark/bagel/dependency-reduced-pom.xml

 On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote:

 Hi all,

 I am trying to build the master, but it stucks and prints

 [INFO] Dependency-reduced POM written at:
 /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml

 build command:  mvn -DskipTests clean package

 Do others have the same issue?

 Regards,
 Tarek





-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Thanks - it appears this is just a legitimate issue with the build,
affecting all versions of Maven.

On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote:
 I have 3.3.3
 USS-Defiant:NW ksankar$ mvn -version
 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)
 Maven home: /usr/local/apache-maven-3.3.3
 Java version: 1.7.0_60, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 Let me nuke it and reinstall maven.

 Cheers
 k/

 On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com
 wrote:
  Yep, happens to me as well. Build loops.
  Cheers
  k/
 
  On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Patrick:
  I used the following command:
  ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive
  clean
  package
 
  The build doesn't seem to stop.
  Here is tail of build output:
 
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 
  Here is part of the stack trace for the build process:
 
  http://pastebin.com/xL2Y0QMU
 
  FYI
 
  On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  07b95c7adf88f0662b7ab1c47e302ff5e6859606
 
  The release files, including signatures, digests, etc. can be found
  at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
 
  https://repository.apache.org/content/repositories/orgapachespark-1120/
  [published as version: 1.4.1-rc2]
 
  https://repository.apache.org/content/repositories/orgapachespark-1121/
 
  The documentation corresponding to this release can be found at:
 
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Monday, July 06, at 22:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[RESULT] [VOTE] Release Apache Spark 1.4.1

2015-07-03 Thread Patrick Wendell

This vote is cancelled in favor of RC2. Thanks very much to Sean Owen
for triaging an important bug associated with RC1.

I took a look at the branch-1.4 contents and I think its safe to cut
RC2 from the head of that branch (i.e no very high risk patches that I
could see). JIRA management around the time of the RC voting is an
interesting topic, Sean I like your most recent proposal. Maybe we can
put that on the wiki or start a DISCUSS thread to cover that topic.

On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
07b95c7adf88f0662b7ab1c47e302ff5e6859606

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1120/
[published as version: 1.4.1-rc2]
https://repository.apache.org/content/repositories/orgapachespark-1121/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Monday, July 06, at 22:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Resolved] (SPARK-8649) Mapr repository is not defined properly

2015-06-28 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-8649.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Mapr repository is not defined properly
 ---

 Key: SPARK-8649
 URL: https://issues.apache.org/jira/browse/SPARK-8649
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Ashok Kumar
Priority: Trivial
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-28 Thread Patrick Wendell

Hey Krishna - this is still the current release candidate.

- Patrick

On Sun, Jun 28, 2015 at 12:14 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Patrick,
Haven't seen any replies on test results. I will byte ;o) - Should I test
 this version or is another one in the wings ?
 Cheers
 k/

 On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Commented] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-27 Thread Patrick Wendell (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604000#comment-14604000
]

Patrick Wendell commented on SPARK-8667:

Thanks Sean. I looked for a while for an older JIRA on this, but couldn't find
it. This is definitely a dup of SPARK-2015.

Improve Spark UI behavior at scale
--

Key: SPARK-8667
URL: https://issues.apache.org/jira/browse/SPARK-8667
Project: Spark
Issue Type: Improvement
Components: Web UI
Reporter: Patrick Wendell
Assignee: Shixiong Zhu

This is a parent ticket and we can create child tickets when solving specific
issues. The main problem I would like to solve is the fact that the Spark UI
has issues at very large scale.
The worst issue is when there is a stage page with more than a few thousand
tasks. In this case:
1. The page itself is very slow to load and becomes unresponsive with huge
number of tasks.
2. The Scala XML output can become so large that it crashes the driver
program due to OOM for a page with a huge number of tasks.
I am not sure if (1) is caused by javascript slowness, or maybe just the raw
amount of data sent over the wire. If it is the latter, it might be possible
to add compression to the HTTP payload to help improve load time.
It would be nice to reproduce+investigate these issues further and create
specific sub tasks to improve them.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-27 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-8667.

Resolution: Duplicate

 Improve Spark UI behavior at scale
 --

 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Shixiong Zhu

 This is a parent ticket and we can create child tickets when solving specific 
 issues. The main problem I would like to solve is the fact that the Spark UI 
 has issues at very large scale.
 The worst issue is when there is a stage page with more than a few thousand 
 tasks. In this case:
 1. The page itself is very slow to load and becomes unresponsive with huge 
 number of tasks.
 2. The Scala XML output can become so large that it crashes the driver 
 program due to OOM for a page with a huge number of tasks.
 I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
 amount of data sent over the wire. If it is the latter, it might be possible 
 to add compression to the HTTP payload to help improve load time.
 It would be nice to reproduce+investigate these issues further and create 
 specific sub tasks to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Patrick Wendell

Hey Tom - no one voted on this yet, so I need to keep it open until
people vote. But I'm not aware of specific things we are waiting for.
Anyone else?

- Patrick

On Fri, Jun 26, 2015 at 7:10 AM, Tom Graves tgraves...@yahoo.com wrote:
 So is this open for vote then or are we waiting on other things?

 Tom



 On Thursday, June 25, 2015 10:32 AM, Andrew Ash and...@andrewash.com
 wrote:


 I would guess that many tickets targeted at 1.4.1 were set that way during
 the tail end of the 1.4.0 voting process as people realized they wouldn't
 make the .0 release in time.  In that case, they were likely aiming for a
 1.4.x release, not necessarily 1.4.1 specifically.  Maybe creating a 1.4.x
 target in Jira in addition to 1.4.0, 1.4.1, 1.4.2, etc would make it more
 clear that these tickets are targeted at some 1.4 update release rather
 than specifically the 1.4.1 update.

 On Thu, Jun 25, 2015 at 5:38 AM, Sean Owen so...@cloudera.com wrote:

 That makes sense to me -- there's an urgent fix to get out. I missed
 that part. Not that it really matters but was that expressed
 elsewhere?

 I know we tend to start the RC process even when a few more changes
 are still in progress, to get a first wave or two of testing done
 early, knowing that the RC won't be the final one. It makes sense for
 some issues for X to be open when an RC is cut, if they are actually
 truly intended for X.

 44 seems like a lot, and I don't think it's good practice just because
 that's how it's happened before. It looks like half of them weren't
 actually important for 1.4.x as we're now down to 21. I don't disagree
 with the idea that only most of the issues targeted for version X
 will be in version X; the target expresses a stretch goal. Given the
 fast pace of change that's probably the only practical view.

 I think we're just missing a step then: before RC of X, ask people to
 review and update the target of JIRAs for X? In this case, it was a
 good point to untarget stuff from 1.4.x entirely; I suspect everything
 else should then be targeted at 1.4.2 by default with the exception of
 a handful that people really do intend to work in for 1.4.1 before its
 final release.

 I know it sounds like pencil-pushing, but it's a cheap way to bring
 some additional focus to release planning. RC time has felt like a
 last-call to *begin* changes ad-hoc when it would go faster if it were
 more intentional and constrained. Meaning faster RCs, meaning getting
 back to a 3-month release cycle or less, and meaning less rush to push
 stuff into a .0 release and less frequent need for a maintenance .1
 version.

 So what happens if all 1.4.1-targeted JIRAs are targeted to 1.4.2?
 would that miss something that is definitely being worked on for
 1.4.1?

 On Wed, Jun 24, 2015 at 6:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean,

 This is being shipped now because there is a severe bug in 1.4.0 that
 can cause data corruption for Parquet users.

 There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
 inconsistent with shipping a release now. The goal of having every
 single targeted JIRA cleared by the time we start voting, I don't
 think there is broad consensus and cultural adoption of that principle
 yet. So I do not take it as a signal this release is premature (the
 story has been the same for every previous release we've ever done).

 The fact that we hit 90/124 of issues targeted at this release means
 we are targeting such that we get around 70% of issues merged. That
 actually doesn't seem so bad to me since there is some uncertainty in
 the process. B

 - Patrick

 On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote:
 There are 44 issues still targeted for 1.4.1. None are Blockers; 12
 are Critical. ~80% were opened and/or set by committers. Compare with
 90 issues resolved for 1.4.1.

 I'm concerned that committers are targeting lots more for a release
 even in the short term than realistically can go in. On its face, it
 suggests that an RC is premature. Why is 1.4.1 being put forth for
 release now? It seems like people are saying they want a fair bit more
 time to work on 1.4.1.

 I suspect that in fact people would rather untarget / slip (again)
 these JIRAs, but it calls into question again how the targeting is
 consistently off by this much.

 What unresolved JIRAs targeted for 1.4.1 are *really* still open for
 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were
 untargeted now? is the reality that there are a handful of items to
 get in before the final release, and those are hopefully the ~12
 critical ones? How about some review of that before we ask people to
 seriously test these bits?

 On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed
 here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted

[jira] [Created] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-26 Thread Patrick Wendell (JIRA)

Patrick Wendell created SPARK-8667:
--

 Summary: Improve Spark UI behavior at scale
 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
Reporter: Patrick Wendell
Assignee: Shixiong Zhu


This is a parent ticket and we can create child tickets when solving specific 
issues. The main problem I would like to solve is the fact that the Spark UI 
has issues at very large scale.

The worst issue is when there is a stage page with more than a few thousand 
tasks. In this case:
1. The page itself is very slow to load and becomes unresponsive with huge 
number of tasks.
2. The Scala XML output can become so large that it crashes the driver program 
due to OOM for a page with a huge number of tasks.

I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
amount of data sent over the wire. If it is the latter, it might be possible to 
add compression to the HTTP payload to help improve load time.

It would be nice to reproduce+investigate these issues further and create 
specific sub tasks to improve them.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-26 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8667:
---
Component/s: Web UI

 Improve Spark UI behavior at scale
 --

 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Shixiong Zhu

 This is a parent ticket and we can create child tickets when solving specific 
 issues. The main problem I would like to solve is the fact that the Spark UI 
 has issues at very large scale.
 The worst issue is when there is a stage page with more than a few thousand 
 tasks. In this case:
 1. The page itself is very slow to load and becomes unresponsive with huge 
 number of tasks.
 2. The Scala XML output can become so large that it crashes the driver 
 program due to OOM for a page with a huge number of tasks.
 I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
 amount of data sent over the wire. If it is the latter, it might be possible 
 to add compression to the HTTP payload to help improve load time.
 It would be nice to reproduce+investigate these issues further and create 
 specific sub tasks to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-24 Thread Patrick Wendell

Hey Sean,

This is being shipped now because there is a severe bug in 1.4.0 that
can cause data corruption for Parquet users.

There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
inconsistent with shipping a release now. The goal of having every
single targeted JIRA cleared by the time we start voting, I don't
think there is broad consensus and cultural adoption of that principle
yet. So I do not take it as a signal this release is premature (the
story has been the same for every previous release we've ever done).

The fact that we hit 90/124 of issues targeted at this release means
we are targeting such that we get around 70% of issues merged. That
actually doesn't seem so bad to me since there is some uncertainty in
the process. B

- Patrick

On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote:
 There are 44 issues still targeted for 1.4.1. None are Blockers; 12
 are Critical. ~80% were opened and/or set by committers. Compare with
 90 issues resolved for 1.4.1.

 I'm concerned that committers are targeting lots more for a release
 even in the short term than realistically can go in. On its face, it
 suggests that an RC is premature. Why is 1.4.1 being put forth for
 release now? It seems like people are saying they want a fair bit more
 time to work on 1.4.1.

 I suspect that in fact people would rather untarget / slip (again)
 these JIRAs, but it calls into question again how the targeting is
 consistently off by this much.

 What unresolved JIRAs targeted for 1.4.1 are *really* still open for
 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were
 untargeted now? is the reality that there are a handful of items to
 get in before the final release, and those are hopefully the ~12
 critical ones? How about some review of that before we ask people to
 seriously test these bits?

 On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[VOTE] Release Apache Spark 1.4.1

2015-06-23 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
60e08e50751fe3929156de956d62faea79f5b801

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1118/
[published as version: 1.4.1-rc1]
https://repository.apache.org/content/repositories/orgapachespark-1119/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Saturday, June 27, at 06:32 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8494:
---
Assignee: (was: Patrick Wendell)

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
 Attachments: spark-test-case.zip


 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7292) Provide operator to truncate lineage without persisting RDD's

2015-06-19 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7292:
---
Assignee: Andrew Or

 Provide operator to truncate lineage without persisting RDD's
 -

 Key: SPARK-7292
 URL: https://issues.apache.org/jira/browse/SPARK-7292
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Andrew Or

 Checkpointing exists in Spark to truncate a lineage chain. I've heard 
 requests from some users to allow truncation of lineage in a way that is 
 cheap and doesn't serialized and persist the RDD. This is possible if the 
 user is willing to forgo fault tolerance for that RDD (for instance, for 
 shorter running jobs or ones that use a small number of machines). It's 
 pretty easy to allow this so we should look into it for Spark 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8416) Thread dump page should highlight Spark executor threads


[ 
https://issues.apache.org/jira/browse/SPARK-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592411#comment-14592411
 ] 

Patrick Wendell commented on SPARK-8416:


It would also be nice to put those threads first in the list.

 Thread dump page should highlight Spark executor threads
 

 Key: SPARK-8416
 URL: https://issues.apache.org/jira/browse/SPARK-8416
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Josh Rosen

 On the Spark thread dump page, it's hard to pick out executor threads from 
 other system threads.  The UI should employ some color coding or highlighting 
 to make this more apparent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8434) Add a pretty parameter to show


 [ 
https://issues.apache.org/jira/browse/SPARK-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8434:
---
Component/s: SQL

 Add a pretty parameter to show
 

 Key: SPARK-8434
 URL: https://issues.apache.org/jira/browse/SPARK-8434
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Shixiong Zhu

 Sometimes the user may want to show the complete content of cells, such as 
 sql(set -v)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8450) PySpark write.parquet raises Unsupported datatype DecimalType()


 [ 
https://issues.apache.org/jira/browse/SPARK-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8450:
---
Component/s: SQL
 PySpark

 PySpark write.parquet raises Unsupported datatype DecimalType()
 ---

 Key: SPARK-8450
 URL: https://issues.apache.org/jira/browse/SPARK-8450
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
 Environment: Spark 1.4.0 on Debian
Reporter: Peter Hoffmann

 I'm getting an Exception when I try to save a DataFrame with a DeciamlType as 
 an parquet file
 Minimal Example:
 from decimal import Decimal
 from pyspark.sql import SQLContext
 from pyspark.sql.types import *
 sqlContext = SQLContext(sc)
 schema = StructType([
 StructField('id', LongType()),
 StructField('value', DecimalType())])
 rdd = sc.parallelize([[1, Decimal(0.5)],[2, Decimal(2.9)]])
 df = sqlContext.createDataFrame(rdd, schema)
 df.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 'overwrite')
 Stack Trace
 ---
 Py4JJavaError Traceback (most recent call last)
 ipython-input-19-a77dac8de5f3 in module()
  1 sr.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 
 'overwrite')
 /home/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.pyc in 
 parquet(self, path, mode)
 367 :param mode: one of `append`, `overwrite`, `error`, `ignore` 
 (default: error)
 368 
 -- 369 return self._jwrite.mode(mode).parquet(path)
 370 
 371 @since(1.4)
 /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
  in __call__(self, *args)
 536 answer = self.gateway_client.send_command(command)
 537 return_value = get_return_value(answer, self.gateway_client,
 -- 538 self.target_id, self.name)
 539 
 540 for temp_arg in temp_args:
 /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
  in get_return_value(answer, gateway_client, target_id, name)
 298 raise Py4JJavaError(
 299 'An error occurred while calling {0}{1}{2}.\n'.
 -- 300 format(target_id, '.', name), value)
 301 else:
 302 raise Py4JError(
 Py4JJavaError: An error occurred while calling o361.parquet.
 : org.apache.spark.SparkException: Job aborted.
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:138)
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114)
   at 
 org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
   at 
 org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
   at 
 org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
   at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
   at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
   at 
 org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:281)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
   at py4j.Gateway.invoke(Gateway.java:259)
   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:207)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 158 in stage 35.0 failed 4 times, most recent failure: Lost task 158.3 
 in stage 35.0 (TID 2736, 10.2.160.14

[jira] [Updated] (SPARK-8427) Incorrect ACL checking for partitioned table in Spark SQL-1.4


 [ 
https://issues.apache.org/jira/browse/SPARK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8427:
---
Priority: Critical  (was: Blocker)

 Incorrect ACL checking for partitioned table in Spark SQL-1.4
 -

 Key: SPARK-8427
 URL: https://issues.apache.org/jira/browse/SPARK-8427
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: CentOS 6  OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 
 2.6.0
Reporter: Karthik Subramanian
Priority: Critical
  Labels: security

 Problem Statement:
 While doing query on a partitioned table using Spark SQL (Version 1.4.0), 
 access denied exception is observed on the partition the user doesn’t belong 
 to (The user permission is controlled using HDF ACLs). The same works 
 correctly in hive.
 Usercase: To address Multitenancy
 Consider a table containing multiple customers and each customer with 
 multiple facility. The table is partitioned by customer and facility. The 
 user belonging to on facility will not have access to other facility. This is 
 enforced using HDFS ACLs on corresponding directories. When querying on the 
 table as ‘user1’ belonging to ‘facility1’ and ‘customer1’ on the particular 
 partition (using ‘where’ clause) only the corresponding directory access 
 should be verified and not the entire table. 
 The above use case works as expected when using HIVE client, version 0.13.1  
 1.1.0. 
 The query used: select count(*) from customertable where customer=‘customer1’ 
 and facility=‘facility1’
 Below is the exception received in Spark-shell:
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=user1, access=READ_EXECUTE, 
 inode=/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755

[jira] [Updated] (SPARK-5787) Protect JVM from some not-important exceptions


 [ 
https://issues.apache.org/jira/browse/SPARK-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5787:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Protect JVM from some not-important exceptions
 --

 Key: SPARK-5787
 URL: https://issues.apache.org/jira/browse/SPARK-5787
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Davies Liu
Priority: Critical

 Any un-captured exception will shutdown the executor JVM, so we should 
 capture all those exceptions which did not hurt executor much (executor is 
 still functional).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7448) Implement custom bye array serializer for use in PySpark shuffle


 [ 
https://issues.apache.org/jira/browse/SPARK-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7448:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Implement custom bye array serializer for use in PySpark shuffle
 

 Key: SPARK-7448
 URL: https://issues.apache.org/jira/browse/SPARK-7448
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Shuffle
Reporter: Josh Rosen
Priority: Minor

 PySpark's shuffle typically shuffles Java RDDs that contain byte arrays. We 
 should implement a custom Serializer for use in these shuffles.  This will 
 allow us to take advantage of shuffle optimizations like SPARK-7311 for 
 PySpark without requiring users to change the default serializer to 
 KryoSerializer (this is useful for JobServer-type applications).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7078) Cache-aware binary processing in-memory sort


 [ 
https://issues.apache.org/jira/browse/SPARK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7078:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Cache-aware binary processing in-memory sort
 

 Key: SPARK-7078
 URL: https://issues.apache.org/jira/browse/SPARK-7078
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: Reynold Xin
Assignee: Josh Rosen

 A cache-friendly sort algorithm that can be used eventually for:
 * sort-merge join
 * shuffle
 See the old alpha sort paper: 
 http://research.microsoft.com/pubs/68249/alphasort.doc
 Note that state-of-the-art for sorting has improved quite a bit, but we can 
 easily optimize the sorting algorithm itself later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7041) Avoid writing empty files in BypassMergeSortShuffleWriter


 [ 
https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7041:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Avoid writing empty files in BypassMergeSortShuffleWriter
 -

 Key: SPARK-7041
 URL: https://issues.apache.org/jira/browse/SPARK-7041
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Josh Rosen
Assignee: Josh Rosen

 In BypassMergeSortShuffleWriter, we may end up opening disk writers files for 
 empty partitions; this occurs because we manually call {{open()}} after 
 creating the writer, causing serialization and compression input streams to 
 be created; these streams may write headers to the output stream, resulting 
 in non-zero-length files being created for partitions that contain no 
 records.  This is unnecessary, though, since the disk object writer will 
 automatically open itself when the first write is performed.  Removing this 
 eager {{open()}} call and rewriting the consumers to cope with the 
 non-existence of empty files results in a large performance benefit for 
 certain sparse workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6393) Extra RPC to the AM during killExecutor invocation