date:20150206


 [ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DoingDone9 updated SPARK-5648:
--
Description: 
make hivecontext support unset tblproperties
like :
alter view viewName unset tblproperties(k)
alter table tableName unset tblproperties(k)






  was:
make hivecontext support unset tblproperties
like :







 suppot alter view/table tableName unset tblproperties(k) 
 -

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties
 like :
 alter view viewName unset tblproperties(k)
 alter table tableName unset tblproperties(k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5648) suppot alter ... unset tblproperties(key)


 [ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DoingDone9 updated SPARK-5648:
--
Summary: suppot alter ... unset tblproperties(key)   (was: suppot 
alter ... unset tblproperties(k) )

 suppot alter ... unset tblproperties(key) 
 --

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties
 like :
 alter view viewName unset tblproperties(k)
 alter table tableName unset tblproperties(k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5648) suppot alter view/table tableName unset tblproperties(k)


 [ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DoingDone9 updated SPARK-5648:
--
Description: 
make hivecontext support unset tblproperties
like :






 suppot alter view/table tableName unset tblproperties(k) 
 -

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties
 like :



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5648) suppot alter ... unset tblproperties(k)


 [ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DoingDone9 updated SPARK-5648:
--
Summary: suppot alter ... unset tblproperties(k)   (was: suppot alter 
view/table tableName unset tblproperties(k) )

 suppot alter ... unset tblproperties(k) 
 

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties
 like :
 alter view viewName unset tblproperties(k)
 alter table tableName unset tblproperties(k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5648) support alter ... unset tblproperties(key)


 [ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DoingDone9 updated SPARK-5648:
--
Summary: support alter ... unset tblproperties(key)   (was: suppot 
alter ... unset tblproperties(key) )

 support alter ... unset tblproperties(key) 
 ---

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties(key)
 like :
 alter view viewName unset tblproperties(k)
 alter table tableName unset tblproperties(k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5648) suppot alter view/table tableName unset tblproperties(k)

DoingDone9 created SPARK-5648:
-

 Summary: suppot alter view/table tableName unset 
tblproperties(k) 
 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5648) suppot alter ... unset tblproperties(key)


 [ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DoingDone9 updated SPARK-5648:
--
Description: 
make hivecontext support unset tblproperties(key)
like :
alter view viewName unset tblproperties(k)
alter table tableName unset tblproperties(k)






  was:
make hivecontext support unset tblproperties
like :
alter view viewName unset tblproperties(k)
alter table tableName unset tblproperties(k)







 suppot alter ... unset tblproperties(key) 
 --

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties(key)
 like :
 alter view viewName unset tblproperties(k)
 alter table tableName unset tblproperties(k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2789) Apply names to RDD to becoming SchemaRDD


[ 
https://issues.apache.org/jira/browse/SPARK-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308825#comment-14308825
 ] 

Apache Spark commented on SPARK-2789:
-

User 'dwmclary' has created a pull request for this issue:
https://github.com/apache/spark/pull/4421

 Apply names to RDD to becoming SchemaRDD
 

 Key: SPARK-2789
 URL: https://issues.apache.org/jira/browse/SPARK-2789
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Davies Liu

 In order to simplify apply schema, we could add an API called applyNames(), 
 which will infer the types in the RDD and create an schema with names, then 
 apply  this schema on it to becoming a SchemaRDD. The names could be provides 
 by String with names separated  by space.
 For example:
 rdd = sc.parallelize([(Alice, 10)])
 srdd = sqlCtx.applyNames(rdd, name age)
 User don't need to create an case class or StructType to have all power of 
 Spark SQL.
 The string presentation of schema also could support nested structure 
 (MapType, ArrayType and StructType), for example:
 name age address(city zip) likes[title stars] props{[value type]}
 It will equal to unnamed schema:
 root
 |--name
 |--age
 |--address
 |--|--city
 |--|--zip
 |--likes
 |--|--element
 |--|--|--title
 |--|--|--starts
 |--props
 |--|--key:
 |--|--value:
 |--|--|--element
 |--|--|--|--value
 |--|--|--|--type
 All the names of fields are seperated by space, the struct of field (if it is 
 nested type) follows the name without space, wich shoud startswith ( 
 (StructType) or [ (ArrayType) or { (MapType).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5648) support alter ... unset tblproperties(key)


[ 
https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308835#comment-14308835
 ] 

Apache Spark commented on SPARK-5648:
-

User 'DoingDone9' has created a pull request for this issue:
https://github.com/apache/spark/pull/4423

 support alter ... unset tblproperties(key) 
 ---

 Key: SPARK-5648
 URL: https://issues.apache.org/jira/browse/SPARK-5648
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: DoingDone9

 make hivecontext support unset tblproperties(key)
 like :
 alter view viewName unset tblproperties(k)
 alter table tableName unset tblproperties(k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5598) Model import/export for ALS


[ 
https://issues.apache.org/jira/browse/SPARK-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308836#comment-14308836
 ] 

Apache Spark commented on SPARK-5598:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/4422

 Model import/export for ALS
 ---

 Key: SPARK-5598
 URL: https://issues.apache.org/jira/browse/SPARK-5598
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Xiangrui Meng

 Please see parent JIRA for details on model import/export plans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode

Andrew Rowson created SPARK-5655:


 Summary: YARN Auxiliary Shuffle service can't access shuffle files 
on Hadoop cluster configured in secure mode
 Key: SPARK-5655
 URL: https://issues.apache.org/jira/browse/SPARK-5655
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
 Environment: Both CDH5.3.0 and CDH5.1.3, latest build on branch-1.2
Reporter: Andrew Rowson


When running a Spark job on a YARN cluster which doesn't run containers under 
the same user as the nodemanager, and also when using the YARN auxiliary 
shuffle service, jobs fail with something similar to:

java.io.FileNotFoundException: 
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
 (Permission denied)

The root cause of this here: 
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

Spark will attempt to chmod 700 any application directories it creates during 
the job, which includes files created in the nodemanager's usercache directory. 
The owner of these files is the container UID, which on a secure cluster is the 
name of the user creating the job, and on an nonsecure cluster but with the 
yarn.nodemanager.container-executor.class configured is the value of 
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.

The problem with this is that the auxiliary shuffle manager runs as part of the 
nodemanager, which is typically running as the user 'yarn'. This can't access 
these files that are only owner-readable.

YARN already attempts to secure files created under appcache but keep them 
readable by the nodemanager, by setting the group of the appcache directory to 
'yarn' and also setting the setgid flag. This means that files and directories 
created under this should also have the 'yarn' group. Normally this means that 
the nodemanager should also be able to read these files, but Spark setting 
chmod700 wipes this out.

I'm not sure what the right approach is here. Commenting out the chmod700 
functionality makes this work on YARN, and still makes the application files 
only readable by the owner and the group:

data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
 # ls -lah
total 206M
drwxr-s---  2 nobody yarn 4.0K Feb  6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb  6 18:30 ..
-rw-r-  1 nobody yarn 206M Feb  6 18:30 shuffle_0_0_0.data

But this may not be the right approach on non-YARN. Perhaps an additional step 
to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I 
don't have a non-YARN environment to test, otherwise I'd be able to suggest a 
patch.

I believe this is a related issue in the MapReduce framwork: 
https://issues.apache.org/jira/browse/MAPREDUCE-3728



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5593) Replace BlockManager listener with Executor listener in ExecutorAllocationListener


 [ 
https://issues.apache.org/jira/browse/SPARK-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5593.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Lianhui Wang
Target Version/s: 1.3.0

 Replace BlockManager listener with Executor listener in 
 ExecutorAllocationListener
 --

 Key: SPARK-5593
 URL: https://issues.apache.org/jira/browse/SPARK-5593
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Lianhui Wang
Assignee: Lianhui Wang
 Fix For: 1.3.0


 More strictly, in ExecutorAllocationListener, we need to replace 
 onBlockManagerAdded, onBlockManagerRemoved with 
 onExecutorAdded,onExecutorRemoved. because at some time, onExecutorAdded and 
 onExecutorRemoved are more accurate to express these meanings. example at 
 SPARK-5529, BlockManager has been removed,but executor is existed.
 [~andrewor14] [~sandyr] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5653) in ApplicationMaster rename isDriver to isClusterMode


 [ 
https://issues.apache.org/jira/browse/SPARK-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5653:
-
Affects Version/s: 1.2.0

 in ApplicationMaster rename isDriver to isClusterMode
 -

 Key: SPARK-5653
 URL: https://issues.apache.org/jira/browse/SPARK-5653
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Lianhui Wang
Assignee: Lianhui Wang
 Fix For: 1.3.0


 in ApplicationMaster rename isDriver to isClusterMode,because in Client it 
 uses isClusterMode,ApplicationMaster should keep consistent with it and uses 
 isClusterMode.isClusterMode is easier to understand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5470) use defaultClassLoader of Serializer to load classes of classesToRegister in KryoSerializer

2015-02-06 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-5470:
--
Fix Version/s: 1.3.0

 use defaultClassLoader of Serializer to load classes of classesToRegister in 
 KryoSerializer
 ---

 Key: SPARK-5470
 URL: https://issues.apache.org/jira/browse/SPARK-5470
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Lianhui Wang
Assignee: Lianhui Wang
 Fix For: 1.3.0, 1.4.0


 Now KryoSerializer load classes of classesToRegister at the time of its 
 initialization. when we set spark.kryo.classesToRegister=class1, it will 
 throw  SparkException(Failed to load class to register with Kryo.
 because in KryoSerializer's initialization, classLoader cannot include class 
 of user's jars.
 we need to use defaultClassLoader of Serializer in newKryo(), because 
 executor will reset defaultClassLoader of Serializer after Serializer's 
 initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5593) Replace BlockManager listener with Executor listener in ExecutorAllocationListener


 [ 
https://issues.apache.org/jira/browse/SPARK-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5593:
-
Affects Version/s: 1.2.0

 Replace BlockManager listener with Executor listener in 
 ExecutorAllocationListener
 --

 Key: SPARK-5593
 URL: https://issues.apache.org/jira/browse/SPARK-5593
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Lianhui Wang

 More strictly, in ExecutorAllocationListener, we need to replace 
 onBlockManagerAdded, onBlockManagerRemoved with 
 onExecutorAdded,onExecutorRemoved. because at some time, onExecutorAdded and 
 onExecutorRemoved are more accurate to express these meanings. example at 
 SPARK-5529, BlockManager has been removed,but executor is existed.
 [~andrewor14] [~sandyr] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5653) in ApplicationMaster rename isDriver to isClusterMode


 [ 
https://issues.apache.org/jira/browse/SPARK-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5653:
-
Affects Version/s: (was: 1.2.0)
   1.0.0

 in ApplicationMaster rename isDriver to isClusterMode
 -

 Key: SPARK-5653
 URL: https://issues.apache.org/jira/browse/SPARK-5653
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Lianhui Wang
Assignee: Lianhui Wang
 Fix For: 1.3.0


 in ApplicationMaster rename isDriver to isClusterMode,because in Client it 
 uses isClusterMode,ApplicationMaster should keep consistent with it and uses 
 isClusterMode.isClusterMode is easier to understand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5653) in ApplicationMaster rename isDriver to isClusterMode


 [ 
https://issues.apache.org/jira/browse/SPARK-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5653.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Lianhui Wang
Target Version/s: 1.3.0

 in ApplicationMaster rename isDriver to isClusterMode
 -

 Key: SPARK-5653
 URL: https://issues.apache.org/jira/browse/SPARK-5653
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Lianhui Wang
Assignee: Lianhui Wang
 Fix For: 1.3.0


 in ApplicationMaster rename isDriver to isClusterMode,because in Client it 
 uses isClusterMode,ApplicationMaster should keep consistent with it and uses 
 isClusterMode.isClusterMode is easier to understand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5396) Syntax error in spark scripts on windows.


 [ 
https://issues.apache.org/jira/browse/SPARK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5396.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Masayoshi TSUZUKI
Target Version/s: 1.3.0  (was: 1.2.0)

 Syntax error in spark scripts on windows.
 -

 Key: SPARK-5396
 URL: https://issues.apache.org/jira/browse/SPARK-5396
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.2.0
 Environment: Window 7 and Window 8.1.
Reporter: Vladimir Protsenko
Assignee: Masayoshi TSUZUKI
Priority: Critical
 Fix For: 1.3.0

 Attachments: windows7.png, windows8.1.png


 I made the following steps: 
 1. downloaded and installed Scala 2.11.5 
 2. downloaded spark 1.2.0 by git clone git://github.com/apache/spark.git 
 3. run dev/change-version-to-2.11.sh and mvn -Dscala-2.11 -DskipTests clean 
 package (in git bash) 
 After installation tried to run spark-shell.cmd in cmd shell and it says 
 there is a syntax error in file. The same with spark-shell2.cmd, 
 spark-submit.cmd and  spark-submit2.cmd.
 !windows7.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5656) NegativeArraySizeException in EigenValueDecomposition.symmetricEigs for large n and/or large k


[ 
https://issues.apache.org/jira/browse/SPARK-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309684#comment-14309684
 ] 

Apache Spark commented on SPARK-5656:
-

User 'mbittmann' has created a pull request for this issue:
https://github.com/apache/spark/pull/4433

 NegativeArraySizeException in EigenValueDecomposition.symmetricEigs for large 
 n and/or large k
 --

 Key: SPARK-5656
 URL: https://issues.apache.org/jira/browse/SPARK-5656
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Mark Bittmann
Priority: Minor

 Large values of n or k in EigenValueDecomposition.symmetricEigs will fail 
 with a NegativeArraySizeException. Specifically, this occurs when 2*n*k  
 Integer.MAX_VALUE. These values are currently unchecked and allow for the 
 array to be initialized to a value greater than Integer.MAX_VALUE. I have 
 written the below 'require' to fail this condition gracefully. I will submit 
 a pull request. 
 require(ncv * n.toLong  Integer.MAX_VALUE, Product of 2*k*n must be smaller 
 than  +
   sInteger.MAX_VALUE. Found required eigenvalues k = $k and matrix 
 dimension n = $n)
 Here is the exception that occurs from computeSVD with large k and/or n: 
 Exception in thread main java.lang.NegativeArraySizeException
   at 
 org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:85)
   at 
 org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:258)
   at 
 org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:190)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-540) Add API to customize in-memory representation of RDDs


 [ 
https://issues.apache.org/jira/browse/SPARK-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-540:
---
Component/s: Spark Core

 Add API to customize in-memory representation of RDDs
 -

 Key: SPARK-540
 URL: https://issues.apache.org/jira/browse/SPARK-540
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Matei Zaharia

 Right now the choice between serialized caching and just Java objects in dev 
 is fine, but it might be cool to also support structures such as 
 column-oriented storage through arrays of primitives without forcing it 
 through the serialization interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4705) Driver retries in yarn-cluster mode always fail if event logging is enabled


 [ 
https://issues.apache.org/jira/browse/SPARK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4705:
-
Target Version/s: 1.4.0

 Driver retries in yarn-cluster mode always fail if event logging is enabled
 ---

 Key: SPARK-4705
 URL: https://issues.apache.org/jira/browse/SPARK-4705
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin

 yarn-cluster mode will retry to run the driver in certain failure modes. If 
 even logging is enabled, this will most probably fail, because:
 {noformat}
 Exception in thread Driver java.io.IOException: Log directory 
 hdfs://vanzin-krb-1.vpc.cloudera.com:8020/user/spark/applicationHistory/application_1417554558066_0003
  already exists!
 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
 at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
 at 
 org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
 at org.apache.spark.SparkContext.init(SparkContext.scala:353)
 {noformat}
 The even log path should be more unique. Or perhaps retries of the same app 
 should clean up the old logs first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5619) Support 'show roles' in HiveContext


 [ 
https://issues.apache.org/jira/browse/SPARK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5619.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4397
[https://github.com/apache/spark/pull/4397]

 Support 'show roles' in HiveContext
 ---

 Key: SPARK-5619
 URL: https://issues.apache.org/jira/browse/SPARK-5619
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yadong Qi
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5657) Add PySpark Avro Output Format example

2015-02-06 Thread Stanislav Los (JIRA)

Stanislav Los created SPARK-5657:


 Summary: Add PySpark Avro Output Format example
 Key: SPARK-5657
 URL: https://issues.apache.org/jira/browse/SPARK-5657
 Project: Spark
  Issue Type: Improvement
Reporter: Stanislav Los


There is an Avro Input Format example that shows how to read Avro data in 
PySpark, but nothing shows how to write from PySpark to Avro. The main 
challenge, a Converter needs an Avro schema to build a record, but current 
Spark API doesn't provide a way to supply extra parameters to custom 
converters. Provided workaround is possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5278) check ambiguous reference to fields in Spark SQL is incompleted


 [ 
https://issues.apache.org/jira/browse/SPARK-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5278.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4068
[https://github.com/apache/spark/pull/4068]

 check ambiguous reference to fields in Spark SQL is incompleted
 ---

 Key: SPARK-5278
 URL: https://issues.apache.org/jira/browse/SPARK-5278
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
 Fix For: 1.3.0


 at hive context
 for json string like
 {code}{a: {b: 1, B: 2}}{code}
 The SQL `SELECT a.b from t` will report error for ambiguous reference to 
 fields.
 But for json string like
 {code}{a: [{b: 1, B: 2}]}{code}
 The SQL `SELECT a[0].b from t` will pass and pick the first `b`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5416) Initialize Executor.threadPool before ExecutorSource

2015-02-06 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-5416:
--
Fix Version/s: 1.3.0

 Initialize Executor.threadPool before ExecutorSource
 

 Key: SPARK-5416
 URL: https://issues.apache.org/jira/browse/SPARK-5416
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Ryan Williams
Assignee: Ryan Williams
Priority: Minor
 Fix For: 1.3.0, 1.4.0


 I recently saw some NPEs from 
 [{{ExecutorSource:44}}|https://github.com/apache/spark/blob/0497ea51ac345f8057d222a18dbbf8eae78f5b92/core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala#L44]
  in the first couple seconds of my executors' being initialized.
 I think that {{ExecutorSource}} was trying to report these metrics before its 
 threadpool was initialized; there are a few LoC between the source being 
 registered 
 ([Executor.scala:82|https://github.com/apache/spark/blob/0497ea51ac345f8057d222a18dbbf8eae78f5b92/core/src/main/scala/org/apache/spark/executor/Executor.scala#L82])
  and the threadpool being initialized 
 ([Executor.scala:106|https://github.com/apache/spark/blob/0497ea51ac345f8057d222a18dbbf8eae78f5b92/core/src/main/scala/org/apache/spark/executor/Executor.scala#L106]).
 We should initialize the threapool before the ExecutorSource is registered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5396) Syntax error in spark scripts on windows.


 [ 
https://issues.apache.org/jira/browse/SPARK-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5396:
-
Affects Version/s: (was: 1.2.0)
   1.3.0

 Syntax error in spark scripts on windows.
 -

 Key: SPARK-5396
 URL: https://issues.apache.org/jira/browse/SPARK-5396
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.3.0
 Environment: Window 7 and Window 8.1.
Reporter: Vladimir Protsenko
Assignee: Masayoshi TSUZUKI
Priority: Critical
 Fix For: 1.3.0

 Attachments: windows7.png, windows8.1.png


 I made the following steps: 
 1. downloaded and installed Scala 2.11.5 
 2. downloaded spark 1.2.0 by git clone git://github.com/apache/spark.git 
 3. run dev/change-version-to-2.11.sh and mvn -Dscala-2.11 -DskipTests clean 
 package (in git bash) 
 After installation tried to run spark-shell.cmd in cmd shell and it says 
 there is a syntax error in file. The same with spark-shell2.cmd, 
 spark-submit.cmd and  spark-submit2.cmd.
 !windows7.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5603) Preinsert casting and renaming rule is needed in the Analyzer


 [ 
https://issues.apache.org/jira/browse/SPARK-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5603.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4373
[https://github.com/apache/spark/pull/4373]

 Preinsert casting and renaming rule is needed in the Analyzer
 -

 Key: SPARK-5603
 URL: https://issues.apache.org/jira/browse/SPARK-5603
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Priority: Blocker
 Fix For: 1.3.0


 For an INSERT INTO/OVERWRITE statement, we should add necessary Cast and 
 Alias to the output of the query.
 {code}
 CREATE TEMPORARY TABLE jsonTable (a int, b string)
 USING org.apache.spark.sql.json.DefaultSource
 OPTIONS (
   path '...'
 )
 INSERT OVERWRITE TABLE jsonTable SELECT a * 2, a * 4 FROM table
 {code}
 For a*2, we should create an Alias, so the InsertableRelation can know it is 
 the column a. For a*4, it is actually the column b in jsonTable. We should 
 first cast it to StringType and add an Alias b to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5656) NegativeArraySizeException in EigenValueDecomposition.symmetricEigs for large n and/or large k

2015-02-06 Thread Mark Bittmann (JIRA)

Mark Bittmann created SPARK-5656:


 Summary: NegativeArraySizeException in 
EigenValueDecomposition.symmetricEigs for large n and/or large k
 Key: SPARK-5656
 URL: https://issues.apache.org/jira/browse/SPARK-5656
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Mark Bittmann
Priority: Minor


Large values of n or k in EigenValueDecomposition.symmetricEigs will fail with 
a NegativeArraySizeException. Specifically, this occurs when 2*n*k  
Integer.MAX_VALUE. These values are currently unchecked and allow for the array 
to be initialized to a value greater than Integer.MAX_VALUE. I have written the 
below 'require' to fail this condition gracefully. I will submit a pull 
request. 

require(ncv * n  Integer.MAX_VALUE, Product of 2*k*n must be smaller than  +
  sInteger.MAX_VALUE. Found required eigenvalues k = $k and matrix 
dimension n = $n)


Here is the exception that occurs from computeSVD with large k and/or n: 

Exception in thread main java.lang.NegativeArraySizeException
at 
org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:85)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:258)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:190)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3956) Python API for Distributed Matrix


 [ 
https://issues.apache.org/jira/browse/SPARK-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-3956:

Component/s: PySpark

 Python API for Distributed Matrix
 -

 Key: SPARK-3956
 URL: https://issues.apache.org/jira/browse/SPARK-3956
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Minor

 Python API for distributed matrix



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode

[
https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Rowson updated SPARK-5655:
-
Description:
When running a Spark job on a YARN cluster which doesn't run containers under
the same user as the nodemanager, and also when using the YARN auxiliary
shuffle service, jobs fail with something similar to:
{code:java}
java.io.FileNotFoundException:
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
(Permission denied)
{code}

The root cause of this here:
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

Spark will attempt to chmod 700 any application directories it creates during
the job, which includes files created in the nodemanager's usercache directory.
The owner of these files is the container UID, which on a secure cluster is the
name of the user creating the job, and on an nonsecure cluster but with the
yarn.nodemanager.container-executor.class configured is the value of
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.

The problem with this is that the auxiliary shuffle manager runs as part of the
nodemanager, which is typically running as the user 'yarn'. This can't access
these files that are only owner-readable.

YARN already attempts to secure files created under appcache but keep them
readable by the nodemanager, by setting the group of the appcache directory to
'yarn' and also setting the setgid flag. This means that files and directories
created under this should also have the 'yarn' group. Normally this means that
the nodemanager should also be able to read these files, but Spark setting
chmod700 wipes this out.

I'm not sure what the right approach is here. Commenting out the chmod700
functionality makes this work on YARN, and still makes the application files
only readable by the owner and the group:

{code}
/data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
# ls -lah
total 206M
drwxr-s--- 2 nobody yarn 4.0K Feb 6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb 6 18:30 ..
-rw-r- 1 nobody yarn 206M Feb 6 18:30 shuffle_0_0_0.data
{code}
But this may not be the right approach on non-YARN. Perhaps an additional step
to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I
don't have a non-YARN environment to test, otherwise I'd be able to suggest a
patch.

I believe this is a related issue in the MapReduce framwork:
https://issues.apache.org/jira/browse/MAPREDUCE-3728

was:
When running a Spark job on a YARN cluster which doesn't run containers under
the same user as the nodemanager, and also when using the YARN auxiliary
shuffle service, jobs fail with something similar to:

java.io.FileNotFoundException:
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
(Permission denied)

The root cause of this here:
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

The problem with this is that the auxiliary shuffle manager runs as part of the
nodemanager, which is typically running as the user 'yarn'. This can't access
these files that are only owner-readable.

I'm not sure what the right approach is here. Commenting out the chmod700
functionality makes this work on YARN, and still makes the application files
only readable by the owner and the group:

data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
# ls -lah
total 206M
drwxr-s--- 2 nobody yarn 4.0K Feb 6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb 6 18:30 ..
-rw-r- 1 nobody yarn 206M Feb 6 18:30 shuffle_0_0_0.data

But this may not be the right approach on non-YARN. Perhaps an additional step
to see if this chmod700 step is necessary (ie non-YARN) is required.

[jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode

[
https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

java.io.FileNotFoundException:
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
(Permission denied)

The root cause of this here:
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

The problem with this is that the auxiliary shuffle manager runs as part of the
nodemanager, which is typically running as the user 'yarn'. This can't access
these files that are only owner-readable.

I'm not sure what the right approach is here. Commenting out the chmod700
functionality makes this work on YARN, and still makes the application files
only readable by the owner and the group:

But this may not be the right approach on non-YARN. Perhaps an additional step
to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I
don't have a non-YARN environment to test, otherwise I'd be able to suggest a
patch.

I believe this is a related issue in the MapReduce framwork:
https://issues.apache.org/jira/browse/MAPREDUCE-3728

java.io.FileNotFoundException:
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
(Permission denied)

The root cause of this here:
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

The problem with this is that the auxiliary shuffle manager runs as part of the
nodemanager, which is typically running as the user 'yarn'. This can't access
these files that are only owner-readable.

I'm not sure what the right approach is here. Commenting out the chmod700
functionality makes this work on YARN, and still makes the application files
only readable by the owner and the group:

But this may not be the right approach on non-YARN. Perhaps an additional step
to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I
don't have a non-YARN

[jira] [Commented] (SPARK-1799) Add init script to the debian packaging


[ 
https://issues.apache.org/jira/browse/SPARK-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309753#comment-14309753
 ] 

Nicholas Chammas commented on SPARK-1799:
-

cc [~markhamstra], [~srowen], [~pwendell]

 Add init script to the debian packaging
 ---

 Key: SPARK-1799
 URL: https://issues.apache.org/jira/browse/SPARK-1799
 Project: Spark
  Issue Type: New Feature
Reporter: Nicolas Lalevée

 See https://github.com/apache/spark/pull/733



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

2015-02-06 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309784#comment-14309784
 ] 

Patrick Wendell commented on SPARK-5388:


On DELETE, I'll defer to you guys, have zero strong feelings either way.

 Provide a stable application submission gateway in standalone cluster mode
 --

 Key: SPARK-5388
 URL: https://issues.apache.org/jira/browse/SPARK-5388
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Blocker
 Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf


 The existing submission gateway in standalone mode is not compatible across 
 Spark versions. If you have a newer version of Spark submitting to an older 
 version of the standalone Master, it is currently not guaranteed to work. The 
 goal is to provide a stable REST interface to replace this channel.
 For more detail, please see the most recent design doc attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5595) In memory data cache should be invalidated after insert into/overwrite


 [ 
https://issues.apache.org/jira/browse/SPARK-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5595.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4373
[https://github.com/apache/spark/pull/4373]

 In memory data cache should be invalidated after insert into/overwrite
 --

 Key: SPARK-5595
 URL: https://issues.apache.org/jira/browse/SPARK-5595
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Priority: Blocker
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-4337) Add ability to cancel pending requests to YARN


 [ 
https://issues.apache.org/jira/browse/SPARK-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4337.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Sandy Ryza
Target Version/s: 1.3.0

 Add ability to cancel pending requests to YARN
 --

 Key: SPARK-4337
 URL: https://issues.apache.org/jira/browse/SPARK-4337
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 1.3.0


 This will be useful for things like SPARK-4136



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode


 [ 
https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Rowson updated SPARK-5655:
-
Description: 
When running a Spark job on a YARN cluster which doesn't run containers under 
the same user as the nodemanager, and also when using the YARN auxiliary 
shuffle service, jobs fail with something similar to:
{code|borderStyle=solid}
java.io.FileNotFoundException: 
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
 (Permission denied)
{/code}

The root cause of this here: 
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

Spark will attempt to chmod 700 any application directories it creates during 
the job, which includes files created in the nodemanager's usercache directory. 
The owner of these files is the container UID, which on a secure cluster is the 
name of the user creating the job, and on an nonsecure cluster but with the 
yarn.nodemanager.container-executor.class configured is the value of 
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.

The problem with this is that the auxiliary shuffle manager runs as part of the 
nodemanager, which is typically running as the user 'yarn'. This can't access 
these files that are only owner-readable.

YARN already attempts to secure files created under appcache but keep them 
readable by the nodemanager, by setting the group of the appcache directory to 
'yarn' and also setting the setgid flag. This means that files and directories 
created under this should also have the 'yarn' group. Normally this means that 
the nodemanager should also be able to read these files, but Spark setting 
chmod700 wipes this out.

I'm not sure what the right approach is here. Commenting out the chmod700 
functionality makes this work on YARN, and still makes the application files 
only readable by the owner and the group:

data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
 # ls -lah
total 206M
drwxr-s---  2 nobody yarn 4.0K Feb  6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb  6 18:30 ..
-rw-r-  1 nobody yarn 206M Feb  6 18:30 shuffle_0_0_0.data

But this may not be the right approach on non-YARN. Perhaps an additional step 
to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I 
don't have a non-YARN environment to test, otherwise I'd be able to suggest a 
patch.

I believe this is a related issue in the MapReduce framwork: 
https://issues.apache.org/jira/browse/MAPREDUCE-3728

  was:
When running a Spark job on a YARN cluster which doesn't run containers under 
the same user as the nodemanager, and also when using the YARN auxiliary 
shuffle service, jobs fail with something similar to:

java.io.FileNotFoundException: 
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
 (Permission denied)

The root cause of this here: 
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

Spark will attempt to chmod 700 any application directories it creates during 
the job, which includes files created in the nodemanager's usercache directory. 
The owner of these files is the container UID, which on a secure cluster is the 
name of the user creating the job, and on an nonsecure cluster but with the 
yarn.nodemanager.container-executor.class configured is the value of 
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.

The problem with this is that the auxiliary shuffle manager runs as part of the 
nodemanager, which is typically running as the user 'yarn'. This can't access 
these files that are only owner-readable.

YARN already attempts to secure files created under appcache but keep them 
readable by the nodemanager, by setting the group of the appcache directory to 
'yarn' and also setting the setgid flag. This means that files and directories 
created under this should also have the 'yarn' group. Normally this means that 
the nodemanager should also be able to read these files, but Spark setting 
chmod700 wipes this out.

I'm not sure what the right approach is here. Commenting out the chmod700 
functionality makes this work on YARN, and still makes the application files 
only readable by the owner and the group:

data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
 # ls -lah
total 206M
drwxr-s---  2 nobody yarn 4.0K Feb  6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb  6 18:30 ..
-rw-r-  1 nobody yarn 206M Feb  6 18:30 shuffle_0_0_0.data

But this may not be the right approach on non-YARN. Perhaps an additional step 
to see if this chmod700 step is necessary (ie non-YARN) is

[jira] [Updated] (SPARK-5656) NegativeArraySizeException in EigenValueDecomposition.symmetricEigs for large n and/or large k

2015-02-06 Thread Mark Bittmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bittmann updated SPARK-5656:
-
Description: 
Large values of n or k in EigenValueDecomposition.symmetricEigs will fail with 
a NegativeArraySizeException. Specifically, this occurs when 2*n*k  
Integer.MAX_VALUE. These values are currently unchecked and allow for the array 
to be initialized to a value greater than Integer.MAX_VALUE. I have written the 
below 'require' to fail this condition gracefully. I will submit a pull 
request. 

require(ncv * n.toLong  Integer.MAX_VALUE, Product of 2*k*n must be smaller 
than  +
  sInteger.MAX_VALUE. Found required eigenvalues k = $k and matrix 
dimension n = $n)


Here is the exception that occurs from computeSVD with large k and/or n: 

Exception in thread main java.lang.NegativeArraySizeException
at 
org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:85)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:258)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:190)

  was:
Large values of n or k in EigenValueDecomposition.symmetricEigs will fail with 
a NegativeArraySizeException. Specifically, this occurs when 2*n*k  
Integer.MAX_VALUE. These values are currently unchecked and allow for the array 
to be initialized to a value greater than Integer.MAX_VALUE. I have written the 
below 'require' to fail this condition gracefully. I will submit a pull 
request. 

require(ncv * n  Integer.MAX_VALUE, Product of 2*k*n must be smaller than  +
  sInteger.MAX_VALUE. Found required eigenvalues k = $k and matrix 
dimension n = $n)


Here is the exception that occurs from computeSVD with large k and/or n: 

Exception in thread main java.lang.NegativeArraySizeException
at 
org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:85)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:258)
at 
org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:190)


 NegativeArraySizeException in EigenValueDecomposition.symmetricEigs for large 
 n and/or large k
 --

 Key: SPARK-5656
 URL: https://issues.apache.org/jira/browse/SPARK-5656
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Mark Bittmann
Priority: Minor

 Large values of n or k in EigenValueDecomposition.symmetricEigs will fail 
 with a NegativeArraySizeException. Specifically, this occurs when 2*n*k  
 Integer.MAX_VALUE. These values are currently unchecked and allow for the 
 array to be initialized to a value greater than Integer.MAX_VALUE. I have 
 written the below 'require' to fail this condition gracefully. I will submit 
 a pull request. 
 require(ncv * n.toLong  Integer.MAX_VALUE, Product of 2*k*n must be smaller 
 than  +
   sInteger.MAX_VALUE. Found required eigenvalues k = $k and matrix 
 dimension n = $n)
 Here is the exception that occurs from computeSVD with large k and/or n: 
 Exception in thread main java.lang.NegativeArraySizeException
   at 
 org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:85)
   at 
 org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:258)
   at 
 org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:190)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5618) Optimise utility code.


 [ 
https://issues.apache.org/jira/browse/SPARK-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5618.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: Makoto Fukuhara
Target Version/s: 1.3.0

 Optimise utility code.
 --

 Key: SPARK-5618
 URL: https://issues.apache.org/jira/browse/SPARK-5618
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: Makoto Fukuhara
Assignee: Makoto Fukuhara
Priority: Minor
 Fix For: 1.3.0


 I refactored the evaluation timing and unnecessary Regex API call.
 Because Regex API is heavy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-560) Specialize RDDs / iterators


 [ 
https://issues.apache.org/jira/browse/SPARK-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-560:
---
Component/s: Spark Core

 Specialize RDDs / iterators
 ---

 Key: SPARK-560
 URL: https://issues.apache.org/jira/browse/SPARK-560
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Matei Zaharia

 When you're working on in-memory data, the overhead of boxing / unboxing 
 starts to matter, and it looks like specializing would give a 2-4x speedup. 
 We can't just throw in @specialized though because Scala's Iterator is not 
 specialized. We probably need to make our own and also ensure that the right 
 methods get called remotely when you have a chain of RDDs (i.e. it doesn't 
 lose its specialization).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

2015-02-06 Thread Patrick Wendell (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309825#comment-14309825
]

Patrick Wendell commented on SPARK-5388:

One the boolean and numeric values. I don't mind one way or the other how they
are handled programmatically (since we are not exposing this). However, it does
seem weird that in the wire protocol defines these as string types. I looked at
a few other API's, Github, Twitter, etc and they all use proper boolean types.
So I'd definitely recommend setting them as proper types in the JSON, and if
that's easier to do by making them nullable Boolean and Long values, seems like
a good approach.

Provide a stable application submission gateway in standalone cluster mode
--

Key: SPARK-5388
URL: https://issues.apache.org/jira/browse/SPARK-5388
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.2.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Blocker
Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf

The existing submission gateway in standalone mode is not compatible across
Spark versions. If you have a newer version of Spark submitting to an older
version of the standalone Master, it is currently not guaranteed to work. The
goal is to provide a stable REST interface to replace this channel.
For more detail, please see the most recent design doc attached.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5324) Results of describe can't be queried


 [ 
https://issues.apache.org/jira/browse/SPARK-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5324.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 4249
[https://github.com/apache/spark/pull/4249]

 Results of describe can't be queried
 

 Key: SPARK-5324
 URL: https://issues.apache.org/jira/browse/SPARK-5324
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Michael Armbrust
 Fix For: 1.3.0


 {code}
 sql(DESCRIBE TABLE test).registerTempTable(describeTest)
 sql(SELECT * FROM describeTest).collect()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5628) Add option to return spark-ec2 version


 [ 
https://issues.apache.org/jira/browse/SPARK-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5628:

Fix Version/s: 1.2.2

 Add option to return spark-ec2 version
 --

 Key: SPARK-5628
 URL: https://issues.apache.org/jira/browse/SPARK-5628
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
  Labels: backport-needed
 Fix For: 1.3.0, 1.2.2, 1.4.0


 We need a {{--version}} option for {{spark-ec2}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5628) Add option to return spark-ec2 version


 [ 
https://issues.apache.org/jira/browse/SPARK-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-5628:

Labels: backport-needed  (was: )

 Add option to return spark-ec2 version
 --

 Key: SPARK-5628
 URL: https://issues.apache.org/jira/browse/SPARK-5628
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor
  Labels: backport-needed
 Fix For: 1.3.0, 1.2.2, 1.4.0


 We need a {{--version}} option for {{spark-ec2}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5636) Lower dynamic allocation add interval


 [ 
https://issues.apache.org/jira/browse/SPARK-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-5636.

   Resolution: Fixed
Fix Version/s: 1.3.0

 Lower dynamic allocation add interval
 -

 Key: SPARK-5636
 URL: https://issues.apache.org/jira/browse/SPARK-5636
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Andrew Or
Assignee: Andrew Or
 Fix For: 1.3.0


 The current default of 1 min is a little long especially since a recent patch 
 causes the number of executors to start at 0 by default. We should ramp up 
 much more quickly in the beginning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5655) YARN Auxiliary Shuffle service can't access shuffle files on Hadoop cluster configured in secure mode


 [ 
https://issues.apache.org/jira/browse/SPARK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Rowson updated SPARK-5655:
-
Description: 
When running a Spark job on a YARN cluster which doesn't run containers under 
the same user as the nodemanager, and also when using the YARN auxiliary 
shuffle service, jobs fail with something similar to:

java.io.FileNotFoundException: 
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
 (Permission denied)


The root cause of this here: 
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

Spark will attempt to chmod 700 any application directories it creates during 
the job, which includes files created in the nodemanager's usercache directory. 
The owner of these files is the container UID, which on a secure cluster is the 
name of the user creating the job, and on an nonsecure cluster but with the 
yarn.nodemanager.container-executor.class configured is the value of 
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.

The problem with this is that the auxiliary shuffle manager runs as part of the 
nodemanager, which is typically running as the user 'yarn'. This can't access 
these files that are only owner-readable.

YARN already attempts to secure files created under appcache but keep them 
readable by the nodemanager, by setting the group of the appcache directory to 
'yarn' and also setting the setgid flag. This means that files and directories 
created under this should also have the 'yarn' group. Normally this means that 
the nodemanager should also be able to read these files, but Spark setting 
chmod700 wipes this out.

I'm not sure what the right approach is here. Commenting out the chmod700 
functionality makes this work on YARN, and still makes the application files 
only readable by the owner and the group:

data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
 # ls -lah
total 206M
drwxr-s---  2 nobody yarn 4.0K Feb  6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb  6 18:30 ..
-rw-r-  1 nobody yarn 206M Feb  6 18:30 shuffle_0_0_0.data

But this may not be the right approach on non-YARN. Perhaps an additional step 
to see if this chmod700 step is necessary (ie non-YARN) is required. Sadly, I 
don't have a non-YARN environment to test, otherwise I'd be able to suggest a 
patch.

I believe this is a related issue in the MapReduce framwork: 
https://issues.apache.org/jira/browse/MAPREDUCE-3728

  was:
When running a Spark job on a YARN cluster which doesn't run containers under 
the same user as the nodemanager, and also when using the YARN auxiliary 
shuffle service, jobs fail with something similar to:
{code|borderStyle=solid}
java.io.FileNotFoundException: 
/data/9/yarn/nm/usercache/username/appcache/application_1423069181231_0032/spark-c434a703-7368-4a05-9e99-41e77e564d1d/3e/shuffle_0_0_0.index
 (Permission denied)
{/code}

The root cause of this here: 
https://github.com/apache/spark/blob/branch-1.2/core/src/main/scala/org/apache/spark/util/Utils.scala#L287

Spark will attempt to chmod 700 any application directories it creates during 
the job, which includes files created in the nodemanager's usercache directory. 
The owner of these files is the container UID, which on a secure cluster is the 
name of the user creating the job, and on an nonsecure cluster but with the 
yarn.nodemanager.container-executor.class configured is the value of 
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.

The problem with this is that the auxiliary shuffle manager runs as part of the 
nodemanager, which is typically running as the user 'yarn'. This can't access 
these files that are only owner-readable.

YARN already attempts to secure files created under appcache but keep them 
readable by the nodemanager, by setting the group of the appcache directory to 
'yarn' and also setting the setgid flag. This means that files and directories 
created under this should also have the 'yarn' group. Normally this means that 
the nodemanager should also be able to read these files, but Spark setting 
chmod700 wipes this out.

I'm not sure what the right approach is here. Commenting out the chmod700 
functionality makes this work on YARN, and still makes the application files 
only readable by the owner and the group:

data/1/yarn/nm/usercache/username/appcache/application_1423247249655_0001/spark-c7a6fc0f-e5df-49cf-a8f5-e51a1ca087df/0c
 # ls -lah
total 206M
drwxr-s---  2 nobody yarn 4.0K Feb  6 18:30 .
drwxr-s--- 12 nobody yarn 4.0K Feb  6 18:30 ..
-rw-r-  1 nobody yarn 206M Feb  6 18:30 shuffle_0_0_0.data

But this may not be the right approach on non-YARN. Perhaps an additional step 
to see if this chmod700 step is necessary (ie non-YARN) is

[jira] [Commented] (SPARK-4877) userClassPathFirst doesn't handle user classes inheriting from parent

2015-02-06 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309666#comment-14309666
 ] 

Josh Rosen commented on SPARK-4877:
---

I've gone ahead and committed this PR because it fixes a known bug and adds a 
new test case.  Both the old and new code overloaded findClass; I think the 
findClass vs. loadClass change is related to the this JIRA, but kind of 
orthogonal to the fix here.  If you think that we should re-work our 
classloader to change its overriding strategy, let's do that in a separate 
followup PR.

 userClassPathFirst doesn't handle user classes inheriting from parent
 -

 Key: SPARK-4877
 URL: https://issues.apache.org/jira/browse/SPARK-4877
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Stephen Haberman
Assignee: Stephen Haberman
 Fix For: 1.3.0


 We're trying out userClassPathFirst.
 To do so, we make an uberjar that does not contain Spark or Scala classes 
 (because we want those to load from the parent classloader, otherwise we'll 
 get errors like scala.Function0 != scala.Function0 since they'd load from 
 different class loaders).
 (Tangentially, some isolation classloaders like Jetty whitelist certain 
 packages, like spark/* and scala/*, to only come from the parent classloader, 
 so that technically if the user still messes up and leaks the Scala/Spark 
 jars into their uberjar, it won't blow up; this would be a good enhancement, 
 I think.)
 Anyway, we have a custom Kryo registrar, which ships in our uberjar, but 
 since it extends spark.KryoRegistrator, which is not in our uberjar, we get 
 a ClassNotFoundException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2945) Allow specifying num of executors in the context configuration


 [ 
https://issues.apache.org/jira/browse/SPARK-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-2945.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: WangTaoTheTonic
Target Version/s: 1.3.0

 Allow specifying num of executors in the context configuration
 --

 Key: SPARK-2945
 URL: https://issues.apache.org/jira/browse/SPARK-2945
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.0.0
 Environment: Ubuntu precise, on YARN (CDH 5.1.0)
Reporter: Shay Rojansky
Assignee: WangTaoTheTonic
 Fix For: 1.3.0


 Running on YARN, the only way to specify the number of executors seems to be 
 on the command line of spark-submit, via the --num-executors switch.
 In many cases this is too early. Our Spark app receives some cmdline 
 arguments which determine the amount of work that needs to be done - and that 
 affects the number of executors it ideally requires. Ideally, the Spark 
 context configuration would support specifying this like any other config 
 param.
 Our current workaround is a wrapper script that determines how much work is 
 needed, and which itself launches spark-submit with the number passed to 
 --num-executors - it's a shame to have to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5625) Spark binaries do not incude Spark Core

2015-02-06 Thread DeepakVohra (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309935#comment-14309935
 ] 

DeepakVohra commented on SPARK-5625:


Not clear if the assembly jar is to be extracted. Is the assembly jar to be 
extracted? Because if added as such to classpath the core classes are not found.

 Spark binaries do not incude Spark Core
 ---

 Key: SPARK-5625
 URL: https://issues.apache.org/jira/browse/SPARK-5625
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.2.0
 Environment: CDH4
Reporter: DeepakVohra

 Spark binaries for CDH 4 do not include the Spark Core Jar. 
 http://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5593) Replace BlockManager listener with Executor listener in ExecutorAllocationListener