[jira] [Commented] (SPARK-6368) Build a specialized serializer for Exchange operator.

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524117#comment-14524117
 ] 

Apache Spark commented on SPARK-6368:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/5849

 Build a specialized serializer for Exchange operator. 
 --

 Key: SPARK-6368
 URL: https://issues.apache.org/jira/browse/SPARK-6368
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Critical
 Fix For: 1.4.0

 Attachments: Kryo.nps, SchemaBased.nps


 Kryo is still pretty slow because it works on individual objects and relative 
 expensive to allocate. For Exchange operator, because the schema for key and 
 value are already defined, we can create a specialized serializer to handle 
 the specific schemas of key and value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6907) Create an isolated classloader for the Hive Client.

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6907:
---

Assignee: Apache Spark  (was: Michael Armbrust)

 Create an isolated classloader for the Hive Client.
 ---

 Key: SPARK-6907
 URL: https://issues.apache.org/jira/browse/SPARK-6907
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7316) Add step capability to RDD sliding window

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7316:
---

Assignee: Apache Spark

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
Assignee: Apache Spark
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7316) Add step capability to RDD sliding window

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7316:
---

Assignee: (was: Apache Spark)

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524309#comment-14524309
 ] 

Apache Spark commented on SPARK-7316:
-

User 'avulanov' has created a pull request for this issue:
https://github.com/apache/spark/pull/5855

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7216) Show driver details in Mesos cluster UI

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7216:
-
Affects Version/s: 1.4.0

 Show driver details in Mesos cluster UI
 ---

 Key: SPARK-7216
 URL: https://issues.apache.org/jira/browse/SPARK-7216
 Project: Spark
  Issue Type: Improvement
  Components: Mesos
Affects Versions: 1.4.0
Reporter: Timothy Chen
Assignee: Timothy Chen
 Fix For: 1.4.0


 Show driver details in Mesos cluster UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7216) Show driver details in Mesos cluster UI

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7216.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Assignee: Timothy Chen
Target Version/s: 1.4.0

 Show driver details in Mesos cluster UI
 ---

 Key: SPARK-7216
 URL: https://issues.apache.org/jira/browse/SPARK-7216
 Project: Spark
  Issue Type: Improvement
  Components: Mesos
Affects Versions: 1.4.0
Reporter: Timothy Chen
Assignee: Timothy Chen
 Fix For: 1.4.0


 Show driver details in Mesos cluster UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6229) Support SASL encryption in network/common module

2015-05-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-6229.

   Resolution: Fixed
Fix Version/s: 1.4.0
 Assignee: Marcelo Vanzin

 Support SASL encryption in network/common module
 

 Key: SPARK-6229
 URL: https://issues.apache.org/jira/browse/SPARK-6229
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Fix For: 1.4.0


 After SASL support has been added to network/common, supporting encryption 
 should be rather simple. Encryption is supported for DIGEST-MD5 and GSSAPI. 
 Since the latter requires a valid kerberos login to work (and so doesn't 
 really work with executors), encryption would require the use of DIGEST-MD5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2808) update kafka to version 0.8.2

2015-05-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-2808.
--
   Resolution: Fixed
Fix Version/s: 1.4.0
 Assignee: Cody Koeninger

 update kafka to version 0.8.2
 -

 Key: SPARK-2808
 URL: https://issues.apache.org/jira/browse/SPARK-2808
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Reporter: Anand Avati
Assignee: Cody Koeninger
 Fix For: 1.4.0


 First kafka_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7113) Add the direct stream related information to the streaming listener and web UI

2015-05-01 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524408#comment-14524408
 ] 

Tathagata Das commented on SPARK-7113:
--

[~jerryshao] Since this other sub task is done, can you create a PR for Kafka 
Direct to use the InputInfoTracker?



 Add the direct stream related information to the streaming listener and web UI
 --

 Key: SPARK-7113
 URL: https://issues.apache.org/jira/browse/SPARK-7113
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Saisai Shao
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7317) ShuffleHandle needs to be exposed

2015-05-01 Thread Mridul Muralidharan (JIRA)
Mridul Muralidharan created SPARK-7317:
--

 Summary: ShuffleHandle needs to be exposed
 Key: SPARK-7317
 URL: https://issues.apache.org/jira/browse/SPARK-7317
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Minor


ShuffleHandle is marked private[spark] - while a lot of code which depends on 
it, and exposes it, is DeveloperApi.
While the actual implementation can remain private[spark], the handle class 
itself should be exposed so that Rdd's can leverage it.

Example: 
a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle
b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager
c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as param 
: and can be used to write RDD's which leverage shuffle without needing to go 
through spark's shuffle based rdd's.

So all the machinery for custom RDD to leverage shuffle exists - except for 
specifying the ShuffleHandle class itself in dependencies.
This allows for customizations in user code on how to leverage shuffle.
For example, specialized join implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite

2015-05-01 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-7315:


 Summary: Flaky Test: WriteAheadLogBackedBlockRDDSuite
 Key: SPARK-7315
 URL: https://issues.apache.org/jira/browse/SPARK-7315
 Project: Spark
  Issue Type: Test
Reporter: Tathagata Das
Assignee: Tathagata Das






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14

2015-05-01 Thread Russell Alexander Spitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524372#comment-14524372
 ] 

Russell Alexander Spitzer commented on SPARK-6069:
--

Running with --conf spark.files.userClassPathFirst=true yields a different error

{code}
scala cc.sql(SELECT * FROM test.fun as a JOIN test.fun as b ON (a.k = 
b.v)).collect
15/05/01 17:24:34 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
10.0.2.15): java.lang.NoClassDefFoundError: org/apache/spark/Partition
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
at 
org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:50)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at 
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$findClass$1.apply(ExecutorClassLoader.scala:57)
at 
org.apache.spark.repl.ExecutorClassLoader$$anonfun$findClass$1.apply(ExecutorClassLoader.scala:57)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:57)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Partition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 

[jira] [Resolved] (SPARK-7309) Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler

2015-05-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-7309.
--
   Resolution: Fixed
Fix Version/s: 1.4.0
 Assignee: Shixiong Zhu

 Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler
 --

 Key: SPARK-7309
 URL: https://issues.apache.org/jira/browse/SPARK-7309
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Streaming
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2691:
-
Assignee: (was: Timothy Chen)

 Allow Spark on Mesos to be launched with Docker
 ---

 Key: SPARK-2691
 URL: https://issues.apache.org/jira/browse/SPARK-2691
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Timothy Chen
  Labels: mesos
 Attachments: spark-docker.patch


 Currently to launch Spark with Mesos one must upload a tarball and specifiy 
 the executor URI to be passed in that is to be downloaded on each slave or 
 even each execution depending coarse mode or not.
 We want to make Spark able to support launching Executors via a Docker image 
 that utilizes the recent Docker and Mesos integration work. 
 With the recent integration Spark can simply specify a Docker image and 
 options that is needed and it should continue to work as-is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-2691) Allow Spark on Mesos to be launched with Docker

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-2691.

   Resolution: Fixed
Fix Version/s: 1.4.0

 Allow Spark on Mesos to be launched with Docker
 ---

 Key: SPARK-2691
 URL: https://issues.apache.org/jira/browse/SPARK-2691
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Timothy Chen
Assignee: Chris Heller
  Labels: mesos
 Fix For: 1.4.0

 Attachments: spark-docker.patch


 Currently to launch Spark with Mesos one must upload a tarball and specifiy 
 the executor URI to be passed in that is to be downloaded on each slave or 
 even each execution depending coarse mode or not.
 We want to make Spark able to support launching Executors via a Docker image 
 that utilizes the recent Docker and Mesos integration work. 
 With the recent integration Spark can simply specify a Docker image and 
 options that is needed and it should continue to work as-is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2691:
-
Assignee: Chris Heller

 Allow Spark on Mesos to be launched with Docker
 ---

 Key: SPARK-2691
 URL: https://issues.apache.org/jira/browse/SPARK-2691
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Timothy Chen
Assignee: Chris Heller
  Labels: mesos
 Attachments: spark-docker.patch


 Currently to launch Spark with Mesos one must upload a tarball and specifiy 
 the executor URI to be passed in that is to be downloaded on each slave or 
 even each execution depending coarse mode or not.
 We want to make Spark able to support launching Executors via a Docker image 
 that utilizes the recent Docker and Mesos integration work. 
 With the recent integration Spark can simply specify a Docker image and 
 options that is needed and it should continue to work as-is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6166:
---

Assignee: (was: Apache Spark)

 Add config to limit number of concurrent outbound connections for shuffle 
 fetch
 ---

 Key: SPARK-6166
 URL: https://issues.apache.org/jira/browse/SPARK-6166
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Mridul Muralidharan
Priority: Minor

 spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of 
 size.
 But this is not always sufficient : when the number of hosts in the cluster 
 increase, this can lead to very large number of in-bound connections to one 
 more nodes - causing workers to fail under the load.
 I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on 
 number of outstanding outbound connections.
 This might still cause hotspots in the cluster, but in our tests this has 
 significantly reduced the occurance of worker failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6166:
---

Assignee: Apache Spark

 Add config to limit number of concurrent outbound connections for shuffle 
 fetch
 ---

 Key: SPARK-6166
 URL: https://issues.apache.org/jira/browse/SPARK-6166
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Mridul Muralidharan
Assignee: Apache Spark
Priority: Minor

 spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of 
 size.
 But this is not always sufficient : when the number of hosts in the cluster 
 increase, this can lead to very large number of in-bound connections to one 
 more nodes - causing workers to fail under the load.
 I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on 
 number of outstanding outbound connections.
 This might still cause hotspots in the cluster, but in our tests this has 
 significantly reduced the occurance of worker failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524205#comment-14524205
 ] 

Apache Spark commented on SPARK-6166:
-

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/5852

 Add config to limit number of concurrent outbound connections for shuffle 
 fetch
 ---

 Key: SPARK-6166
 URL: https://issues.apache.org/jira/browse/SPARK-6166
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Mridul Muralidharan
Priority: Minor

 spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of 
 size.
 But this is not always sufficient : when the number of hosts in the cluster 
 increase, this can lead to very large number of in-bound connections to one 
 more nodes - causing workers to fail under the load.
 I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on 
 number of outstanding outbound connections.
 This might still cause hotspots in the cluster, but in our tests this has 
 significantly reduced the occurance of worker failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7112) Add InputInfoTracker to have a generic way to track input data rates for all input streams.

2015-05-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-7112.
--
Resolution: Fixed

 Add InputInfoTracker to have a generic way to track input data rates for all 
 input streams.
 ---

 Key: SPARK-7112
 URL: https://issues.apache.org/jira/browse/SPARK-7112
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Saisai Shao
Assignee: Saisai Shao
 Fix For: 1.4.0


 Non-receiver streams like Kafka Direct stream should be able to report input 
 data rates. For that we need a generic way to report input information. This 
 JIRA is to track the addition of an InputInfoTracker for that purport. 
 Here is the design doc - 
 https://docs.google.com/document/d/122QvcwPoLkI2OW4eM7nyBOAqffk2uxgsNT38WI-M5vQ/edit#heading=h.9eluy73ulzuz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6443) Support HA in standalone cluster mode

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6443.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Target Version/s: 1.4.0

 Support HA in standalone cluster mode
 -

 Key: SPARK-6443
 URL: https://issues.apache.org/jira/browse/SPARK-6443
 Project: Spark
  Issue Type: New Feature
  Components: Spark Submit
Affects Versions: 1.0.0
Reporter: Tao Wang
Assignee: Tao Wang
 Fix For: 1.4.0


 == EDIT by Andrew ==
 From a quick survey in the code I can confirm that client mode does support 
 this. [This 
 line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162]
  splits the master URLs by comma and passes these URLs into the AppClient. In 
 standalone cluster mode, there is simply no equivalent logic to even split 
 the master URLs, whether in the old submission gateway (o.a.s.deploy.Client) 
 or in the new one (o.a.s.deploy.rest.StandaloneRestClient).
 Thus, this is an unsupported feature, not a bug!
 == Original description from Tao Wang ==
 After digging some codes, I found user could not submit app in standalone 
 cluster mode when HA is enabled. But in client mode it can work.
 Haven't try yet. But I will verify this and file a PR to resolve it if the 
 problem exists.
 3/23 update:
 I started a HA cluster with zk, and tried to submit SparkPi example with 
 command:
 ./spark-submit  --class org.apache.spark.examples.SparkPi --master 
 spark://doggie153:7077,doggie159:7077 --deploy-mode cluster 
 ../lib/spark-examples-1.2.0-hadoop2.4.0.jar 
 and it failed with error message:
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: 
 spark://doggie153:7077,doggie159:7077
 akka.actor.ActorInitializationException: exception during creation
 at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
 at akka.actor.ActorCell.create(ActorCell.scala:596)
 at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
 at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
 at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Invalid master URL: 
 spark://doggie153:7077,doggie159:7077
 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830)
 at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42)
 at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
 at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35)
 at akka.actor.ActorCell.create(ActorCell.scala:580)
 ... 9 more
 But in client mode it ended with correct result. So my guess is right. I will 
 fix it in the related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7317) ShuffleHandle needs to be exposed

2015-05-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-7317.

   Resolution: Fixed
Fix Version/s: 1.4.0

 ShuffleHandle needs to be exposed
 -

 Key: SPARK-7317
 URL: https://issues.apache.org/jira/browse/SPARK-7317
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Minor
 Fix For: 1.4.0


 ShuffleHandle is marked private[spark] - while a lot of code which depends on 
 it, and exposes it, is DeveloperApi.
 While the actual implementation can remain private[spark], the handle class 
 itself should be exposed so that Rdd's can leverage it.
 Example: 
 a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle
 b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager
 c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as 
 param : and can be used to write RDD's which leverage shuffle without needing 
 to go through spark's shuffle based rdd's.
 So all the machinery for custom RDD to leverage shuffle exists - except for 
 specifying the ShuffleHandle class itself in dependencies.
 This allows for customizations in user code on how to leverage shuffle.
 For example, specialized join implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3444) Provide a way to easily change the log level in the Spark shell while running

2015-05-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3444.

   Resolution: Fixed
Fix Version/s: 1.4.0

 Provide a way to easily change the log level in the Spark shell while running
 -

 Key: SPARK-3444
 URL: https://issues.apache.org/jira/browse/SPARK-3444
 Project: Spark
  Issue Type: Improvement
  Components: Spark Shell
Reporter: holdenk
Assignee: Holden Karau
Priority: Minor
 Fix For: 1.4.0


 Right now its difficult to change the log level while running. Our log 
 messages can be quite verbose at the more detailed levels, and some users 
 want to run at WARN until they encounter an issue and then increase the 
 logging level to debug without restarting the shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-6954.

  Resolution: Fixed
   Fix Version/s: 1.4.0
Assignee: Sandy Ryza  (was: Cheolsoo Park)
Target Version/s: 1.4.0

 ExecutorAllocationManager can end up requesting a negative number of executors
 --

 Key: SPARK-6954
 URL: https://issues.apache.org/jira/browse/SPARK-6954
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.1
Reporter: Cheolsoo Park
Assignee: Sandy Ryza
  Labels: yarn
 Fix For: 1.4.0

 Attachments: with_fix.png, without_fix.png


 I have a simple test case for dynamic allocation on YARN that fails with the 
 following stack trace-
 {code}
 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread 
 spark-dynamic-executor-allocation-0
 java.lang.IllegalArgumentException: Attempted to request a negative number of 
 executor(s) -21 from the cluster manager. Please specify a positive number!
   at 
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
   at 
 org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
   at 
 org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
   at 
 org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
   at 
 org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 My test is as follows-
 # Start spark-shell with a single executor.
 # Run a {{select count(\*)}} query. The number of executors rises as input 
 size is non-trivial.
 # After the job finishes, the number of  executors falls as most of them 
 become idle.
 # Rerun the same query again, and the request to add executors fails with the 
 above error. In fact, the job itself continues to run with whatever executors 
 it already has, but it never gets more executors unless the shell is closed 
 and restarted. 
 In fact, this error only happens when I configure {{executorIdleTimeout}} 
 very small. For eg, I can reproduce it with the following configs-
 {code}
 spark.dynamicAllocation.executorIdleTimeout 5
 spark.dynamicAllocation.schedulerBacklogTimeout 5
 {code}
 Although I can simply increase {{executorIdleTimeout}} to something like 60 
 secs to avoid the error, I think this is still a bug to be fixed.
 The root cause seems that {{numExecutorsPending}} accidentally becomes 
 negative if executors are killed too aggressively (i.e. 
 {{executorIdleTimeout}} is too small) because under that circumstance, the 
 new target # of executors can be smaller than the current # of executors. 
 When that happens, {{ExecutorAllocationManager}} ends up trying to add a 
 negative number of executors, which throws an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6954:
-
Target Version/s: 1.3.1, 1.4.0  (was: 1.4.0)

 ExecutorAllocationManager can end up requesting a negative number of executors
 --

 Key: SPARK-6954
 URL: https://issues.apache.org/jira/browse/SPARK-6954
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.1
Reporter: Cheolsoo Park
Assignee: Sandy Ryza
  Labels: yarn
 Fix For: 1.4.0

 Attachments: with_fix.png, without_fix.png


 I have a simple test case for dynamic allocation on YARN that fails with the 
 following stack trace-
 {code}
 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread 
 spark-dynamic-executor-allocation-0
 java.lang.IllegalArgumentException: Attempted to request a negative number of 
 executor(s) -21 from the cluster manager. Please specify a positive number!
   at 
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
   at 
 org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
   at 
 org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
   at 
 org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
   at 
 org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 My test is as follows-
 # Start spark-shell with a single executor.
 # Run a {{select count(\*)}} query. The number of executors rises as input 
 size is non-trivial.
 # After the job finishes, the number of  executors falls as most of them 
 become idle.
 # Rerun the same query again, and the request to add executors fails with the 
 above error. In fact, the job itself continues to run with whatever executors 
 it already has, but it never gets more executors unless the shell is closed 
 and restarted. 
 In fact, this error only happens when I configure {{executorIdleTimeout}} 
 very small. For eg, I can reproduce it with the following configs-
 {code}
 spark.dynamicAllocation.executorIdleTimeout 5
 spark.dynamicAllocation.schedulerBacklogTimeout 5
 {code}
 Although I can simply increase {{executorIdleTimeout}} to something like 60 
 secs to avoid the error, I think this is still a bug to be fixed.
 The root cause seems that {{numExecutorsPending}} accidentally becomes 
 negative if executors are killed too aggressively (i.e. 
 {{executorIdleTimeout}} is too small) because under that circumstance, the 
 new target # of executors can be smaller than the current # of executors. 
 When that happens, {{ExecutorAllocationManager}} ends up trying to add a 
 negative number of executors, which throws an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6954:
-
Labels: backport-needed yarn  (was: yarn)

 ExecutorAllocationManager can end up requesting a negative number of executors
 --

 Key: SPARK-6954
 URL: https://issues.apache.org/jira/browse/SPARK-6954
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.1
Reporter: Cheolsoo Park
Assignee: Sandy Ryza
  Labels: backport-needed, yarn
 Fix For: 1.4.0

 Attachments: with_fix.png, without_fix.png


 I have a simple test case for dynamic allocation on YARN that fails with the 
 following stack trace-
 {code}
 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread 
 spark-dynamic-executor-allocation-0
 java.lang.IllegalArgumentException: Attempted to request a negative number of 
 executor(s) -21 from the cluster manager. Please specify a positive number!
   at 
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
   at 
 org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
   at 
 org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
   at 
 org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
   at 
 org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 My test is as follows-
 # Start spark-shell with a single executor.
 # Run a {{select count(\*)}} query. The number of executors rises as input 
 size is non-trivial.
 # After the job finishes, the number of  executors falls as most of them 
 become idle.
 # Rerun the same query again, and the request to add executors fails with the 
 above error. In fact, the job itself continues to run with whatever executors 
 it already has, but it never gets more executors unless the shell is closed 
 and restarted. 
 In fact, this error only happens when I configure {{executorIdleTimeout}} 
 very small. For eg, I can reproduce it with the following configs-
 {code}
 spark.dynamicAllocation.executorIdleTimeout 5
 spark.dynamicAllocation.schedulerBacklogTimeout 5
 {code}
 Although I can simply increase {{executorIdleTimeout}} to something like 60 
 secs to avoid the error, I think this is still a bug to be fixed.
 The root cause seems that {{numExecutorsPending}} accidentally becomes 
 negative if executors are killed too aggressively (i.e. 
 {{executorIdleTimeout}} is too small) because under that circumstance, the 
 new target # of executors can be smaller than the current # of executors. 
 When that happens, {{ExecutorAllocationManager}} ends up trying to add a 
 negative number of executors, which throws an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-6954:
--

 ExecutorAllocationManager can end up requesting a negative number of executors
 --

 Key: SPARK-6954
 URL: https://issues.apache.org/jira/browse/SPARK-6954
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.1
Reporter: Cheolsoo Park
Assignee: Sandy Ryza
  Labels: yarn
 Fix For: 1.4.0

 Attachments: with_fix.png, without_fix.png


 I have a simple test case for dynamic allocation on YARN that fails with the 
 following stack trace-
 {code}
 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread 
 spark-dynamic-executor-allocation-0
 java.lang.IllegalArgumentException: Attempted to request a negative number of 
 executor(s) -21 from the cluster manager. Please specify a positive number!
   at 
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
   at 
 org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
   at 
 org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
   at 
 org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
   at 
 org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 My test is as follows-
 # Start spark-shell with a single executor.
 # Run a {{select count(\*)}} query. The number of executors rises as input 
 size is non-trivial.
 # After the job finishes, the number of  executors falls as most of them 
 become idle.
 # Rerun the same query again, and the request to add executors fails with the 
 above error. In fact, the job itself continues to run with whatever executors 
 it already has, but it never gets more executors unless the shell is closed 
 and restarted. 
 In fact, this error only happens when I configure {{executorIdleTimeout}} 
 very small. For eg, I can reproduce it with the following configs-
 {code}
 spark.dynamicAllocation.executorIdleTimeout 5
 spark.dynamicAllocation.schedulerBacklogTimeout 5
 {code}
 Although I can simply increase {{executorIdleTimeout}} to something like 60 
 secs to avoid the error, I think this is still a bug to be fixed.
 The root cause seems that {{numExecutorsPending}} accidentally becomes 
 negative if executors are killed too aggressively (i.e. 
 {{executorIdleTimeout}} is too small) because under that circumstance, the 
 new target # of executors can be smaller than the current # of executors. 
 When that happens, {{ExecutorAllocationManager}} ends up trying to add a 
 negative number of executors, which throws an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6954:
-
Target Version/s: 1.3.2, 1.4.0  (was: 1.3.1, 1.4.0)

 ExecutorAllocationManager can end up requesting a negative number of executors
 --

 Key: SPARK-6954
 URL: https://issues.apache.org/jira/browse/SPARK-6954
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.1
Reporter: Cheolsoo Park
Assignee: Sandy Ryza
  Labels: backport-needed, yarn
 Fix For: 1.4.0

 Attachments: with_fix.png, without_fix.png


 I have a simple test case for dynamic allocation on YARN that fails with the 
 following stack trace-
 {code}
 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread 
 spark-dynamic-executor-allocation-0
 java.lang.IllegalArgumentException: Attempted to request a negative number of 
 executor(s) -21 from the cluster manager. Please specify a positive number!
   at 
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
   at 
 org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
   at 
 org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
   at 
 org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
   at 
 org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 My test is as follows-
 # Start spark-shell with a single executor.
 # Run a {{select count(\*)}} query. The number of executors rises as input 
 size is non-trivial.
 # After the job finishes, the number of  executors falls as most of them 
 become idle.
 # Rerun the same query again, and the request to add executors fails with the 
 above error. In fact, the job itself continues to run with whatever executors 
 it already has, but it never gets more executors unless the shell is closed 
 and restarted. 
 In fact, this error only happens when I configure {{executorIdleTimeout}} 
 very small. For eg, I can reproduce it with the following configs-
 {code}
 spark.dynamicAllocation.executorIdleTimeout 5
 spark.dynamicAllocation.schedulerBacklogTimeout 5
 {code}
 Although I can simply increase {{executorIdleTimeout}} to something like 60 
 secs to avoid the error, I think this is still a bug to be fixed.
 The root cause seems that {{numExecutorsPending}} accidentally becomes 
 negative if executors are killed too aggressively (i.e. 
 {{executorIdleTimeout}} is too small) because under that circumstance, the 
 new target # of executors can be smaller than the current # of executors. 
 When that happens, {{ExecutorAllocationManager}} ends up trying to add a 
 negative number of executors, which throws an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7113) Add the direct stream related information to the streaming listener and web UI

2015-05-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524536#comment-14524536
 ] 

Saisai Shao commented on SPARK-7113:


Yes, I will do it. Thanks a lot :).

 Add the direct stream related information to the streaming listener and web UI
 --

 Key: SPARK-7113
 URL: https://issues.apache.org/jira/browse/SPARK-7113
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Saisai Shao
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7314) Upgrade Pyrolite with patches

2015-05-01 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-7314:


 Summary: Upgrade Pyrolite with patches
 Key: SPARK-7314
 URL: https://issues.apache.org/jira/browse/SPARK-7314
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng


As discussed on SPARK-6288, we are using a really old version of Pyrolite, 
which was published under org.spark-project. It would be nice to upgrade to it 
the latest (and possibly official) version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7314) Upgrade Pyrolite with patches

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524130#comment-14524130
 ] 

Apache Spark commented on SPARK-7314:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/5850

 Upgrade Pyrolite with patches
 -

 Key: SPARK-7314
 URL: https://issues.apache.org/jira/browse/SPARK-7314
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng

 As discussed on SPARK-6288, we are using a really old version of Pyrolite, 
 which was published under org.spark-project. It would be nice to upgrade to 
 it the latest (and possibly official) version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7314) Upgrade Pyrolite with patches

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7314:
---

Assignee: Apache Spark  (was: Xiangrui Meng)

 Upgrade Pyrolite with patches
 -

 Key: SPARK-7314
 URL: https://issues.apache.org/jira/browse/SPARK-7314
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Apache Spark

 As discussed on SPARK-6288, we are using a really old version of Pyrolite, 
 which was published under org.spark-project. It would be nice to upgrade to 
 it the latest (and possibly official) version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7314) Upgrade Pyrolite with patches

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7314:
---

Assignee: Xiangrui Meng  (was: Apache Spark)

 Upgrade Pyrolite with patches
 -

 Key: SPARK-7314
 URL: https://issues.apache.org/jira/browse/SPARK-7314
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng

 As discussed on SPARK-6288, we are using a really old version of Pyrolite, 
 which was published under org.spark-project. It would be nice to upgrade to 
 it the latest (and possibly official) version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])

2015-05-01 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid resolved SPARK-6999.
-
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 5804
[https://github.com/apache/spark/pull/5804]

 infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])
 -

 Key: SPARK-6999
 URL: https://issues.apache.org/jira/browse/SPARK-6999
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Justin Uang
Priority: Blocker
 Fix For: 1.4.0


 It looks like 
 {code}
   def createDataFrame(rowRDD: JavaRDD[Row], columns: java.util.List[String]): 
 DataFrame = {
 createDataFrame(rowRDD.rdd, columns.toSeq)
   }
 {code}
 is in fact an infinite recursion because it calls itself. Scala implicit 
 conversions convert the arguments back into a JavaRDD and a java.util.List.
 {code}
 15/04/19 16:51:24 INFO BlockManagerMaster: Trying to register BlockManager
 15/04/19 16:51:24 INFO BlockManagerMasterActor: Registering block manager 
 localhost:53711 with 1966.1 MB RAM, BlockManagerId(driver, localhost, 53711)
 15/04/19 16:51:24 INFO BlockManagerMaster: Registered BlockManager
 Exception in thread main java.lang.StackOverflowError
 at scala.collection.mutable.AbstractSeq.init(Seq.scala:47)
 at scala.collection.mutable.AbstractBuffer.init(Buffer.scala:48)
 at 
 scala.collection.convert.Wrappers$JListWrapper.init(Wrappers.scala:84)
 at 
 scala.collection.convert.WrapAsScala$class.asScalaBuffer(WrapAsScala.scala:127)
 at 
 scala.collection.JavaConversions$.asScalaBuffer(JavaConversions.scala:53)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408)
 {code}
 Here is the code sample I used to reproduce the issue:
 {code}
 /**
  * @author juang
  */
 public final class InfiniteRecursionExample {
 public static void main(String[] args) {
 JavaSparkContext sc = new JavaSparkContext(local, 
 infinite_recursion_example);
 ListRow rows = Lists.newArrayList();
 JavaRDDRow rowRDD = sc.parallelize(rows);
 SQLContext sqlContext = new SQLContext(sc);
 sqlContext.createDataFrame(rowRDD, ImmutableList.of(myCol));
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7317) ShuffleHandle needs to be exposed

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7317:
---

Assignee: Apache Spark  (was: Mridul Muralidharan)

 ShuffleHandle needs to be exposed
 -

 Key: SPARK-7317
 URL: https://issues.apache.org/jira/browse/SPARK-7317
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Mridul Muralidharan
Assignee: Apache Spark
Priority: Minor

 ShuffleHandle is marked private[spark] - while a lot of code which depends on 
 it, and exposes it, is DeveloperApi.
 While the actual implementation can remain private[spark], the handle class 
 itself should be exposed so that Rdd's can leverage it.
 Example: 
 a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle
 b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager
 c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as 
 param : and can be used to write RDD's which leverage shuffle without needing 
 to go through spark's shuffle based rdd's.
 So all the machinery for custom RDD to leverage shuffle exists - except for 
 specifying the ShuffleHandle class itself in dependencies.
 This allows for customizations in user code on how to leverage shuffle.
 For example, specialized join implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7317) ShuffleHandle needs to be exposed

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7317:
---

Assignee: Mridul Muralidharan  (was: Apache Spark)

 ShuffleHandle needs to be exposed
 -

 Key: SPARK-7317
 URL: https://issues.apache.org/jira/browse/SPARK-7317
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Minor

 ShuffleHandle is marked private[spark] - while a lot of code which depends on 
 it, and exposes it, is DeveloperApi.
 While the actual implementation can remain private[spark], the handle class 
 itself should be exposed so that Rdd's can leverage it.
 Example: 
 a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle
 b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager
 c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as 
 param : and can be used to write RDD's which leverage shuffle without needing 
 to go through spark's shuffle based rdd's.
 So all the machinery for custom RDD to leverage shuffle exists - except for 
 specifying the ShuffleHandle class itself in dependencies.
 This allows for customizations in user code on how to leverage shuffle.
 For example, specialized join implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7317) ShuffleHandle needs to be exposed

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524469#comment-14524469
 ] 

Apache Spark commented on SPARK-7317:
-

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/5857

 ShuffleHandle needs to be exposed
 -

 Key: SPARK-7317
 URL: https://issues.apache.org/jira/browse/SPARK-7317
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Minor

 ShuffleHandle is marked private[spark] - while a lot of code which depends on 
 it, and exposes it, is DeveloperApi.
 While the actual implementation can remain private[spark], the handle class 
 itself should be exposed so that Rdd's can leverage it.
 Example: 
 a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle
 b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager
 c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as 
 param : and can be used to write RDD's which leverage shuffle without needing 
 to go through spark's shuffle based rdd's.
 So all the machinery for custom RDD to leverage shuffle exists - except for 
 specifying the ShuffleHandle class itself in dependencies.
 This allows for customizations in user code on how to leverage shuffle.
 For example, specialized join implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524252#comment-14524252
 ] 

Apache Spark commented on SPARK-7315:
-

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/5853

 Flaky Test: WriteAheadLogBackedBlockRDDSuite
 

 Key: SPARK-7315
 URL: https://issues.apache.org/jira/browse/SPARK-7315
 Project: Spark
  Issue Type: Test
Reporter: Tathagata Das
Assignee: Tathagata Das





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7315:
---

Assignee: Apache Spark  (was: Tathagata Das)

 Flaky Test: WriteAheadLogBackedBlockRDDSuite
 

 Key: SPARK-7315
 URL: https://issues.apache.org/jira/browse/SPARK-7315
 Project: Spark
  Issue Type: Test
Reporter: Tathagata Das
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7315:
---

Assignee: Tathagata Das  (was: Apache Spark)

 Flaky Test: WriteAheadLogBackedBlockRDDSuite
 

 Key: SPARK-7315
 URL: https://issues.apache.org/jira/browse/SPARK-7315
 Project: Spark
  Issue Type: Test
Reporter: Tathagata Das
Assignee: Tathagata Das





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7241) Pearson correlation for DataFrames

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7241:
---

Assignee: Apache Spark  (was: Burak Yavuz)

 Pearson correlation for DataFrames
 --

 Key: SPARK-7241
 URL: https://issues.apache.org/jira/browse/SPARK-7241
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Xiangrui Meng
Assignee: Apache Spark

 This JIRA is for computing the Pearson linear correlation for two numerical 
 columns in a DataFrame. The method `corr` should live under `df.stat`:
 {code}
 df.stat.corr(col1, col2, method=pearson): Double
 {code}
 `method` will be used when we add other correlations.
 Similar to SPARK-7240, UDAF will be added later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7241) Pearson correlation for DataFrames

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7241:
---

Assignee: Burak Yavuz  (was: Apache Spark)

 Pearson correlation for DataFrames
 --

 Key: SPARK-7241
 URL: https://issues.apache.org/jira/browse/SPARK-7241
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 This JIRA is for computing the Pearson linear correlation for two numerical 
 columns in a DataFrame. The method `corr` should live under `df.stat`:
 {code}
 df.stat.corr(col1, col2, method=pearson): Double
 {code}
 `method` will be used when we add other correlations.
 Similar to SPARK-7240, UDAF will be added later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7241) Pearson correlation for DataFrames

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524489#comment-14524489
 ] 

Apache Spark commented on SPARK-7241:
-

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/5858

 Pearson correlation for DataFrames
 --

 Key: SPARK-7241
 URL: https://issues.apache.org/jira/browse/SPARK-7241
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 This JIRA is for computing the Pearson linear correlation for two numerical 
 columns in a DataFrame. The method `corr` should live under `df.stat`:
 {code}
 df.stat.corr(col1, col2, method=pearson): Double
 {code}
 `method` will be used when we add other correlations.
 Similar to SPARK-7240, UDAF will be added later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7313) Allow for configuring max_samples in range partitioner.

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7313:
---

Assignee: Apache Spark  (was: Mridul Muralidharan)

 Allow for configuring max_samples in range partitioner.
 ---

 Key: SPARK-7313
 URL: https://issues.apache.org/jira/browse/SPARK-7313
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Mridul Muralidharan
Assignee: Apache Spark
Priority: Minor

 Currently, we assume that 1e6 is a reasonable upper bound to number of keys 
 while sampling. This works fine when size of keys is 'small' - but breaks for 
 anything non-trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7313) Allow for configuring max_samples in range partitioner.

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524105#comment-14524105
 ] 

Apache Spark commented on SPARK-7313:
-

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/5848

 Allow for configuring max_samples in range partitioner.
 ---

 Key: SPARK-7313
 URL: https://issues.apache.org/jira/browse/SPARK-7313
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Minor

 Currently, we assume that 1e6 is a reasonable upper bound to number of keys 
 while sampling. This works fine when size of keys is 'small' - but breaks for 
 anything non-trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7313) Allow for configuring max_samples in range partitioner.

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7313:
---

Assignee: Mridul Muralidharan  (was: Apache Spark)

 Allow for configuring max_samples in range partitioner.
 ---

 Key: SPARK-7313
 URL: https://issues.apache.org/jira/browse/SPARK-7313
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Minor

 Currently, we assume that 1e6 is a reasonable upper bound to number of keys 
 while sampling. This works fine when size of keys is 'small' - but breaks for 
 anything non-trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6907) Create an isolated classloader for the Hive Client.

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6907:
---

Assignee: Michael Armbrust  (was: Apache Spark)

 Create an isolated classloader for the Hive Client.
 ---

 Key: SPARK-6907
 URL: https://issues.apache.org/jira/browse/SPARK-6907
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6907) Create an isolated classloader for the Hive Client.

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524161#comment-14524161
 ] 

Apache Spark commented on SPARK-6907:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/5851

 Create an isolated classloader for the Hive Client.
 ---

 Key: SPARK-6907
 URL: https://issues.apache.org/jira/browse/SPARK-6907
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7312) SPARK-6913 broke jdk6 build

2015-05-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-7312.
-
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 5847
[https://github.com/apache/spark/pull/5847]

 SPARK-6913 broke jdk6 build
 ---

 Key: SPARK-7312
 URL: https://issues.apache.org/jira/browse/SPARK-7312
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Thomas Graves
Priority: Blocker
 Fix For: 1.4.0


 https://github.com/apache/spark/pull/5782 uses 
 java.sql.Driver.getParentLogger  which doesn't exist in jdk6, only jdk7
 [error] 
 /home/tgraves/tgravescs_spark/sql/core/src/main/scala/org/apache/spark/sql/jdbc/jdbc.scala:198:
  value getParentLogger is not a member of java.sql.Driver
 [error] override def getParentLogger: java.util.logging.Logger = 
 wrapped.getParentLogger
 [error] ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7304) Include $@ in call to mvn in make-distribution.sh

2015-05-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7304:
---
Assignee: Rajendra

 Include $@ in call to mvn in make-distribution.sh
 -

 Key: SPARK-7304
 URL: https://issues.apache.org/jira/browse/SPARK-7304
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Rajendra
Assignee: Rajendra
Priority: Minor
 Fix For: 1.4.0

 Attachments: 0001-Include-in-call-to-mvn-in-make-distribution.sh.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The call to mvn does not include $@ in the command line in one place in 
 make-distribution.sh.  This causes that mvn call to ignore additional command 
 line parameters passed to make-distribution.sh in that call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7304) Include $@ in call to mvn in make-distribution.sh

2015-05-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7304.

   Resolution: Fixed
Fix Version/s: 1.4.0

 Include $@ in call to mvn in make-distribution.sh
 -

 Key: SPARK-7304
 URL: https://issues.apache.org/jira/browse/SPARK-7304
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Rajendra
Assignee: Rajendra
Priority: Minor
 Fix For: 1.4.0

 Attachments: 0001-Include-in-call-to-mvn-in-make-distribution.sh.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The call to mvn does not include $@ in the command line in one place in 
 make-distribution.sh.  This causes that mvn call to ignore additional command 
 line parameters passed to make-distribution.sh in that call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7260) Support changing Spark's log level programatically

2015-05-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7260.

Resolution: Duplicate

 Support changing Spark's log level programatically
 --

 Key: SPARK-7260
 URL: https://issues.apache.org/jira/browse/SPARK-7260
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core, Spark Shell
Reporter: Patrick Wendell
Priority: Minor

 There was an earlier PR for this that was basically ready to merge. Just 
 wanted to open a JIRA:
 https://github.com/apache/spark/pull/2433/files
 The main use case I see here is for changing logging in the shell easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524466#comment-14524466
 ] 

Apache Spark commented on SPARK-6954:
-

User 'sryza' has created a pull request for this issue:
https://github.com/apache/spark/pull/5856

 ExecutorAllocationManager can end up requesting a negative number of executors
 --

 Key: SPARK-6954
 URL: https://issues.apache.org/jira/browse/SPARK-6954
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.1
Reporter: Cheolsoo Park
Assignee: Sandy Ryza
  Labels: backport-needed, yarn
 Fix For: 1.4.0

 Attachments: with_fix.png, without_fix.png


 I have a simple test case for dynamic allocation on YARN that fails with the 
 following stack trace-
 {code}
 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread 
 spark-dynamic-executor-allocation-0
 java.lang.IllegalArgumentException: Attempted to request a negative number of 
 executor(s) -21 from the cluster manager. Please specify a positive number!
   at 
 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338)
   at 
 org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137)
   at 
 org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294)
   at 
 org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263)
   at 
 org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
   at 
 org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 My test is as follows-
 # Start spark-shell with a single executor.
 # Run a {{select count(\*)}} query. The number of executors rises as input 
 size is non-trivial.
 # After the job finishes, the number of  executors falls as most of them 
 become idle.
 # Rerun the same query again, and the request to add executors fails with the 
 above error. In fact, the job itself continues to run with whatever executors 
 it already has, but it never gets more executors unless the shell is closed 
 and restarted. 
 In fact, this error only happens when I configure {{executorIdleTimeout}} 
 very small. For eg, I can reproduce it with the following configs-
 {code}
 spark.dynamicAllocation.executorIdleTimeout 5
 spark.dynamicAllocation.schedulerBacklogTimeout 5
 {code}
 Although I can simply increase {{executorIdleTimeout}} to something like 60 
 secs to avoid the error, I think this is still a bug to be fixed.
 The root cause seems that {{numExecutorsPending}} accidentally becomes 
 negative if executors are killed too aggressively (i.e. 
 {{executorIdleTimeout}} is too small) because under that circumstance, the 
 new target # of executors can be smaller than the current # of executors. 
 When that happens, {{ExecutorAllocationManager}} ends up trying to add a 
 negative number of executors, which throws an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7242) Frequent items for DataFrames

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524575#comment-14524575
 ] 

Apache Spark commented on SPARK-7242:
-

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/5859

 Frequent items for DataFrames
 -

 Key: SPARK-7242
 URL: https://issues.apache.org/jira/browse/SPARK-7242
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Xiangrui Meng
Assignee: Burak Yavuz

 Finding frequent items with possibly false positives, using the algorithm 
 described in http://www.cs.umd.edu/~samir/498/karp.pdf.
 {code}
 df.stat.freqItems(cols: Array[String], support: Double = 0.001): DataFrame
 {code}
 The output is a local DataFrame having the input column names. In the first 
 version, we will implement the single pass algorithm that may return false 
 positives, but no false negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14

2015-05-01 Thread Russell Alexander Spitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524363#comment-14524363
 ] 

Russell Alexander Spitzer commented on SPARK-6069:
--

We've seen the same issue while developing the Spark Cassandra Connector. 
Unless the connector lib is loaded via spark.executor.extraClassPath, 
kryoSerializaition for joins always returns a classNotFound even though all 
operations which don't require a shuffle are fine. 

{code}
com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.apache.spark.sql.cassandra.CassandraSQLRow
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:80)
at 
org.apache.spark.sql.execution.joins.ShuffledHashJoin$$anonfun$execute$1.apply(ShuffledHashJoin.scala:46)
at 
org.apache.spark.sql.execution.joins.ShuffledHashJoin$$anonfun$execute$1.apply(ShuffledHashJoin.scala:45)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

Adding the jar to executorExtraClasspath rather than --jars solves the issue.

 Deserialization Error ClassNotFoundException with Kryo, Guava 14
 

 Key: SPARK-6069
 URL: https://issues.apache.org/jira/browse/SPARK-6069
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.1
 Environment: Standalone one worker cluster on localhost, or any 
 cluster
Reporter: Pat Ferrel
Priority: Critical

 A class is contained in the jars passed in when creating a context. It is 
 registered with kryo. The class (Guava HashBiMap) is created correctly from 
 an RDD and broadcast but the deserialization fails with ClassNotFound.
 The work around is to hard code the path to the jar and make it available on 
 all workers. Hard code because we are creating a library so there is no easy 
 way to pass in to the app something like:
 spark.executor.extraClassPath  /path/to/some.jar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6986) Makes SparkSqlSerializer2 support sort-based shuffle with sort merge

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525070#comment-14525070
 ] 

Apache Spark commented on SPARK-6986:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/5849

 Makes SparkSqlSerializer2 support sort-based shuffle with sort merge
 

 Key: SPARK-6986
 URL: https://issues.apache.org/jira/browse/SPARK-6986
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai
Assignee: Yin Huai

 *Update*: SPARK-4550 has exposed the interfaces. We can safely enable 
 Serializer2 to support sort merge.
 *Original description*:
 Our existing Java and Kryo serializer are both general-purpose serialize. 
 They treat every object individually and encode the type of an object to 
 underlying stream. For Spark, it is common that we serialize a collection 
 with records having the same types (for example, records of a DataFrame). For 
 these cases, we do not need to write out types of records and we can take 
 advantage the type information to build specialized serializer. To do so, 
 seems we need to extend the interface of 
 SerializationStream/DeserializationStream, so a 
 SerializationStream/DeserializationStream can have more information about 
 objects passed in (for example, if an object is key/value pair, a key, or a 
 value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Description: 
Fix default system alias problem.

execute the sql statement will cause problem: 

select substr(value, 0, 2), key as c0 from testData order by c0


org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: 
c0#42, c0#41.;

  was:
Fix default system alias problem.

execute the sql statement will cause problem: 

select substr(value, 1, 2), key as c0 from testData order by c0


org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: 
c0#42, c0#41.;


 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525063#comment-14525063
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525061#comment-14525061
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525057#comment-14525057
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525059#comment-14525059
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525062#comment-14525062
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525054#comment-14525054
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7318) DStream isn't cleaning closures correctly

2015-05-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7318:
-
Description: 
{code}
  def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
transform((r: RDD[T], t: Time) = 
context.sparkContext.clean(transformFunc(r), false))
  }
{code}
This is cleaning an RDD instead!

  was:
{code}
  def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
SparkContext.clean
transform((r: RDD[T], t: Time) = 
context.sparkContext.clean(transformFunc(r), false))
  }
{code}
This is cleaning an RDD instead!


 DStream isn't cleaning closures correctly
 -

 Key: SPARK-7318
 URL: https://issues.apache.org/jira/browse/SPARK-7318
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Streaming
Affects Versions: 1.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 {code}
   def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
 transform((r: RDD[T], t: Time) = 
 context.sparkContext.clean(transformFunc(r), false))
   }
 {code}
 This is cleaning an RDD instead!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Description: 
Fix default system alias problem.

execute the sql statement will cause problem: 

select substr(value, 1, 2), key as c0 from testData order by c0


org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: 
c0#42, c0#41.;

  was:
Fix default system alias problem.

execute the sql statement will cause problem: 

select substr(concat('value', value), 1, 3), key as c0 from testData order by c0


org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: 
c0#42, c0#41.;


 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 1, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525060#comment-14525060
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525056#comment-14525056
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525055#comment-14525055
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525058#comment-14525058
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7318) DStream isn't cleaning closures correctly

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7318:
---

Assignee: Apache Spark  (was: Andrew Or)

 DStream isn't cleaning closures correctly
 -

 Key: SPARK-7318
 URL: https://issues.apache.org/jira/browse/SPARK-7318
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Streaming
Affects Versions: 1.0.0
Reporter: Andrew Or
Assignee: Apache Spark
Priority: Critical

 {code}
   def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
 transform((r: RDD[T], t: Time) = 
 context.sparkContext.clean(transformFunc(r), false))
   }
 {code}
 This is cleaning an RDD instead!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525051#comment-14525051
 ] 

haiyang commented on SPARK-7149:


This is SqlParser problem,when we give no alias to a function in project, the 
parser will give it a default alias like c0,c1so, when we execute the sql 
statement like select isnull(key), key as c0 from testData order by c0, it 
will throw exception.

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7318) DStream isn't cleaning closures correctly

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525052#comment-14525052
 ] 

Apache Spark commented on SPARK-7318:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/5860

 DStream isn't cleaning closures correctly
 -

 Key: SPARK-7318
 URL: https://issues.apache.org/jira/browse/SPARK-7318
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Streaming
Affects Versions: 1.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 {code}
   def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
 transform((r: RDD[T], t: Time) = 
 context.sparkContext.clean(transformFunc(r), false))
   }
 {code}
 This is cleaning an RDD instead!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7318) DStream isn't cleaning closures correctly

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7318:
---

Assignee: Andrew Or  (was: Apache Spark)

 DStream isn't cleaning closures correctly
 -

 Key: SPARK-7318
 URL: https://issues.apache.org/jira/browse/SPARK-7318
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Streaming
Affects Versions: 1.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 {code}
   def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
 transform((r: RDD[T], t: Time) = 
 context.sparkContext.clean(transformFunc(r), false))
   }
 {code}
 This is cleaning an RDD instead!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7318) DStream isn't cleaning closures correctly

2015-05-01 Thread Andrew Or (JIRA)
Andrew Or created SPARK-7318:


 Summary: DStream isn't cleaning closures correctly
 Key: SPARK-7318
 URL: https://issues.apache.org/jira/browse/SPARK-7318
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Streaming
Affects Versions: 1.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical


{code}
  def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = {
SparkContext.clean
transform((r: RDD[T], t: Time) = 
context.sparkContext.clean(transformFunc(r), false))
  }
{code}
This is cleaning an RDD instead!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread haiyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haiyang updated SPARK-7149:
---
Comment: was deleted

(was: This is SqlParser problem,when we give no alias to a function in project, 
the parser will give it a default alias like c0,c1so, when we execute the 
sql statement like select isnull(key), key as c0 from testData order by c0, 
it will throw exception.)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7303) push down project if possible when the child is sort

2015-05-01 Thread Fei Wang (JIRA)
Fei Wang created SPARK-7303:
---

 Summary: push down project if possible when the child is sort
 Key: SPARK-7303
 URL: https://issues.apache.org/jira/browse/SPARK-7303
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Fei Wang


Optimize the case of `project(_, sort)` , a example is:

`select key from (select * from testData order by key) t`

optimize it from
```
== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
Subquery testData
 LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Physical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  Exchange (RangePartitioning [key#0 ASC], 5), []
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
```

to 
```
== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
Subquery testData
 LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Optimized Logical Plan ==
Sort [key#0 ASC], true
 Project [key#0]
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Physical Plan ==
Sort [key#0 ASC], true
 Exchange (RangePartitioning [key#0 ASC], 5), []
  Project [key#0]
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5891) Add Binarizer

2015-05-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5891.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 5699
[https://github.com/apache/spark/pull/5699]

 Add Binarizer
 -

 Key: SPARK-5891
 URL: https://issues.apache.org/jira/browse/SPARK-5891
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Xiangrui Meng
Assignee: Liang-Chi Hsieh
 Fix For: 1.4.0


 `Binarizer` takes a column of continuous features and output a column with 
 binary features, where nonzeros (or values below a threshold) become 1 in the 
 output.
 {code}
 val binarizer = new Binarizer()
   .setInputCol(numVisits)
   .setOutputCol(visited)
 {code}
 The output column should be marked as binary. We need to discuss whether we 
 should process multiple columns or a vector column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7299) saving Oracle-source DataFrame to Hive changes scale

2015-05-01 Thread Ken Geis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523236#comment-14523236
 ] 

Ken Geis commented on SPARK-7299:
-

This passes my test!

 saving Oracle-source DataFrame to Hive changes scale
 

 Key: SPARK-7299
 URL: https://issues.apache.org/jira/browse/SPARK-7299
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Ken Geis

 When I load data from Oracle, save it to a table, then select from it, the 
 scale is changed.
 For example, I have a column defined as NUMBER(12, 2). I insert 1 into 
 the column. When I write that to a table and select from it, the result is 
 199.99.
 Some databases (e.g. H2) will return this as 1.00, but Oracle returns it 
 as 1. I believe that when the file is written out to parquet, the scale 
 information is taken from the schema, not the value. In an Oracle (at least) 
 JDBC DataFrame, the scale may be different from row to row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5246) spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does not resolve

2015-05-01 Thread Nick Lipple (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523351#comment-14523351
 ] 

Nick Lipple commented on SPARK-5246:


is there a workaround for this issue? any reason why the script uses hostname 
instead of ip address?

 spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does 
 not resolve
 --

 Key: SPARK-5246
 URL: https://issues.apache.org/jira/browse/SPARK-5246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Vladimir Grigor

 How to reproduce: 
 1)  http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html 
 should be sufficient to setup VPC for this bug. After you followed that 
 guide, start new instance in VPC, ssh to it (though NAT server)
 2) user starts a cluster in VPC:
 {code}
 ./spark-ec2 -k key20141114 -i ~/aws/key.pem -s 1 --region=eu-west-1 
 --spark-version=1.2.0 --instance-type=m1.large --vpc-id=vpc-2e71dd46 
 --subnet-id=subnet-2571dd4d --zone=eu-west-1a  launch SparkByScript
 Setting up security groups...
 
 (omitted for brevity)
 10.1.1.62
 10.1.1.62: no org.apache.spark.deploy.worker.Worker to stop
 no org.apache.spark.deploy.master.Master to stop
 starting org.apache.spark.deploy.master.Master, logging to 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out
 failed to launch org.apache.spark.deploy.master.Master:
   at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
   ... 12 more
 full log in 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out
 10.1.1.62: starting org.apache.spark.deploy.worker.Worker, logging to 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out
 10.1.1.62: failed to launch org.apache.spark.deploy.worker.Worker:
 10.1.1.62:at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
 10.1.1.62:... 12 more
 10.1.1.62: full log in 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out
 [timing] spark-standalone setup:  00h 00m 28s
  
 (omitted for brevity)
 {code}
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out
 {code}
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 Spark Command: /usr/lib/jvm/java-1.7.0/bin/java -cp 
 :::/root/ephemeral-hdfs/conf:/root/spark/sbin/../conf:/root/spark/lib/spark-assembly-1.2.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark/lib/datanucleus-rdbms-3.2.9.jar:/root/spark/lib/datanucleus-core-3.2.10.jar
  -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m 
 org.apache.spark.deploy.master.Master --ip 10.1.1.151 --port 7077 
 --webui-port 8080
 
 15/01/14 07:34:47 INFO master.Master: Registered signal handlers for [TERM, 
 HUP, INT]
 Exception in thread main java.net.UnknownHostException: ip-10-1-1-151: 
 ip-10-1-1-151: Name or service not known
 at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
 at org.apache.spark.util.Utils$.findLocalIpAddress(Utils.scala:620)
 at 
 org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:612)
 at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:612)
 at 
 org.apache.spark.util.Utils$.localIpAddressHostname$lzycompute(Utils.scala:613)
 at 
 org.apache.spark.util.Utils$.localIpAddressHostname(Utils.scala:613)
 at 
 org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665)
 at 
 org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.util.Utils$.localHostName(Utils.scala:665)
 at 
 org.apache.spark.deploy.master.MasterArguments.init(MasterArguments.scala:27)
 at org.apache.spark.deploy.master.Master$.main(Master.scala:819)
 at org.apache.spark.deploy.master.Master.main(Master.scala)
 Caused by: java.net.UnknownHostException: ip-10-1-1-151: Name or service not 
 known
 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
 at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
 at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
 ... 12 more
 {code}
 Problem is that instance launched in VPC may be not able to resolve own local 
 hostname. Please see  
 https://forums.aws.amazon.com/thread.jspa?threadID=92092.
 I am going to submit a fix for this problem since I need this functionality 
 asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib

2015-05-01 Thread longbao wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523226#comment-14523226
 ] 

longbao wang commented on SPARK-2336:
-

I really agree with you,and i'm already implementing it,but i have a 
trouble,after build tree successful,you search target points' knn,so 
parallelize the input target points then search,but i think this have some 
questions,and one point's knn may in two partitions or more.

 Approximate k-NN Models for MLLib
 -

 Key: SPARK-2336
 URL: https://issues.apache.org/jira/browse/SPARK-2336
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Brian Gawalt
Priority: Minor
  Labels: clustering, features

 After tackling the general k-Nearest Neighbor model as per 
 https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
 also offer approximate k-Nearest Neighbor. A promising approach would involve 
 building a kd-tree variant within from each partition, a la
 http://www.autonlab.org/autonweb/14714.html?branch=1language=2
 This could offer a simple non-linear ML model that can label new data with 
 much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib

2015-05-01 Thread longbao wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523227#comment-14523227
 ] 

longbao wang commented on SPARK-2336:
-

I really agree with you,and i'm already implementing it,but i have a 
trouble,after build tree successful,you search target points' knn,so 
parallelize the input target points then search,but i think this have some 
questions,and one point's knn may in two partitions or more.

 Approximate k-NN Models for MLLib
 -

 Key: SPARK-2336
 URL: https://issues.apache.org/jira/browse/SPARK-2336
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Brian Gawalt
Priority: Minor
  Labels: clustering, features

 After tackling the general k-Nearest Neighbor model as per 
 https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
 also offer approximate k-Nearest Neighbor. A promising approach would involve 
 building a kd-tree variant within from each partition, a la
 http://www.autonlab.org/autonweb/14714.html?branch=1language=2
 This could offer a simple non-linear ML model that can label new data with 
 much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-2336) Approximate k-NN Models for MLLib

2015-05-01 Thread longbao wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

longbao wang updated SPARK-2336:

Comment: was deleted

(was: I really agree with you,and i'm already implementing it,but i have a 
trouble,after build tree successful,you search target points' knn,so 
parallelize the input target points then search,but i think this have some 
questions,and one point's knn may in two partitions or more.)

 Approximate k-NN Models for MLLib
 -

 Key: SPARK-2336
 URL: https://issues.apache.org/jira/browse/SPARK-2336
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Brian Gawalt
Priority: Minor
  Labels: clustering, features

 After tackling the general k-Nearest Neighbor model as per 
 https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
 also offer approximate k-Nearest Neighbor. A promising approach would involve 
 building a kd-tree variant within from each partition, a la
 http://www.autonlab.org/autonweb/14714.html?branch=1language=2
 This could offer a simple non-linear ML model that can label new data with 
 much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7303) push down project if possible when the child is sort

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7303:
---

Assignee: Apache Spark

 push down project if possible when the child is sort
 

 Key: SPARK-7303
 URL: https://issues.apache.org/jira/browse/SPARK-7303
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Fei Wang
Assignee: Apache Spark

 Optimize the case of `project(_, sort)` , a example is:
 `select key from (select * from testData order by key) t`
 optimize it from
 ```
 == Parsed Logical Plan ==
 'Project ['key]
  'Subquery t
   'Sort ['key ASC], true
'Project [*]
 'UnresolvedRelation [testData], None
 == Analyzed Logical Plan ==
 Project [key#0]
  Subquery t
   Sort [key#0 ASC], true
Project [key#0,value#1]
 Subquery testData
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
 == Optimized Logical Plan ==
 Project [key#0]
  Sort [key#0 ASC], true
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Physical Plan ==
 Project [key#0]
  Sort [key#0 ASC], true
   Exchange (RangePartitioning [key#0 ASC], 5), []
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 ```
 to 
 ```
 == Parsed Logical Plan ==
 'Project ['key]
  'Subquery t
   'Sort ['key ASC], true
'Project [*]
 'UnresolvedRelation [testData], None
 == Analyzed Logical Plan ==
 Project [key#0]
  Subquery t
   Sort [key#0 ASC], true
Project [key#0,value#1]
 Subquery testData
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Optimized Logical Plan ==
 Sort [key#0 ASC], true
  Project [key#0]
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Physical Plan ==
 Sort [key#0 ASC], true
  Exchange (RangePartitioning [key#0 ASC], 5), []
   Project [key#0]
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7294) Add a between function in Column

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7294:
---

Assignee: (was: Apache Spark)

 Add a between function in Column
 

 Key: SPARK-7294
 URL: https://issues.apache.org/jira/browse/SPARK-7294
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: starter

 Column.between(a, b)
 We can just translate it to c  a and c  b
 Should add this for both Python and Scala/Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7294) Add a between function in Column

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523359#comment-14523359
 ] 

Apache Spark commented on SPARK-7294:
-

User 'kaka1992' has created a pull request for this issue:
https://github.com/apache/spark/pull/5839

 Add a between function in Column
 

 Key: SPARK-7294
 URL: https://issues.apache.org/jira/browse/SPARK-7294
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
  Labels: starter

 Column.between(a, b)
 We can just translate it to c  a and c  b
 Should add this for both Python and Scala/Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7302) SPARK building documentation still mentions building for yarn 0.23

2015-05-01 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-7302:


 Summary: SPARK building documentation still mentions building for 
yarn 0.23
 Key: SPARK-7302
 URL: https://issues.apache.org/jira/browse/SPARK-7302
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.3.1
Reporter: Thomas Graves


as of SPARK-3445 we deprecated using hadoop 0.23.  It looks like the building 
documentation still references it though. We should remove that.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7289) Combine Limit and Sort to avoid total ordering

2015-05-01 Thread Fei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-7289:

Description: 
Optimize following sql

select key from (select * from testData order by key) t limit 5

from 

== Parsed Logical Plan ==
'Limit 5
 'Project ['key]
  'Subquery t
   'Sort ['key ASC], true
'Project [*]
 'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Limit 5
 Project [key#0]
  Subquery t
   Sort [key#0 ASC], true
Project [key#0,value#1]
 Subquery testData
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Optimized Logical Plan ==
Limit 5
 Project [key#0]
  Sort [key#0 ASC], true
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
== Physical Plan ==
Limit 5
 Project [key#0]
  Sort [key#0 ASC], true
   Exchange (RangePartitioning [key#0 ASC], 5), []
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 

to

== Parsed Logical Plan ==
'Limit 5
 'Project ['key]
  'Subquery t
   'Sort ['key ASC], true
'Project [*]
 'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Limit 5
 Project [key#0]
  Subquery t
   Sort [key#0 ASC], true
Project [key#0,value#1]
 Subquery testData
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Project [key#0]
 Limit 5
  Sort [key#0 ASC], true
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Physical Plan ==
Project [key#0]
 TakeOrdered 5, [key#0 ASC]
  PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]

  was:
Optimize following sql
`select key from (select * from testData limit 5) t order by key limit 5`

optimize it from 
```
== Parsed Logical Plan ==
'Limit 5
 'Sort ['key ASC], true
  'Project ['key]
   'Subquery t
'Limit 5
 'Project [*]
  'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Limit 5
 Sort [key#0 ASC], true
  Project [key#0]
   Subquery t
Limit 5
 Project [key#0,value#1]
  Subquery testData
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Optimized Logical Plan ==
Limit 5
 Sort [key#0 ASC], true
  Project [key#0]
   Limit 5
LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 

== Physical Plan ==
TakeOrdered 5, [key#0 ASC]
 Project [key#0]
  Limit 5
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 

```
to 
```
== Parsed Logical Plan ==
'Limit 5
 'Sort ['key ASC], true
  'Project ['key]
   'Subquery t
'Limit 5
 'Project [*]
  'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Limit 5
 Sort [key#0 ASC], true
  Project [key#0]
   Subquery t
Limit 5
 Project [key#0,value#1]
  Subquery testData
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
== Optimized Logical Plan ==
Limit 5
 Sort [key#0 ASC], true
  Project [key#0]
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Physical Plan ==
TakeOrdered 5, [key#0 ASC]
 Project [key#0]
  PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
```

Summary: Combine Limit and Sort to avoid total ordering  (was: push 
down sort when it's child is Limit)

 Combine Limit and Sort to avoid total ordering
 --

 Key: SPARK-7289
 URL: https://issues.apache.org/jira/browse/SPARK-7289
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Fei Wang

 Optimize following sql
 select key from (select * from testData order by key) t limit 5
 from 
 == Parsed Logical Plan ==
 'Limit 5
  'Project ['key]
   'Subquery t
'Sort ['key ASC], true
 'Project [*]
  'UnresolvedRelation [testData], None
 == Analyzed Logical Plan ==
 Limit 5
  Project [key#0]
   Subquery t
Sort [key#0 ASC], true
 Project [key#0,value#1]
  Subquery testData
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Optimized Logical Plan ==
 Limit 5
  Project [key#0]
   Sort [key#0 ASC], true
LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Physical Plan ==
 Limit 5
  Project [key#0]
   Sort [key#0 ASC], true
Exchange (RangePartitioning [key#0 ASC], 5), []
 PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 to
 == Parsed Logical Plan ==
 'Limit 5
  'Project ['key]
   'Subquery t
'Sort ['key ASC], true
 'Project [*]
  'UnresolvedRelation [testData], None
 == Analyzed Logical Plan ==
 Limit 5
  Project [key#0]
   Subquery t
Sort [key#0 ASC], true
 Project [key#0,value#1]
  Subquery testData
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
 == Optimized Logical Plan ==
 Project [key#0]
  Limit 5
   Sort [key#0 ASC], true
LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Physical Plan ==
 Project [key#0]
  TakeOrdered 5, [key#0 ASC]
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, 

[jira] [Comment Edited] (SPARK-5246) spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does not resolve

2015-05-01 Thread Nick Lipple (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523351#comment-14523351
 ] 

Nick Lipple edited comment on SPARK-5246 at 5/1/15 3:53 PM:


is there a workaround for this issue? any reason why the script uses hostname 
instead of ip address?

EDIT: nvm, this issue seems to be addressed: 
https://github.com/apache/spark/commit/86403f5525782bc9656ab11790f7020baa6b2c1f


was (Author: nicklipple):
is there a workaround for this issue? any reason why the script uses hostname 
instead of ip address?

 spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does 
 not resolve
 --

 Key: SPARK-5246
 URL: https://issues.apache.org/jira/browse/SPARK-5246
 Project: Spark
  Issue Type: Bug
  Components: EC2
Reporter: Vladimir Grigor

 How to reproduce: 
 1)  http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html 
 should be sufficient to setup VPC for this bug. After you followed that 
 guide, start new instance in VPC, ssh to it (though NAT server)
 2) user starts a cluster in VPC:
 {code}
 ./spark-ec2 -k key20141114 -i ~/aws/key.pem -s 1 --region=eu-west-1 
 --spark-version=1.2.0 --instance-type=m1.large --vpc-id=vpc-2e71dd46 
 --subnet-id=subnet-2571dd4d --zone=eu-west-1a  launch SparkByScript
 Setting up security groups...
 
 (omitted for brevity)
 10.1.1.62
 10.1.1.62: no org.apache.spark.deploy.worker.Worker to stop
 no org.apache.spark.deploy.master.Master to stop
 starting org.apache.spark.deploy.master.Master, logging to 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out
 failed to launch org.apache.spark.deploy.master.Master:
   at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
   ... 12 more
 full log in 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out
 10.1.1.62: starting org.apache.spark.deploy.worker.Worker, logging to 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out
 10.1.1.62: failed to launch org.apache.spark.deploy.worker.Worker:
 10.1.1.62:at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
 10.1.1.62:... 12 more
 10.1.1.62: full log in 
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out
 [timing] spark-standalone setup:  00h 00m 28s
  
 (omitted for brevity)
 {code}
 /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out
 {code}
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 Spark Command: /usr/lib/jvm/java-1.7.0/bin/java -cp 
 :::/root/ephemeral-hdfs/conf:/root/spark/sbin/../conf:/root/spark/lib/spark-assembly-1.2.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark/lib/datanucleus-rdbms-3.2.9.jar:/root/spark/lib/datanucleus-core-3.2.10.jar
  -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m 
 org.apache.spark.deploy.master.Master --ip 10.1.1.151 --port 7077 
 --webui-port 8080
 
 15/01/14 07:34:47 INFO master.Master: Registered signal handlers for [TERM, 
 HUP, INT]
 Exception in thread main java.net.UnknownHostException: ip-10-1-1-151: 
 ip-10-1-1-151: Name or service not known
 at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
 at org.apache.spark.util.Utils$.findLocalIpAddress(Utils.scala:620)
 at 
 org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:612)
 at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:612)
 at 
 org.apache.spark.util.Utils$.localIpAddressHostname$lzycompute(Utils.scala:613)
 at 
 org.apache.spark.util.Utils$.localIpAddressHostname(Utils.scala:613)
 at 
 org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665)
 at 
 org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.util.Utils$.localHostName(Utils.scala:665)
 at 
 org.apache.spark.deploy.master.MasterArguments.init(MasterArguments.scala:27)
 at org.apache.spark.deploy.master.Master$.main(Master.scala:819)
 at org.apache.spark.deploy.master.Master.main(Master.scala)
 Caused by: java.net.UnknownHostException: ip-10-1-1-151: Name or service not 
 known
 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
 at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
 at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
 at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
 ... 12 more
 {code}
 Problem is that instance 

[jira] [Assigned] (SPARK-7149) Defalt system alias problem

2015-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7149:
---

Assignee: (was: Apache Spark)

 Defalt system alias problem
 ---

 Key: SPARK-7149
 URL: https://issues.apache.org/jira/browse/SPARK-7149
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: haiyang

 Fix default system alias problem.
 execute the sql statement will cause problem: 
 select substr(value, 0, 2), key as c0 from testData order by c0
 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could 
 be: c0#42, c0#41.;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3066) Support recommendAll in matrix factorization model

2015-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522825#comment-14522825
 ] 

Apache Spark commented on SPARK-3066:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/5829

 Support recommendAll in matrix factorization model
 --

 Key: SPARK-3066
 URL: https://issues.apache.org/jira/browse/SPARK-3066
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Debasish Das

 ALS returns a matrix factorization model, which we can use to predict ratings 
 for individual queries as well as small batches. In practice, users may want 
 to compute top-k recommendations offline for all users. It is very expensive 
 but a common problem. We can do some optimization like
 1) collect one side (either user or product) and broadcast it as a matrix
 2) use level-3 BLAS to compute inner products
 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >