[jira] [Commented] (SPARK-6368) Build a specialized serializer for Exchange operator.
[ https://issues.apache.org/jira/browse/SPARK-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524117#comment-14524117 ] Apache Spark commented on SPARK-6368: - User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/5849 Build a specialized serializer for Exchange operator. -- Key: SPARK-6368 URL: https://issues.apache.org/jira/browse/SPARK-6368 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai Assignee: Yin Huai Priority: Critical Fix For: 1.4.0 Attachments: Kryo.nps, SchemaBased.nps Kryo is still pretty slow because it works on individual objects and relative expensive to allocate. For Exchange operator, because the schema for key and value are already defined, we can create a specialized serializer to handle the specific schemas of key and value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-6907) Create an isolated classloader for the Hive Client.
[ https://issues.apache.org/jira/browse/SPARK-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-6907: --- Assignee: Apache Spark (was: Michael Armbrust) Create an isolated classloader for the Hive Client. --- Key: SPARK-6907 URL: https://issues.apache.org/jira/browse/SPARK-6907 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7316: --- Assignee: Apache Spark Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Assignee: Apache Spark Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7316: --- Assignee: (was: Apache Spark) Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7316) Add step capability to RDD sliding window
[ https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524309#comment-14524309 ] Apache Spark commented on SPARK-7316: - User 'avulanov' has created a pull request for this issue: https://github.com/apache/spark/pull/5855 Add step capability to RDD sliding window - Key: SPARK-7316 URL: https://issues.apache.org/jira/browse/SPARK-7316 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Alexander Ulanov Fix For: 1.4.0 Original Estimate: 24h Remaining Estimate: 24h RDDFunctions in MLlib contains sliding window implementation with step 1. User should be able to define step. This capability should be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7216) Show driver details in Mesos cluster UI
[ https://issues.apache.org/jira/browse/SPARK-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7216: - Affects Version/s: 1.4.0 Show driver details in Mesos cluster UI --- Key: SPARK-7216 URL: https://issues.apache.org/jira/browse/SPARK-7216 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 1.4.0 Reporter: Timothy Chen Assignee: Timothy Chen Fix For: 1.4.0 Show driver details in Mesos cluster UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7216) Show driver details in Mesos cluster UI
[ https://issues.apache.org/jira/browse/SPARK-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-7216. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Timothy Chen Target Version/s: 1.4.0 Show driver details in Mesos cluster UI --- Key: SPARK-7216 URL: https://issues.apache.org/jira/browse/SPARK-7216 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 1.4.0 Reporter: Timothy Chen Assignee: Timothy Chen Fix For: 1.4.0 Show driver details in Mesos cluster UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6229) Support SASL encryption in network/common module
[ https://issues.apache.org/jira/browse/SPARK-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-6229. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Marcelo Vanzin Support SASL encryption in network/common module Key: SPARK-6229 URL: https://issues.apache.org/jira/browse/SPARK-6229 Project: Spark Issue Type: Sub-task Components: Spark Core Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: 1.4.0 After SASL support has been added to network/common, supporting encryption should be rather simple. Encryption is supported for DIGEST-MD5 and GSSAPI. Since the latter requires a valid kerberos login to work (and so doesn't really work with executors), encryption would require the use of DIGEST-MD5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2808) update kafka to version 0.8.2
[ https://issues.apache.org/jira/browse/SPARK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-2808. -- Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Cody Koeninger update kafka to version 0.8.2 - Key: SPARK-2808 URL: https://issues.apache.org/jira/browse/SPARK-2808 Project: Spark Issue Type: Sub-task Components: Build, Spark Core Reporter: Anand Avati Assignee: Cody Koeninger Fix For: 1.4.0 First kafka_2.11 0.8.1 has to be released -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7113) Add the direct stream related information to the streaming listener and web UI
[ https://issues.apache.org/jira/browse/SPARK-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524408#comment-14524408 ] Tathagata Das commented on SPARK-7113: -- [~jerryshao] Since this other sub task is done, can you create a PR for Kafka Direct to use the InputInfoTracker? Add the direct stream related information to the streaming listener and web UI -- Key: SPARK-7113 URL: https://issues.apache.org/jira/browse/SPARK-7113 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Saisai Shao Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7317) ShuffleHandle needs to be exposed
Mridul Muralidharan created SPARK-7317: -- Summary: ShuffleHandle needs to be exposed Key: SPARK-7317 URL: https://issues.apache.org/jira/browse/SPARK-7317 Project: Spark Issue Type: Improvement Components: Shuffle Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Minor ShuffleHandle is marked private[spark] - while a lot of code which depends on it, and exposes it, is DeveloperApi. While the actual implementation can remain private[spark], the handle class itself should be exposed so that Rdd's can leverage it. Example: a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as param : and can be used to write RDD's which leverage shuffle without needing to go through spark's shuffle based rdd's. So all the machinery for custom RDD to leverage shuffle exists - except for specifying the ShuffleHandle class itself in dependencies. This allows for customizations in user code on how to leverage shuffle. For example, specialized join implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite
Tathagata Das created SPARK-7315: Summary: Flaky Test: WriteAheadLogBackedBlockRDDSuite Key: SPARK-7315 URL: https://issues.apache.org/jira/browse/SPARK-7315 Project: Spark Issue Type: Test Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524372#comment-14524372 ] Russell Alexander Spitzer commented on SPARK-6069: -- Running with --conf spark.files.userClassPathFirst=true yields a different error {code} scala cc.sql(SELECT * FROM test.fun as a JOIN test.fun as b ON (a.k = b.v)).collect 15/05/01 17:24:34 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.0.2.15): java.lang.NoClassDefFoundError: org/apache/spark/Partition at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42) at org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:50) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$findClass$1.apply(ExecutorClassLoader.scala:57) at org.apache.spark.repl.ExecutorClassLoader$$anonfun$findClass$1.apply(ExecutorClassLoader.scala:57) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:57) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.spark.Partition at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
[jira] [Resolved] (SPARK-7309) Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler
[ https://issues.apache.org/jira/browse/SPARK-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-7309. -- Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Shixiong Zhu Shutdown the thread pools in ReceivedBlockHandler and DAGScheduler -- Key: SPARK-7309 URL: https://issues.apache.org/jira/browse/SPARK-7309 Project: Spark Issue Type: Improvement Components: Spark Core, Streaming Reporter: Shixiong Zhu Assignee: Shixiong Zhu Priority: Minor Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2691: - Assignee: (was: Timothy Chen) Allow Spark on Mesos to be launched with Docker --- Key: SPARK-2691 URL: https://issues.apache.org/jira/browse/SPARK-2691 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 1.0.0 Reporter: Timothy Chen Labels: mesos Attachments: spark-docker.patch Currently to launch Spark with Mesos one must upload a tarball and specifiy the executor URI to be passed in that is to be downloaded on each slave or even each execution depending coarse mode or not. We want to make Spark able to support launching Executors via a Docker image that utilizes the recent Docker and Mesos integration work. With the recent integration Spark can simply specify a Docker image and options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-2691. Resolution: Fixed Fix Version/s: 1.4.0 Allow Spark on Mesos to be launched with Docker --- Key: SPARK-2691 URL: https://issues.apache.org/jira/browse/SPARK-2691 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 1.0.0 Reporter: Timothy Chen Assignee: Chris Heller Labels: mesos Fix For: 1.4.0 Attachments: spark-docker.patch Currently to launch Spark with Mesos one must upload a tarball and specifiy the executor URI to be passed in that is to be downloaded on each slave or even each execution depending coarse mode or not. We want to make Spark able to support launching Executors via a Docker image that utilizes the recent Docker and Mesos integration work. With the recent integration Spark can simply specify a Docker image and options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2691: - Assignee: Chris Heller Allow Spark on Mesos to be launched with Docker --- Key: SPARK-2691 URL: https://issues.apache.org/jira/browse/SPARK-2691 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 1.0.0 Reporter: Timothy Chen Assignee: Chris Heller Labels: mesos Attachments: spark-docker.patch Currently to launch Spark with Mesos one must upload a tarball and specifiy the executor URI to be passed in that is to be downloaded on each slave or even each execution depending coarse mode or not. We want to make Spark able to support launching Executors via a Docker image that utilizes the recent Docker and Mesos integration work. With the recent integration Spark can simply specify a Docker image and options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch
[ https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-6166: --- Assignee: (was: Apache Spark) Add config to limit number of concurrent outbound connections for shuffle fetch --- Key: SPARK-6166 URL: https://issues.apache.org/jira/browse/SPARK-6166 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Mridul Muralidharan Priority: Minor spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size. But this is not always sufficient : when the number of hosts in the cluster increase, this can lead to very large number of in-bound connections to one more nodes - causing workers to fail under the load. I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on number of outstanding outbound connections. This might still cause hotspots in the cluster, but in our tests this has significantly reduced the occurance of worker failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch
[ https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-6166: --- Assignee: Apache Spark Add config to limit number of concurrent outbound connections for shuffle fetch --- Key: SPARK-6166 URL: https://issues.apache.org/jira/browse/SPARK-6166 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Mridul Muralidharan Assignee: Apache Spark Priority: Minor spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size. But this is not always sufficient : when the number of hosts in the cluster increase, this can lead to very large number of in-bound connections to one more nodes - causing workers to fail under the load. I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on number of outstanding outbound connections. This might still cause hotspots in the cluster, but in our tests this has significantly reduced the occurance of worker failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch
[ https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524205#comment-14524205 ] Apache Spark commented on SPARK-6166: - User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/5852 Add config to limit number of concurrent outbound connections for shuffle fetch --- Key: SPARK-6166 URL: https://issues.apache.org/jira/browse/SPARK-6166 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Mridul Muralidharan Priority: Minor spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size. But this is not always sufficient : when the number of hosts in the cluster increase, this can lead to very large number of in-bound connections to one more nodes - causing workers to fail under the load. I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on number of outstanding outbound connections. This might still cause hotspots in the cluster, but in our tests this has significantly reduced the occurance of worker failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7112) Add InputInfoTracker to have a generic way to track input data rates for all input streams.
[ https://issues.apache.org/jira/browse/SPARK-7112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-7112. -- Resolution: Fixed Add InputInfoTracker to have a generic way to track input data rates for all input streams. --- Key: SPARK-7112 URL: https://issues.apache.org/jira/browse/SPARK-7112 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Saisai Shao Assignee: Saisai Shao Fix For: 1.4.0 Non-receiver streams like Kafka Direct stream should be able to report input data rates. For that we need a generic way to report input information. This JIRA is to track the addition of an InputInfoTracker for that purport. Here is the design doc - https://docs.google.com/document/d/122QvcwPoLkI2OW4eM7nyBOAqffk2uxgsNT38WI-M5vQ/edit#heading=h.9eluy73ulzuz -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6443) Support HA in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6443. Resolution: Fixed Fix Version/s: 1.4.0 Target Version/s: 1.4.0 Support HA in standalone cluster mode - Key: SPARK-6443 URL: https://issues.apache.org/jira/browse/SPARK-6443 Project: Spark Issue Type: New Feature Components: Spark Submit Affects Versions: 1.0.0 Reporter: Tao Wang Assignee: Tao Wang Fix For: 1.4.0 == EDIT by Andrew == From a quick survey in the code I can confirm that client mode does support this. [This line|https://github.com/apache/spark/blob/e3202aa2e9bd140effbcf2a7a02b90cb077e760b/core/src/main/scala/org/apache/spark/SparkContext.scala#L2162] splits the master URLs by comma and passes these URLs into the AppClient. In standalone cluster mode, there is simply no equivalent logic to even split the master URLs, whether in the old submission gateway (o.a.s.deploy.Client) or in the new one (o.a.s.deploy.rest.StandaloneRestClient). Thus, this is an unsupported feature, not a bug! == Original description from Tao Wang == After digging some codes, I found user could not submit app in standalone cluster mode when HA is enabled. But in client mode it can work. Haven't try yet. But I will verify this and file a PR to resolve it if the problem exists. 3/23 update: I started a HA cluster with zk, and tried to submit SparkPi example with command: ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://doggie153:7077,doggie159:7077 --deploy-mode cluster ../lib/spark-examples-1.2.0-hadoop2.4.0.jar and it failed with error message: Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/03/23 15:24:45 ERROR actor.OneForOneStrategy: Invalid master URL: spark://doggie153:7077,doggie159:7077 akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.spark.SparkException: Invalid master URL: spark://doggie153:7077,doggie159:7077 at org.apache.spark.deploy.master.Master$.toAkkaUrl(Master.scala:830) at org.apache.spark.deploy.ClientActor.preStart(Client.scala:42) at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) at org.apache.spark.deploy.ClientActor.aroundPreStart(Client.scala:35) at akka.actor.ActorCell.create(ActorCell.scala:580) ... 9 more But in client mode it ended with correct result. So my guess is right. I will fix it in the related PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7317) ShuffleHandle needs to be exposed
[ https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-7317. Resolution: Fixed Fix Version/s: 1.4.0 ShuffleHandle needs to be exposed - Key: SPARK-7317 URL: https://issues.apache.org/jira/browse/SPARK-7317 Project: Spark Issue Type: Improvement Components: Shuffle Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Minor Fix For: 1.4.0 ShuffleHandle is marked private[spark] - while a lot of code which depends on it, and exposes it, is DeveloperApi. While the actual implementation can remain private[spark], the handle class itself should be exposed so that Rdd's can leverage it. Example: a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as param : and can be used to write RDD's which leverage shuffle without needing to go through spark's shuffle based rdd's. So all the machinery for custom RDD to leverage shuffle exists - except for specifying the ShuffleHandle class itself in dependencies. This allows for customizations in user code on how to leverage shuffle. For example, specialized join implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3444) Provide a way to easily change the log level in the Spark shell while running
[ https://issues.apache.org/jira/browse/SPARK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3444. Resolution: Fixed Fix Version/s: 1.4.0 Provide a way to easily change the log level in the Spark shell while running - Key: SPARK-3444 URL: https://issues.apache.org/jira/browse/SPARK-3444 Project: Spark Issue Type: Improvement Components: Spark Shell Reporter: holdenk Assignee: Holden Karau Priority: Minor Fix For: 1.4.0 Right now its difficult to change the log level while running. Our log messages can be quite verbose at the more detailed levels, and some users want to run at WARN until they encounter an issue and then increase the logging level to debug without restarting the shell. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors
[ https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-6954. Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Sandy Ryza (was: Cheolsoo Park) Target Version/s: 1.4.0 ExecutorAllocationManager can end up requesting a negative number of executors -- Key: SPARK-6954 URL: https://issues.apache.org/jira/browse/SPARK-6954 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Cheolsoo Park Assignee: Sandy Ryza Labels: yarn Fix For: 1.4.0 Attachments: with_fix.png, without_fix.png I have a simple test case for dynamic allocation on YARN that fails with the following stack trace- {code} 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0 java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -21 from the cluster manager. Please specify a positive number! at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My test is as follows- # Start spark-shell with a single executor. # Run a {{select count(\*)}} query. The number of executors rises as input size is non-trivial. # After the job finishes, the number of executors falls as most of them become idle. # Rerun the same query again, and the request to add executors fails with the above error. In fact, the job itself continues to run with whatever executors it already has, but it never gets more executors unless the shell is closed and restarted. In fact, this error only happens when I configure {{executorIdleTimeout}} very small. For eg, I can reproduce it with the following configs- {code} spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.schedulerBacklogTimeout 5 {code} Although I can simply increase {{executorIdleTimeout}} to something like 60 secs to avoid the error, I think this is still a bug to be fixed. The root cause seems that {{numExecutorsPending}} accidentally becomes negative if executors are killed too aggressively (i.e. {{executorIdleTimeout}} is too small) because under that circumstance, the new target # of executors can be smaller than the current # of executors. When that happens, {{ExecutorAllocationManager}} ends up trying to add a negative number of executors, which throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors
[ https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6954: - Target Version/s: 1.3.1, 1.4.0 (was: 1.4.0) ExecutorAllocationManager can end up requesting a negative number of executors -- Key: SPARK-6954 URL: https://issues.apache.org/jira/browse/SPARK-6954 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Cheolsoo Park Assignee: Sandy Ryza Labels: yarn Fix For: 1.4.0 Attachments: with_fix.png, without_fix.png I have a simple test case for dynamic allocation on YARN that fails with the following stack trace- {code} 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0 java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -21 from the cluster manager. Please specify a positive number! at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My test is as follows- # Start spark-shell with a single executor. # Run a {{select count(\*)}} query. The number of executors rises as input size is non-trivial. # After the job finishes, the number of executors falls as most of them become idle. # Rerun the same query again, and the request to add executors fails with the above error. In fact, the job itself continues to run with whatever executors it already has, but it never gets more executors unless the shell is closed and restarted. In fact, this error only happens when I configure {{executorIdleTimeout}} very small. For eg, I can reproduce it with the following configs- {code} spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.schedulerBacklogTimeout 5 {code} Although I can simply increase {{executorIdleTimeout}} to something like 60 secs to avoid the error, I think this is still a bug to be fixed. The root cause seems that {{numExecutorsPending}} accidentally becomes negative if executors are killed too aggressively (i.e. {{executorIdleTimeout}} is too small) because under that circumstance, the new target # of executors can be smaller than the current # of executors. When that happens, {{ExecutorAllocationManager}} ends up trying to add a negative number of executors, which throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors
[ https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6954: - Labels: backport-needed yarn (was: yarn) ExecutorAllocationManager can end up requesting a negative number of executors -- Key: SPARK-6954 URL: https://issues.apache.org/jira/browse/SPARK-6954 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Cheolsoo Park Assignee: Sandy Ryza Labels: backport-needed, yarn Fix For: 1.4.0 Attachments: with_fix.png, without_fix.png I have a simple test case for dynamic allocation on YARN that fails with the following stack trace- {code} 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0 java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -21 from the cluster manager. Please specify a positive number! at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My test is as follows- # Start spark-shell with a single executor. # Run a {{select count(\*)}} query. The number of executors rises as input size is non-trivial. # After the job finishes, the number of executors falls as most of them become idle. # Rerun the same query again, and the request to add executors fails with the above error. In fact, the job itself continues to run with whatever executors it already has, but it never gets more executors unless the shell is closed and restarted. In fact, this error only happens when I configure {{executorIdleTimeout}} very small. For eg, I can reproduce it with the following configs- {code} spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.schedulerBacklogTimeout 5 {code} Although I can simply increase {{executorIdleTimeout}} to something like 60 secs to avoid the error, I think this is still a bug to be fixed. The root cause seems that {{numExecutorsPending}} accidentally becomes negative if executors are killed too aggressively (i.e. {{executorIdleTimeout}} is too small) because under that circumstance, the new target # of executors can be smaller than the current # of executors. When that happens, {{ExecutorAllocationManager}} ends up trying to add a negative number of executors, which throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors
[ https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-6954: -- ExecutorAllocationManager can end up requesting a negative number of executors -- Key: SPARK-6954 URL: https://issues.apache.org/jira/browse/SPARK-6954 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Cheolsoo Park Assignee: Sandy Ryza Labels: yarn Fix For: 1.4.0 Attachments: with_fix.png, without_fix.png I have a simple test case for dynamic allocation on YARN that fails with the following stack trace- {code} 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0 java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -21 from the cluster manager. Please specify a positive number! at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My test is as follows- # Start spark-shell with a single executor. # Run a {{select count(\*)}} query. The number of executors rises as input size is non-trivial. # After the job finishes, the number of executors falls as most of them become idle. # Rerun the same query again, and the request to add executors fails with the above error. In fact, the job itself continues to run with whatever executors it already has, but it never gets more executors unless the shell is closed and restarted. In fact, this error only happens when I configure {{executorIdleTimeout}} very small. For eg, I can reproduce it with the following configs- {code} spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.schedulerBacklogTimeout 5 {code} Although I can simply increase {{executorIdleTimeout}} to something like 60 secs to avoid the error, I think this is still a bug to be fixed. The root cause seems that {{numExecutorsPending}} accidentally becomes negative if executors are killed too aggressively (i.e. {{executorIdleTimeout}} is too small) because under that circumstance, the new target # of executors can be smaller than the current # of executors. When that happens, {{ExecutorAllocationManager}} ends up trying to add a negative number of executors, which throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors
[ https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6954: - Target Version/s: 1.3.2, 1.4.0 (was: 1.3.1, 1.4.0) ExecutorAllocationManager can end up requesting a negative number of executors -- Key: SPARK-6954 URL: https://issues.apache.org/jira/browse/SPARK-6954 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Cheolsoo Park Assignee: Sandy Ryza Labels: backport-needed, yarn Fix For: 1.4.0 Attachments: with_fix.png, without_fix.png I have a simple test case for dynamic allocation on YARN that fails with the following stack trace- {code} 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0 java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -21 from the cluster manager. Please specify a positive number! at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My test is as follows- # Start spark-shell with a single executor. # Run a {{select count(\*)}} query. The number of executors rises as input size is non-trivial. # After the job finishes, the number of executors falls as most of them become idle. # Rerun the same query again, and the request to add executors fails with the above error. In fact, the job itself continues to run with whatever executors it already has, but it never gets more executors unless the shell is closed and restarted. In fact, this error only happens when I configure {{executorIdleTimeout}} very small. For eg, I can reproduce it with the following configs- {code} spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.schedulerBacklogTimeout 5 {code} Although I can simply increase {{executorIdleTimeout}} to something like 60 secs to avoid the error, I think this is still a bug to be fixed. The root cause seems that {{numExecutorsPending}} accidentally becomes negative if executors are killed too aggressively (i.e. {{executorIdleTimeout}} is too small) because under that circumstance, the new target # of executors can be smaller than the current # of executors. When that happens, {{ExecutorAllocationManager}} ends up trying to add a negative number of executors, which throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7113) Add the direct stream related information to the streaming listener and web UI
[ https://issues.apache.org/jira/browse/SPARK-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524536#comment-14524536 ] Saisai Shao commented on SPARK-7113: Yes, I will do it. Thanks a lot :). Add the direct stream related information to the streaming listener and web UI -- Key: SPARK-7113 URL: https://issues.apache.org/jira/browse/SPARK-7113 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Saisai Shao Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7314) Upgrade Pyrolite with patches
Xiangrui Meng created SPARK-7314: Summary: Upgrade Pyrolite with patches Key: SPARK-7314 URL: https://issues.apache.org/jira/browse/SPARK-7314 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.4.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng As discussed on SPARK-6288, we are using a really old version of Pyrolite, which was published under org.spark-project. It would be nice to upgrade to it the latest (and possibly official) version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7314) Upgrade Pyrolite with patches
[ https://issues.apache.org/jira/browse/SPARK-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524130#comment-14524130 ] Apache Spark commented on SPARK-7314: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/5850 Upgrade Pyrolite with patches - Key: SPARK-7314 URL: https://issues.apache.org/jira/browse/SPARK-7314 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.4.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng As discussed on SPARK-6288, we are using a really old version of Pyrolite, which was published under org.spark-project. It would be nice to upgrade to it the latest (and possibly official) version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7314) Upgrade Pyrolite with patches
[ https://issues.apache.org/jira/browse/SPARK-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7314: --- Assignee: Apache Spark (was: Xiangrui Meng) Upgrade Pyrolite with patches - Key: SPARK-7314 URL: https://issues.apache.org/jira/browse/SPARK-7314 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.4.0 Reporter: Xiangrui Meng Assignee: Apache Spark As discussed on SPARK-6288, we are using a really old version of Pyrolite, which was published under org.spark-project. It would be nice to upgrade to it the latest (and possibly official) version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7314) Upgrade Pyrolite with patches
[ https://issues.apache.org/jira/browse/SPARK-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7314: --- Assignee: Xiangrui Meng (was: Apache Spark) Upgrade Pyrolite with patches - Key: SPARK-7314 URL: https://issues.apache.org/jira/browse/SPARK-7314 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.4.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng As discussed on SPARK-6288, we are using a really old version of Pyrolite, which was published under org.spark-project. It would be nice to upgrade to it the latest (and possibly official) version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6999) infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String])
[ https://issues.apache.org/jira/browse/SPARK-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-6999. - Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5804 [https://github.com/apache/spark/pull/5804] infinite recursion with createDataFrame(JavaRDD[Row], java.util.List[String]) - Key: SPARK-6999 URL: https://issues.apache.org/jira/browse/SPARK-6999 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Justin Uang Priority: Blocker Fix For: 1.4.0 It looks like {code} def createDataFrame(rowRDD: JavaRDD[Row], columns: java.util.List[String]): DataFrame = { createDataFrame(rowRDD.rdd, columns.toSeq) } {code} is in fact an infinite recursion because it calls itself. Scala implicit conversions convert the arguments back into a JavaRDD and a java.util.List. {code} 15/04/19 16:51:24 INFO BlockManagerMaster: Trying to register BlockManager 15/04/19 16:51:24 INFO BlockManagerMasterActor: Registering block manager localhost:53711 with 1966.1 MB RAM, BlockManagerId(driver, localhost, 53711) 15/04/19 16:51:24 INFO BlockManagerMaster: Registered BlockManager Exception in thread main java.lang.StackOverflowError at scala.collection.mutable.AbstractSeq.init(Seq.scala:47) at scala.collection.mutable.AbstractBuffer.init(Buffer.scala:48) at scala.collection.convert.Wrappers$JListWrapper.init(Wrappers.scala:84) at scala.collection.convert.WrapAsScala$class.asScalaBuffer(WrapAsScala.scala:127) at scala.collection.JavaConversions$.asScalaBuffer(JavaConversions.scala:53) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:408) {code} Here is the code sample I used to reproduce the issue: {code} /** * @author juang */ public final class InfiniteRecursionExample { public static void main(String[] args) { JavaSparkContext sc = new JavaSparkContext(local, infinite_recursion_example); ListRow rows = Lists.newArrayList(); JavaRDDRow rowRDD = sc.parallelize(rows); SQLContext sqlContext = new SQLContext(sc); sqlContext.createDataFrame(rowRDD, ImmutableList.of(myCol)); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7317) ShuffleHandle needs to be exposed
[ https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7317: --- Assignee: Apache Spark (was: Mridul Muralidharan) ShuffleHandle needs to be exposed - Key: SPARK-7317 URL: https://issues.apache.org/jira/browse/SPARK-7317 Project: Spark Issue Type: Improvement Components: Shuffle Reporter: Mridul Muralidharan Assignee: Apache Spark Priority: Minor ShuffleHandle is marked private[spark] - while a lot of code which depends on it, and exposes it, is DeveloperApi. While the actual implementation can remain private[spark], the handle class itself should be exposed so that Rdd's can leverage it. Example: a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as param : and can be used to write RDD's which leverage shuffle without needing to go through spark's shuffle based rdd's. So all the machinery for custom RDD to leverage shuffle exists - except for specifying the ShuffleHandle class itself in dependencies. This allows for customizations in user code on how to leverage shuffle. For example, specialized join implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7317) ShuffleHandle needs to be exposed
[ https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7317: --- Assignee: Mridul Muralidharan (was: Apache Spark) ShuffleHandle needs to be exposed - Key: SPARK-7317 URL: https://issues.apache.org/jira/browse/SPARK-7317 Project: Spark Issue Type: Improvement Components: Shuffle Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Minor ShuffleHandle is marked private[spark] - while a lot of code which depends on it, and exposes it, is DeveloperApi. While the actual implementation can remain private[spark], the handle class itself should be exposed so that Rdd's can leverage it. Example: a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as param : and can be used to write RDD's which leverage shuffle without needing to go through spark's shuffle based rdd's. So all the machinery for custom RDD to leverage shuffle exists - except for specifying the ShuffleHandle class itself in dependencies. This allows for customizations in user code on how to leverage shuffle. For example, specialized join implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7317) ShuffleHandle needs to be exposed
[ https://issues.apache.org/jira/browse/SPARK-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524469#comment-14524469 ] Apache Spark commented on SPARK-7317: - User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/5857 ShuffleHandle needs to be exposed - Key: SPARK-7317 URL: https://issues.apache.org/jira/browse/SPARK-7317 Project: Spark Issue Type: Improvement Components: Shuffle Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Minor ShuffleHandle is marked private[spark] - while a lot of code which depends on it, and exposes it, is DeveloperApi. While the actual implementation can remain private[spark], the handle class itself should be exposed so that Rdd's can leverage it. Example: a) ShuffleDependency.shuffleHandle exposes a ShuffleHandle b) ShuffleManager instance is exposed via SparkEnv.get.shuffleManager c) SparkEnv.get.shuffleManager.getReader is exposed which needs handle as param : and can be used to write RDD's which leverage shuffle without needing to go through spark's shuffle based rdd's. So all the machinery for custom RDD to leverage shuffle exists - except for specifying the ShuffleHandle class itself in dependencies. This allows for customizations in user code on how to leverage shuffle. For example, specialized join implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite
[ https://issues.apache.org/jira/browse/SPARK-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524252#comment-14524252 ] Apache Spark commented on SPARK-7315: - User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/5853 Flaky Test: WriteAheadLogBackedBlockRDDSuite Key: SPARK-7315 URL: https://issues.apache.org/jira/browse/SPARK-7315 Project: Spark Issue Type: Test Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite
[ https://issues.apache.org/jira/browse/SPARK-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7315: --- Assignee: Apache Spark (was: Tathagata Das) Flaky Test: WriteAheadLogBackedBlockRDDSuite Key: SPARK-7315 URL: https://issues.apache.org/jira/browse/SPARK-7315 Project: Spark Issue Type: Test Reporter: Tathagata Das Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7315) Flaky Test: WriteAheadLogBackedBlockRDDSuite
[ https://issues.apache.org/jira/browse/SPARK-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7315: --- Assignee: Tathagata Das (was: Apache Spark) Flaky Test: WriteAheadLogBackedBlockRDDSuite Key: SPARK-7315 URL: https://issues.apache.org/jira/browse/SPARK-7315 Project: Spark Issue Type: Test Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7241) Pearson correlation for DataFrames
[ https://issues.apache.org/jira/browse/SPARK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7241: --- Assignee: Apache Spark (was: Burak Yavuz) Pearson correlation for DataFrames -- Key: SPARK-7241 URL: https://issues.apache.org/jira/browse/SPARK-7241 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Xiangrui Meng Assignee: Apache Spark This JIRA is for computing the Pearson linear correlation for two numerical columns in a DataFrame. The method `corr` should live under `df.stat`: {code} df.stat.corr(col1, col2, method=pearson): Double {code} `method` will be used when we add other correlations. Similar to SPARK-7240, UDAF will be added later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7241) Pearson correlation for DataFrames
[ https://issues.apache.org/jira/browse/SPARK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7241: --- Assignee: Burak Yavuz (was: Apache Spark) Pearson correlation for DataFrames -- Key: SPARK-7241 URL: https://issues.apache.org/jira/browse/SPARK-7241 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Xiangrui Meng Assignee: Burak Yavuz This JIRA is for computing the Pearson linear correlation for two numerical columns in a DataFrame. The method `corr` should live under `df.stat`: {code} df.stat.corr(col1, col2, method=pearson): Double {code} `method` will be used when we add other correlations. Similar to SPARK-7240, UDAF will be added later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7241) Pearson correlation for DataFrames
[ https://issues.apache.org/jira/browse/SPARK-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524489#comment-14524489 ] Apache Spark commented on SPARK-7241: - User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/5858 Pearson correlation for DataFrames -- Key: SPARK-7241 URL: https://issues.apache.org/jira/browse/SPARK-7241 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Xiangrui Meng Assignee: Burak Yavuz This JIRA is for computing the Pearson linear correlation for two numerical columns in a DataFrame. The method `corr` should live under `df.stat`: {code} df.stat.corr(col1, col2, method=pearson): Double {code} `method` will be used when we add other correlations. Similar to SPARK-7240, UDAF will be added later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7313) Allow for configuring max_samples in range partitioner.
[ https://issues.apache.org/jira/browse/SPARK-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7313: --- Assignee: Apache Spark (was: Mridul Muralidharan) Allow for configuring max_samples in range partitioner. --- Key: SPARK-7313 URL: https://issues.apache.org/jira/browse/SPARK-7313 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Mridul Muralidharan Assignee: Apache Spark Priority: Minor Currently, we assume that 1e6 is a reasonable upper bound to number of keys while sampling. This works fine when size of keys is 'small' - but breaks for anything non-trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7313) Allow for configuring max_samples in range partitioner.
[ https://issues.apache.org/jira/browse/SPARK-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524105#comment-14524105 ] Apache Spark commented on SPARK-7313: - User 'mridulm' has created a pull request for this issue: https://github.com/apache/spark/pull/5848 Allow for configuring max_samples in range partitioner. --- Key: SPARK-7313 URL: https://issues.apache.org/jira/browse/SPARK-7313 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Minor Currently, we assume that 1e6 is a reasonable upper bound to number of keys while sampling. This works fine when size of keys is 'small' - but breaks for anything non-trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7313) Allow for configuring max_samples in range partitioner.
[ https://issues.apache.org/jira/browse/SPARK-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7313: --- Assignee: Mridul Muralidharan (was: Apache Spark) Allow for configuring max_samples in range partitioner. --- Key: SPARK-7313 URL: https://issues.apache.org/jira/browse/SPARK-7313 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Priority: Minor Currently, we assume that 1e6 is a reasonable upper bound to number of keys while sampling. This works fine when size of keys is 'small' - but breaks for anything non-trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-6907) Create an isolated classloader for the Hive Client.
[ https://issues.apache.org/jira/browse/SPARK-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-6907: --- Assignee: Michael Armbrust (was: Apache Spark) Create an isolated classloader for the Hive Client. --- Key: SPARK-6907 URL: https://issues.apache.org/jira/browse/SPARK-6907 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6907) Create an isolated classloader for the Hive Client.
[ https://issues.apache.org/jira/browse/SPARK-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524161#comment-14524161 ] Apache Spark commented on SPARK-6907: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/5851 Create an isolated classloader for the Hive Client. --- Key: SPARK-6907 URL: https://issues.apache.org/jira/browse/SPARK-6907 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7312) SPARK-6913 broke jdk6 build
[ https://issues.apache.org/jira/browse/SPARK-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-7312. - Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5847 [https://github.com/apache/spark/pull/5847] SPARK-6913 broke jdk6 build --- Key: SPARK-7312 URL: https://issues.apache.org/jira/browse/SPARK-7312 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Reporter: Thomas Graves Priority: Blocker Fix For: 1.4.0 https://github.com/apache/spark/pull/5782 uses java.sql.Driver.getParentLogger which doesn't exist in jdk6, only jdk7 [error] /home/tgraves/tgravescs_spark/sql/core/src/main/scala/org/apache/spark/sql/jdbc/jdbc.scala:198: value getParentLogger is not a member of java.sql.Driver [error] override def getParentLogger: java.util.logging.Logger = wrapped.getParentLogger [error] ^ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7304) Include $@ in call to mvn in make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7304: --- Assignee: Rajendra Include $@ in call to mvn in make-distribution.sh - Key: SPARK-7304 URL: https://issues.apache.org/jira/browse/SPARK-7304 Project: Spark Issue Type: Improvement Components: Build Reporter: Rajendra Assignee: Rajendra Priority: Minor Fix For: 1.4.0 Attachments: 0001-Include-in-call-to-mvn-in-make-distribution.sh.patch Original Estimate: 1h Remaining Estimate: 1h The call to mvn does not include $@ in the command line in one place in make-distribution.sh. This causes that mvn call to ignore additional command line parameters passed to make-distribution.sh in that call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7304) Include $@ in call to mvn in make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7304. Resolution: Fixed Fix Version/s: 1.4.0 Include $@ in call to mvn in make-distribution.sh - Key: SPARK-7304 URL: https://issues.apache.org/jira/browse/SPARK-7304 Project: Spark Issue Type: Improvement Components: Build Reporter: Rajendra Assignee: Rajendra Priority: Minor Fix For: 1.4.0 Attachments: 0001-Include-in-call-to-mvn-in-make-distribution.sh.patch Original Estimate: 1h Remaining Estimate: 1h The call to mvn does not include $@ in the command line in one place in make-distribution.sh. This causes that mvn call to ignore additional command line parameters passed to make-distribution.sh in that call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7260) Support changing Spark's log level programatically
[ https://issues.apache.org/jira/browse/SPARK-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7260. Resolution: Duplicate Support changing Spark's log level programatically -- Key: SPARK-7260 URL: https://issues.apache.org/jira/browse/SPARK-7260 Project: Spark Issue Type: New Feature Components: Spark Core, Spark Shell Reporter: Patrick Wendell Priority: Minor There was an earlier PR for this that was basically ready to merge. Just wanted to open a JIRA: https://github.com/apache/spark/pull/2433/files The main use case I see here is for changing logging in the shell easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6954) ExecutorAllocationManager can end up requesting a negative number of executors
[ https://issues.apache.org/jira/browse/SPARK-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524466#comment-14524466 ] Apache Spark commented on SPARK-6954: - User 'sryza' has created a pull request for this issue: https://github.com/apache/spark/pull/5856 ExecutorAllocationManager can end up requesting a negative number of executors -- Key: SPARK-6954 URL: https://issues.apache.org/jira/browse/SPARK-6954 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Cheolsoo Park Assignee: Sandy Ryza Labels: backport-needed, yarn Fix For: 1.4.0 Attachments: with_fix.png, without_fix.png I have a simple test case for dynamic allocation on YARN that fails with the following stack trace- {code} 15/04/16 00:52:14 ERROR Utils: Uncaught exception in thread spark-dynamic-executor-allocation-0 java.lang.IllegalArgumentException: Attempted to request a negative number of executor(s) -21 from the cluster manager. Please specify a positive number! at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) at org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} My test is as follows- # Start spark-shell with a single executor. # Run a {{select count(\*)}} query. The number of executors rises as input size is non-trivial. # After the job finishes, the number of executors falls as most of them become idle. # Rerun the same query again, and the request to add executors fails with the above error. In fact, the job itself continues to run with whatever executors it already has, but it never gets more executors unless the shell is closed and restarted. In fact, this error only happens when I configure {{executorIdleTimeout}} very small. For eg, I can reproduce it with the following configs- {code} spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.schedulerBacklogTimeout 5 {code} Although I can simply increase {{executorIdleTimeout}} to something like 60 secs to avoid the error, I think this is still a bug to be fixed. The root cause seems that {{numExecutorsPending}} accidentally becomes negative if executors are killed too aggressively (i.e. {{executorIdleTimeout}} is too small) because under that circumstance, the new target # of executors can be smaller than the current # of executors. When that happens, {{ExecutorAllocationManager}} ends up trying to add a negative number of executors, which throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7242) Frequent items for DataFrames
[ https://issues.apache.org/jira/browse/SPARK-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524575#comment-14524575 ] Apache Spark commented on SPARK-7242: - User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/5859 Frequent items for DataFrames - Key: SPARK-7242 URL: https://issues.apache.org/jira/browse/SPARK-7242 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Xiangrui Meng Assignee: Burak Yavuz Finding frequent items with possibly false positives, using the algorithm described in http://www.cs.umd.edu/~samir/498/karp.pdf. {code} df.stat.freqItems(cols: Array[String], support: Double = 0.001): DataFrame {code} The output is a local DataFrame having the input column names. In the first version, we will implement the single pass algorithm that may return false positives, but no false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6069) Deserialization Error ClassNotFoundException with Kryo, Guava 14
[ https://issues.apache.org/jira/browse/SPARK-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524363#comment-14524363 ] Russell Alexander Spitzer commented on SPARK-6069: -- We've seen the same issue while developing the Spark Cassandra Connector. Unless the connector lib is loaded via spark.executor.extraClassPath, kryoSerializaition for joins always returns a classNotFound even though all operations which don't require a shuffle are fine. {code} com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.spark.sql.cassandra.CassandraSQLRow at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721) at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42) at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:144) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:80) at org.apache.spark.sql.execution.joins.ShuffledHashJoin$$anonfun$execute$1.apply(ShuffledHashJoin.scala:46) at org.apache.spark.sql.execution.joins.ShuffledHashJoin$$anonfun$execute$1.apply(ShuffledHashJoin.scala:45) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} Adding the jar to executorExtraClasspath rather than --jars solves the issue. Deserialization Error ClassNotFoundException with Kryo, Guava 14 Key: SPARK-6069 URL: https://issues.apache.org/jira/browse/SPARK-6069 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.1 Environment: Standalone one worker cluster on localhost, or any cluster Reporter: Pat Ferrel Priority: Critical A class is contained in the jars passed in when creating a context. It is registered with kryo. The class (Guava HashBiMap) is created correctly from an RDD and broadcast but the deserialization fails with ClassNotFound. The work around is to hard code the path to the jar and make it available on all workers. Hard code because we are creating a library so there is no easy way to pass in to the app something like: spark.executor.extraClassPath /path/to/some.jar -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6986) Makes SparkSqlSerializer2 support sort-based shuffle with sort merge
[ https://issues.apache.org/jira/browse/SPARK-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525070#comment-14525070 ] Apache Spark commented on SPARK-6986: - User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/5849 Makes SparkSqlSerializer2 support sort-based shuffle with sort merge Key: SPARK-6986 URL: https://issues.apache.org/jira/browse/SPARK-6986 Project: Spark Issue Type: Improvement Components: SQL Reporter: Yin Huai Assignee: Yin Huai *Update*: SPARK-4550 has exposed the interfaces. We can safely enable Serializer2 to support sort merge. *Original description*: Our existing Java and Kryo serializer are both general-purpose serialize. They treat every object individually and encode the type of an object to underlying stream. For Spark, it is common that we serialize a collection with records having the same types (for example, records of a DataFrame). For these cases, we do not need to write out types of records and we can take advantage the type information to build specialized serializer. To do so, seems we need to extend the interface of SerializationStream/DeserializationStream, so a SerializationStream/DeserializationStream can have more information about objects passed in (for example, if an object is key/value pair, a key, or a value). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Description: Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; was: Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 1, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525063#comment-14525063 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525061#comment-14525061 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525057#comment-14525057 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525059#comment-14525059 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525062#comment-14525062 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525054#comment-14525054 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7318) DStream isn't cleaning closures correctly
[ https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-7318: - Description: {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! was: {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { SparkContext.clean transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! DStream isn't cleaning closures correctly - Key: SPARK-7318 URL: https://issues.apache.org/jira/browse/SPARK-7318 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Description: Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 1, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; was: Fix default system alias problem. execute the sql statement will cause problem: select substr(concat('value', value), 1, 3), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 1, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525060#comment-14525060 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525056#comment-14525056 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525055#comment-14525055 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525058#comment-14525058 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7318) DStream isn't cleaning closures correctly
[ https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7318: --- Assignee: Apache Spark (was: Andrew Or) DStream isn't cleaning closures correctly - Key: SPARK-7318 URL: https://issues.apache.org/jira/browse/SPARK-7318 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Apache Spark Priority: Critical {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525051#comment-14525051 ] haiyang commented on SPARK-7149: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception. Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7318) DStream isn't cleaning closures correctly
[ https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525052#comment-14525052 ] Apache Spark commented on SPARK-7318: - User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/5860 DStream isn't cleaning closures correctly - Key: SPARK-7318 URL: https://issues.apache.org/jira/browse/SPARK-7318 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7318) DStream isn't cleaning closures correctly
[ https://issues.apache.org/jira/browse/SPARK-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7318: --- Assignee: Andrew Or (was: Apache Spark) DStream isn't cleaning closures correctly - Key: SPARK-7318 URL: https://issues.apache.org/jira/browse/SPARK-7318 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7318) DStream isn't cleaning closures correctly
Andrew Or created SPARK-7318: Summary: DStream isn't cleaning closures correctly Key: SPARK-7318 URL: https://issues.apache.org/jira/browse/SPARK-7318 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical {code} def transform[U: ClassTag](transformFunc: RDD[T] = RDD[U]): DStream[U] = { SparkContext.clean transform((r: RDD[T], t: Time) = context.sparkContext.clean(transformFunc(r), false)) } {code} This is cleaning an RDD instead! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haiyang updated SPARK-7149: --- Comment: was deleted (was: This is SqlParser problem,when we give no alias to a function in project, the parser will give it a default alias like c0,c1so, when we execute the sql statement like select isnull(key), key as c0 from testData order by c0, it will throw exception.) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7303) push down project if possible when the child is sort
Fei Wang created SPARK-7303: --- Summary: push down project if possible when the child is sort Key: SPARK-7303 URL: https://issues.apache.org/jira/browse/SPARK-7303 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Optimize the case of `project(_, sort)` , a example is: `select key from (select * from testData order by key) t` optimize it from ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5891) Add Binarizer
[ https://issues.apache.org/jira/browse/SPARK-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5891. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 5699 [https://github.com/apache/spark/pull/5699] Add Binarizer - Key: SPARK-5891 URL: https://issues.apache.org/jira/browse/SPARK-5891 Project: Spark Issue Type: Sub-task Components: ML Reporter: Xiangrui Meng Assignee: Liang-Chi Hsieh Fix For: 1.4.0 `Binarizer` takes a column of continuous features and output a column with binary features, where nonzeros (or values below a threshold) become 1 in the output. {code} val binarizer = new Binarizer() .setInputCol(numVisits) .setOutputCol(visited) {code} The output column should be marked as binary. We need to discuss whether we should process multiple columns or a vector column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7299) saving Oracle-source DataFrame to Hive changes scale
[ https://issues.apache.org/jira/browse/SPARK-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523236#comment-14523236 ] Ken Geis commented on SPARK-7299: - This passes my test! saving Oracle-source DataFrame to Hive changes scale Key: SPARK-7299 URL: https://issues.apache.org/jira/browse/SPARK-7299 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Ken Geis When I load data from Oracle, save it to a table, then select from it, the scale is changed. For example, I have a column defined as NUMBER(12, 2). I insert 1 into the column. When I write that to a table and select from it, the result is 199.99. Some databases (e.g. H2) will return this as 1.00, but Oracle returns it as 1. I believe that when the file is written out to parquet, the scale information is taken from the schema, not the value. In an Oracle (at least) JDBC DataFrame, the scale may be different from row to row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5246) spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does not resolve
[ https://issues.apache.org/jira/browse/SPARK-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523351#comment-14523351 ] Nick Lipple commented on SPARK-5246: is there a workaround for this issue? any reason why the script uses hostname instead of ip address? spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does not resolve -- Key: SPARK-5246 URL: https://issues.apache.org/jira/browse/SPARK-5246 Project: Spark Issue Type: Bug Components: EC2 Reporter: Vladimir Grigor How to reproduce: 1) http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html should be sufficient to setup VPC for this bug. After you followed that guide, start new instance in VPC, ssh to it (though NAT server) 2) user starts a cluster in VPC: {code} ./spark-ec2 -k key20141114 -i ~/aws/key.pem -s 1 --region=eu-west-1 --spark-version=1.2.0 --instance-type=m1.large --vpc-id=vpc-2e71dd46 --subnet-id=subnet-2571dd4d --zone=eu-west-1a launch SparkByScript Setting up security groups... (omitted for brevity) 10.1.1.62 10.1.1.62: no org.apache.spark.deploy.worker.Worker to stop no org.apache.spark.deploy.master.Master to stop starting org.apache.spark.deploy.master.Master, logging to /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out failed to launch org.apache.spark.deploy.master.Master: at java.net.InetAddress.getLocalHost(InetAddress.java:1469) ... 12 more full log in /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out 10.1.1.62: starting org.apache.spark.deploy.worker.Worker, logging to /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out 10.1.1.62: failed to launch org.apache.spark.deploy.worker.Worker: 10.1.1.62:at java.net.InetAddress.getLocalHost(InetAddress.java:1469) 10.1.1.62:... 12 more 10.1.1.62: full log in /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out [timing] spark-standalone setup: 00h 00m 28s (omitted for brevity) {code} /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out {code} Spark assembly has been built with Hive, including Datanucleus jars on classpath Spark Command: /usr/lib/jvm/java-1.7.0/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/sbin/../conf:/root/spark/lib/spark-assembly-1.2.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark/lib/datanucleus-rdbms-3.2.9.jar:/root/spark/lib/datanucleus-core-3.2.10.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 10.1.1.151 --port 7077 --webui-port 8080 15/01/14 07:34:47 INFO master.Master: Registered signal handlers for [TERM, HUP, INT] Exception in thread main java.net.UnknownHostException: ip-10-1-1-151: ip-10-1-1-151: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1473) at org.apache.spark.util.Utils$.findLocalIpAddress(Utils.scala:620) at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:612) at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:612) at org.apache.spark.util.Utils$.localIpAddressHostname$lzycompute(Utils.scala:613) at org.apache.spark.util.Utils$.localIpAddressHostname(Utils.scala:613) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.localHostName(Utils.scala:665) at org.apache.spark.deploy.master.MasterArguments.init(MasterArguments.scala:27) at org.apache.spark.deploy.master.Master$.main(Master.scala:819) at org.apache.spark.deploy.master.Master.main(Master.scala) Caused by: java.net.UnknownHostException: ip-10-1-1-151: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getLocalHost(InetAddress.java:1469) ... 12 more {code} Problem is that instance launched in VPC may be not able to resolve own local hostname. Please see https://forums.aws.amazon.com/thread.jspa?threadID=92092. I am going to submit a fix for this problem since I need this functionality asap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523226#comment-14523226 ] longbao wang commented on SPARK-2336: - I really agree with you,and i'm already implementing it,but i have a trouble,after build tree successful,you search target points' knn,so parallelize the input target points then search,but i think this have some questions,and one point's knn may in two partitions or more. Approximate k-NN Models for MLLib - Key: SPARK-2336 URL: https://issues.apache.org/jira/browse/SPARK-2336 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Brian Gawalt Priority: Minor Labels: clustering, features After tackling the general k-Nearest Neighbor model as per https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to also offer approximate k-Nearest Neighbor. A promising approach would involve building a kd-tree variant within from each partition, a la http://www.autonlab.org/autonweb/14714.html?branch=1language=2 This could offer a simple non-linear ML model that can label new data with much lower latency than the plain-vanilla kNN versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523227#comment-14523227 ] longbao wang commented on SPARK-2336: - I really agree with you,and i'm already implementing it,but i have a trouble,after build tree successful,you search target points' knn,so parallelize the input target points then search,but i think this have some questions,and one point's knn may in two partitions or more. Approximate k-NN Models for MLLib - Key: SPARK-2336 URL: https://issues.apache.org/jira/browse/SPARK-2336 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Brian Gawalt Priority: Minor Labels: clustering, features After tackling the general k-Nearest Neighbor model as per https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to also offer approximate k-Nearest Neighbor. A promising approach would involve building a kd-tree variant within from each partition, a la http://www.autonlab.org/autonweb/14714.html?branch=1language=2 This could offer a simple non-linear ML model that can label new data with much lower latency than the plain-vanilla kNN versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-2336) Approximate k-NN Models for MLLib
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] longbao wang updated SPARK-2336: Comment: was deleted (was: I really agree with you,and i'm already implementing it,but i have a trouble,after build tree successful,you search target points' knn,so parallelize the input target points then search,but i think this have some questions,and one point's knn may in two partitions or more.) Approximate k-NN Models for MLLib - Key: SPARK-2336 URL: https://issues.apache.org/jira/browse/SPARK-2336 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Brian Gawalt Priority: Minor Labels: clustering, features After tackling the general k-Nearest Neighbor model as per https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to also offer approximate k-Nearest Neighbor. A promising approach would involve building a kd-tree variant within from each partition, a la http://www.autonlab.org/autonweb/14714.html?branch=1language=2 This could offer a simple non-linear ML model that can label new data with much lower latency than the plain-vanilla kNN versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7303) push down project if possible when the child is sort
[ https://issues.apache.org/jira/browse/SPARK-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7303: --- Assignee: Apache Spark push down project if possible when the child is sort Key: SPARK-7303 URL: https://issues.apache.org/jira/browse/SPARK-7303 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Assignee: Apache Spark Optimize the case of `project(_, sort)` , a example is: `select key from (select * from testData order by key) t` optimize it from ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7294) Add a between function in Column
[ https://issues.apache.org/jira/browse/SPARK-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7294: --- Assignee: (was: Apache Spark) Add a between function in Column Key: SPARK-7294 URL: https://issues.apache.org/jira/browse/SPARK-7294 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Labels: starter Column.between(a, b) We can just translate it to c a and c b Should add this for both Python and Scala/Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7294) Add a between function in Column
[ https://issues.apache.org/jira/browse/SPARK-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523359#comment-14523359 ] Apache Spark commented on SPARK-7294: - User 'kaka1992' has created a pull request for this issue: https://github.com/apache/spark/pull/5839 Add a between function in Column Key: SPARK-7294 URL: https://issues.apache.org/jira/browse/SPARK-7294 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Labels: starter Column.between(a, b) We can just translate it to c a and c b Should add this for both Python and Scala/Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7302) SPARK building documentation still mentions building for yarn 0.23
Thomas Graves created SPARK-7302: Summary: SPARK building documentation still mentions building for yarn 0.23 Key: SPARK-7302 URL: https://issues.apache.org/jira/browse/SPARK-7302 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.3.1 Reporter: Thomas Graves as of SPARK-3445 we deprecated using hadoop 0.23. It looks like the building documentation still references it though. We should remove that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7289) Combine Limit and Sort to avoid total ordering
[ https://issues.apache.org/jira/browse/SPARK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Wang updated SPARK-7289: Description: Optimize following sql select key from (select * from testData order by key) t limit 5 from == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] to == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Limit 5 Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] TakeOrdered 5, [key#0 ASC] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] was: Optimize following sql `select key from (select * from testData limit 5) t order by key limit 5` optimize it from ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Limit 5 LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] Limit 5 PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` Summary: Combine Limit and Sort to avoid total ordering (was: push down sort when it's child is Limit) Combine Limit and Sort to avoid total ordering -- Key: SPARK-7289 URL: https://issues.apache.org/jira/browse/SPARK-7289 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Optimize following sql select key from (select * from testData order by key) t limit 5 from == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] to == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Limit 5 Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] TakeOrdered 5, [key#0 ASC] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe,
[jira] [Comment Edited] (SPARK-5246) spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does not resolve
[ https://issues.apache.org/jira/browse/SPARK-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523351#comment-14523351 ] Nick Lipple edited comment on SPARK-5246 at 5/1/15 3:53 PM: is there a workaround for this issue? any reason why the script uses hostname instead of ip address? EDIT: nvm, this issue seems to be addressed: https://github.com/apache/spark/commit/86403f5525782bc9656ab11790f7020baa6b2c1f was (Author: nicklipple): is there a workaround for this issue? any reason why the script uses hostname instead of ip address? spark/spark-ec2.py cannot start Spark master in VPC if local DNS name does not resolve -- Key: SPARK-5246 URL: https://issues.apache.org/jira/browse/SPARK-5246 Project: Spark Issue Type: Bug Components: EC2 Reporter: Vladimir Grigor How to reproduce: 1) http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html should be sufficient to setup VPC for this bug. After you followed that guide, start new instance in VPC, ssh to it (though NAT server) 2) user starts a cluster in VPC: {code} ./spark-ec2 -k key20141114 -i ~/aws/key.pem -s 1 --region=eu-west-1 --spark-version=1.2.0 --instance-type=m1.large --vpc-id=vpc-2e71dd46 --subnet-id=subnet-2571dd4d --zone=eu-west-1a launch SparkByScript Setting up security groups... (omitted for brevity) 10.1.1.62 10.1.1.62: no org.apache.spark.deploy.worker.Worker to stop no org.apache.spark.deploy.master.Master to stop starting org.apache.spark.deploy.master.Master, logging to /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out failed to launch org.apache.spark.deploy.master.Master: at java.net.InetAddress.getLocalHost(InetAddress.java:1469) ... 12 more full log in /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out 10.1.1.62: starting org.apache.spark.deploy.worker.Worker, logging to /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out 10.1.1.62: failed to launch org.apache.spark.deploy.worker.Worker: 10.1.1.62:at java.net.InetAddress.getLocalHost(InetAddress.java:1469) 10.1.1.62:... 12 more 10.1.1.62: full log in /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-10-1-1-62.out [timing] spark-standalone setup: 00h 00m 28s (omitted for brevity) {code} /root/spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-.out {code} Spark assembly has been built with Hive, including Datanucleus jars on classpath Spark Command: /usr/lib/jvm/java-1.7.0/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/sbin/../conf:/root/spark/lib/spark-assembly-1.2.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark/lib/datanucleus-rdbms-3.2.9.jar:/root/spark/lib/datanucleus-core-3.2.10.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 10.1.1.151 --port 7077 --webui-port 8080 15/01/14 07:34:47 INFO master.Master: Registered signal handlers for [TERM, HUP, INT] Exception in thread main java.net.UnknownHostException: ip-10-1-1-151: ip-10-1-1-151: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1473) at org.apache.spark.util.Utils$.findLocalIpAddress(Utils.scala:620) at org.apache.spark.util.Utils$.localIpAddress$lzycompute(Utils.scala:612) at org.apache.spark.util.Utils$.localIpAddress(Utils.scala:612) at org.apache.spark.util.Utils$.localIpAddressHostname$lzycompute(Utils.scala:613) at org.apache.spark.util.Utils$.localIpAddressHostname(Utils.scala:613) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665) at org.apache.spark.util.Utils$$anonfun$localHostName$1.apply(Utils.scala:665) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.localHostName(Utils.scala:665) at org.apache.spark.deploy.master.MasterArguments.init(MasterArguments.scala:27) at org.apache.spark.deploy.master.Master$.main(Master.scala:819) at org.apache.spark.deploy.master.Master.main(Master.scala) Caused by: java.net.UnknownHostException: ip-10-1-1-151: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getLocalHost(InetAddress.java:1469) ... 12 more {code} Problem is that instance
[jira] [Assigned] (SPARK-7149) Defalt system alias problem
[ https://issues.apache.org/jira/browse/SPARK-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7149: --- Assignee: (was: Apache Spark) Defalt system alias problem --- Key: SPARK-7149 URL: https://issues.apache.org/jira/browse/SPARK-7149 Project: Spark Issue Type: Bug Components: SQL Reporter: haiyang Fix default system alias problem. execute the sql statement will cause problem: select substr(value, 0, 2), key as c0 from testData order by c0 org.apache.spark.sql.AnalysisException: Reference 'c0' is ambiguous, could be: c0#42, c0#41.; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3066) Support recommendAll in matrix factorization model
[ https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522825#comment-14522825 ] Apache Spark commented on SPARK-3066: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/5829 Support recommendAll in matrix factorization model -- Key: SPARK-3066 URL: https://issues.apache.org/jira/browse/SPARK-3066 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Xiangrui Meng Assignee: Debasish Das ALS returns a matrix factorization model, which we can use to predict ratings for individual queries as well as small batches. In practice, users may want to compute top-k recommendations offline for all users. It is very expensive but a common problem. We can do some optimization like 1) collect one side (either user or product) and broadcast it as a matrix 2) use level-3 BLAS to compute inner products 3) use Utils.takeOrdered to find top-k -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org