date:20150516


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546846#comment-14546846
 ] 

Sean Owen commented on SPARK-7670:
--

I can't reproduce this on Ubuntu 14 at master either. I think it's a problem 
with your env? others are using Scala 2.11 and I presume without problems. Are 
there earlier errors?

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546793#comment-14546793
 ] 

Fernando Ruben Otero edited comment on SPARK-7670 at 5/16/15 4:11 PM:
--

This docker file reproduces the error I see on my machine

I reproduced the same behavior on a OSX and Ubuntu too


was (Author: zeos):
This docker file reproduces the error on my machine

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546793#comment-14546793
 ] 

Fernando Ruben Otero edited comment on SPARK-7670 at 5/16/15 4:13 PM:
--

This docker file reproduces the error I see on my machine on a clean fedora

Even though the SO should not be the issue, I reproduced the same behavior on a 
OSX and Ubuntu too since I usually work with those environments.



was (Author: zeos):
This docker file reproduces the error I see on my machine

I reproduced the same behavior on a OSX and Ubuntu too

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546875#comment-14546875
 ] 

Sean Owen commented on SPARK-7670:
--

Yeah I see the same thing with your Dockerfile. The strange thing is, the 
problem shows in compiling the Java code.
I'm still tempted to say this must be something funny about this environment 
since I have two other environments where it's fine, but I don't know whether 
it's down to Java version or Scala or what. I have Java 8 / Scala 2.11.6 in 
both cases and it works.
the thing is that the Java code is valid and correct, so at best this is some 
compiler problem? so I'm not sure what to do if it's not a Spark issue and 
doesn't affect the build envs that developers will use to produce artifacts.

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7527) Wrong detection of REPL mode in ClosureCleaner

2015-05-16 Thread Oleksii Kostyliev (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546892#comment-14546892
 ] 

Oleksii Kostyliev commented on SPARK-7527:
--

In the end, due to a bigger complexity decided to be fixed separately from 
SPARK-7233

 Wrong detection of REPL mode in ClosureCleaner
 --

 Key: SPARK-7527
 URL: https://issues.apache.org/jira/browse/SPARK-7527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Oleksii Kostyliev
Priority: Minor

 If REPL class is not present on the classpath, the {{inIntetpreter}} boolean 
 switch shall be {{false}}, not {{true}} at: 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L247



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


 [ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fernando Ruben Otero updated SPARK-7670:

Attachment: Dockerfile

This docker file reproduces the error on my machine

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546794#comment-14546794
 ] 

Fernando Ruben Otero commented on SPARK-7670:
-

I just attacked a docker file that reproduces the error on master (I just run 
it)
I'm doing binary search on commits to find where the build broke, so far 
fc17661475443d9f0a8d28e3439feeb7a7bca67b is building

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6439) Show per-task metrics when you hover over a task in the web UI visualization

2015-05-16 Thread Kay Ousterhout (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout updated SPARK-6439:
--
Assignee: Kousuke Saruta  (was: Kay Ousterhout)

 Show per-task metrics when you hover over a task in the web UI visualization
 

 Key: SPARK-6439
 URL: https://issues.apache.org/jira/browse/SPARK-6439
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kousuke Saruta
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546847#comment-14546847
 ] 

Fernando Ruben Otero edited comment on SPARK-7670 at 5/16/15 4:34 PM:
--

I did the docker file because that generates a clean environment.

I don't see any other error before that.


was (Author: zeos):
I did the docker file because that generates a clean environment

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546847#comment-14546847
 ] 

Fernando Ruben Otero commented on SPARK-7670:
-

I did the docker file because that generates a clean environment

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-4412) Parquet logger cannot be configured

2015-05-16 Thread Yana Kadiyska (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545677#comment-14545677
 ] 

Yana Kadiyska edited comment on SPARK-4412 at 5/16/15 5:11 PM:
---

I would like to reopen as I believe the issue has again regressed in Spark 
1.3.0. This SO thread has a lengthy discussion 
http://stackoverflow.com/questions/30052889/how-to-suppress-parquet-log-messages-in-spark
 but the short summary is that log4j.rootCategory=ERROR, console setting still 
leaks
{quote}
 INFO: parquet.hadoop.InternalParquetRecordReader 
{quote} messages


was (Author: yanakad):
I would like to reopen as I believe the issue has again regressed in Spark 
1.3.0. This SO thread has a lengthy discussion 
http://stackoverflow.com/questions/30052889/how-to-suppress-parquet-log-messages-in-spark
 but the short summary is that log4j.rootCategory=ERROR, console setting still 
leaks INFO: parquet.hadoop.InternalParquetRecordReader messages

 Parquet logger cannot be configured
 ---

 Key: SPARK-4412
 URL: https://issues.apache.org/jira/browse/SPARK-4412
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Jim Carroll

 The Spark ParquetRelation.scala code makes the assumption that the 
 parquet.Log class has already been loaded. If 
 ParquetRelation.enableLogForwarding executes prior to the parquet.Log class 
 being loaded then the code in enableLogForwarding has no affect.
 ParquetRelation.scala attempts to override the parquet logger but, at least 
 currently (and if your application simply reads a parquet file before it does 
 anything else with Parquet), the parquet.Log class hasn't been loaded yet. 
 Therefore the code in ParquetRelation.enableLogForwarding has no affect. If 
 you look at the code in parquet.Log there's a static initializer that needs 
 to be called prior to enableLogForwarding or whatever enableLogForwarding 
 does gets undone by this static initializer.
 The fix would be to force the static initializer to get called in 
 parquet.Log as part of enableForwardLogging. 
 PR will be forthcomming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6657) Fix Python doc build warnings


 [ 
https://issues.apache.org/jira/browse/SPARK-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6657:
-
Fix Version/s: (was: 1.3.1)

 Fix Python doc build warnings
 -

 Key: SPARK-6657
 URL: https://issues.apache.org/jira/browse/SPARK-6657
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib, PySpark, SQL, Streaming
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Trivial

 Reported by [~rxin]
 {code}
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:15: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:16: WARNING: Block quote ends 
 without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:18: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:22: WARNING: Definition list 
 ends without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:28: WARNING: Definition list 
 ends without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:13: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:14: WARNING: Block quote ends 
 without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:16: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:18: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.collect:1: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.orderBy:3: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.sort:3: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.take:1: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/docs/pyspark.streaming.rst:13: WARNING: Title 
 underline too short.
 pyspark.streaming.kafka module
 
 /scratch/rxin/spark/python/docs/pyspark.streaming.rst:13: WARNING: Title 
 underline too short.
 pyspark.streaming.kafka module
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6197) handle json parse exception for eventlog file not finished writing

[
https://issues.apache.org/jira/browse/SPARK-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-6197.
--
Resolution: Fixed
Fix Version/s: 1.3.2
Target Version/s: 1.3.2, 1.4.0 (was: 1.4.0)

I back-ported to 1.3.x. I infer from the previous target and label that this
was the work left to do.

handle json parse exception for eventlog file not finished writing
---

Key: SPARK-6197
URL: https://issues.apache.org/jira/browse/SPARK-6197
Project: Spark
Issue Type: Bug
Components: Web UI
Affects Versions: 1.3.0
Reporter: Zhang, Liye
Assignee: Zhang, Liye
Priority: Minor
Fix For: 1.3.2, 1.4.0

This is a following JIRA for
[SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107]. In
[SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107], webUI can
display event log files that with suffix *.inprogress*. However, the eventlog
file may be not finished writing for some abnormal cases (e.g. Ctrl+C), In
which case, the file maybe truncated in the last line, leading to the line
being not in valid Json format. Which will cause Json parse exception when
reading the file.
For this case, we can just ignore the last line content, since the history
for abnormal cases showed on web is only a reference for user, it can
demonstrate the past status of the app before terminated abnormally (we can
not guarantee the history can show exactly the last moment when app encounter
the abnormal situation).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6197) handle json parse exception for eventlog file not finished writing


 [ 
https://issues.apache.org/jira/browse/SPARK-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6197:
-
Labels:   (was: backport-needed)

 handle json parse exception for eventlog file not finished writing 
 ---

 Key: SPARK-6197
 URL: https://issues.apache.org/jira/browse/SPARK-6197
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Zhang, Liye
Assignee: Zhang, Liye
Priority: Minor
 Fix For: 1.3.2, 1.4.0


 This is a following JIRA for 
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107]. In  
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107], webUI can 
 display event log files that with suffix *.inprogress*. However, the eventlog 
 file may be not finished writing for some abnormal cases (e.g. Ctrl+C), In 
 which case, the file maybe  truncated in the last line, leading to the line 
 being not in valid Json format. Which will cause Json parse exception when 
 reading the file. 
 For this case, we can just ignore the last line content, since the history 
 for abnormal cases showed on web is only a reference for user, it can 
 demonstrate the past status of the app before terminated abnormally (we can 
 not guarantee the history can show exactly the last moment when app encounter 
 the abnormal situation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3928) Support wildcard matches on Parquet files


 [ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3928:
-
Fix Version/s: (was: 1.3.0)

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Assignee: Cheng Lian
Priority: Minor

 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4325) Improve spark-ec2 cluster launch times


 [ 
https://issues.apache.org/jira/browse/SPARK-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4325:
-
Fix Version/s: (was: 1.3.0)

 Improve spark-ec2 cluster launch times
 --

 Key: SPARK-4325
 URL: https://issues.apache.org/jira/browse/SPARK-4325
 Project: Spark
  Issue Type: Umbrella
  Components: EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas
Priority: Minor

 This is an umbrella task to capture several pieces of work related to 
 significantly improving spark-ec2 cluster launch times.
 There are several optimizations we know we can make to [{{setup.sh}} | 
 https://github.com/mesos/spark-ec2/blob/v4/setup.sh] to make cluster launches 
 faster.
 There are also some improvements to the AMIs that will help a lot.
 Potential improvements:
 * Upgrade the Spark AMIs and pre-install tools like Ganglia on them. This 
 will reduce or eliminate SSH wait time and Ganglia init time.
 * Replace instances of {{download; rsync to rest of cluster}} with parallel 
 downloads on all nodes of the cluster.
 * Replace instances of 
  {code}
 for node in $NODES; do
   command
   sleep 0.3
 done
 wait{code}
  with simpler calls to {{pssh}}.
 * Remove the [linear backoff | 
 https://github.com/apache/spark/blob/b32734e12d5197bad26c080e529edd875604c6fb/ec2/spark_ec2.py#L665]
  when we wait for SSH availability now that we are already waiting for EC2 
 status checks to clear before testing SSH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6657) Fix Python doc build warnings


 [ 
https://issues.apache.org/jira/browse/SPARK-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6657:
-
Target Version/s: 1.3.2, 1.4.0  (was: 1.3.1, 1.4.0)

 Fix Python doc build warnings
 -

 Key: SPARK-6657
 URL: https://issues.apache.org/jira/browse/SPARK-6657
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib, PySpark, SQL, Streaming
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Trivial

 Reported by [~rxin]
 {code}
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:15: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:16: WARNING: Block quote ends 
 without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:18: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:22: WARNING: Definition list 
 ends without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainClassifier:28: WARNING: Definition list 
 ends without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:13: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:14: WARNING: Block quote ends 
 without a blank line; unexpected unindent.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:16: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/mllib/tree.py:docstring of 
 pyspark.mllib.tree.RandomForest.trainRegressor:18: ERROR: Unexpected 
 indentation.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.collect:1: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.orderBy:3: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.sort:3: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/pyspark/sql/__init__.py:docstring of 
 pyspark.sql.DataFrame.take:1: WARNING: Inline interpreted text or phrase 
 reference start-string without end-string.
 /scratch/rxin/spark/python/docs/pyspark.streaming.rst:13: WARNING: Title 
 underline too short.
 pyspark.streaming.kafka module
 
 /scratch/rxin/spark/python/docs/pyspark.streaming.rst:13: WARNING: Title 
 underline too short.
 pyspark.streaming.kafka module
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3490) Alleviate port collisions during tests


 [ 
https://issues.apache.org/jira/browse/SPARK-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3490.
--
  Resolution: Fixed
Target Version/s: 1.2.0, 1.1.1, 0.9.3, 1.0.3  (was: 0.9.3, 1.0.3, 1.1.1, 
1.2.0)

I think it's not likely there would be another 0.9 or 1.0 branch release now.

 Alleviate port collisions during tests
 --

 Key: SPARK-3490
 URL: https://issues.apache.org/jira/browse/SPARK-3490
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Andrew Or
Assignee: Andrew Or
 Fix For: 0.9.3, 1.2.0, 1.1.1


 A few tests, notably SparkSubmitSuite and DriverSuite, have been failing 
 intermittently because we open too many ephemeral ports and occasionally 
 can't bind to new ones.
 We should minimize the use of ports during tests where possible. A great 
 candidate is the SparkUI, which is not needed for most tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3987) NNLS generates incorrect result


 [ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3987.
--
Resolution: Fixed

From the discussion it sounds like the issue that this JIRA concerns was 
actually OK.

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.2.0, 1.1.1


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
 9735.061160, -45360.674033, 10634.633559, 0.00, -11652.364691, 
 15039.566630, -1202.539106, -293517.883778, 56991.742991, -183046.845592, 
 148311.355507, 42231.381747, -12524.380088, 219025.288975, 203889.695169,

[jira] [Updated] (SPARK-4258) NPE with new Parquet Filters


 [ 
https://issues.apache.org/jira/browse/SPARK-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4258:
-
Fix Version/s: (was: 1.2.0)

 NPE with new Parquet Filters
 

 Key: SPARK-4258
 URL: https://issues.apache.org/jira/browse/SPARK-4258
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Priority: Critical

 {code}
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 0 in stage 21.0 failed 4 times, most recent failure: Lost task 0.3 in 
 stage 21.0 (TID 160, ip-10-0-247-144.us-west-2.compute.internal): 
 java.lang.NullPointerException: 
 parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:206)
 parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:162)
 
 parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:100)
 
 parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
 parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162)
 
 parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:210)
 
 parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
 parquet.filter2.predicate.Operators$Or.accept(Operators.java:302)
 
 parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:201)
 
 parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
 parquet.filter2.predicate.Operators$And.accept(Operators.java:290)
 
 parquet.filter2.statisticslevel.StatisticsFilter.canDrop(StatisticsFilter.java:52)
 parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:46)
 parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22)
 
 parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108)
 
 parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28)
 
 parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158)
 
 parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
 {code}
 This occurs when reading parquet data encoded with the older version of the 
 library for TPC-DS query 34.  Will work on coming up with a smaller 
 reproduction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2973) Use LocalRelation for all ExecutedCommands, avoid job for take/collect()


 [ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2973:
-
Fix Version/s: (was: 1.2.0)

 Use LocalRelation for all ExecutedCommands, avoid job for take/collect()
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical

 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4412) Parquet logger cannot be configured


 [ 
https://issues.apache.org/jira/browse/SPARK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4412:
-
Fix Version/s: (was: 1.2.0)

 Parquet logger cannot be configured
 ---

 Key: SPARK-4412
 URL: https://issues.apache.org/jira/browse/SPARK-4412
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Jim Carroll

 The Spark ParquetRelation.scala code makes the assumption that the 
 parquet.Log class has already been loaded. If 
 ParquetRelation.enableLogForwarding executes prior to the parquet.Log class 
 being loaded then the code in enableLogForwarding has no affect.
 ParquetRelation.scala attempts to override the parquet logger but, at least 
 currently (and if your application simply reads a parquet file before it does 
 anything else with Parquet), the parquet.Log class hasn't been loaded yet. 
 Therefore the code in ParquetRelation.enableLogForwarding has no affect. If 
 you look at the code in parquet.Log there's a static initializer that needs 
 to be called prior to enableLogForwarding or whatever enableLogForwarding 
 does gets undone by this static initializer.
 The fix would be to force the static initializer to get called in 
 parquet.Log as part of enableForwardLogging. 
 PR will be forthcomming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2750) Add Https support for Web UI


 [ 
https://issues.apache.org/jira/browse/SPARK-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2750:
-
Fix Version/s: (was: 1.0.3)

 Add Https support for Web UI
 

 Key: SPARK-2750
 URL: https://issues.apache.org/jira/browse/SPARK-2750
 Project: Spark
  Issue Type: New Feature
  Components: Web UI
Reporter: Tao Wang
  Labels: https, ssl, webui
 Attachments: exception on yarn when https enabled.txt

   Original Estimate: 96h
  Remaining Estimate: 96h

 Now I try to add https support for web ui using Jetty ssl integration.Below 
 is the plan:
 1.Web UI include Master UI, Worker UI, HistoryServer UI and Spark Ui. User 
 can switch between https and http by configure spark.http.policy in JVM 
 property for each process, while choose http by default.
 2.Web port of Master and worker would be decided in order of launch 
 arguments, JVM property, System Env and default port.
 3.Below is some other configuration items:
 spark.ssl.server.keystore.location The file or URL of the SSL Key store
 spark.ssl.server.keystore.password  The password for the key store
 spark.ssl.server.keystore.keypassword The password (if any) for the specific 
 key within the key store
 spark.ssl.server.keystore.type The type of the key store (default JKS)
 spark.client.https.need-auth True if SSL needs client authentication
 spark.ssl.server.truststore.location The file name or URL of the trust store 
 location
 spark.ssl.server.truststore.password The password for the trust store
 spark.ssl.server.truststore.type The type of the trust store (default JKS)
 Any feedback is welcome!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7685) Handle high imbalanced data or apply weights to different samples in Logistic Regression

2015-05-16 Thread DB Tsai (JIRA)

DB Tsai created SPARK-7685:
--

 Summary: Handle high imbalanced data or apply weights to different 
samples in Logistic Regression
 Key: SPARK-7685
 URL: https://issues.apache.org/jira/browse/SPARK-7685
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: DB Tsai


In fraud detection dataset, almost all the samples are negative while only 
couple of them are positive. This type of high imbalanced data will bias the 
models toward negative resulting poor performance. In python-scikit, they 
provide a correction allowing users to Over-/undersample the samples of each 
class according to the given weights. In auto mode, selects weights inversely 
proportional to class frequencies in the training set. This can be done in a 
more efficient way by multiplying the weights into loss and gradient instead of 
doing actual over/undersampling in the training dataset which is very expensive.

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

On the other hand, some of the training data maybe more important like the 
training samples from tenure users while the training samples from new users 
maybe less important. We should be able to provide another weight: Double 
information in the LabeledPoint to weight them differently in the learning 
algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7685) Handle high imbalanced data and apply weights to different samples in Logistic Regression

2015-05-16 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-7685:
---
Summary: Handle high imbalanced data and apply weights to different samples 
in Logistic Regression  (was: Handle high imbalanced data or apply weights to 
different samples in Logistic Regression)

 Handle high imbalanced data and apply weights to different samples in 
 Logistic Regression
 -

 Key: SPARK-7685
 URL: https://issues.apache.org/jira/browse/SPARK-7685
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: DB Tsai

 In fraud detection dataset, almost all the samples are negative while only 
 couple of them are positive. This type of high imbalanced data will bias the 
 models toward negative resulting poor performance. In python-scikit, they 
 provide a correction allowing users to Over-/undersample the samples of each 
 class according to the given weights. In auto mode, selects weights inversely 
 proportional to class frequencies in the training set. This can be done in a 
 more efficient way by multiplying the weights into loss and gradient instead 
 of doing actual over/undersampling in the training dataset which is very 
 expensive.
 http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
 On the other hand, some of the training data maybe more important like the 
 training samples from tenure users while the training samples from new users 
 maybe less important. We should be able to provide another weight: Double 
 information in the LabeledPoint to weight them differently in the learning 
 algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4556) binary distribution assembly can't run in local mode


 [ 
https://issues.apache.org/jira/browse/SPARK-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4556.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 6186
[https://github.com/apache/spark/pull/6186]

 binary distribution assembly can't run in local mode
 

 Key: SPARK-4556
 URL: https://issues.apache.org/jira/browse/SPARK-4556
 Project: Spark
  Issue Type: Bug
  Components: Build, Deploy, Documentation
Reporter: Sean Busbey
 Fix For: 1.4.0


 After building the binary distribution assembly, the resultant tarball can't 
 be used for local mode.
 {code}
 busbey2-MBA:spark busbey$ mvn -Pbigtop-dist -DskipTests=true package
 [INFO] Scanning for projects...
 ...SNIP...
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] Spark Project Parent POM ... SUCCESS [ 32.227 
 s]
 [INFO] Spark Project Networking ... SUCCESS [ 31.402 
 s]
 [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  8.864 
 s]
 [INFO] Spark Project Core . SUCCESS [15:39 
 min]
 [INFO] Spark Project Bagel  SUCCESS [ 29.470 
 s]
 [INFO] Spark Project GraphX ... SUCCESS [05:20 
 min]
 [INFO] Spark Project Streaming  SUCCESS [11:02 
 min]
 [INFO] Spark Project Catalyst . SUCCESS [11:26 
 min]
 [INFO] Spark Project SQL .. SUCCESS [11:33 
 min]
 [INFO] Spark Project ML Library ... SUCCESS [14:27 
 min]
 [INFO] Spark Project Tools  SUCCESS [ 40.980 
 s]
 [INFO] Spark Project Hive . SUCCESS [11:45 
 min]
 [INFO] Spark Project REPL . SUCCESS [03:15 
 min]
 [INFO] Spark Project Assembly . SUCCESS [04:22 
 min]
 [INFO] Spark Project External Twitter . SUCCESS [ 43.567 
 s]
 [INFO] Spark Project External Flume Sink .. SUCCESS [ 50.367 
 s]
 [INFO] Spark Project External Flume ... SUCCESS [01:41 
 min]
 [INFO] Spark Project External MQTT  SUCCESS [ 40.973 
 s]
 [INFO] Spark Project External ZeroMQ .. SUCCESS [ 54.878 
 s]
 [INFO] Spark Project External Kafka ... SUCCESS [01:23 
 min]
 [INFO] Spark Project Examples . SUCCESS [10:19 
 min]
 [INFO] 
 
 [INFO] BUILD SUCCESS
 [INFO] 
 
 [INFO] Total time: 01:47 h
 [INFO] Finished at: 2014-11-22T02:13:51-06:00
 [INFO] Final Memory: 79M/2759M
 [INFO] 
 
 busbey2-MBA:spark busbey$ cd assembly/target/
 busbey2-MBA:target busbey$ mkdir dist-temp
 busbey2-MBA:target busbey$ tar -C dist-temp -xzf 
 spark-assembly_2.10-1.3.0-SNAPSHOT-dist.tar.gz 
 busbey2-MBA:target busbey$ cd dist-temp/
 busbey2-MBA:dist-temp busbey$ ./bin/spark-shell
 ls: 
 /Users/busbey/projects/spark/assembly/target/dist-temp/assembly/target/scala-2.10:
  No such file or directory
 Failed to find Spark assembly in 
 /Users/busbey/projects/spark/assembly/target/dist-temp/assembly/target/scala-2.10
 You need to build Spark before running this program.
 {code}
 It looks like the classpath calculations in {{bin/compute_classpath.sh}} 
 don't handle it.
 If I move all of the spark-*.jar files from the top level into the lib folder 
 and touch the RELEASE file, then the spark shell launches in local mode 
 normally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4556) Document that make-distribution.sh is required to make a runnable distribution


 [ 
https://issues.apache.org/jira/browse/SPARK-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4556:
-
Component/s: (was: Spark Shell)
 Documentation
 Deploy
   Priority: Minor  (was: Major)
   Assignee: Sean Owen
 Issue Type: Improvement  (was: Bug)
Summary: Document that make-distribution.sh is required to make a 
runnable distribution  (was: binary distribution assembly can't run in local 
mode)

 Document that make-distribution.sh is required to make a runnable distribution
 --

 Key: SPARK-4556
 URL: https://issues.apache.org/jira/browse/SPARK-4556
 Project: Spark
  Issue Type: Improvement
  Components: Build, Deploy, Documentation
Reporter: Sean Busbey
Assignee: Sean Owen
Priority: Minor
 Fix For: 1.4.0


 After building the binary distribution assembly, the resultant tarball can't 
 be used for local mode.
 {code}
 busbey2-MBA:spark busbey$ mvn -Pbigtop-dist -DskipTests=true package
 [INFO] Scanning for projects...
 ...SNIP...
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] Spark Project Parent POM ... SUCCESS [ 32.227 
 s]
 [INFO] Spark Project Networking ... SUCCESS [ 31.402 
 s]
 [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  8.864 
 s]
 [INFO] Spark Project Core . SUCCESS [15:39 
 min]
 [INFO] Spark Project Bagel  SUCCESS [ 29.470 
 s]
 [INFO] Spark Project GraphX ... SUCCESS [05:20 
 min]
 [INFO] Spark Project Streaming  SUCCESS [11:02 
 min]
 [INFO] Spark Project Catalyst . SUCCESS [11:26 
 min]
 [INFO] Spark Project SQL .. SUCCESS [11:33 
 min]
 [INFO] Spark Project ML Library ... SUCCESS [14:27 
 min]
 [INFO] Spark Project Tools  SUCCESS [ 40.980 
 s]
 [INFO] Spark Project Hive . SUCCESS [11:45 
 min]
 [INFO] Spark Project REPL . SUCCESS [03:15 
 min]
 [INFO] Spark Project Assembly . SUCCESS [04:22 
 min]
 [INFO] Spark Project External Twitter . SUCCESS [ 43.567 
 s]
 [INFO] Spark Project External Flume Sink .. SUCCESS [ 50.367 
 s]
 [INFO] Spark Project External Flume ... SUCCESS [01:41 
 min]
 [INFO] Spark Project External MQTT  SUCCESS [ 40.973 
 s]
 [INFO] Spark Project External ZeroMQ .. SUCCESS [ 54.878 
 s]
 [INFO] Spark Project External Kafka ... SUCCESS [01:23 
 min]
 [INFO] Spark Project Examples . SUCCESS [10:19 
 min]
 [INFO] 
 
 [INFO] BUILD SUCCESS
 [INFO] 
 
 [INFO] Total time: 01:47 h
 [INFO] Finished at: 2014-11-22T02:13:51-06:00
 [INFO] Final Memory: 79M/2759M
 [INFO] 
 
 busbey2-MBA:spark busbey$ cd assembly/target/
 busbey2-MBA:target busbey$ mkdir dist-temp
 busbey2-MBA:target busbey$ tar -C dist-temp -xzf 
 spark-assembly_2.10-1.3.0-SNAPSHOT-dist.tar.gz 
 busbey2-MBA:target busbey$ cd dist-temp/
 busbey2-MBA:dist-temp busbey$ ./bin/spark-shell
 ls: 
 /Users/busbey/projects/spark/assembly/target/dist-temp/assembly/target/scala-2.10:
  No such file or directory
 Failed to find Spark assembly in 
 /Users/busbey/projects/spark/assembly/target/dist-temp/assembly/target/scala-2.10
 You need to build Spark before running this program.
 {code}
 It looks like the classpath calculations in {{bin/compute_classpath.sh}} 
 don't handle it.
 If I move all of the spark-*.jar files from the top level into the lib folder 
 and touch the RELEASE file, then the spark shell launches in local mode 
 normally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7672) Number format exception with spark.kryoserializer.buffer.mb


 [ 
https://issues.apache.org/jira/browse/SPARK-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7672:
-
Assignee: Nishkam Ravi

 Number format exception with spark.kryoserializer.buffer.mb
 ---

 Key: SPARK-7672
 URL: https://issues.apache.org/jira/browse/SPARK-7672
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Nishkam Ravi
Assignee: Nishkam Ravi
Priority: Critical
 Fix For: 1.4.0


 With spark.kryoserializer.buffer.mb  1000 : 
 Exception in thread main java.lang.NumberFormatException: Size must be 
 specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), 
 tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
 Fractional values are not supported. Input was: 100.0
 at 
 org.apache.spark.network.util.JavaUtils.parseByteString(JavaUtils.java:238)
 at 
 org.apache.spark.network.util.JavaUtils.byteStringAsKb(JavaUtils.java:259)
 at org.apache.spark.util.Utils$.byteStringAsKb(Utils.scala:1037)
 at org.apache.spark.SparkConf.getSizeAsKb(SparkConf.scala:245)
 at 
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:53)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:269)
 at 
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
 at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
 at 
 org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7672) Number format exception with spark.kryoserializer.buffer.mb


 [ 
https://issues.apache.org/jira/browse/SPARK-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7672.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 6198
[https://github.com/apache/spark/pull/6198]

 Number format exception with spark.kryoserializer.buffer.mb
 ---

 Key: SPARK-7672
 URL: https://issues.apache.org/jira/browse/SPARK-7672
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Nishkam Ravi
Priority: Critical
 Fix For: 1.4.0


 With spark.kryoserializer.buffer.mb  1000 : 
 Exception in thread main java.lang.NumberFormatException: Size must be 
 specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), 
 tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
 Fractional values are not supported. Input was: 100.0
 at 
 org.apache.spark.network.util.JavaUtils.parseByteString(JavaUtils.java:238)
 at 
 org.apache.spark.network.util.JavaUtils.byteStringAsKb(JavaUtils.java:259)
 at org.apache.spark.util.Utils$.byteStringAsKb(Utils.scala:1037)
 at org.apache.spark.SparkConf.getSizeAsKb(SparkConf.scala:245)
 at 
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:53)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:269)
 at 
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
 at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
 at 
 org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7661) Support for dynamic allocation of executors in Kinesis Spark Streaming

2015-05-16 Thread Tathagata Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546617#comment-14546617
]

Tathagata Das commented on SPARK-7661:
--

N+1 is used in the example, but isnt really the suggested recommended way. Here
is how it works. You have to give X + Y cores, where X = number of Kinesis
streams/receivers and Y = number of cores for processing the data. The X
receivers will in collaboration with each other receive data from N shards. If
you expect your N to vary from 10 to 20, then having X = 15 isnt a bad idea. At
N = 20, the 15 receivers wil distribute the work among themselves. And Y should
be such that your systems can process the data as fast as it is received.

Support for dynamic allocation of executors in Kinesis Spark Streaming
--

Key: SPARK-7661
URL: https://issues.apache.org/jira/browse/SPARK-7661
Project: Spark
Issue Type: New Feature
Components: Streaming
Affects Versions: 1.3.1
Environment: AWS-EMR
Reporter: Murtaza Kanchwala

Currently the no. of cores is (N + 1), where N is no. of shards in a Kinesis
Stream.
My Requirement is that if I use this Resharding util for Amazon Kinesis :
Amazon Kinesis Resharding :
https://github.com/awslabs/amazon-kinesis-scaling-utils
Then there should be some way to allocate executors on the basis of no. of
shards directly (for Spark Streaming only).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7671) Fix wrong URLs in MLlib Data Types Documentation

2015-05-16 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546622#comment-14546622
 ] 

Favio Vázquez commented on SPARK-7671:
--

Thanks [~josephkb]  and [~srowen] for fixing that 

 Fix wrong URLs in MLlib Data Types Documentation
 

 Key: SPARK-7671
 URL: https://issues.apache.org/jira/browse/SPARK-7671
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib
 Environment: Ubuntu 14.04. Apache Mesos in cluster mode with HDFS 
 from cloudera 2.6.0-cdh5.4.0.
Reporter: Favio Vázquez
Assignee: Favio Vázquez
Priority: Trivial
  Labels: Documentation,, Fix, MLlib,, URL
 Fix For: 1.4.0


 There is a mistake in the URL of Matrices in the MLlib Data Types 
 documentation (Local matrix scala section), the URL points to 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices
  which is a mistake, since Matrices is an object that implements factory 
 methods for Matrix that does not have a companion class. The correct link 
 should point to 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$
 There is another mistake, in the Local Vector section in Scala, Java and 
 Python
 In the Scala section the URL of Vectors points to the trait Vector 
 (https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector)
  and not to the factory methods implemented in Vectors. 
 The correct link should be: 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$
 In the Java section the URL of Vectors points to the Interface Vector 
 (https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html)
  and not to the Class Vectors
 The correct link should be:
 https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vectors.html
 In the Python section the URL of Vectors points to the class Vector 
 (https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector)
  and not the Class Vectors
 The correct link should be:
 https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7661) Support for dynamic allocation of executors in Kinesis Spark Streaming

2015-05-16 Thread Murtaza Kanchwala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546623#comment-14546623
 ] 

Murtaza Kanchwala commented on SPARK-7661:
--

Ok, Let me try your solution as well with this scaling util.

 Support for dynamic allocation of executors in Kinesis Spark Streaming
 --

 Key: SPARK-7661
 URL: https://issues.apache.org/jira/browse/SPARK-7661
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.3.1
 Environment: AWS-EMR
Reporter: Murtaza Kanchwala

 Currently the no. of cores is (N + 1), where N is no. of shards in a Kinesis 
 Stream.
 My Requirement is that if I use this Resharding util for Amazon Kinesis :
 Amazon Kinesis Resharding : 
 https://github.com/awslabs/amazon-kinesis-scaling-utils
 Then there should be some way to allocate executors on the basis of no. of 
 shards directly (for Spark Streaming only).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7654) DataFrameReader and DataFrameWriter for input/output API

2015-05-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546632#comment-14546632
 ] 

Apache Spark commented on SPARK-7654:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/6210

 DataFrameReader and DataFrameWriter for input/output API
 

 Key: SPARK-7654
 URL: https://issues.apache.org/jira/browse/SPARK-7654
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Blocker

 We have a proliferation of save options now. It'd make more sense to have a 
 builder pattern for write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7646) Create table support to JDBC Datasource


 [ 
https://issues.apache.org/jira/browse/SPARK-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-7646:
---
Labels: 1.4.1  (was: )

 Create table support to JDBC Datasource
 ---

 Key: SPARK-7646
 URL: https://issues.apache.org/jira/browse/SPARK-7646
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Venkata Ramana G
  Labels: 1.4.1

 Support Create table into JDBCDataSource. Following are usage examples
 {code}
 df.saveAsTable(
 testcreate2,
 org.apache.spark.sql.jdbc,
  org.apache.spark.sql.SaveMode.Overwrite,
  Map(url-s$url, dbtable-testcreate2, user-xx, 
 password-xx, driver-com.h2.Driver)
 )
 {code}
  if table doesnot exists, this should create a table and write dataframe 
 content to table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7661) Support for dynamic allocation of executors in Kinesis Spark Streaming

2015-05-16 Thread Murtaza Kanchwala (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Murtaza Kanchwala updated SPARK-7661:
-
Description:
Currently the no. of cores is (N + 1), where N is no. of shards in a Kinesis
Stream.

My Requirement is that if I use this Resharding util for Amazon Kinesis :

Amazon Kinesis Resharding :
https://github.com/awslabs/amazon-kinesis-scaling-utils

Then there should be some way to allocate executors on the basis of no. of
shards directly (for Spark Streaming only).

was:
Currently the logic for the no. of executors is (N + 1), where N is no. of
shards in a Kinesis Stream.

My Requirement is that if I use this Resharding util for Amazon Kinesis :

Amazon Kinesis Resharding :
https://github.com/awslabs/amazon-kinesis-scaling-utils

Then there should be some way to allocate executors on the basis of no. of
shards directly (for Spark Streaming only).

Support for dynamic allocation of executors in Kinesis Spark Streaming
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7655) Akka timeout exception


 [ 
https://issues.apache.org/jira/browse/SPARK-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-7655.

   Resolution: Fixed
Fix Version/s: 1.4.0

 Akka timeout exception
 --

 Key: SPARK-7655
 URL: https://issues.apache.org/jira/browse/SPARK-7655
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Yin Huai
Assignee: Shixiong Zhu
Priority: Blocker
 Fix For: 1.4.0


 I got the following exception when I was running a query with broadcast join.
 {code}
 15/05/15 01:15:49 [WARN] AkkaRpcEndpointRef: Error sending message [message = 
 UpdateBlockInfo(BlockManagerId(driver, 10.0.171.162, 
 54870),broadcast_758_piece0,StorageLevel(false, false, false, false, 
 1),0,0,0)] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at 
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
   at 
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
   at 
 org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58)
   at 
 org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:374)
   at 
 org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:350)
   at 
 org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1107)
   at 
 org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1083)
   at 
 org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1083)
   at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
   at 
 org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1083)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:78)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7655) Akka timeout exception from ask and table broadcast


 [ 
https://issues.apache.org/jira/browse/SPARK-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-7655:
---
Summary: Akka timeout exception from ask and table broadcast  (was: Akka 
timeout exception)

 Akka timeout exception from ask and table broadcast
 ---

 Key: SPARK-7655
 URL: https://issues.apache.org/jira/browse/SPARK-7655
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: Yin Huai
Assignee: Shixiong Zhu
Priority: Blocker
 Fix For: 1.4.0


 I got the following exception when I was running a query with broadcast join.
 {code}
 15/05/15 01:15:49 [WARN] AkkaRpcEndpointRef: Error sending message [message = 
 UpdateBlockInfo(BlockManagerId(driver, 10.0.171.162, 
 54870),broadcast_758_piece0,StorageLevel(false, false, false, false, 
 1),0,0,0)] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at 
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
   at 
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
   at 
 org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58)
   at 
 org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:374)
   at 
 org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:350)
   at 
 org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1107)
   at 
 org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1083)
   at 
 org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1083)
   at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
   at 
 org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1083)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
   at 
 org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:78)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
   at 
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7654) DataFrameReader and DataFrameWriter for input/output API

2015-05-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546647#comment-14546647
 ] 

Apache Spark commented on SPARK-7654:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/6211

 DataFrameReader and DataFrameWriter for input/output API
 

 Key: SPARK-7654
 URL: https://issues.apache.org/jira/browse/SPARK-7654
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Blocker

 We have a proliferation of save options now. It'd make more sense to have a 
 builder pattern for write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7661) Support for dynamic allocation of executors in Kinesis Spark Streaming

2015-05-16 Thread Murtaza Kanchwala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546598#comment-14546598
 ] 

Murtaza Kanchwala commented on SPARK-7661:
--

Ok I'll correct my terms, My case is exactly like this 

https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3c30d8e3e3-95db-492b-8b49-73a99d587...@gmail.com%3E

Just the difference is that my no. of spark shards are updating by this utility 
provided by AWS, and when the no. of shards increase my Spark streaming 
consumer got's hunged up and and it goes in the waiting state

 Support for dynamic allocation of executors in Kinesis Spark Streaming
 --

 Key: SPARK-7661
 URL: https://issues.apache.org/jira/browse/SPARK-7661
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.3.1
 Environment: AWS-EMR
Reporter: Murtaza Kanchwala

 Currently the no. of cores is (N + 1), where N is no. of shards in a Kinesis 
 Stream.
 My Requirement is that if I use this Resharding util for Amazon Kinesis :
 Amazon Kinesis Resharding : 
 https://github.com/awslabs/amazon-kinesis-scaling-utils
 Then there should be some way to allocate executors on the basis of no. of 
 shards directly (for Spark Streaming only).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7671) Fix wrong URLs in MLlib Data Types Documentation


 [ 
https://issues.apache.org/jira/browse/SPARK-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7671.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 6196
[https://github.com/apache/spark/pull/6196]

 Fix wrong URLs in MLlib Data Types Documentation
 

 Key: SPARK-7671
 URL: https://issues.apache.org/jira/browse/SPARK-7671
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib
 Environment: Ubuntu 14.04. Apache Mesos in cluster mode with HDFS 
 from cloudera 2.6.0-cdh5.4.0.
Reporter: Favio Vázquez
Priority: Trivial
  Labels: Documentation,, Fix, MLlib,, URL
 Fix For: 1.4.0


 There is a mistake in the URL of Matrices in the MLlib Data Types 
 documentation (Local matrix scala section), the URL points to 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices
  which is a mistake, since Matrices is an object that implements factory 
 methods for Matrix that does not have a companion class. The correct link 
 should point to 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$
 There is another mistake, in the Local Vector section in Scala, Java and 
 Python
 In the Scala section the URL of Vectors points to the trait Vector 
 (https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector)
  and not to the factory methods implemented in Vectors. 
 The correct link should be: 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$
 In the Java section the URL of Vectors points to the Interface Vector 
 (https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html)
  and not to the Class Vectors
 The correct link should be:
 https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vectors.html
 In the Python section the URL of Vectors points to the class Vector 
 (https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector)
  and not the Class Vectors
 The correct link should be:
 https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7671) Fix wrong URLs in MLlib Data Types Documentation


 [ 
https://issues.apache.org/jira/browse/SPARK-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7671:
-
Assignee: Favio Vázquez

 Fix wrong URLs in MLlib Data Types Documentation
 

 Key: SPARK-7671
 URL: https://issues.apache.org/jira/browse/SPARK-7671
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib
 Environment: Ubuntu 14.04. Apache Mesos in cluster mode with HDFS 
 from cloudera 2.6.0-cdh5.4.0.
Reporter: Favio Vázquez
Assignee: Favio Vázquez
Priority: Trivial
  Labels: Documentation,, Fix, MLlib,, URL
 Fix For: 1.4.0


 There is a mistake in the URL of Matrices in the MLlib Data Types 
 documentation (Local matrix scala section), the URL points to 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices
  which is a mistake, since Matrices is an object that implements factory 
 methods for Matrix that does not have a companion class. The correct link 
 should point to 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$
 There is another mistake, in the Local Vector section in Scala, Java and 
 Python
 In the Scala section the URL of Vectors points to the trait Vector 
 (https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector)
  and not to the factory methods implemented in Vectors. 
 The correct link should be: 
 https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$
 In the Java section the URL of Vectors points to the Interface Vector 
 (https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html)
  and not to the Class Vectors
 The correct link should be:
 https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vectors.html
 In the Python section the URL of Vectors points to the class Vector 
 (https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector)
  and not the Class Vectors
 The correct link should be:
 https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


[ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546628#comment-14546628
 ] 

Sean Owen commented on SPARK-7670:
--

I can't reproduce this. Master builds fine for me with the same commands.

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Fix For: 1.4.0


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7654) DataFrameReader and DataFrameWriter for input/output API


[ 
https://issues.apache.org/jira/browse/SPARK-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546637#comment-14546637
 ] 

Reynold Xin commented on SPARK-7654:


TODOs:

- Move insertInto also into write.
- Python API.
- Update usage everywhere outside SQL.
- Update programming guide.

 DataFrameReader and DataFrameWriter for input/output API
 

 Key: SPARK-7654
 URL: https://issues.apache.org/jira/browse/SPARK-7654
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Blocker

 We have a proliferation of save options now. It'd make more sense to have a 
 builder pattern for write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7682) Size of distributed grids still limited by cPickle

2015-05-16 Thread Toby Potter (JIRA)

Toby Potter created SPARK-7682:
--

 Summary: Size of distributed grids still limited by cPickle 
 Key: SPARK-7682
 URL: https://issues.apache.org/jira/browse/SPARK-7682
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.3.1
 Environment: Redhat Enterprise Linux 6.5, Spark 1.3.1 standalone in 
cluster mode, 2 nodes with 64 GB spark slaves, Python 2.7.6
Reporter: Toby Potter
Priority: Minor


I'm trying to explore the possibilities of writing a fault-tolerant distributed 
computing engine for multidimensional arrays. I'm finding that the Python 
cPickle serializer is limiting the size of Numpy arrays that I can distribute 
over the cluster.

My example code is below

#!/usr/bin/env python

#Python app to use spark
from pyspark import SparkContext, SparkConf
import numpy

appName=Spark Test App

# Create a spark context
conf = SparkConf().setAppName(appName)

# Set memory
conf = SparkConf().set(spark.executor.memory, 32g)
sc = SparkContext(conf=conf)

# Make array
grid=numpy.zeros((1024,1024,1024))

# Now parallelise and persist the data
rdd = sc.parallelize([(srcw, grid)])

# Make the data persist in memory
rdd_rdd.persist()

When I run the code I get the following error

Traceback (most recent call last):
  File test_app.py, line 20, in module
rdd = sc.parallelize([(srcw, grid)])
  File /spark/1.3.1/python/pyspark/context.py, line 341, in parallelize
serializer.dump_stream(c, tempFile)
  File /spark/1.3.1/python/pyspark/serializers.py, line 208, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
  File /spark/1.3.1/python/pyspark/serializers.py, line 127, in dump_stream
self._write_with_length(obj, stream)
  File /spark/1.3.1/python/pyspark/serializers.py, line 137, in 
_write_with_length
serialized = self.dumps(obj)
  File /spark/1.3.1/python/pyspark/serializers.py, line 403, in dumps
return cPickle.dumps(obj, 2)
SystemError: error return without exception set






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5948) Support writing to partitioned table for the Parquet data source


 [ 
https://issues.apache.org/jira/browse/SPARK-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5948:
-
Assignee: Michael Armbrust

 Support writing to partitioned table for the Parquet data source
 

 Key: SPARK-5948
 URL: https://issues.apache.org/jira/browse/SPARK-5948
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Lian
Assignee: Michael Armbrust
Priority: Blocker
 Fix For: 1.4.0


 In 1.3.0, we added support for reading partitioned tables declared in Hive 
 metastore for the Parquet data source. However, writing to partitioned tables 
 is not supported yet. This feature should probably built upon SPARK-5947.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5281) Registering table on RDD is giving MissingRequirementError


 [ 
https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5281:
-
Assignee: Iulian Dragos

 Registering table on RDD is giving MissingRequirementError
 --

 Key: SPARK-5281
 URL: https://issues.apache.org/jira/browse/SPARK-5281
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0, 1.3.1
Reporter: sarsol
Assignee: Iulian Dragos
Priority: Critical
 Fix For: 1.4.0


 Application crashes on this line  {{rdd.registerTempTable(temp)}}  in 1.2 
 version when using sbt or Eclipse SCALA IDE
 Stacktrace:
 {code}
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with 
 primordial classloader with boot classpath 
 [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program
  Files\Java\jre7\lib\resources.jar;C:\Program 
 Files\Java\jre7\lib\rt.jar;C:\Program 
 Files\Java\jre7\lib\sunrsasign.jar;C:\Program 
 Files\Java\jre7\lib\jsse.jar;C:\Program 
 Files\Java\jre7\lib\jce.jar;C:\Program 
 Files\Java\jre7\lib\charsets.jar;C:\Program 
 Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found.
   at 
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
   at 
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
   at 
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
   at scala.reflect.api.Universe.typeOf(Universe.scala:59)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
   at 
 com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43)
   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
   at 
 scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
   at scala.App$class.main(App.scala:71)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5632) not able to resolve dot('.') in field name


 [ 
https://issues.apache.org/jira/browse/SPARK-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5632:
-
Assignee: Wenchen Fan

 not able to resolve dot('.') in field name
 --

 Key: SPARK-5632
 URL: https://issues.apache.org/jira/browse/SPARK-5632
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.2.0, 1.3.0
 Environment: Spark cluster: EC2 m1.small + Spark 1.2.0
 Cassandra cluster: EC2 m3.xlarge + Cassandra 2.1.2
Reporter: Lishu Liu
Assignee: Wenchen Fan
Priority: Blocker
 Fix For: 1.4.0


 My cassandra table task_trace has a field sm.result which contains dot in the 
 name. So SQL tried to look up sm instead of full name 'sm.result'. 
 Here is my code: 
 {code}
 scala import org.apache.spark.sql.cassandra.CassandraSQLContext
 scala val cc = new CassandraSQLContext(sc)
 scala val task_trace = cc.jsonFile(/task_trace.json)
 scala task_trace.registerTempTable(task_trace)
 scala cc.setKeyspace(cerberus_data_v4)
 scala val res = cc.sql(SELECT received_datetime, task_body.cerberus_id, 
 task_body.sm.result FROM task_trace WHERE task_id = 
 'fff7304e-9984-4b45-b10c-0423a96745ce')
 res: org.apache.spark.sql.SchemaRDD = 
 SchemaRDD[57] at RDD at SchemaRDD.scala:108
 == Query Plan ==
 == Physical Plan ==
 java.lang.RuntimeException: No such struct field sm in cerberus_batch_id, 
 cerberus_id, couponId, coupon_code, created, description, domain, expires, 
 message_id, neverShowAfter, neverShowBefore, offerTitle, screenshots, 
 sm.result, sm.task, startDate, task_id, url, uuid, validationDateTime, 
 validity
 {code}
 The full schema look like this:
 {code}
 scala task_trace.printSchema()
 root
  \|-- received_datetime: long (nullable = true)
  \|-- task_body: struct (nullable = true)
  \|\|-- cerberus_batch_id: string (nullable = true)
  \|\|-- cerberus_id: string (nullable = true)
  \|\|-- couponId: integer (nullable = true)
  \|\|-- coupon_code: string (nullable = true)
  \|\|-- created: string (nullable = true)
  \|\|-- description: string (nullable = true)
  \|\|-- domain: string (nullable = true)
  \|\|-- expires: string (nullable = true)
  \|\|-- message_id: string (nullable = true)
  \|\|-- neverShowAfter: string (nullable = true)
  \|\|-- neverShowBefore: string (nullable = true)
  \|\|-- offerTitle: string (nullable = true)
  \|\|-- screenshots: array (nullable = true)
  \|\|\|-- element: string (containsNull = false)
  \|\|-- sm.result: struct (nullable = true)
  \|\|\|-- cerberus_batch_id: string (nullable = true)
  \|\|\|-- cerberus_id: string (nullable = true)
  \|\|\|-- code: string (nullable = true)
  \|\|\|-- couponId: integer (nullable = true)
  \|\|\|-- created: string (nullable = true)
  \|\|\|-- description: string (nullable = true)
  \|\|\|-- domain: string (nullable = true)
  \|\|\|-- expires: string (nullable = true)
  \|\|\|-- message_id: string (nullable = true)
  \|\|\|-- neverShowAfter: string (nullable = true)
  \|\|\|-- neverShowBefore: string (nullable = true)
  \|\|\|-- offerTitle: string (nullable = true)
  \|\|\|-- result: struct (nullable = true)
  \|\|\|\|-- post: struct (nullable = true)
  \|\|\|\|\|-- alchemy_out_of_stock: struct (nullable = true)
  \|\|\|\|\|\|-- ci: double (nullable = true)
  \|\|\|\|\|\|-- value: boolean (nullable = true)
  \|\|\|\|\|-- meta: struct (nullable = true)
  \|\|\|\|\|\|-- None_tx_value: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- exceptions: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- no_input_value: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- not_mapped: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- not_transformed: array (nullable = true)
  \|\|\|\|\|\|\|-- element: array (containsNull = 
 false)
  \|\|\|\|\|\|\|\|-- element: string (containsNull 
 = false)
  \|\|\|\|\|-- now_price_checkout: struct (nullable = true)
  \|\|\|\|\|\|-- ci: double (nullable = true)
  \|\|\|\|\|\|-- value: double (nullable = true)
  \|\|\|\|\|-- shipping_price: struct (nullable = true)
  \|\|\|\|\|

[jira] [Updated] (SPARK-4699) Make caseSensitive configurable in Analyzer.scala


 [ 
https://issues.apache.org/jira/browse/SPARK-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4699:
-
Assignee: Fei Wang

 Make caseSensitive configurable in Analyzer.scala
 -

 Key: SPARK-4699
 URL: https://issues.apache.org/jira/browse/SPARK-4699
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jacky Li
Assignee: Fei Wang
 Fix For: 1.4.0


 Currently, case sensitivity is true by default in Analyzer. It should be 
 configurable by setting SQLConf in the client application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5947) First class partitioning support in data sources API


 [ 
https://issues.apache.org/jira/browse/SPARK-5947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5947:
-
Assignee: Michael Armbrust

 First class partitioning support in data sources API
 

 Key: SPARK-5947
 URL: https://issues.apache.org/jira/browse/SPARK-5947
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Lian
Assignee: Michael Armbrust
Priority: Blocker
 Fix For: 1.4.0


 For file system based data sources, implementing Hive style partitioning 
 support can be complex and error prone. To be specific, partitioning support 
 include:
 # Partition discovery:  Given a directory organized similar to Hive 
 partitions, discover the directory structure and partitioning information 
 automatically, including partition column names, data types, and values.
 # Reading from partitioned tables
 # Writing to partitioned tables
 It would be good to have first class partitioning support in the data sources 
 API. For example, add a {{FileBasedScan}} trait with callbacks and default 
 implementations for these features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6734) Support GenericUDTF.close for Generate


 [ 
https://issues.apache.org/jira/browse/SPARK-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6734:
-
Assignee: Cheng Hao

 Support GenericUDTF.close for Generate
 --

 Key: SPARK-6734
 URL: https://issues.apache.org/jira/browse/SPARK-6734
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Cheng Hao
Assignee: Cheng Hao
 Fix For: 1.4.0


 Some third-party UDTF extension, will generate more rows in the 
 GenericUDTF.close() method, which is supported by Hive.
 https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
 However, Spark SQL ignores the GenericUDTF.close(), and it causes bug while 
 porting job from Hive to Spark SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7109) Push down left side filter for left semi join


 [ 
https://issues.apache.org/jira/browse/SPARK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7109:
-
Assignee: Fei Wang

 Push down left side filter for left semi join
 -

 Key: SPARK-7109
 URL: https://issues.apache.org/jira/browse/SPARK-7109
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Fei Wang
Assignee: Fei Wang
 Fix For: 1.4.0


 now in spark sql optimizer we only push down right side filter, actually we 
 can push down left side filter for left semi join



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6439) Show per-task metrics when you hover over a task in the web UI visualization


 [ 
https://issues.apache.org/jira/browse/SPARK-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6439:
-
Assignee: Kay Ousterhout

 Show per-task metrics when you hover over a task in the web UI visualization
 

 Key: SPARK-6439
 URL: https://issues.apache.org/jira/browse/SPARK-6439
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6418) Add simple per-stage visualization to the UI


 [ 
https://issues.apache.org/jira/browse/SPARK-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6418:
-
Assignee: Kousuke Saruta

 Add simple per-stage visualization to the UI
 

 Key: SPARK-6418
 URL: https://issues.apache.org/jira/browse/SPARK-6418
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kousuke Saruta
 Fix For: 1.4.0

 Attachments: Screen Shot 2015-03-18 at 6.13.04 PM.png


 Visualizing how tasks in a stage spend their time can be very helpful to 
 understanding performance.  Many folks have started using the visualization 
 tools here: https://github.com/kayousterhout/trace-analysis (see the README 
 at the bottom) to analyze their jobs after they've finished running, but it 
 would be great if this functionality were natively integrated into Spark's UI.
 I'd propose adding a relatively simple visualization to the stage detail 
 page, that's hidden by default but that users can view by clicking on a 
 drop-down menu.  The plan is to implement this using D3; a mock up of how 
 this would look (that uses D3) is attached.  One change we'll make for the 
 initial implementation, compared to the attached visualization, is tasks will 
 be sorted by start time.
 This is intended to be a much simpler and more limited version of SPARK-3468



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7437) Fold literal in (item1, item2, ..., literal, ...) into true or false directly


 [ 
https://issues.apache.org/jira/browse/SPARK-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7437:
-
Assignee: Zhongshuai Pei

 Fold literal in (item1, item2, ..., literal, ...) into true or false 
 directly
 ---

 Key: SPARK-7437
 URL: https://issues.apache.org/jira/browse/SPARK-7437
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Zhongshuai Pei
Assignee: Zhongshuai Pei
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7504) NullPointerException when initializing SparkContext in YARN-cluster mode


 [ 
https://issues.apache.org/jira/browse/SPARK-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7504:
-
Assignee: Zoltán Zvara

 NullPointerException when initializing SparkContext in YARN-cluster mode
 

 Key: SPARK-7504
 URL: https://issues.apache.org/jira/browse/SPARK-7504
 Project: Spark
  Issue Type: Bug
  Components: Deploy, YARN
Reporter: Zoltán Zvara
Assignee: Zoltán Zvara
  Labels: deployment, yarn, yarn-client
 Fix For: 1.4.0


 It is not clear for most users that, while running Spark on YARN a 
 {{SparkContext}} with a given execution plan can be run locally as 
 {{yarn-client}}, but can not deploy itself to the cluster. This is currently 
 performed using {{org.apache.spark.deploy.yarn.Client}}. {color:gray} I think 
 we should support deployment through {{SparkContext}}, but this is not the 
 point I wish to make here. {color}
 Configuring a {{SparkContext}} to deploy itself currently will yield an 
 {{ERROR}} while accessing {{spark.yarn.app.id}} in  
 {{YarnClusterSchedulerBackend}}, and after that a {{NullPointerException}} 
 while referencing the {{ApplicationMaster}} instance.
 Spark should clearly inform the user that it might be running in 
 {{yarn-cluster}} mode without a proper submission using {{Client}} and that 
 deploying is not supported directly from {{SparkContext}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7595) Window will cause resolve failed with self join


 [ 
https://issues.apache.org/jira/browse/SPARK-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7595:
-
Assignee: Weizhong

 Window will cause resolve failed with self join
 ---

 Key: SPARK-7595
 URL: https://issues.apache.org/jira/browse/SPARK-7595
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Weizhong
Assignee: Weizhong
Priority: Minor
 Fix For: 1.4.0


 for example:
 table: src(key string, value string)
 sql: with v1 as(select key, count(value) over (partition by key) cnt_val from 
 src), v2 as(select v1.key, v1_lag.cnt_val from v1, v1 v1_lag where v1.key = 
 v1_lag.key) select * from v2 limit 5;
 then will analyze fail when resolving conflicting references in Join:
 'Limit 5
  'Project [*]
   'Subquery v2
'Project ['v1.key,'v1_lag.cnt_val]
 'Filter ('v1.key = 'v1_lag.key)
  'Join Inner, None
   Subquery v1
Project [key#95,cnt_val#94L]
 Window [key#95,value#96], 
 [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(value#96)
  WindowSpecDefinition [key#95], [], ROWS BETWEEN UNBOUNDED PRECEDING AND 
 UNBOUNDED FOLLOWING AS cnt_val#94L], WindowSpecDefinition [key#95], [], ROWS 
 BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  Project [key#95,value#96]
   MetastoreRelation default, src, None
   Subquery v1_lag
Subquery v1
 Project [key#97,cnt_val#94L]
  Window [key#97,value#98], 
 [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(value#98)
  WindowSpecDefinition [key#97], [], ROWS BETWEEN UNBOUNDED PRECEDING AND 
 UNBOUNDED FOLLOWING AS cnt_val#94L], WindowSpecDefinition [key#97], [], ROWS 
 BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   Project [key#97,value#98]
MetastoreRelation default, src, None
 Conflicting attributes: cnt_val#94L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7598) Add aliveWorkers metrics in Master


 [ 
https://issues.apache.org/jira/browse/SPARK-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7598:
-
Assignee: Rex Xiong

 Add aliveWorkers metrics in Master
 --

 Key: SPARK-7598
 URL: https://issues.apache.org/jira/browse/SPARK-7598
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 1.3.1
Reporter: Rex Xiong
Assignee: Rex Xiong
Priority: Minor
 Fix For: 1.4.0


 In Spark Standalone setup, when some workers are DEAD, they will stay in 
 master worker list for a while.
 master.workers metrics for master is only showing the total number of 
 workers, we need to monitor how many real ALIVE workers are there to ensure 
 the cluster is healthy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource


 [ 
https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7601:
-
Assignee: Venkata Ramana G

 Support Insert into JDBC Datasource
 ---

 Key: SPARK-7601
 URL: https://issues.apache.org/jira/browse/SPARK-7601
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Venkata Ramana G
Assignee: Venkata Ramana G
 Fix For: 1.4.0


 Support Insert into JDBCDataSource. Following are usage examples
 {code}
 sqlContext.sql(
   s
 |CREATE TEMPORARY TABLE testram1
 |USING org.apache.spark.sql.jdbc
 |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
 driver 'com.h2.Driver')
   .stripMargin.replaceAll(\n,  ))
 sqlContext.sql(insert into table testram1 select * from testsrc).show
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5782) Python Worker / Pyspark Daemon Memory Issue


 [ 
https://issues.apache.org/jira/browse/SPARK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5782:
-
Priority: Major  (was: Blocker)

 Python Worker / Pyspark Daemon Memory Issue
 ---

 Key: SPARK-5782
 URL: https://issues.apache.org/jira/browse/SPARK-5782
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Shuffle
Affects Versions: 1.2.1, 1.2.2, 1.3.0
 Environment: CentOS 7, Spark Standalone
Reporter: Mark Khaitman

 I'm including the Shuffle component on this, as a brief scan through the code 
 (which I'm not 100% familiar with just yet) shows a large amount of memory 
 handling in it:
 It appears that any type of join between two RDDs spawns up twice as many 
 pyspark.daemon workers compared to the default 1 task - 1 core configuration 
 in our environment. This can become problematic in the cases where you build 
 up a tree of RDD joins, since the pyspark.daemons do not cease to exist until 
 the top level join is completed (or so it seems)... This can lead to memory 
 exhaustion by a single framework, even though is set to have a 512MB python 
 worker memory limit and few gigs of executor memory.
 Another related issue to this is that the individual python workers are not 
 supposed to even exceed that far beyond 512MB, otherwise they're supposed to 
 spill to disk.
 Some of our python workers are somehow reaching 2GB each (which when 
 multiplied by the number of cores per executor * the number of joins 
 occurring in some cases), causing the Out-of-Memory killer to step up to its 
 unfortunate job! :(
 I think with the _next_limit method in shuffle.py, if the current memory 
 usage is close to the memory limit, then a 1.05 multiplier can endlessly 
 cause more memory to be consumed by the single python worker, since the max 
 of (512 vs 511 * 1.05) would end up blowing up towards the latter of the 
 two... Shouldn't the memory limit be the absolute cap in this case?
 I've only just started looking into the code, and would definitely love to 
 contribute towards Spark, though I figured it might be quicker to resolve if 
 someone already owns the code!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7269) Incorrect aggregation analysis


 [ 
https://issues.apache.org/jira/browse/SPARK-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7269:
-
Priority: Major  (was: Blocker)

 Incorrect aggregation analysis
 --

 Key: SPARK-7269
 URL: https://issues.apache.org/jira/browse/SPARK-7269
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Cheng Hao

 In a case insensitive analyzer (HiveContext), the attribute name captial 
 differences will fail the analysis check for aggregation.
 {code}
 test(check analysis failed in case in-sensitive) {
 Seq(1,2,3).map(i = (i, i.toString)).toDF(key, 
 value).registerTempTable(df_analysis)
 sql(SELECT kEy from df_analysis group by key)
 }
 {code}
 {noformat}
 expression 'kEy' is neither present in the group by, nor is it an aggregate 
 function. Add to group by or wrap in first() if you don't care which value 
 you get.;
 org.apache.spark.sql.AnalysisException: expression 'kEy' is neither present 
 in the group by, nor is it an aggregate function. Add to group by or wrap in 
 first() if you don't care which value you get.;
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
   at 
 org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:39)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:85)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$4.apply(CheckAnalysis.scala:101)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$4.apply(CheckAnalysis.scala:101)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:101)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:89)
   at 
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
   at 
 org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:39)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1121)
   at org.apache.spark.sql.DataFrame.init(DataFrame.scala:133)
   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
   at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:97)
   at 
 org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$15.apply$mcV$sp(SQLQuerySuite.scala:408)
   at 
 org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$15.apply(SQLQuerySuite.scala:406)
   at 
 org.apache.spark.sql.hive.execution.SQLQuerySuite$$anonfun$15.apply(SQLQuerySuite.scala:406)
   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   at org.scalatest.Transformer.apply(Transformer.scala:22)
   at org.scalatest.Transformer.apply(Transformer.scala:20)
   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
   at 
 org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6680) Be able to specifie IP for spark-shell(spark driver) blocker for Docker integration


 [ 
https://issues.apache.org/jira/browse/SPARK-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6680:
-
Priority: Minor  (was: Blocker)

 Be able to specifie IP for spark-shell(spark driver) blocker for Docker 
 integration
 ---

 Key: SPARK-6680
 URL: https://issues.apache.org/jira/browse/SPARK-6680
 Project: Spark
  Issue Type: New Feature
  Components: Deploy
Affects Versions: 1.3.0
 Environment: Docker.
Reporter: Egor Pakhomov
Priority: Minor
  Labels: core, deploy, docker

 Suppose I have 3 docker containers - spark_master, spark_worker and 
 spark_shell. In docker for public IP of this container there is an alias like 
 fgsdfg454534. It only visible in this container. When spark use it for 
 communication other containers receive this alias and don't know what to do 
 with it. Thats why I used SPARK_LOCAL_IP for master and worker. But it 
 doesn't work for spark driver(for spark shell - other types of drivers I 
 haven't try). Spark driver sent everyone fgsdfg454534 alias about itself 
 and then nobody can address it. I've overcome it in 
 https://github.com/epahomov/docker-spark, but it would be better if it would 
 be solved on spark code level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7119) ScriptTransform doesn't consider the output data type


 [ 
https://issues.apache.org/jira/browse/SPARK-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7119:
-
Priority: Major  (was: Blocker)

 ScriptTransform doesn't consider the output data type
 -

 Key: SPARK-7119
 URL: https://issues.apache.org/jira/browse/SPARK-7119
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Cheng Hao

 {panel}
 from (from src select transform(key, value) using 'cat' as (thing1 int, 
 thing2 string)) t select thing1 + 2;
 {panel}
 {panel}
 15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job 
 aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent 
 failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
 java.lang.ClassCastException: org.apache.spark.sql.types.UTF8String cannot be 
 cast to java.lang.Integer
   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
   at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
   at 
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
   at 
 org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
   at 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7523) ERROR LiveListenerBus: Listener EventLoggingListener threw an exception


 [ 
https://issues.apache.org/jira/browse/SPARK-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7523.
--
Resolution: Invalid

I think this should start as a discussion on the mailing list. It's not clear 
this is a Spark problem.

 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception
 ---

 Key: SPARK-7523
 URL: https://issues.apache.org/jira/browse/SPARK-7523
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.3.0
 Environment: Prod
Reporter: sagar
Priority: Blocker
 Attachments: schema.txt, spark-0.0.1-SNAPSHOT.jar


 Hi Team,
 I am using CDH 5.4 with spark 1.3.0.
 I am getting below error while executing below command -
 I see jira's (SPARK-2906/SPARK-1407) specifying the issue is resolved, but i 
 didnt get any solution what the fix for that. Can you pls guide/suggest as 
 this is production issue.
 $ spark-submit   --master local[4]   --class org.sample.spark.SparkFilter   
 --name Spark Sample Program   spark-0.0.1-SNAPSHOT.jar  
 /user/user1/schema.txt
 ==
 15/05/11 06:28:36 ERROR LiveListenerBus: Listener EventLoggingListener threw 
 an exception
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at 
 org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
   at 
 org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:169)
   at 
 org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:36)
   at 
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
   at 
 org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
   at 
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
   at 
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
   at 
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
   at 
 org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
 Caused by: java.io.IOException: Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:792)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1998)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959)
   at 
 org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
   ... 19 more
 ==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-4452:
-
Priority: Major (was: Critical)
Target Version/s: (was: 1.1.2, 1.2.1, 1.3.0)

Shuffle data structures can starve others on the same thread for memory

Key: SPARK-4452
URL: https://issues.apache.org/jira/browse/SPARK-4452
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.1.0
Reporter: Tianshuo Deng
Assignee: Tianshuo Deng

When an Aggregator is used with ExternalSorter in a task, spark will create
many small files and could cause too many files open error during merging.
Currently, ShuffleMemoryManager does not work well when there are 2 spillable
objects in a thread, which are ExternalSorter and ExternalAppendOnlyMap(used
by Aggregator) in this case. Here is an example: Due to the usage of mapside
aggregation, ExternalAppendOnlyMap is created first to read the RDD. It may
ask as much memory as it can, which is totalMem/numberOfThreads. Then later
on when ExternalSorter is created in the same thread, the
ShuffleMemoryManager could refuse to allocate more memory to it, since the
memory is already given to the previous requested
object(ExternalAppendOnlyMap). That causes the ExternalSorter keeps spilling
small files(due to the lack of memory)
I'm currently working on a PR to address these two issues. It will include
following changes:
1. The ShuffleMemoryManager should not only track the memory usage for each
thread, but also the object who holds the memory
2. The ShuffleMemoryManager should be able to trigger the spilling of a
spillable object. In this way, if a new object in a thread is requesting
memory, the old occupant could be evicted/spilled. Previously the spillable
objects trigger spilling by themselves. So one may not trigger spilling even
if another object in the same thread needs more memory. After this change The
ShuffleMemoryManager could trigger the spilling of an object if it needs to.
3. Make the iterator of ExternalAppendOnlyMap spillable. Previously
ExternalAppendOnlyMap returns an destructive iterator and can not be spilled
after the iterator is returned. This should be changed so that even after the
iterator is returned, the ShuffleMemoryManager can still spill it.
Currently, I have a working branch in progress:
https://github.com/tsdeng/spark/tree/enhance_memory_manager. Already made
change 3 and have a prototype of change 1 and 2 to evict spillable from
memory manager, still in progress. I will send a PR when it's done.
Any feedback or thoughts on this change is highly appreciated !

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5205) Inconsistent behaviour between Streaming job and others, when click kill link in WebUI


 [ 
https://issues.apache.org/jira/browse/SPARK-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5205:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Inconsistent behaviour between Streaming job and others, when click kill link 
 in WebUI
 --

 Key: SPARK-5205
 URL: https://issues.apache.org/jira/browse/SPARK-5205
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: uncleGen

 The kill link is used to kill a stage in job. It works in any kinds of 
 Spark job but Spark Streaming. To be specific, we can only kill the stage 
 which is used to run Receiver, but not kill the Receivers. Well, the 
 stage can be killed and cleaned from the ui, but the receivers are still 
 alive and receiving data. I think it dose not fit with the common sense. 
 IMHO, killing the receiver stage means kill the receivers and stopping 
 receiving data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4888) Spark EC2 doesn't mount local disks for i2.8xlarge instances


 [ 
https://issues.apache.org/jira/browse/SPARK-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4888:
-
Target Version/s: 1.5.0  (was: 1.0.3, 1.1.2, 1.2.1, 1.3.0)

 Spark EC2 doesn't mount local disks for i2.8xlarge instances
 

 Key: SPARK-4888
 URL: https://issues.apache.org/jira/browse/SPARK-4888
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.0.2, 1.1.1, 1.2.0
Reporter: Josh Rosen
Priority: Critical

 When launching a cluster using {{spark-ec2}} with i8.2xlarge instances, the 
 local disks aren't mounted.   The AWS console doesn't show the disks as 
 mounted, either
 I think that the issue is that EC2 won't auto-mount the SSDs.  We have some 
 code that handles this for some of the {{r3*}} instance types, and I think 
 the right fix is to extend this for {{i8}} instance types, too: 
 https://github.com/mesos/spark-ec2/blob/v4/setup-slave.sh#L37
 /cc [~adav], who originally found this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6174) Improve doc: Python ALS, MatrixFactorizationModel


 [ 
https://issues.apache.org/jira/browse/SPARK-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6174:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Improve doc: Python ALS, MatrixFactorizationModel
 -

 Key: SPARK-6174
 URL: https://issues.apache.org/jira/browse/SPARK-6174
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Minor

 The Python docs for recommendation have almost no content except an example.  
 Add class, method  attribute descriptions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4227) Document external shuffle service


 [ 
https://issues.apache.org/jira/browse/SPARK-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4227:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Document external shuffle service
 -

 Key: SPARK-4227
 URL: https://issues.apache.org/jira/browse/SPARK-4227
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Priority: Critical

 We should add spark.shuffle.service.enabled to the Configuration page and 
 give instructions for launching the shuffle service as an auxiliary service 
 on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6266) PySpark SparseVector missing doc for size, indices, values


 [ 
https://issues.apache.org/jira/browse/SPARK-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6266:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 PySpark SparseVector missing doc for size, indices, values
 --

 Key: SPARK-6266
 URL: https://issues.apache.org/jira/browse/SPARK-6266
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Minor

 Need to add doc for size, indices, values attributes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6173) Python doc parity with Scala/Java in MLlib


 [ 
https://issues.apache.org/jira/browse/SPARK-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6173:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Python doc parity with Scala/Java in MLlib
 --

 Key: SPARK-6173
 URL: https://issues.apache.org/jira/browse/SPARK-6173
 Project: Spark
  Issue Type: Umbrella
  Components: Documentation, MLlib, PySpark
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 This is an umbrella JIRA for noting parts of the Python API in MLlib which 
 are significantly less well-documented than the Scala/Java docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6270) Standalone Master hangs when streaming job completes


 [ 
https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6270:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Standalone Master hangs when streaming job completes
 

 Key: SPARK-6270
 URL: https://issues.apache.org/jira/browse/SPARK-6270
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Streaming
Affects Versions: 1.2.0, 1.2.1, 1.3.0
Reporter: Tathagata Das
Priority: Critical

 If the event logging is enabled, the Spark Standalone Master tries to 
 recreate the web UI of a completed Spark application from its event logs. 
 However if this event log is huge (e.g. for a Spark Streaming application), 
 then the master hangs in its attempt to read and recreate the web ui. This 
 hang causes the whole standalone cluster to be unusable. 
 Workaround is to disable the event logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6265) PySpark GLMs missing doc for intercept, weights


 [ 
https://issues.apache.org/jira/browse/SPARK-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6265:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 PySpark GLMs missing doc for intercept, weights
 ---

 Key: SPARK-6265
 URL: https://issues.apache.org/jira/browse/SPARK-6265
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Minor

 In PySpark MLlib, the GLMs (e.g., LinearRegressionModel) have no 
 documentation for the intercept and weights attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6632) Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself


 [ 
https://issues.apache.org/jira/browse/SPARK-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6632:
-
Fix Version/s: (was: 1.4.0)

 Optimize the parquetSchema to metastore schema reconciliation, so that the 
 process is delegated to each map task itself
 ---

 Key: SPARK-6632
 URL: https://issues.apache.org/jira/browse/SPARK-6632
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Yash Datta

 Currently in ParquetRelation2, schema from all the part files is first 
 merged, and then reconciled with metastore schema. This approach does not 
 scale in case we have thousands of partitions for the table. We can take a 
 different approach where we can go ahead with the metastore schema, and 
 reconcile the names of the columns within each map task , using ReadSupport 
 hooks provided in parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver


 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7563:
-
Fix Version/s: (was: 1.4.0)

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical

 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 actorSystem.shutdown()
 ..
 {noformat}

[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge


 [ 
https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6378:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 srcAttr in graph.triplets don't update when the size of graph is huge
 -

 Key: SPARK-6378
 URL: https://issues.apache.org/jira/browse/SPARK-6378
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.2.1
Reporter: zhangzhenyue

 when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the 
 srcAttr and dstAttr in graph.triplets don't update when using the 
 Graph.outerJoinVertices(when the data in vertex is changed).
 the code and the log is as follows:
 {quote}
 g = graph.outerJoinVertices()...
 g,vertices,count()
 g.edges.count()
 println(example edge  + g.triplets.filter(e = e.srcId == 
 51L).collect()
   .map(e =(e.srcId + : + e.srcAttr + ,  + e.dstId + : + 
 e.dstAttr)).mkString(\n))
 println(example vertex  + g.vertices.filter(e = e._1 == 
 51L).collect()
   .map(e = (e._1 + , + e._2)).mkString(\n))
 {quote}
 the result:
 {quote}
 example edge 51:0, 2467451620:61
 51:0, 1962741310:83 // attr of vertex 51 is 0 in 
 Graph.triplets
 example vertex 51,2 // attr of vertex 51 is 2 in 
 Graph.vertices
 {quote}
 when the graph is smaller(10 million vertex), the code is OK, the triplets 
 will update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6701) Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application


 [ 
https://issues.apache.org/jira/browse/SPARK-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6701:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Flaky test: o.a.s.deploy.yarn.YarnClusterSuite Python application
 -

 Key: SPARK-6701
 URL: https://issues.apache.org/jira/browse/SPARK-6701
 Project: Spark
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 1.3.0
Reporter: Andrew Or
Priority: Critical

 Observed in Master and 1.3, both in SBT and in Maven (with YARN).
 {code}
 Process 
 List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
  --master, yarn-cluster, --num-executors, 1, --properties-file, 
 /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
  --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
 /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
 /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
 exited with code 1
 sbt.ForkMain$ForkError: Process 
 List(/home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop2.3/label/centos/bin/spark-submit,
  --master, yarn-cluster, --num-executors, 1, --properties-file, 
 /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/spark968020731409047027.properties,
  --py-files, /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test2.py, 
 /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/test.py, 
 /tmp/spark-ea49597c-2a95-4d8c-a9ea-23861a02c9bd/result961582960984674264.tmp) 
 exited with code 1
   at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1122)
   at 
 org.apache.spark.deploy.yarn.YarnClusterSuite.org$apache$spark$deploy$yarn$YarnClusterSuite$$runSpark(YarnClusterSuite.scala:259)
   at 
 org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply$mcV$sp(YarnClusterSuite.scala:160)
   at 
 org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
   at 
 org.apache.spark.deploy.yarn.YarnClusterSuite$$anonfun$4.apply(YarnClusterSuite.scala:146)
   at 
 org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6981) [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext


 [ 
https://issues.apache.org/jira/browse/SPARK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6981:
-
Fix Version/s: (was: 1.4.0)

 [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext
 

 Key: SPARK-6981
 URL: https://issues.apache.org/jira/browse/SPARK-6981
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0, 1.4.0
Reporter: Edoardo Vacchi
Priority: Minor

 In order to simplify extensibility with new strategies from third-parties, it 
 should be better to factor SparkPlanner and QueryExecution in their own 
 classes. Dependent types add additional, unnecessary complexity; besides, 
 HiveContext would benefit from this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6484) Ganglia metrics xml reporter doesn't escape correctly


 [ 
https://issues.apache.org/jira/browse/SPARK-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6484:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 Ganglia metrics xml reporter doesn't escape correctly
 -

 Key: SPARK-6484
 URL: https://issues.apache.org/jira/browse/SPARK-6484
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Michael Armbrust
Assignee: Josh Rosen
Priority: Critical

 The following should be escaped:
 {code}
quot;
 '   apos;
lt;
gt;
amp;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7606) Document all PySpark SQL/DataFrame public methods with @since tag


 [ 
https://issues.apache.org/jira/browse/SPARK-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7606:
-
Fix Version/s: (was: 1.4.0)

 Document all PySpark SQL/DataFrame public methods with @since tag
 -

 Key: SPARK-7606
 URL: https://issues.apache.org/jira/browse/SPARK-7606
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Nicholas Chammas





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7444) Eliminate noisy css warn/error logs for UISeleniumSuite


 [ 
https://issues.apache.org/jira/browse/SPARK-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7444:
-
Fix Version/s: (was: 1.4.0)

 Eliminate noisy css warn/error logs for UISeleniumSuite
 ---

 Key: SPARK-7444
 URL: https://issues.apache.org/jira/browse/SPARK-7444
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Reporter: Shixiong Zhu
Priority: Minor

 Eliminate the following noisy logs for {{UISeleniumSuite}}:
 {code}
 15/05/07 10:09:50.196 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN 
 DefaultCssErrorHandler: CSS error: 
 'http://192.168.0.170:4040/static/bootstrap.min.css' [793:167] Error in style 
 rule. (Invalid token *. Was expecting one of: EOF, S, IDENT, }, 
 ;.)
 15/05/07 10:09:50.196 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN 
 DefaultCssErrorHandler: CSS warning: 
 'http://192.168.0.170:4040/static/bootstrap.min.css' [793:167] Ignoring the 
 following declarations in this rule.
 15/05/07 10:09:50.197 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN 
 DefaultCssErrorHandler: CSS error: 
 'http://192.168.0.170:4040/static/bootstrap.min.css' [799:325] Error in style 
 rule. (Invalid token *. Was expecting one of: EOF, S, IDENT, }, 
 ;.)
 15/05/07 10:09:50.197 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN 
 DefaultCssErrorHandler: CSS warning: 
 'http://192.168.0.170:4040/static/bootstrap.min.css' [799:325] Ignoring the 
 following declarations in this rule.
 15/05/07 10:09:50.198 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN 
 DefaultCssErrorHandler: CSS error: 
 'http://192.168.0.170:4040/static/bootstrap.min.css' [805:18] Error in style 
 rule. (Invalid token *. Was expecting one of: EOF, S, IDENT, }, 
 ;.)
 15/05/07 10:09:50.198 pool-1-thread-1-ScalaTest-running-UISeleniumSuite WARN 
 DefaultCssErrorHandler: CSS warning: 
 'http://192.168.0.170:4040/static/bootstrap.min.css' [805:18] Ignoring the 
 following declarations in this rule.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7097) Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold


 [ 
https://issues.apache.org/jira/browse/SPARK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7097:
-
Fix Version/s: (was: 1.4.0)

 Partitioned tables should only consider referred partitions in query during 
 size estimation for checking against autoBroadcastJoinThreshold
 ---

 Key: SPARK-7097
 URL: https://issues.apache.org/jira/browse/SPARK-7097
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
Reporter: Yash Datta

 Currently when deciding about whether to create HashJoin or ShuffleHashJoin, 
 the size estimation of partitioned tables involved considers the size of 
 entire table. This results in many query plans using shuffle hash joins , 
 where infact only a small number of partitions may be being referred by the 
 actual query (due to additional filters), and hence these could be run using 
 BroadCastHash join.
 The query plan should consider the size of only the referred partitions in 
 such cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6828) Spark returns misleading message when client is incompatible with server


 [ 
https://issues.apache.org/jira/browse/SPARK-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6828:
-
Fix Version/s: (was: 1.4.0)

 Spark returns misleading message when client is incompatible with server
 

 Key: SPARK-6828
 URL: https://issues.apache.org/jira/browse/SPARK-6828
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
 Environment: Client: Windows 7 spark-core v.1.3.0
 Server: RedHat 6.6 spark-core v.1.2.0
Reporter: Alexander Ulanov
Priority: Minor

 Client code:
 val conf = new SparkConf().
   setMaster(spark://mynetwrok.com:7077).
   setAppName(myapp).
 val sc = new SparkContext(conf)
 Server reply:
 5/04/09 15:35:22 INFO client.AppClient$ClientActor: Connecting to master 
 akka.tcp://sparkmas...@mynetwrok.com:7077/user/Master...
 15/04/09 15:35:22 WARN remote.ReliableDeliverySupervisor: Association with 
 remote system [akka.tcp://sparkMaster@mynetwork:7077] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7527) Wrong detection of REPL mode in ClosureCleaner


 [ 
https://issues.apache.org/jira/browse/SPARK-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7527:
-
Fix Version/s: (was: 1.4.0)

 Wrong detection of REPL mode in ClosureCleaner
 --

 Key: SPARK-7527
 URL: https://issues.apache.org/jira/browse/SPARK-7527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Oleksii Kostyliev
Priority: Minor

 If REPL class is not present on the classpath, the {{inIntetpreter}} boolean 
 switch shall be {{false}}, not {{true}} at: 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L247



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6803) [SparkR] Support SparkR Streaming


 [ 
https://issues.apache.org/jira/browse/SPARK-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6803:
-
Fix Version/s: (was: 1.4.0)

 [SparkR] Support SparkR Streaming
 -

 Key: SPARK-6803
 URL: https://issues.apache.org/jira/browse/SPARK-6803
 Project: Spark
  Issue Type: New Feature
  Components: SparkR, Streaming
Reporter: Hao

 Adds R API for Spark Streaming.
 A experimental version is presented in repo [1]. which follows the PySpark 
 streaming design. Also, this PR can be further broken down into sub task 
 issues.
 [1] https://github.com/hlin09/spark/tree/SparkR-streaming/ 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7316) Add step capability to RDD sliding window


 [ 
https://issues.apache.org/jira/browse/SPARK-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7316:
-
Fix Version/s: (was: 1.4.0)

 Add step capability to RDD sliding window
 -

 Key: SPARK-7316
 URL: https://issues.apache.org/jira/browse/SPARK-7316
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Alexander Ulanov
   Original Estimate: 24h
  Remaining Estimate: 24h

 RDDFunctions in MLlib contains sliding window implementation with step 1. 
 User should be able to define step. This capability should be implemented.
 Although one can generate sliding windows with step 1 and then filter every 
 Nth window, it might take much more time and disk space depending on the step 
 size. For example, if your window is 1000 then you will generate the amount 
 of data thousand times bigger than your initial dataset. It does not make 
 sense if you need just every Nth window, so the data generated will be 1000/N 
 smaller. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6828) Spark returns misleading message when client is incompatible with server


 [ 
https://issues.apache.org/jira/browse/SPARK-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6828:
-
Target Version/s:   (was: 1.4.0)

 Spark returns misleading message when client is incompatible with server
 

 Key: SPARK-6828
 URL: https://issues.apache.org/jira/browse/SPARK-6828
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
 Environment: Client: Windows 7 spark-core v.1.3.0
 Server: RedHat 6.6 spark-core v.1.2.0
Reporter: Alexander Ulanov
Priority: Minor

 Client code:
 val conf = new SparkConf().
   setMaster(spark://mynetwrok.com:7077).
   setAppName(myapp).
 val sc = new SparkContext(conf)
 Server reply:
 5/04/09 15:35:22 INFO client.AppClient$ClientActor: Connecting to master 
 akka.tcp://sparkmas...@mynetwrok.com:7077/user/Master...
 15/04/09 15:35:22 WARN remote.ReliableDeliverySupervisor: Association with 
 remote system [akka.tcp://sparkMaster@mynetwork:7077] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1


 [ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7670:
-
Fix Version/s: (was: 1.4.0)

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero

 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7627) DAG visualization: cached RDDs not shown on job page


 [ 
https://issues.apache.org/jira/browse/SPARK-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7627:
-
Fix Version/s: (was: 1.4.0)

 DAG visualization: cached RDDs not shown on job page
 

 Key: SPARK-7627
 URL: https://issues.apache.org/jira/browse/SPARK-7627
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Andrew Or

 It's a small styling issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7658) Update the mouse behaviors for the timeline graphs


 [ 
https://issues.apache.org/jira/browse/SPARK-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7658:
-
Fix Version/s: (was: 1.4.0)

 Update the mouse behaviors for the timeline graphs
 --

 Key: SPARK-7658
 URL: https://issues.apache.org/jira/browse/SPARK-7658
 Project: Spark
  Issue Type: Improvement
  Components: Streaming, Web UI
Reporter: Shixiong Zhu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7224) Mock repositories for testing with --packages


 [ 
https://issues.apache.org/jira/browse/SPARK-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7224:
-
Fix Version/s: (was: 1.4.0)

 Mock repositories for testing with --packages
 -

 Key: SPARK-7224
 URL: https://issues.apache.org/jira/browse/SPARK-7224
 Project: Spark
  Issue Type: Test
  Components: Spark Submit
Reporter: Burak Yavuz
Assignee: Burak Yavuz
Priority: Critical

 Create mock repositories (folders with jars and pom in MAven format) for 
 testing --packages without the need for internet connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6197) handle json parse exception for eventlog file not finished writing


 [ 
https://issues.apache.org/jira/browse/SPARK-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6197:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 handle json parse exception for eventlog file not finished writing 
 ---

 Key: SPARK-6197
 URL: https://issues.apache.org/jira/browse/SPARK-6197
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Zhang, Liye
Assignee: Zhang, Liye
Priority: Minor
  Labels: backport-needed

 This is a following JIRA for 
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107]. In  
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107], webUI can 
 display event log files that with suffix *.inprogress*. However, the eventlog 
 file may be not finished writing for some abnormal cases (e.g. Ctrl+C), In 
 which case, the file maybe  truncated in the last line, leading to the line 
 being not in valid Json format. Which will cause Json parse exception when 
 reading the file. 
 For this case, we can just ignore the last line content, since the history 
 for abnormal cases showed on web is only a reference for user, it can 
 demonstrate the past status of the app before terminated abnormally (we can 
 not guarantee the history can show exactly the last moment when app encounter 
 the abnormal situation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7245) Spearman correlation for DataFrames


 [ 
https://issues.apache.org/jira/browse/SPARK-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7245:
-
Fix Version/s: (was: 1.4.0)

 Spearman correlation for DataFrames
 ---

 Key: SPARK-7245
 URL: https://issues.apache.org/jira/browse/SPARK-7245
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Xiangrui Meng

 Spearman correlation is harder than Pearson to compute.
 ~~~
 df.stat.corr(col1, col2, method=spearman): Double
 ~~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7498) Params.setDefault should not use varargs annotation


 [ 
https://issues.apache.org/jira/browse/SPARK-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7498:
-
Fix Version/s: (was: 1.4.0)

 Params.setDefault should not use varargs annotation
 ---

 Key: SPARK-7498
 URL: https://issues.apache.org/jira/browse/SPARK-7498
 Project: Spark
  Issue Type: Bug
  Components: Java API, ML
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Assignee: Xiangrui Meng

 In [SPARK-7429] and PR [https://github.com/apache/spark/pull/5960], I added 
 the varargs annotation to Params.setDefault which takes a variable number of 
 ParamPairs.  It worked locally and on Jenkins for me.
 However, @mengxr reported issues compiling on his machine.  So I'm reverting 
 the change introduced in [https://github.com/apache/spark/pull/5960] by 
 removing varargs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6216) Check Python version in worker before run PySpark job


 [ 
https://issues.apache.org/jira/browse/SPARK-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6216:
-
Fix Version/s: (was: 1.4.0)

 Check Python version in worker before run PySpark job
 -

 Key: SPARK-6216
 URL: https://issues.apache.org/jira/browse/SPARK-6216
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Davies Liu
Assignee: Davies Liu

 PySpark can only run with the same major version both in driver and worker ( 
 both of the are 2.6 or 2.7), it will cause random error if it have 2.7 in 
 driver or 2.6 in worker (or vice).
 For example:
 {code}
 davies@localhost:~/work/spark$ PYSPARK_PYTHON=python2.6 
 PYSPARK_DRIVER_PYTHON=python2.7 bin/pyspark
 Using Python version 2.7.7 (default, Jun  2 2014 12:48:16)
 SparkContext available as sc, SQLContext available as sqlCtx.
  sc.textFile('LICENSE').map(lambda l: l.split()).count()
 org.apache.spark.api.python.PythonException: Traceback (most recent call 
 last):
   File /Users/davies/work/spark/python/pyspark/worker.py, line 101, in main
 process()
   File /Users/davies/work/spark/python/pyspark/worker.py, line 96, in 
 process
 serializer.dump_stream(func(split_index, iterator), outfile)
   File /Users/davies/work/spark/python/pyspark/rdd.py, line 2251, in 
 pipeline_func
 return func(split, prev_func(split, iterator))
   File /Users/davies/work/spark/python/pyspark/rdd.py, line 2251, in 
 pipeline_func
 return func(split, prev_func(split, iterator))
   File /Users/davies/work/spark/python/pyspark/rdd.py, line 2251, in 
 pipeline_func
 return func(split, prev_func(split, iterator))
   File /Users/davies/work/spark/python/pyspark/rdd.py, line 281, in func
 return f(iterator)
   File /Users/davies/work/spark/python/pyspark/rdd.py, line 931, in lambda
 return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
   File /Users/davies/work/spark/python/pyspark/rdd.py, line 931, in 
 genexpr
 return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
   File stdin, line 1, in lambda
 TypeError: 'bool' object is not callable
   at 
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:136)
   at 
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:177)
   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:95)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7287) Flaky test: o.a.s.deploy.SparkSubmitSuite --packages


 [ 
https://issues.apache.org/jira/browse/SPARK-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7287:
-
Fix Version/s: (was: 1.4.0)

 Flaky test: o.a.s.deploy.SparkSubmitSuite --packages
 

 Key: SPARK-7287
 URL: https://issues.apache.org/jira/browse/SPARK-7287
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
  Labels: flaky-test

 Error message was not helpful (did not complete within 60 seconds or 
 something).
 Observed only in master:
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/2239/
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/2238/
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2163/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6197) handle json parse exception for eventlog file not finished writing


 [ 
https://issues.apache.org/jira/browse/SPARK-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6197:
-
Fix Version/s: 1.4.0

 handle json parse exception for eventlog file not finished writing 
 ---

 Key: SPARK-6197
 URL: https://issues.apache.org/jira/browse/SPARK-6197
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Zhang, Liye
Assignee: Zhang, Liye
Priority: Minor
  Labels: backport-needed
 Fix For: 1.4.0


 This is a following JIRA for 
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107]. In  
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107], webUI can 
 display event log files that with suffix *.inprogress*. However, the eventlog 
 file may be not finished writing for some abnormal cases (e.g. Ctrl+C), In 
 which case, the file maybe  truncated in the last line, leading to the line 
 being not in valid Json format. Which will cause Json parse exception when 
 reading the file. 
 For this case, we can just ignore the last line content, since the history 
 for abnormal cases showed on web is only a reference for user, it can 
 demonstrate the past status of the app before terminated abnormally (we can 
 not guarantee the history can show exactly the last moment when app encounter 
 the abnormal situation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6197) handle json parse exception for eventlog file not finished writing


 [ 
https://issues.apache.org/jira/browse/SPARK-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6197:
-
Fix Version/s: (was: 1.4.0)

 handle json parse exception for eventlog file not finished writing 
 ---

 Key: SPARK-6197
 URL: https://issues.apache.org/jira/browse/SPARK-6197
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Zhang, Liye
Assignee: Zhang, Liye
Priority: Minor
  Labels: backport-needed

 This is a following JIRA for 
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107]. In  
 [SPARK-6107|https://issues.apache.org/jira/browse/SPARK-6107], webUI can 
 display event log files that with suffix *.inprogress*. However, the eventlog 
 file may be not finished writing for some abnormal cases (e.g. Ctrl+C), In 
 which case, the file maybe  truncated in the last line, leading to the line 
 being not in valid Json format. Which will cause Json parse exception when 
 reading the file. 
 For this case, we can just ignore the last line content, since the history 
 for abnormal cases showed on web is only a reference for user, it can 
 demonstrate the past status of the app before terminated abnormally (we can 
 not guarantee the history can show exactly the last moment when app encounter 
 the abnormal situation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements


 [ 
https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-2155:
-
Assignee: Wenchen Fan

 Support effectful / non-deterministic key expressions in CASE WHEN statements
 -

 Key: SPARK-2155
 URL: https://issues.apache.org/jira/browse/SPARK-2155
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Zongheng Yang
Assignee: Wenchen Fan
Priority: Minor
 Fix For: 1.4.0


 Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
 evaluations of the key expression. Relevant discussions here: 
 https://github.com/apache/spark/pull/1055/files#r13784248
 If we are very in need of support for effectful key expressions, at least we 
 can resort to the baseline approach of having both CaseWhen and CaseKeyWhen 
 as expressions, which seem to introduce much code duplication (e.g. see 
 https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216
  for a sketch implementation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7093) Using newPredicate in NestedLoopJoin to enable code generation


 [ 
https://issues.apache.org/jira/browse/SPARK-7093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7093:
-
Assignee: Fei Wang

 Using newPredicate in NestedLoopJoin to enable code generation
 --

 Key: SPARK-7093
 URL: https://issues.apache.org/jira/browse/SPARK-7093
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Fei Wang
Assignee: Fei Wang
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7123) support table.star in sqlcontext