[jira] [Commented] (SPARK-4094) checkpoint should still be available after rdd actions

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184897#comment-14184897
 ] 

Apache Spark commented on SPARK-4094:
-

User 'liyezhang556520' has created a pull request for this issue:
https://github.com/apache/spark/pull/2956

> checkpoint should still be available after rdd actions
> --
>
> Key: SPARK-4094
> URL: https://issues.apache.org/jira/browse/SPARK-4094
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Zhang, Liye
>
> rdd.checkpoint() must be called before any actions on this rdd, if there is 
> any other actions before, checkpoint would never succeed. For the following 
> code as example:
> *rdd = sc.makeRDD(...)*
> *rdd.collect()*
> *rdd.checkpoint()*
> *rdd.count()*
> This rdd would never be checkpointed. But this would not happen for RDD 
> cache. RDD cache would always make successfully before rdd actions no matter 
> whether there is any actions before cache().
> So rdd.checkpoint() should also be with the same behavior with rdd.cache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4096) Update executor memory description in the help message

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184894#comment-14184894
 ] 

Apache Spark commented on SPARK-4096:
-

User 'WangTaoTheTonic' has created a pull request for this issue:
https://github.com/apache/spark/pull/2955

> Update executor memory description in the help message
> --
>
> Key: SPARK-4096
> URL: https://issues.apache.org/jira/browse/SPARK-4096
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: WangTaoTheTonic
>Priority: Minor
>
> Here `ApplicationMaster` accept executor memory argument only in number 
> format, we should update the description in help message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4096) Update executor memory description in the help message

2014-10-26 Thread WangTaoTheTonic (JIRA)
WangTaoTheTonic created SPARK-4096:
--

 Summary: Update executor memory description in the help message
 Key: SPARK-4096
 URL: https://issues.apache.org/jira/browse/SPARK-4096
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: WangTaoTheTonic
Priority: Minor


Here `ApplicationMaster` accept executor memory argument only in number format, 
we should update the description in help message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4094) checkpoint should still be available after rdd actions

2014-10-26 Thread Zhang, Liye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhang, Liye updated SPARK-4094:
---
Description: 
rdd.checkpoint() must be called before any actions on this rdd, if there is any 
other actions before, checkpoint would never succeed. For the following code as 
example:

*rdd = sc.makeRDD(...)*
*rdd.collect()*
*rdd.checkpoint()*
*rdd.count()*

This rdd would never be checkpointed. But this would not happen for RDD cache. 
RDD cache would always make successfully before rdd actions no matter whether 
there is any actions before cache().
So rdd.checkpoint() should also be with the same behavior with rdd.cache().



  was:kjh


> checkpoint should still be available after rdd actions
> --
>
> Key: SPARK-4094
> URL: https://issues.apache.org/jira/browse/SPARK-4094
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Zhang, Liye
>
> rdd.checkpoint() must be called before any actions on this rdd, if there is 
> any other actions before, checkpoint would never succeed. For the following 
> code as example:
> *rdd = sc.makeRDD(...)*
> *rdd.collect()*
> *rdd.checkpoint()*
> *rdd.count()*
> This rdd would never be checkpointed. But this would not happen for RDD 
> cache. RDD cache would always make successfully before rdd actions no matter 
> whether there is any actions before cache().
> So rdd.checkpoint() should also be with the same behavior with rdd.cache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184889#comment-14184889
 ] 

Patrick Wendell commented on SPARK-4049:


This actually seems alright to me if it means that a single partition is cached 
in two locations.

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4094) checkpoint should still be available after rdd actions

2014-10-26 Thread Zhang, Liye (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhang, Liye updated SPARK-4094:
---
Description: kjh

> checkpoint should still be available after rdd actions
> --
>
> Key: SPARK-4094
> URL: https://issues.apache.org/jira/browse/SPARK-4094
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Zhang, Liye
>
> kjh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4095) [YARN][Minor]extract val isLaunchingDriver in ClientBase

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184881#comment-14184881
 ] 

Apache Spark commented on SPARK-4095:
-

User 'WangTaoTheTonic' has created a pull request for this issue:
https://github.com/apache/spark/pull/2954

> [YARN][Minor]extract val isLaunchingDriver in ClientBase
> 
>
> Key: SPARK-4095
> URL: https://issues.apache.org/jira/browse/SPARK-4095
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: WangTaoTheTonic
>Priority: Minor
>
> Instead of checking if `args.userClass` is null repeatedly, we extract it to 
> an global val as in `ApplicationMaster`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4095) [YARN][Minor]extract val isLaunchingDriver in ClientBase

2014-10-26 Thread WangTaoTheTonic (JIRA)
WangTaoTheTonic created SPARK-4095:
--

 Summary: [YARN][Minor]extract val isLaunchingDriver in ClientBase
 Key: SPARK-4095
 URL: https://issues.apache.org/jira/browse/SPARK-4095
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: WangTaoTheTonic
Priority: Minor


Instead of checking if `args.userClass` is null repeatedly, we extract it to an 
global val as in `ApplicationMaster`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4094) checkpoint should still be available after rdd actions

2014-10-26 Thread Zhang, Liye (JIRA)
Zhang, Liye created SPARK-4094:
--

 Summary: checkpoint should still be available after rdd actions
 Key: SPARK-4094
 URL: https://issues.apache.org/jira/browse/SPARK-4094
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Zhang, Liye






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1442) Add Window function support

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184873#comment-14184873
 ] 

Apache Spark commented on SPARK-1442:
-

User 'guowei2' has created a pull request for this issue:
https://github.com/apache/spark/pull/2953

> Add Window function support
> ---
>
> Key: SPARK-1442
> URL: https://issues.apache.org/jira/browse/SPARK-1442
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Chengxiang Li
>
> similiar to Hive, add window function support for catalyst.
> https://issues.apache.org/jira/browse/HIVE-4197
> https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib

2014-10-26 Thread Ashutosh Trivedi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184848#comment-14184848
 ] 

Ashutosh Trivedi commented on SPARK-2336:
-

Is anybody already working on it ? I can take up this task. We can also 
implement KNN joins which will  be a nice utility for data mining.

Here is the link for KNN-joins
http://ww2.cs.fsu.edu/~czhang/knnjedbt/

> Approximate k-NN Models for MLLib
> -
>
> Key: SPARK-2336
> URL: https://issues.apache.org/jira/browse/SPARK-2336
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Brian Gawalt
>Priority: Minor
>  Labels: features, newbie
>
> After tackling the general k-Nearest Neighbor model as per 
> https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
> also offer approximate k-Nearest Neighbor. A promising approach would involve 
> building a kd-tree variant within from each partition, a la
> http://www.autonlab.org/autonweb/14714.html?branch=1&language=2
> This could offer a simple non-linear ML model that can label new data with 
> much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3988) Public API for DateType support

2014-10-26 Thread Adrian Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180933#comment-14180933
 ] 

Adrian Wang edited comment on SPARK-3988 at 10/27/14 4:28 AM:
--

have to investigate solution 3 in spark-2674


was (Author: adrian-wang):
have to investigate solution 3 in spark-2179

> Public API for DateType support
> ---
>
> Key: SPARK-3988
> URL: https://issues.apache.org/jira/browse/SPARK-3988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>Priority: Minor
>
> add Python API and something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2396) Spark EC2 scripts fail when trying to log in to EC2 instances

2014-10-26 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184803#comment-14184803
 ] 

Anant Daksh Asthana commented on SPARK-2396:


Seems like a python issue on your system. You are missing the subprocess module.

> Spark EC2 scripts fail when trying to log in to EC2 instances
> -
>
> Key: SPARK-2396
> URL: https://issues.apache.org/jira/browse/SPARK-2396
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.0.0
> Environment: Windows 8, Cygwin and command prompt, Python 2.7
>Reporter: Stephen M. Hopper
>  Labels: aws, ec2, ssh
>
> I cannot seem to successfully start up a Spark EC2 cluster using the 
> spark-ec2 script.
> I'm using variations on the following command:
> ./spark-ec2 --instance-type=m1.small --region=us-west-1 --spot-price=0.05 
> --spark-version=1.0.0 -k my-key-name -i my-key-name.pem -s 1 launch 
> spark-test-cluster
> The script always allocates the EC2 instances without much trouble, but can 
> never seem to complete the SSH step to install Spark on the cluster.  It 
> always complains about my SSH key.  If I try to log in with my ssh key doing 
> something like this:
> ssh -i my-key-name.pem root@
> it fails.  However, if I log in to the AWS console, click on my instance and 
> select "connect", it displays the instructions for SSHing into my instance 
> (which are no different from the ssh command from above).  So, if I rerun the 
> SSH command from above, I'm able to log in.
> Next, if I try to rerun the spark-ec2 command from above (replacing "launch" 
> with "start"), the script logs in and starts installing Spark.  However, it 
> eventually errors out with the following output:
> Cloning into 'spark-ec2'...
> remote: Counting objects: 1465, done.
> remote: Compressing objects: 100% (697/697), done.
> remote: Total 1465 (delta 485), reused 1465 (delta 485)
> Receiving objects: 100% (1465/1465), 228.51 KiB | 287 KiB/s, done.
> Resolving deltas: 100% (485/485), done.
> Connection to ec2-.us-west-1.compute.amazonaws.com closed.
> Searching for existing cluster spark-test-cluster...
> Found 1 master(s), 1 slaves
> Starting slaves...
> Starting master...
> Waiting for instances to start up...
> Waiting 120 more seconds...
> Deploying files to master...
> Traceback (most recent call last):
>   File "./spark_ec2.py", line 823, in 
> main()
>   File "./spark_ec2.py", line 815, in main
> real_main()
>   File "./spark_ec2.py", line 806, in real_main
> setup_cluster(conn, master_nodes, slave_nodes, opts, False)
>   File "./spark_ec2.py", line 450, in setup_cluster
> deploy_files(conn, "deploy.generic", opts, master_nodes, slave_nodes, 
> modules)
>   File "./spark_ec2.py", line 593, in deploy_files
> subprocess.check_call(command)
>   File "E:\windows_programs\Python27\lib\subprocess.py", line 535, in 
> check_call
> retcode = call(*popenargs, **kwargs)
>   File "E:\windows_programs\Python27\lib\subprocess.py", line 522, in call
> return Popen(*popenargs, **kwargs).wait()
>   File "E:\windows_programs\Python27\lib\subprocess.py", line 710, in __init__
> errread, errwrite)
>   File "E:\windows_programs\Python27\lib\subprocess.py", line 958, in 
> _execute_child
> startupinfo)
> WindowsError: [Error 2] The system cannot find the file specified
> So, in short, am I missing something or is this a bug?  Any help would be 
> appreciated.
> Other notes:
> -I've tried both us-west-1 and us-east-1 regions.
> -I've tried several different instance types.
> -I've tried playing with the permissions on the ssh key (600, 400, etc.), but 
> to no avail



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3838) Python code example for Word2Vec in user guide

2014-10-26 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184750#comment-14184750
 ] 

Anant Daksh Asthana commented on SPARK-3838:


Pull request for resolution can be found at 
https://github.com/apache/spark/pull/2952

> Python code example for Word2Vec in user guide
> --
>
> Key: SPARK-3838
> URL: https://issues.apache.org/jira/browse/SPARK-3838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Reporter: Xiangrui Meng
>Assignee: Anant Daksh Asthana
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3838) Python code example for Word2Vec in user guide

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184749#comment-14184749
 ] 

Apache Spark commented on SPARK-3838:
-

User 'anantasty' has created a pull request for this issue:
https://github.com/apache/spark/pull/2952

> Python code example for Word2Vec in user guide
> --
>
> Key: SPARK-3838
> URL: https://issues.apache.org/jira/browse/SPARK-3838
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Reporter: Xiangrui Meng
>Assignee: Anant Daksh Asthana
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4091) Occasionally spark.local.dir can be deleted twice and causes test failure

2014-10-26 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian closed SPARK-4091.
-
Resolution: Duplicate

> Occasionally spark.local.dir can be deleted twice and causes test failure
> -
>
> Key: SPARK-4091
> URL: https://issues.apache.org/jira/browse/SPARK-4091
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>
> By persisting an arbitrary RDD with storage level {{MEMORY_AND_DISK}}, Spark 
> may occasionally throw the following exception when shutting down:
> {code}
> java.io.IOException: Failed to list files for dir: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027005012-5bcd/0b
>   at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:664)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
> at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)
> {code}
> By adding log output to {{Utils.deleteRecursively}}, setting breakpoints at 
> {{File.delete}} in IntelliJ, and asking IntelliJ to evaluate and log 
> {{Thread.currentThread().getStackTrace()}} when the breakpoint is hit rather 
> than suspend execution, we can get the following result, which shows 
> {{spark.local.dir}} is deleted twice from both {{DiskBlockManager.stop}} and 
> the shutdown hook installed in {{Utils}}:
> {code}
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)]
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:157)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:154)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   
> org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:154)
>   
> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.appl

[jira] [Commented] (SPARK-4091) Occasionally spark.local.dir can be deleted twice and causes test failure

2014-10-26 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184741#comment-14184741
 ] 

Cheng Lian commented on SPARK-4091:
---

Yes, thanks [~joshrosen], closing this.

> Occasionally spark.local.dir can be deleted twice and causes test failure
> -
>
> Key: SPARK-4091
> URL: https://issues.apache.org/jira/browse/SPARK-4091
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>
> By persisting an arbitrary RDD with storage level {{MEMORY_AND_DISK}}, Spark 
> may occasionally throw the following exception when shutting down:
> {code}
> java.io.IOException: Failed to list files for dir: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027005012-5bcd/0b
>   at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:664)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
> at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)
> {code}
> By adding log output to {{Utils.deleteRecursively}}, setting breakpoints at 
> {{File.delete}} in IntelliJ, and asking IntelliJ to evaluate and log 
> {{Thread.currentThread().getStackTrace()}} when the breakpoint is hit rather 
> than suspend execution, we can get the following result, which shows 
> {{spark.local.dir}} is deleted twice from both {{DiskBlockManager.stop}} and 
> the shutdown hook installed in {{Utils}}:
> {code}
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)]
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:157)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:154)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   
> org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:154)
> 

[jira] [Created] (SPARK-4093) Simplify the unwrap/wrap between HiveUDFs

2014-10-26 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4093:


 Summary: Simplify the unwrap/wrap between HiveUDFs
 Key: SPARK-4093
 URL: https://issues.apache.org/jira/browse/SPARK-4093
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao
Priority: Minor


Currently, the nested Hive UDFs invoking will cause extra overhead in 
"unwrapping" / "wrapping" data.
e.g.
SELECT cos(sin(a)) FROM t;

We can reuse the ObjectInspector & output result of nested Hive UDF(sin), and 
avoid the extra data "unwrap" and "wrap".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3970) Remove duplicate removal of local dirs

2014-10-26 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3970:
-
Assignee: Liang-Chi Hsieh

> Remove duplicate removal of local dirs
> --
>
> Key: SPARK-3970
> URL: https://issues.apache.org/jira/browse/SPARK-3970
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 1.2.0
>
>
> The shutdown hook of DiskBlockManager would remove localDirs. So do not need 
> to register them with Utils.registerShutdownDeleteDir. It causes duplicate 
> removal of these local dirs and corresponding exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3970) Remove duplicate removal of local dirs

2014-10-26 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3970.

  Resolution: Fixed
   Fix Version/s: 1.2.0
Target Version/s: 1.2.0

> Remove duplicate removal of local dirs
> --
>
> Key: SPARK-3970
> URL: https://issues.apache.org/jira/browse/SPARK-3970
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 1.2.0
>
>
> The shutdown hook of DiskBlockManager would remove localDirs. So do not need 
> to register them with Utils.registerShutdownDeleteDir. It causes duplicate 
> removal of these local dirs and corresponding exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3970) Remove duplicate removal of local dirs

2014-10-26 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3970:
-
Affects Version/s: 1.1.0

> Remove duplicate removal of local dirs
> --
>
> Key: SPARK-3970
> URL: https://issues.apache.org/jira/browse/SPARK-3970
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 1.2.0
>
>
> The shutdown hook of DiskBlockManager would remove localDirs. So do not need 
> to register them with Utils.registerShutdownDeleteDir. It causes duplicate 
> removal of these local dirs and corresponding exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2760) Caching tables from multiple databases does not work

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2760.
-
   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: Michael Armbrust

This was fixed with the caching overhaul.

> Caching tables from multiple databases does not work
> 
>
> Key: SPARK-2760
> URL: https://issues.apache.org/jira/browse/SPARK-2760
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4042) append columns ids and names before broadcast

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4042.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2885
[https://github.com/apache/spark/pull/2885]

> append columns ids and names before broadcast
> -
>
> Key: SPARK-4042
> URL: https://issues.apache.org/jira/browse/SPARK-4042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
> Fix For: 1.2.0
>
>
> appended columns ids and names will not broadcast because we append them 
> after creating table reader. This leads to the config broadcasted to executor 
> side dose not contain the configs of appended columns and names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4061) We cannot use EOL character in the operand of LIKE predicate.

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4061.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2908
[https://github.com/apache/spark/pull/2908]

> We cannot use EOL character in the operand of LIKE predicate.
> -
>
> Key: SPARK-4061
> URL: https://issues.apache.org/jira/browse/SPARK-4061
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
> Fix For: 1.2.0
>
>
> We cannot use EOL character like \n or \r in the operand of LIKE predicate.
> So following condition is never true.
> {code}
> -- someStr is 'hoge\nfuga'
> where someStr LIKE 'hoge_fuga'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3959) SqlParser fails to parse literal -9223372036854775808 (Long.MinValue).

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3959.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2816
[https://github.com/apache/spark/pull/2816]

> SqlParser fails to parse literal -9223372036854775808 (Long.MinValue).
> --
>
> Key: SPARK-3959
> URL: https://issues.apache.org/jira/browse/SPARK-3959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
>Priority: Critical
> Fix For: 1.2.0
>
>
> SqlParser fails to parse -9223372036854775808 (Long.MinValue) so we cannot 
> write queries such like as follows.
> {code}
> SELECT value FROM someTable WHERE value > -9223372036854775808
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3483) Special chars in column names

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3483.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2927
[https://github.com/apache/spark/pull/2927]

> Special chars in column names
> -
>
> Key: SPARK-3483
> URL: https://issues.apache.org/jira/browse/SPARK-3483
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.0.2
>Reporter: Kuldeep
>Assignee: Ravindra Pesala
> Fix For: 1.2.0
>
>
> For columns with special characters in names, double quoted ANSI syntax would 
> be nice to have.
> select "a/b" from mytable
> Is there a workaround for this? Currently the grammar interprets this as a 
> string value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3266) JavaDoubleRDD doesn't contain max()

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184704#comment-14184704
 ] 

Patrick Wendell commented on SPARK-3266:


I think it sort of depends how many people use JavaRDDLike and how they use it. 
In my mind it wasn't intended to be used by user applications, but probably 
some do because there isn't really a way to write functions that pass RDD's 
around and deal with both Pair RDD's and normal ones in Java. [~matei], what do 
you think of this vis-a-vis compatibility?

> JavaDoubleRDD doesn't contain max()
> ---
>
> Key: SPARK-3266
> URL: https://issues.apache.org/jira/browse/SPARK-3266
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0
>Reporter: Amey Chaugule
>Assignee: Josh Rosen
> Attachments: spark-repro-3266.tar.gz
>
>
> While I can compile my code, I see:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double;
> When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I 
> don't notice max()
> although it is clearly listed in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4068) NPE in jsonRDD schema inference

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4068.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2918
[https://github.com/apache/spark/pull/2918]

> NPE in jsonRDD schema inference
> ---
>
> Key: SPARK-4068
> URL: https://issues.apache.org/jira/browse/SPARK-4068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Michael Armbrust
>Assignee: Yin Huai
>Priority: Critical
> Fix For: 1.2.0
>
>
> {code}
> val jsonData = """{"data":[[null], [[["Test"}""" :: """{"other": ""}""" 
> :: Nil
> sqlContext.jsonRDD(sc.parallelize(jsonData))
> {code}
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in 
> stage 5.0 failed 4 times, most recent failure: Lost task 13.3 in stage 5.0 
> (TID 347, ip-10-0-234-152.us-west-2.compute.internal): 
> java.lang.NullPointerException: 
> 
> org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$allKeysWithValueTypes$1.org$apache$spark$sql$json$JsonRDD$$anonfun$$buildKeyPathForInnerStructs$1(JsonRDD.scala:252)
> 
> org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$allKeysWithValueTypes$1$$anonfun$org$apache$spark$sql$json$JsonRDD$$anonfun$$buildKeyPathForInnerStructs$1$3.apply(JsonRDD.scala:253)
> 
> org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$allKeysWithValueTypes$1$$anonfun$org$apache$spark$sql$json$JsonRDD$$anonfun$$buildKeyPathForInnerStructs$1$3.apply(JsonRDD.scala:253)
> 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4052) Use scala.collection.Map for pattern matching instead of using Predef.Map (it is scala.collection.immutable.Map)

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4052.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

> Use scala.collection.Map for pattern matching instead of using Predef.Map (it 
> is scala.collection.immutable.Map)
> 
>
> Key: SPARK-4052
> URL: https://issues.apache.org/jira/browse/SPARK-4052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Minor
> Fix For: 1.2.0
>
>
> Seems ScalaReflection and InsertIntoHiveTable only take 
> scala.collection.immutable.Map as the value type of MapType. Here are test 
> cases showing errors.
> {code}
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> import sqlContext.createSchemaRDD
> val rdd = sc.parallelize(("key", "value") :: Nil)
> // Test1: This one fails.
> case class Test1(m: scala.collection.Map[String, String])
> val rddOfTest1 = rdd.map { case (k, v) => Test1(Map(k->v)) }
> rddOfTest1.registerTempTable("t1")
> /* Stack trace
> scala.MatchError: scala.collection.Map[String,String] (of class 
> scala.reflect.internal.Types$TypeRef$$anon$5)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:64)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:62)
> ...
> */
> // Test2: This one is fine.
> case class Test2(m: scala.collection.immutable.Map[String, String])
> val rddOfTest2 = rdd.map { case (k, v) => Test2(Map(k->v)) }
> rddOfTest2.registerTempTable("t2")
> sqlContext.sql("SELECT m FROM t2").collect
> sqlContext.sql("SELECT m['key'] FROM t2").collect
> // Test3: This one fails.
> val schema = StructType(StructField("m", MapType(StringType, StringType), 
> true) :: Nil)
> val rowRDD = rdd.map { case (k, v) =>  
> Row(scala.collection.mutable.HashMap(k->v)) }
> val schemaRDD = sqlContext.applySchema(rowRDD, schema)
> schemaRDD.registerTempTable("t3")
> sqlContext.sql("SELECT m FROM t3").collect
> sqlContext.sql("SELECT m['key'] FROM t3").collect
> sqlContext.sql("CREATE TABLE testHiveTable1(m MAP )")
> sqlContext.sql("INSERT OVERWRITE TABLE testHiveTable1 SELECT m FROM t3")
> /* Stack trace
> 14/10/22 19:30:56 INFO DAGScheduler: Job 4 failed: runJob at 
> InsertIntoHiveTable.scala:124, took 1.384579 s
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
> stage 4.0 failed 4 times, most recent failure: Lost task 1.3 in stage 4.0 
> (TID 12, yins-mbp): java.lang.ClassCastException: 
> scala.collection.mutable.HashMap cannot be cast to 
> scala.collection.immutable.Map
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$wrapperFor$5.apply(InsertIntoHiveTable.scala:96)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$wrapperFor$5.apply(InsertIntoHiveTable.scala:96)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:148)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:145)
> */
> // Test4: This one is fine.
> val rowRDD = rdd.map { case (k, v) =>  Row(Map(k->v)) }
> val schemaRDD = sqlContext.applySchema(rowRDD, schema)
> schemaRDD.registerTempTable("t4")
> sqlContext.sql("SELECT m FROM t4").collect
> sqlContext.sql("SELECT m['key'] FROM t4").collect
> sqlContext.sql("CREATE TABLE testHiveTable1(m MAP )")
> sqlContext.sql("INSERT OVERWRITE TABLE testHiveTable1 SELECT m FROM t4")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3953) Confusable variable name.

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3953.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2807
[https://github.com/apache/spark/pull/2807]

> Confusable variable name.
> -
>
> Key: SPARK-3953
> URL: https://issues.apache.org/jira/browse/SPARK-3953
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Fix For: 1.2.0
>
>
> In SqlParser.scala, there is following code.
> {code}
> case d ~ p ~ r ~ f ~ g ~ h ~ o ~ l  =>
>   val base = r.getOrElse(NoRelation)
>   val withFilter = f.map(f => Filter(f, base)).getOrElse(base)
> {code}
> in the code above, there 2 variables which has same name "f" in near place. 
> One is receiver "f" and other is bound variable "f".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3997) scalastyle should output the error location

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3997.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2846
[https://github.com/apache/spark/pull/2846]

> scalastyle should output the error location
> ---
>
> Key: SPARK-3997
> URL: https://issues.apache.org/jira/browse/SPARK-3997
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Guoqiang Li
> Fix For: 1.2.0
>
>
> {{./dev/scalastyle}} =>
> {noformat}
> Scalastyle checks failed at following occurrences:
> java.lang.RuntimeException: exists error
>   at scala.sys.package$.error(package.scala:27)
>   at scala.Predef$.error(Predef.scala:142)
> [error] (mllib/*:scalastyle) exists error
> {noformat}
> scalastyle should output the error location:
> {noformat}
> [error] 
> /Users/witgo/work/code/java/spark/mllib/src/main/scala/org/apache/spark/mllib/feature/TopicModeling.scala:413:
>  File line length exceeds 100 characters
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3266) JavaDoubleRDD doesn't contain max()

2014-10-26 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184696#comment-14184696
 ] 

Josh Rosen commented on SPARK-3266:
---

I've opened a new pull request which tries to work around the Scala issue by 
moving the implementations of these methods from the Java*Like traits into 
abstract base classes that inherit from those traits (essentially making the 
traits act as interfaces).  This breaks binary compatibility from Scala's point 
of view, since the fact that a trait contains a default implementation of a 
method is part of its API contract (it affects implementors of that trait).  I 
don't think there's any legitimate reason for someone to have extended 
JavaRDDLike from their own code, so we shouldn't have to worry about this.

>From a simplicity perspective, I prefer the approach from my first PR of 
>simply converting JavaRDDLike into an abstract class.  This would cause 
>problems for Java API users who were invoking methods through the interface, 
>though.  I can't imagine that most users would have done this, but maybe it's 
>important to not break compatibility.  On the other hand, the current API is 
>functionally broken as long as it's throwing NoSuchMethodErrors.

The one approach that doesn't break _any_ binary compatibility would be to just 
keep the default implementations of methods in JavaRDDLike then copy-paste the 
ones affected by the bugs into the individual JavaRDD classes.  This is a mess, 
but I can do it if necessary.

> JavaDoubleRDD doesn't contain max()
> ---
>
> Key: SPARK-3266
> URL: https://issues.apache.org/jira/browse/SPARK-3266
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0
>Reporter: Amey Chaugule
>Assignee: Josh Rosen
> Attachments: spark-repro-3266.tar.gz
>
>
> While I can compile my code, I see:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double;
> When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I 
> don't notice max()
> although it is clearly listed in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3537) Statistics for cached RDDs

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3537.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2860
[https://github.com/apache/spark/pull/2860]

> Statistics for cached RDDs
> --
>
> Key: SPARK-3537
> URL: https://issues.apache.org/jira/browse/SPARK-3537
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
> Fix For: 1.2.0
>
>
> Right now we only have limited statistics for hive tables.  We could easily 
> collect this data when caching an RDD as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3266) JavaDoubleRDD doesn't contain max()

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184688#comment-14184688
 ] 

Apache Spark commented on SPARK-3266:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2951

> JavaDoubleRDD doesn't contain max()
> ---
>
> Key: SPARK-3266
> URL: https://issues.apache.org/jira/browse/SPARK-3266
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0
>Reporter: Amey Chaugule
>Assignee: Josh Rosen
> Attachments: spark-repro-3266.tar.gz
>
>
> While I can compile my code, I see:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double;
> When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I 
> don't notice max()
> although it is clearly listed in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3925) Do not consider the ordering of qualifiers during comparison

2014-10-26 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3925.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2783
[https://github.com/apache/spark/pull/2783]

> Do not consider the ordering of qualifiers during comparison
> 
>
> Key: SPARK-3925
> URL: https://issues.apache.org/jira/browse/SPARK-3925
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Fix For: 1.2.0
>
>
> The qualifiers orderings should not be considered during the comparison 
> between old qualifiers and new qualifiers when calling 'withQualifiers'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-799) Windows versions of the deploy scripts

2014-10-26 Thread Andrew Tweddle (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184663#comment-14184663
 ] 

Andrew Tweddle commented on SPARK-799:
--

Powershell is the modern Microsoft shell for Windows.

Do you specifically want .cmd files rather than .ps1? What about .cmd files 
that delegate to .ps1 scripts?

> Windows versions of the deploy scripts
> --
>
> Key: SPARK-799
> URL: https://issues.apache.org/jira/browse/SPARK-799
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Reporter: Matei Zaharia
>  Labels: Starter
>
> Although the Spark daemons run fine on Windows with run.cmd, the deploy 
> scripts (bin/start-all.sh and such) don't do so unless you have Cygwin. It 
> would be nice to make .cmd versions of those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3960) We can apply unary minus only to literal.

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184649#comment-14184649
 ] 

Apache Spark commented on SPARK-3960:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/2949

> We can apply unary minus only to literal.
> -
>
> Key: SPARK-3960
> URL: https://issues.apache.org/jira/browse/SPARK-3960
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
>Priority: Critical
>
> Because of the wrong syntax definition, we cannot apply unary minus only to 
> literal. So, we cannot write such expressions.
> {code}
> -(value1 + value2) // Parenthesized expressions
> -column // Columns
> -MAX(column) // Functions
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3959) SqlParser fails to parse literal -9223372036854775808 (Long.MinValue).

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184648#comment-14184648
 ] 

Apache Spark commented on SPARK-3959:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/2949

> SqlParser fails to parse literal -9223372036854775808 (Long.MinValue).
> --
>
> Key: SPARK-3959
> URL: https://issues.apache.org/jira/browse/SPARK-3959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
>Priority: Critical
>
> SqlParser fails to parse -9223372036854775808 (Long.MinValue) so we cannot 
> write queries such like as follows.
> {code}
> SELECT value FROM someTable WHERE value > -9223372036854775808
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4092) Input metrics don't work for coalesce()'d RDD's

2014-10-26 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-4092:
--

 Summary: Input metrics don't work for coalesce()'d RDD's
 Key: SPARK-4092
 URL: https://issues.apache.org/jira/browse/SPARK-4092
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Priority: Critical


In every case where we set input metrics (from both Hadoop and block storage) 
we currently assume that exactly one input partition is computed within the 
task. This is not a correct assumption in the general case. The main example in 
the current API is coalesce(), but user-defined RDD's could also be affected.

To deal with the most general case, we would need to support the notion of a 
single task having multiple input sources. A more surgical and less general fix 
is to simply go to HadoopRDD and check if there are already inputMetrics 
defined for the task with the same "type". If there are, then merge in the new 
data rather than blowing away the old one.

This wouldn't cover case where, e.g. a single task has input from both on-disk 
and in-memory blocks. It _would_ cover the case where someone calls coalesce on 
a HadoopRDD... which is more common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2811) update algebird to 0.8.1

2014-10-26 Thread Adam Pingel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184635#comment-14184635
 ] 

Adam Pingel commented on SPARK-2811:


This seemed like an easy first way to contribute to spark.  I created a pull 
request with the 1-line change https://github.com/apache/spark/pull/2947  and 
confirmed that the two uses of Algebird (the streaming examples 
TwitterAlgebirdHLL and TwitterAlgebirdCMS) still work.

> update algebird to 0.8.1
> 
>
> Key: SPARK-2811
> URL: https://issues.apache.org/jira/browse/SPARK-2811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First algebird_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2811) update algebird to 0.8.1

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184627#comment-14184627
 ] 

Apache Spark commented on SPARK-2811:
-

User 'adampingel' has created a pull request for this issue:
https://github.com/apache/spark/pull/2947

> update algebird to 0.8.1
> 
>
> Key: SPARK-2811
> URL: https://issues.apache.org/jira/browse/SPARK-2811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> First algebird_2.11 0.8.1 has to be released



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1812) Support cross-building with Scala 2.11

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184625#comment-14184625
 ] 

Apache Spark commented on SPARK-1812:
-

User 'adampingel' has created a pull request for this issue:
https://github.com/apache/spark/pull/2947

> Support cross-building with Scala 2.11
> --
>
> Key: SPARK-1812
> URL: https://issues.apache.org/jira/browse/SPARK-1812
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Spark Core
>Reporter: Matei Zaharia
>Assignee: Prashant Sharma
>
> Since Scala 2.10/2.11 are source compatible, we should be able to cross build 
> for both versions. From what I understand there are basically three things we 
> need to figure out:
> 1. Have a two versions of our dependency graph, one that uses 2.11 
> dependencies and the other that uses 2.10 dependencies.
> 2. Figure out how to publish different poms for 2.10 and 2.11.
> I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't 
> really well supported by Maven since published pom's aren't generated 
> dynamically. But we can probably script around it to make it work. I've done 
> some initial sanity checks with a simple build here:
> https://github.com/pwendell/scala-maven-crossbuild



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4061) We cannot use EOL character in the operand of LIKE predicate.

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184623#comment-14184623
 ] 

Apache Spark commented on SPARK-4061:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/2946

> We cannot use EOL character in the operand of LIKE predicate.
> -
>
> Key: SPARK-4061
> URL: https://issues.apache.org/jira/browse/SPARK-4061
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Kousuke Saruta
>
> We cannot use EOL character like \n or \r in the operand of LIKE predicate.
> So following condition is never true.
> {code}
> -- someStr is 'hoge\nfuga'
> where someStr LIKE 'hoge_fuga'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2105) SparkUI doesn't remove active stages that failed

2014-10-26 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184620#comment-14184620
 ] 

Andrew Or commented on SPARK-2105:
--

Hey Josh I think this was fixed by this commit:
https://github.com/apache/spark/commit/d934801d53fc2f1d57d3534ae4e1e9384c7dda99

The root cause is because we were dropping events, and that happened because 
one of the listeners was taking all the time to process the events. We may run 
into this only if the application attaches arbitrary listeners to Spark and 
these listeners perform expensive operations, but from Spark's side I don't 
think there's anything we can do about that.

> SparkUI doesn't remove active stages that failed
> 
>
> Key: SPARK-2105
> URL: https://issues.apache.org/jira/browse/SPARK-2105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> If a stage fails because its tasks cannot be serialized, for instance, the 
> failed stage remains in the Active Stages section forever. This is because 
> the StageCompleted event is never posted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3590) Expose async APIs in the Java API

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3590.
---
   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: Josh Rosen

> Expose async APIs in the Java API
> -
>
> Key: SPARK-3590
> URL: https://issues.apache.org/jira/browse/SPARK-3590
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Marcelo Vanzin
>Assignee: Josh Rosen
> Fix For: 1.2.0
>
>
> Currently, a single async method is exposed through the Java API 
> (JavaRDDLike::foreachAsync). That method returns a Scala future 
> (FutureAction).
> We should bring the Java API up to sync with the Scala async APIs, and also 
> expose Java-friendly types (e.g. a proper java.util.concurrent.Future).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3274) Spark Streaming Java API reports java.lang.ClassCastException when calling collectAsMap on JavaPairDStream

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3274.
---
Resolution: Invalid

> Spark Streaming Java API reports java.lang.ClassCastException when calling 
> collectAsMap on JavaPairDStream
> --
>
> Key: SPARK-3274
> URL: https://issues.apache.org/jira/browse/SPARK-3274
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.0.2
>Reporter: Jack Hu
>
> Reproduce code:
> scontext
>   .socketTextStream("localhost", 1)
>   .mapToPair(new PairFunction(){
>   public Tuple2 call(String arg0)
>   throws Exception {
>   return new Tuple2("1", arg0);
>   }
>   })
>   .foreachRDD(new Function2, Time, 
> Void>() {
>   public Void call(JavaPairRDD v1, Time 
> v2) throws Exception {
>   System.out.println(v2.toString() + ": " + 
> v1.collectAsMap().toString());
>   return null;
>   }
>   });
> Exception:
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
> [Lscala.Tupl
> e2;
> at 
> org.apache.spark.rdd.PairRDDFunctions.collectAsMap(PairRDDFunctions.s
> cala:447)
> at 
> org.apache.spark.api.java.JavaPairRDD.collectAsMap(JavaPairRDD.scala:
> 464)
> at tuk.usecase.failedcall.FailedCall$1.call(FailedCall.java:90)
> at tuk.usecase.failedcall.FailedCall$1.call(FailedCall.java:88)
> at 
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachR
> DD$2.apply(JavaDStreamLike.scala:282)
> at 
> org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachR
> DD$2.apply(JavaDStreamLike.scala:282)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mc
> V$sp(ForEachDStream.scala:41)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(Fo
> rEachDStream.scala:40)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(Fo
> rEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
> at 
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2105) SparkUI doesn't remove active stages that failed

2014-10-26 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184613#comment-14184613
 ] 

Josh Rosen commented on SPARK-2105:
---

I tried and failed to reproduce this: 
https://github.com/apache/spark/commit/bf589fc717c842d1998e3c3a523bc8775cb30269#diff-f346ada4cd59416756b6dd36b6c2605aR97

That doesn't mean that we've fixed the issue, though.  In my tests, the stage 
never becomes active because the ClosureCleaner detects that the task isn't 
serializable.  Maybe there's some UDF that manages to slip through the closure 
cleaning step and fails once the stage is submitted to the scheduler, so it's 
still possible that we could hit this bug.

> SparkUI doesn't remove active stages that failed
> 
>
> Key: SPARK-2105
> URL: https://issues.apache.org/jira/browse/SPARK-2105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> If a stage fails because its tasks cannot be serialized, for instance, the 
> failed stage remains in the Active Stages section forever. This is because 
> the StageCompleted event is never posted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3021) Job remains in Active Stages after failing

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3021.
---
   Resolution: Cannot Reproduce
Fix Version/s: 1.2.0
 Assignee: Josh Rosen

I tried to reproduce this in Selenium 
(https://github.com/apache/spark/commit/bf589fc717c842d1998e3c3a523bc8775cb30269#diff-f346ada4cd59416756b6dd36b6c2605aR87),
 but wasn't able to find a reproduction in Spark 1.2.  Therefore, I'm going to 
resolve this as "Cannot Reproduce" for now.

> Job remains in Active Stages after failing
> --
>
> Key: SPARK-3021
> URL: https://issues.apache.org/jira/browse/SPARK-3021
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.1.0
>Reporter: Michael Armbrust
>Assignee: Josh Rosen
> Fix For: 1.2.0
>
>
> It died with the following exception, but i still hanging out in the UI.
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in 
> stage 8.1 failed 4 times, most recent failure: Lost task 20.3 in stage 8.1 
> (TID 710, ip-10-0-166-165.us-west-2.compute.internal): ExecutorLostFailure 
> (executor lost)
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1153)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1142)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1141)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2527) incorrect persistence level shown in Spark UI after repersisting

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-2527:
--
Fix Version/s: 1.2.0

> incorrect persistence level shown in Spark UI after repersisting
> 
>
> Key: SPARK-2527
> URL: https://issues.apache.org/jira/browse/SPARK-2527
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Diana Carroll
>Assignee: Josh Rosen
> Fix For: 1.2.0
>
> Attachments: persistbug1.png, persistbug2.png
>
>
> If I persist an RDD at one level, unpersist it, then repersist it at another 
> level, the UI will continue to show the RDD at the first level...but 
> correctly show individual partitions at the second level.
> {code}
> import org.apache.spark.api.java.StorageLevels
> import org.apache.spark.api.java.StorageLevels._
> val test1 = sc.parallelize(Array(1,2,3))test1.persist(StorageLevels.DISK_ONLY)
> test1.count()
> test1.unpersist()
> test1.persist(StorageLevels.MEMORY_ONLY)
> test1.count()
> {code}
> after the first call to persist and count, the Spark App web UI shows:
> RDD Storage Info for 14 Storage Level: Disk Serialized 1x Replicated 
> rdd_14_0  Disk Serialized 1x Replicated
> After the second call, it shows:
> RDD Storage Info for 14 Storage Level: Disk Serialized 1x Replicated 
> rdd_14_0  Memory Deserialized 1x Replicated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2527) incorrect persistence level shown in Spark UI after repersisting

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2527.
---
Resolution: Cannot Reproduce
  Assignee: Josh Rosen

I think that this was fixed in either 1.1 or 1.2 since I was unable to 
reproduce this when writing a Selenium test to run your example script:

https://github.com/apache/spark/commit/bf589fc717c842d1998e3c3a523bc8775cb30269#diff-f346ada4cd59416756b6dd36b6c2605aR53

Therefore, I'm going to mark this as "Cannot Reproduce" since it was probably 
fixed.  Please re-open this ticket if you observe this in the wild with a newer 
version of Spark.

> incorrect persistence level shown in Spark UI after repersisting
> 
>
> Key: SPARK-2527
> URL: https://issues.apache.org/jira/browse/SPARK-2527
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Diana Carroll
>Assignee: Josh Rosen
> Attachments: persistbug1.png, persistbug2.png
>
>
> If I persist an RDD at one level, unpersist it, then repersist it at another 
> level, the UI will continue to show the RDD at the first level...but 
> correctly show individual partitions at the second level.
> {code}
> import org.apache.spark.api.java.StorageLevels
> import org.apache.spark.api.java.StorageLevels._
> val test1 = sc.parallelize(Array(1,2,3))test1.persist(StorageLevels.DISK_ONLY)
> test1.count()
> test1.unpersist()
> test1.persist(StorageLevels.MEMORY_ONLY)
> test1.count()
> {code}
> after the first call to persist and count, the Spark App web UI shows:
> RDD Storage Info for 14 Storage Level: Disk Serialized 1x Replicated 
> rdd_14_0  Disk Serialized 1x Replicated
> After the second call, it shows:
> RDD Storage Info for 14 Storage Level: Disk Serialized 1x Replicated 
> rdd_14_0  Memory Deserialized 1x Replicated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2698) RDD pages shows negative bytes remaining for some executors

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-2698:
--
Summary: RDD pages shows negative bytes remaining for some executors  (was: 
RDD page Spark Web UI bug)

> RDD pages shows negative bytes remaining for some executors
> ---
>
> Key: SPARK-2698
> URL: https://issues.apache.org/jira/browse/SPARK-2698
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Hossein Falaki
> Attachments: spark ui.png
>
>
> The RDD page shows negative bytes remaining for some executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3616) Add Selenium tests to Web UI

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3616.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2474
[https://github.com/apache/spark/pull/2474]

> Add Selenium tests to Web UI
> 
>
> Key: SPARK-3616
> URL: https://issues.apache.org/jira/browse/SPARK-3616
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.2.0
>
>
> We should add basic Selenium tests to Web UI suite.  This will make it easy 
> to write regression tests / reproductions for UI bugs and will be useful in 
> testing some planned refactorings / redesigns that I'm working on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1758) failing test org.apache.spark.JavaAPISuite.wholeTextFiles

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-1758.
---
Resolution: Cannot Reproduce

Resolving this as "Cannot Reproduce" for now, since I haven't observed this 
problem and both PRs for this were closed.

> failing test org.apache.spark.JavaAPISuite.wholeTextFiles
> -
>
> Key: SPARK-1758
> URL: https://issues.apache.org/jira/browse/SPARK-1758
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.0.0
>Reporter: Nishkam Ravi
> Fix For: 1.0.0
>
> Attachments: SPARK-1758.patch
>
>
> Test org.apache.spark.JavaAPISuite.wholeTextFiles fails (during sbt/sbt test) 
> with the following error message:
> Test org.apache.spark.JavaAPISuite.wholeTextFiles failed: 
> java.lang.AssertionError: expected: but was:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3962) Mark spark dependency as "provided" in external libraries

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184580#comment-14184580
 ] 

Patrick Wendell commented on SPARK-3962:


[~prashant_] can you take a crack at this? It's pretty simple, we just want the 
streaming external projects to mark spark-core as provided.

> Mark spark dependency as "provided" in external libraries
> -
>
> Key: SPARK-3962
> URL: https://issues.apache.org/jira/browse/SPARK-3962
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Patrick Wendell
>Assignee: Prashant Sharma
>Priority: Blocker
>
> Right now there is not an easy way for users to link against the external 
> streaming libraries and not accidentally pull Spark into their assembly jar. 
> We should mark Spark as "provided" in the external connector pom's so that 
> user applications can simply include those like any other dependency in the 
> user's jar.
> This is also the best format for third-party libraries that depend on Spark 
> (of which there will eventually be many) so it would be nice for our own 
> build to conform to this nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3962) Mark spark dependency as "provided" in external libraries

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184580#comment-14184580
 ] 

Patrick Wendell edited comment on SPARK-3962 at 10/26/14 6:11 PM:
--

[~prashant_] can you take a crack at this? It's pretty simple, we just want the 
streaming external projects to mark spark-core and spark-streaming as provided.


was (Author: pwendell):
[~prashant_] can you take a crack at this? It's pretty simple, we just want the 
streaming external projects to mark spark-core as provided.

> Mark spark dependency as "provided" in external libraries
> -
>
> Key: SPARK-3962
> URL: https://issues.apache.org/jira/browse/SPARK-3962
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Patrick Wendell
>Assignee: Prashant Sharma
>Priority: Blocker
>
> Right now there is not an easy way for users to link against the external 
> streaming libraries and not accidentally pull Spark into their assembly jar. 
> We should mark Spark as "provided" in the external connector pom's so that 
> user applications can simply include those like any other dependency in the 
> user's jar.
> This is also the best format for third-party libraries that depend on Spark 
> (of which there will eventually be many) so it would be nice for our own 
> build to conform to this nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3962) Mark spark dependency as "provided" in external libraries

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3962:
---
Assignee: Prashant Sharma

> Mark spark dependency as "provided" in external libraries
> -
>
> Key: SPARK-3962
> URL: https://issues.apache.org/jira/browse/SPARK-3962
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Patrick Wendell
>Assignee: Prashant Sharma
>Priority: Blocker
>
> Right now there is not an easy way for users to link against the external 
> streaming libraries and not accidentally pull Spark into their assembly jar. 
> We should mark Spark as "provided" in the external connector pom's so that 
> user applications can simply include those like any other dependency in the 
> user's jar.
> This is also the best format for third-party libraries that depend on Spark 
> (of which there will eventually be many) so it would be nice for our own 
> build to conform to this nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2633.

Resolution: Duplicate

I believe the design of SPARK-2321 is such that it covers Hive's use case. So 
I'm closing this as a dup of that issue.

> enhance spark listener API to gather more spark job information
> ---
>
> Key: SPARK-2633
> URL: https://issues.apache.org/jira/browse/SPARK-2633
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API
>Reporter: Chengxiang Li
>Priority: Critical
>  Labels: hive
> Attachments: Spark listener enhancement for Hive on Spark job monitor 
> and statistic.docx
>
>
> Based on Hive on Spark job status monitoring and statistic collection 
> requirement, try to enhance spark listener API to gather more spark job 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4091) Occasionally spark.local.dir can be deleted twice and causes test failure

2014-10-26 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184577#comment-14184577
 ] 

Josh Rosen commented on SPARK-4091:
---

This looks like a duplicate of SPARK-3970.

> Occasionally spark.local.dir can be deleted twice and causes test failure
> -
>
> Key: SPARK-4091
> URL: https://issues.apache.org/jira/browse/SPARK-4091
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>
> By persisting an arbitrary RDD with storage level {{MEMORY_AND_DISK}}, Spark 
> may occasionally throw the following exception when shutting down:
> {code}
> java.io.IOException: Failed to list files for dir: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027005012-5bcd/0b
>   at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:664)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
> at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)
> {code}
> By adding log output to {{Utils.deleteRecursively}}, setting breakpoints at 
> {{File.delete}} in IntelliJ, and asking IntelliJ to evaluate and log 
> {{Thread.currentThread().getStackTrace()}} when the breakpoint is hit rather 
> than suspend execution, we can get the following result, which shows 
> {{spark.local.dir}} is deleted twice from both {{DiskBlockManager.stop}} and 
> the shutdown hook installed in {{Utils}}:
> {code}
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)]
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:157)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:154)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   
> org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:154)
>  

[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184576#comment-14184576
 ] 

Patrick Wendell commented on SPARK-2532:


Hey [~matei] - you created some sub-tasks here that are pretty tersely 
described... would you mind looking through them and deciding whether these are 
still relevant? Not sure whether we can close this.

> Fix issues with consolidated shuffle
> 
>
> Key: SPARK-2532
> URL: https://issues.apache.org/jira/browse/SPARK-2532
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Mridul Muralidharan
>Assignee: Mridul Muralidharan
>Priority: Critical
>
> Will file PR with changes as soon as merge is done (earlier merge became 
> outdated in 2 weeks unfortunately :) ).
> Consolidated shuffle is broken in multiple ways in spark :
> a) Task failure(s) can cause the state to become inconsistent.
> b) Multiple revert's or combination of close/revert/close can cause the state 
> to be inconsistent.
> (As part of exception/error handling).
> c) Some of the api in block writer causes implementation issues - for 
> example: a revert is always followed by close : but the implemention tries to 
> keep them separate, resulting in surface for errors.
> d) Fetching data from consolidated shuffle files can go badly wrong if the 
> file is being actively written to : it computes length by subtracting next 
> offset from current offset (or length if this is last offset)- the latter 
> fails when fetch is happening in parallel to write.
> Note, this happens even if there are no task failures of any kind !
> This usually results in stream corruption or decompression errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3917) Compress data before network transfer

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3917:
---
Priority: Major  (was: Critical)

> Compress data before network transfer
> -
>
> Key: SPARK-3917
> URL: https://issues.apache.org/jira/browse/SPARK-3917
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.1.0
> Environment: All
>Reporter: junlong
> Fix For: 1.1.0
>
>
> When training Gradient Boosting Decision Tree on large sparse data, heavy 
> network flow pull down CPU utilization ratio. And through compression on 
> network flow data, 90% are reduced. 
> So maybe compression before transfering may provide higher speedup on 
> spark. And user can configure it to decide whether compress or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-10-26 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184575#comment-14184575
 ] 

koert kuipers commented on SPARK-3655:
--

can you assign to me?
i will have 2 pullreq in a few days



> Support sorting of values in addition to keys (i.e. secondary sort)
> ---
>
> Key: SPARK-3655
> URL: https://issues.apache.org/jira/browse/SPARK-3655
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: koert kuipers
>Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
> There are some use cases where getting a sorted iterator of values per key is 
> helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2760) Caching tables from multiple databases does not work

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2760:
---
Component/s: SQL

> Caching tables from multiple databases does not work
> 
>
> Key: SPARK-2760
> URL: https://issues.apache.org/jira/browse/SPARK-2760
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Michael Armbrust
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4085) Job will fail if a shuffle file that's read locally gets deleted

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4085:
---
Component/s: Spark Core

> Job will fail if a shuffle file that's read locally gets deleted
> 
>
> Key: SPARK-4085
> URL: https://issues.apache.org/jira/browse/SPARK-4085
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Kay Ousterhout
>Assignee: Reynold Xin
>Priority: Critical
>
> This commit: 
> https://github.com/apache/spark/commit/665e71d14debb8a7fc1547c614867a8c3b1f806a
>  changed the behavior of fetching local shuffle blocks such that if a shuffle 
> block is not found locally, the shuffle block is no longer marked as failed, 
> and a fetch failed exception is not thrown (this is because the "catch" block 
> here won't ever be invoked: 
> https://github.com/apache/spark/commit/665e71d14debb8a7fc1547c614867a8c3b1f806a#diff-e6e1631fa01e17bf851f49d30d028823R202
>  because the exception called from getLocalFromDisk() doesn't get thrown 
> until next() gets called on the iterator).
> [~rxin] [~matei] it looks like you guys changed the test for this to catch 
> the new exception that gets thrown 
> (https://github.com/apache/spark/commit/665e71d14debb8a7fc1547c614867a8c3b1f806a#diff-9c2e1918319de967045d04caf813a7d1R93).
>   Was that intentional?  Because the new exception is a SparkException and 
> not a FetchFailedException, jobs with missing local shuffle data will now 
> fail, rather than having the map stage get retried.
> This problem is reproducible with this test case:
> {code}
>   test("hash shuffle manager recovers when local shuffle files get deleted") {
> val conf = new SparkConf(false)
> conf.set("spark.shuffle.manager", "hash")
> sc = new SparkContext("local", "test", conf)
> val rdd = sc.parallelize(1 to 10, 2).map((_, 1)).reduceByKey(_+_)
> rdd.count()
> // Delete one of the local shuffle blocks.
> sc.env.blockManager.diskBlockManager.getFile(new ShuffleBlockId(0, 0, 
> 0)).delete()
> rdd.count()
>   }
> {code}
> which will fail on the second rdd.count().
> This is a regression from 1.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4056) Upgrade snappy-java to 1.1.1.5

2014-10-26 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184570#comment-14184570
 ] 

Josh Rosen commented on SPARK-4056:
---

We reverted the 1.1.5 upgrade after discovering that it caused a memory leak.  
It looks like this has been fixed in 1.1.6 if we still want to upgrade.

> Upgrade snappy-java to 1.1.1.5
> --
>
> Key: SPARK-4056
> URL: https://issues.apache.org/jira/browse/SPARK-4056
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.1.1, 1.2.0
>
>
> We should upgrade snappy-java to 1.1.1.5 across all of our maintenance 
> branches.  This release improves error messages when attempting to 
> deserialize empty inputs using SnappyInputStream (this operation is always an 
> error, but the old error messages made it hard to distinguish failures due to 
> empty streams from ones due to reading invalid / corrupted streams); see 
> https://github.com/xerial/snappy-java/issues/89 for more context.
> This should be a major help in the Snappy debugging work that I've been doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4056) Upgrade snappy-java to 1.1.1.5

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4056:
---
Component/s: Spark Core

> Upgrade snappy-java to 1.1.1.5
> --
>
> Key: SPARK-4056
> URL: https://issues.apache.org/jira/browse/SPARK-4056
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.1.1, 1.2.0
>
>
> We should upgrade snappy-java to 1.1.1.5 across all of our maintenance 
> branches.  This release improves error messages when attempting to 
> deserialize empty inputs using SnappyInputStream (this operation is always an 
> error, but the old error messages made it hard to distinguish failures due to 
> empty streams from ones due to reading invalid / corrupted streams); see 
> https://github.com/xerial/snappy-java/issues/89 for more context.
> This should be a major help in the Snappy debugging work that I've been doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3655:
---
Summary: Support sorting of values in addition to keys (i.e. secondary 
sort)  (was: Secondary sort)

> Support sorting of values in addition to keys (i.e. secondary sort)
> ---
>
> Key: SPARK-3655
> URL: https://issues.apache.org/jira/browse/SPARK-3655
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: koert kuipers
>Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
> There are some use cases where getting a sorted iterator of values per key is 
> helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3655) Secondary sort

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184565#comment-14184565
 ] 

Patrick Wendell commented on SPARK-3655:


Okay, sounds good.

> Secondary sort
> --
>
> Key: SPARK-3655
> URL: https://issues.apache.org/jira/browse/SPARK-3655
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: koert kuipers
>Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
> There are some use cases where getting a sorted iterator of values per key is 
> helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4064) NioBlockTransferService should deal with empty messages correctly

2014-10-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4064:
---
Summary: NioBlockTransferService should deal with empty messages correctly  
(was: If we create a lot of  big broadcast variables, Spark has great 
possibility to hang)

> NioBlockTransferService should deal with empty messages correctly
> -
>
> Key: SPARK-4064
> URL: https://issues.apache.org/jira/browse/SPARK-4064
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>Priority: Critical
> Fix For: 1.2.0
>
> Attachments: executor.log, jstack.txt, screenshot.png
>
>
> When I test [the PR 1983|https://github.com/apache/spark/pull/1983], The 
> probability of a third, spark hangs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4090) Memory leak in snappy-java 1.1.1.4/5

2014-10-26 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184562#comment-14184562
 ] 

Davies Liu commented on SPARK-4090:
---

[~joshrosen] 1.1.1.6 is released.

> Memory leak in snappy-java 1.1.1.4/5
> 
>
> Key: SPARK-4090
> URL: https://issues.apache.org/jira/browse/SPARK-4090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
> Attachments: screenshot-12.png
>
>
> There is memory-leak bug in snappy-java 1.1.1.4/5, we should rollback to 
> 1.1.1.3 or wait for bugfix.
> The jenkins tests timeouted or OOM multiple times recently. While test it 
> locally, I got the heap dump of leaked JVM:
> Then found that it's a bug in recent releases of snappy-java:
> {code}
> +inputBuffer = inputBufferAllocator.allocate(inputSize);
> +outputBuffer = inputBufferAllocator.allocate(outputSize);
> {code}
> The outputBuffer is allocated from inputBufferAllocator but released to 
> outputBufferAllocator: https://github.com/xerial/snappy-java/issues/91



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4091) Occasionally spark.local.dir can be deleted twice and causes test failure

2014-10-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184560#comment-14184560
 ] 

Apache Spark commented on SPARK-4091:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/2945

> Occasionally spark.local.dir can be deleted twice and causes test failure
> -
>
> Key: SPARK-4091
> URL: https://issues.apache.org/jira/browse/SPARK-4091
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>
> By persisting an arbitrary RDD with storage level {{MEMORY_AND_DISK}}, Spark 
> may occasionally throw the following exception when shutting down:
> {code}
> java.io.IOException: Failed to list files for dir: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027005012-5bcd/0b
>   at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:664)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
> at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   at 
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at 
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)
> {code}
> By adding log output to {{Utils.deleteRecursively}}, setting breakpoints at 
> {{File.delete}} in IntelliJ, and asking IntelliJ to evaluate and log 
> {{Thread.currentThread().getStackTrace()}} when the breakpoint is hit rather 
> than suspend execution, we can get the following result, which shows 
> {{spark.local.dir}} is deleted twice from both {{DiskBlockManager.stop}} and 
> the shutdown hook installed in {{Utils}}:
> {code}
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
>   scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   
> org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
>   org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
>   org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)]
> +++ Deleting file: 
> /var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
> Breakpoint reached at java.io.File.delete(File.java:1028)
> [java.lang.Thread.getStackTrace(Thread.java:1589)
>   java.io.File.delete(File.java:1028)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
>   
> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:157)
>   
> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:154)
>   
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   
> org.apache.spar

[jira] [Updated] (SPARK-4091) Occasionally spark.local.dir can be deleted twice and causes test failure

2014-10-26 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-4091:
--
Description: 
By persisting an arbitrary RDD with storage level {{MEMORY_AND_DISK}}, Spark 
may occasionally throw the following exception when shutting down:
{code}
java.io.IOException: Failed to list files for dir: 
/var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027005012-5bcd/0b
at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:664)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
at 
org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
at 
org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)
{code}
By adding log output to {{Utils.deleteRecursively}}, setting breakpoints at 
{{File.delete}} in IntelliJ, and asking IntelliJ to evaluate and log 
{{Thread.currentThread().getStackTrace()}} when the breakpoint is hit rather 
than suspend execution, we can get the following result, which shows 
{{spark.local.dir}} is deleted twice from both {{DiskBlockManager.stop}} and 
the shutdown hook installed in {{Utils}}:
{code}
+++ Deleting file: 
/var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
Breakpoint reached at java.io.File.delete(File.java:1028)
[java.lang.Thread.getStackTrace(Thread.java:1589)
java.io.File.delete(File.java:1028)
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)]
+++ Deleting file: 
/var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
Breakpoint reached at java.io.File.delete(File.java:1028)
[java.lang.Thread.getStackTrace(Thread.java:1589)
java.io.File.delete(File.java:1028)
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)

org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)

org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)

scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)

org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:157)

org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:154)

scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:154)

org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:147)

org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:145)

org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:145)
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)

org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:145)]
{code}
When this bug happens during Jenkins build, it fails {{CliSuite}}.

  was:
By p

[jira] [Created] (SPARK-4091) Occasionally spark.local.dir can be deleted twice and causes test failure

2014-10-26 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-4091:
-

 Summary: Occasionally spark.local.dir can be deleted twice and 
causes test failure
 Key: SPARK-4091
 URL: https://issues.apache.org/jira/browse/SPARK-4091
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Cheng Lian


By persisting an arbitrary RDD with storage level {{MEMORY_AND_DISK}}, Spark 
may occasionally throw the following exception when shutting down:
{code}
java.io.IOException: Failed to list files for dir: 
/var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027005012-5bcd/0b
at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:664)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
at 
org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)
at 
org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
at 
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)
{code}
By adding log output to {{Utils.deleteRecursively}}, setting breakpoints at 
{{File.delete}} in IntelliJ, and asking IntelliJ to evaluate and log 
{{Thread.currentThread().getStackTrace()}} when the breakpoint is hit rather 
than suspend execution, we can get the following result, which shows 
{{spark.local.dir}} is deleted twice from both {{DiskBlockManager.stop}} and 
the shutdown hook installed in {{Utils}}:
{code}
+++ Deleting file: 
/var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
Breakpoint reached at java.io.File.delete(File.java:1028)
[java.lang.Thread.getStackTrace(Thread.java:1589)
java.io.File.delete(File.java:1028)
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:177)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:175)
scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:175)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)

org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:173)
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1323)
org.apache.spark.util.Utils$$anon$4.run(Utils.scala:173)]
+++ Deleting file: 
/var/folders/kh/r9ylmzln40n9nrlchnsry2qwgn/T/spark-local-20141027003412-7fae/1d
Breakpoint reached at java.io.File.delete(File.java:1028)
[java.lang.Thread.getStackTrace(Thread.java:1589)
java.io.File.delete(File.java:1028)
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:695)

org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:680)

org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:678)

scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:678)

org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:157)

org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:154)

scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:154)

org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:147)

org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:145)

org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:145)
org.apache.spark.util.Utils$.logUncaughtEx

[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets

2014-10-26 Thread David Martinez Rego (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184457#comment-14184457
 ] 

David Martinez Rego commented on SPARK-1473:


Dear Sam, 

Thank you for the invitation. Funny enough, I am a usual at the meet ups and I 
have been already invited by Martin Goodson to do a talk about ... "selected 
topics on ML in Big Data". Currently I have a lab in Spain polishing the code 
and deploying it on a cluster to prove its performance (and support a future 
pull request). Dr. Brown has suggested me a couple of improvements using 
semi-supervised data. When we have solid results, at least on my side, I would 
love to share them with the community.

> Feature selection for high dimensional datasets
> ---
>
> Key: SPARK-1473
> URL: https://issues.apache.org/jira/browse/SPARK-1473
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ignacio Zendejas
>Assignee: Alexander Ulanov
>Priority: Minor
>  Labels: features
>
> For classification tasks involving large feature spaces in the order of tens 
> of thousands or higher (e.g., text classification with n-grams, where n > 1), 
> it is often useful to rank and filter features that are irrelevant thereby 
> reducing the feature space by at least one or two orders of magnitude without 
> impacting performance on key evaluation metrics (accuracy/precision/recall).
> A feature evaluation interface which is flexible needs to be designed and at 
> least two methods should be implemented with Information Gain being a 
> priority as it has been shown to be amongst the most reliable.
> Special consideration should be taken in the design to account for wrapper 
> methods (see research papers below) which are more practical for lower 
> dimensional data.
> Relevant research:
> * Brown, G., Pocock, A., Zhao, M. J., & Luján, M. (2012). Conditional
> likelihood maximisation: a unifying framework for information theoretic
> feature selection.*The Journal of Machine Learning Research*, *13*, 27-66.
> * Forman, George. "An extensive empirical study of feature selection metrics 
> for text classification." The Journal of machine learning research 3 (2003): 
> 1289-1305.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1473) Feature selection for high dimensional datasets

2014-10-26 Thread sam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184453#comment-14184453
 ] 

sam commented on SPARK-1473:


[~gbr...@cs.man.ac.uk] Thanks for taking the time to respond to my questions, 
and I thank you again for writing the paper as I always enjoy reading 
foundational (i.e. information theoretic) approaches to Machine Learning.

Regarding your final point about empiricism, yes this is better than 
"arbitrary" and so my original comment was too strong. I guess I was hoping for 
the same kind of foundational approach used to define the feature selection, 
and I am optimistic that there does exist a principled approach to how to 
define independence (which I think would also link with estimation).

I notice that your email address indicates that you are at Manchester 
University (I must have overlooked this when reading the paper - typical 
mathematician).  This is where I learnt about Information Theory - in the maths 
department; Jeff Paris, George Wilmers, Vencovska, etc have all done sterling 
work.

Do you ever come to London? Do you have any interest in applications? We have a 
Spark Meetup in London and it would be great if you could attend - much easier 
to share ideas in person. Perhaps yourself and [~torito1984] may even be 
willing to give a talk on "Information Theoretic Feature Selection with 
Implementation in Spark"?

> Feature selection for high dimensional datasets
> ---
>
> Key: SPARK-1473
> URL: https://issues.apache.org/jira/browse/SPARK-1473
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ignacio Zendejas
>Assignee: Alexander Ulanov
>Priority: Minor
>  Labels: features
>
> For classification tasks involving large feature spaces in the order of tens 
> of thousands or higher (e.g., text classification with n-grams, where n > 1), 
> it is often useful to rank and filter features that are irrelevant thereby 
> reducing the feature space by at least one or two orders of magnitude without 
> impacting performance on key evaluation metrics (accuracy/precision/recall).
> A feature evaluation interface which is flexible needs to be designed and at 
> least two methods should be implemented with Information Gain being a 
> priority as it has been shown to be amongst the most reliable.
> Special consideration should be taken in the design to account for wrapper 
> methods (see research papers below) which are more practical for lower 
> dimensional data.
> Relevant research:
> * Brown, G., Pocock, A., Zhao, M. J., & Luján, M. (2012). Conditional
> likelihood maximisation: a unifying framework for information theoretic
> feature selection.*The Journal of Machine Learning Research*, *13*, 27-66.
> * Forman, George. "An extensive empirical study of feature selection metrics 
> for text classification." The Journal of machine learning research 3 (2003): 
> 1289-1305.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4090) Memory leak in snappy-java 1.1.1.4/5

2014-10-26 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-4090.
---
Resolution: Fixed

> Memory leak in snappy-java 1.1.1.4/5
> 
>
> Key: SPARK-4090
> URL: https://issues.apache.org/jira/browse/SPARK-4090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
> Attachments: screenshot-12.png
>
>
> There is memory-leak bug in snappy-java 1.1.1.4/5, we should rollback to 
> 1.1.1.3 or wait for bugfix.
> The jenkins tests timeouted or OOM multiple times recently. While test it 
> locally, I got the heap dump of leaked JVM:
> Then found that it's a bug in recent releases of snappy-java:
> {code}
> +inputBuffer = inputBufferAllocator.allocate(inputSize);
> +outputBuffer = inputBufferAllocator.allocate(outputSize);
> {code}
> The outputBuffer is allocated from inputBufferAllocator but released to 
> outputBufferAllocator: https://github.com/xerial/snappy-java/issues/91



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4090) Memory leak in snappy-java 1.1.1.4/5

2014-10-26 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184423#comment-14184423
 ] 

Josh Rosen commented on SPARK-4090:
---

I rolled back earlier today, so the build should be fixed now.

> Memory leak in snappy-java 1.1.1.4/5
> 
>
> Key: SPARK-4090
> URL: https://issues.apache.org/jira/browse/SPARK-4090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
> Attachments: screenshot-12.png
>
>
> There is memory-leak bug in snappy-java 1.1.1.4/5, we should rollback to 
> 1.1.1.3 or wait for bugfix.
> The jenkins tests timeouted or OOM multiple times recently. While test it 
> locally, I got the heap dump of leaked JVM:
> Then found that it's a bug in recent releases of snappy-java:
> {code}
> +inputBuffer = inputBufferAllocator.allocate(inputSize);
> +outputBuffer = inputBufferAllocator.allocate(outputSize);
> {code}
> The outputBuffer is allocated from inputBufferAllocator but released to 
> outputBufferAllocator: https://github.com/xerial/snappy-java/issues/91



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4090) Memory leak in snappy-java 1.1.1.4/5

2014-10-26 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-4090:
--
Attachment: screenshot-12.png

> Memory leak in snappy-java 1.1.1.4/5
> 
>
> Key: SPARK-4090
> URL: https://issues.apache.org/jira/browse/SPARK-4090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Davies Liu
>Priority: Blocker
> Attachments: screenshot-12.png
>
>
> There is memory-leak bug in snappy-java 1.1.1.4/5, we should rollback to 
> 1.1.1.3 or wait for bugfix.
> The jenkins tests timeouted or OOM multiple times recently. While test it 
> locally, I got the heap dump of leaked JVM:
> Then found that it's a bug in recent releases of snappy-java:
> {code}
> +inputBuffer = inputBufferAllocator.allocate(inputSize);
> +outputBuffer = inputBufferAllocator.allocate(outputSize);
> {code}
> The outputBuffer is allocated from inputBufferAllocator but released to 
> outputBufferAllocator: https://github.com/xerial/snappy-java/issues/91



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4090) Memory leak in snappy-java 1.1.1.4/5

2014-10-26 Thread Davies Liu (JIRA)
Davies Liu created SPARK-4090:
-

 Summary: Memory leak in snappy-java 1.1.1.4/5
 Key: SPARK-4090
 URL: https://issues.apache.org/jira/browse/SPARK-4090
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Davies Liu
Priority: Blocker


There is memory-leak bug in snappy-java 1.1.1.4/5, we should rollback to 
1.1.1.3 or wait for bugfix.

The jenkins tests timeouted or OOM multiple times recently. While test it 
locally, I got the heap dump of leaked JVM:



Then found that it's a bug in recent releases of snappy-java:
{code}
+inputBuffer = inputBufferAllocator.allocate(inputSize);
+outputBuffer = inputBufferAllocator.allocate(outputSize);
{code}

The outputBuffer is allocated from inputBufferAllocator but released to 
outputBufferAllocator: https://github.com/xerial/snappy-java/issues/91





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2014-10-26 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-4049:
--
Priority: Minor  (was: Major)

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org