[jira] [Updated] (SPARK-1021) sortByKey() launches a cluster job when it shouldn't

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1021:
---
Fix Version/s: (was: 1.3.0)

> sortByKey() launches a cluster job when it shouldn't
> 
>
> Key: SPARK-1021
> URL: https://issues.apache.org/jira/browse/SPARK-1021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 0.8.0, 0.9.0, 1.0.0, 1.1.0
>Reporter: Andrew Ash
>Assignee: Erik Erlandson
>  Labels: starter
>
> The sortByKey() method is listed as a transformation, not an action, in the 
> documentation.  But it launches a cluster job regardless.
> http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
> Some discussion on the mailing list suggested that this is a problem with the 
> rdd.count() call inside Partitioner.scala's rangeBounds method.
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L102
> Josh Rosen suggests that rangeBounds should be made into a lazy variable:
> {quote}
> I wonder whether making RangePartitoner .rangeBounds into a lazy val would 
> fix this 
> (https://github.com/apache/incubator-spark/blob/6169fe14a140146602fb07cfcd13eee6efad98f9/core/src/main/scala/org/apache/spark/Partitioner.scala#L95).
>   We'd need to make sure that rangeBounds() is never called before an action 
> is performed.  This could be tricky because it's called in the 
> RangePartitioner.equals() method.  Maybe it's sufficient to just compare the 
> number of partitions, the ids of the RDDs used to create the 
> RangePartitioner, and the sort ordering.  This still supports the case where 
> I range-partition one RDD and pass the same partitioner to a different RDD.  
> It breaks support for the case where two range partitioners created on 
> different RDDs happened to have the same rangeBounds(), but it seems unlikely 
> that this would really harm performance since it's probably unlikely that the 
> range partitioners are equal by chance.
> {quote}
> Can we please make this happen?  I'll send a PR on GitHub to start the 
> discussion and testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1021) sortByKey() launches a cluster job when it shouldn't

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1021:
---
Target Version/s:   (was: 1.2.0)

> sortByKey() launches a cluster job when it shouldn't
> 
>
> Key: SPARK-1021
> URL: https://issues.apache.org/jira/browse/SPARK-1021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 0.8.0, 0.9.0, 1.0.0, 1.1.0
>Reporter: Andrew Ash
>Assignee: Erik Erlandson
>  Labels: starter
>
> The sortByKey() method is listed as a transformation, not an action, in the 
> documentation.  But it launches a cluster job regardless.
> http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
> Some discussion on the mailing list suggested that this is a problem with the 
> rdd.count() call inside Partitioner.scala's rangeBounds method.
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L102
> Josh Rosen suggests that rangeBounds should be made into a lazy variable:
> {quote}
> I wonder whether making RangePartitoner .rangeBounds into a lazy val would 
> fix this 
> (https://github.com/apache/incubator-spark/blob/6169fe14a140146602fb07cfcd13eee6efad98f9/core/src/main/scala/org/apache/spark/Partitioner.scala#L95).
>   We'd need to make sure that rangeBounds() is never called before an action 
> is performed.  This could be tricky because it's called in the 
> RangePartitioner.equals() method.  Maybe it's sufficient to just compare the 
> number of partitions, the ids of the RDDs used to create the 
> RangePartitioner, and the sort ordering.  This still supports the case where 
> I range-partition one RDD and pass the same partitioner to a different RDD.  
> It breaks support for the case where two range partitioners created on 
> different RDDs happened to have the same rangeBounds(), but it seems unlikely 
> that this would really harm performance since it's probably unlikely that the 
> range partitioners are equal by chance.
> {quote}
> Can we please make this happen?  I'll send a PR on GitHub to start the 
> discussion and testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1079) EC2 scripts should allow mounting as XFS or EXT4

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1079:
---
Fix Version/s: (was: 1.3.0)

> EC2 scripts should allow mounting as XFS or EXT4
> 
>
> Key: SPARK-1079
> URL: https://issues.apache.org/jira/browse/SPARK-1079
> Project: Spark
>  Issue Type: New Feature
>  Components: EC2
>Reporter: Patrick Wendell
>
> These offer much better performance when running benchmarks: I've done a 
> hacked together implementation here, but it would be better if you could 
> officially give a filesystem as an argument in the ec2 scripts:
> https://github.com/pwendell/spark-ec2/blob/c63995ce014df61ec1c61276687767e789eb79f7/setup-slave.sh#L21



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1272) Don't fail job if some local directories are buggy

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1272:
---
Fix Version/s: (was: 1.3.0)

> Don't fail job if some local directories are buggy
> --
>
> Key: SPARK-1272
> URL: https://issues.apache.org/jira/browse/SPARK-1272
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Reporter: Patrick Wendell
>
> If Spark cannot create shuffle directories inside of a local directory it 
> might make sense to just log an error and continue, provided that at least 
> one valid shuffle directory exists. Otherwise if a single disk is wonky the 
> entire job can fail.
> The down side is that this might mask failures if the person actually 
> misconfigures the local directories to point to the wrong disk(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1369) HiveUDF wrappers are slow

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1369:
---
Fix Version/s: (was: 1.3.0)

> HiveUDF wrappers are slow
> -
>
> Key: SPARK-1369
> URL: https://issues.apache.org/jira/browse/SPARK-1369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>
> The major issues here are that we are using a lot of functional programming 
> (.map) and creating new writeable objects for each input row. We should 
> switch to while loops and reuse the writeable objects when possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1529:
---
Fix Version/s: (was: 1.3.0)

> Support setting spark.local.dirs to a hadoop FileSystem 
> 
>
> Key: SPARK-1529
> URL: https://issues.apache.org/jira/browse/SPARK-1529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Cheng Lian
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1652) Fixes and improvements for spark-submit/configs

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1652:
---
Fix Version/s: (was: 1.3.0)
   1.2.1

> Fixes and improvements for spark-submit/configs
> ---
>
> Key: SPARK-1652
> URL: https://issues.apache.org/jira/browse/SPARK-1652
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
> Fix For: 1.2.0
>
>
> These are almost all a result of my config patch. Unfortunately the changes 
> were difficult to unit-test and there several edge cases reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1564) Add JavaScript into Javadoc to turn ::Experimental:: and such into badges

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1564:
---
Fix Version/s: (was: 1.3.0)

> Add JavaScript into Javadoc to turn ::Experimental:: and such into badges
> -
>
> Key: SPARK-1564
> URL: https://issues.apache.org/jira/browse/SPARK-1564
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Matei Zaharia
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1652) Fixes and improvements for spark-submit/configs

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1652:
---
Fix Version/s: (was: 1.2.1)
   1.2.0

> Fixes and improvements for spark-submit/configs
> ---
>
> Key: SPARK-1652
> URL: https://issues.apache.org/jira/browse/SPARK-1652
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
> Fix For: 1.2.0
>
>
> These are almost all a result of my config patch. Unfortunately the changes 
> were difficult to unit-test and there several edge cases reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1684) Merge script should standardize SPARK-XXX prefix

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1684:
---
Labels: starter  (was: )

> Merge script should standardize SPARK-XXX prefix
> 
>
> Key: SPARK-1684
> URL: https://issues.apache.org/jira/browse/SPARK-1684
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Minor
>  Labels: starter
>
> If users write "[SPARK-XXX] Issue" or "SPARK-XXX. Issue" or "SPARK XXX: 
> Issue" we should convert it to "SPARK-XXX: Issue"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1652) Fixes and improvements for spark-submit/configs

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1652.

Resolution: Fixed

> Fixes and improvements for spark-submit/configs
> ---
>
> Key: SPARK-1652
> URL: https://issues.apache.org/jira/browse/SPARK-1652
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
> Fix For: 1.2.0
>
>
> These are almost all a result of my config patch. Unfortunately the changes 
> were difficult to unit-test and there several edge cases reported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1684) Merge script should standardize SPARK-XXX prefix

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1684:
---
Fix Version/s: (was: 1.3.0)

> Merge script should standardize SPARK-XXX prefix
> 
>
> Key: SPARK-1684
> URL: https://issues.apache.org/jira/browse/SPARK-1684
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Minor
>  Labels: starter
>
> If users write "[SPARK-XXX] Issue" or "SPARK-XXX. Issue" or "SPARK XXX: 
> Issue" we should convert it to "SPARK-XXX: Issue"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1706) Allow multiple executors per worker in Standalone mode

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1706:
---
Fix Version/s: (was: 1.3.0)

> Allow multiple executors per worker in Standalone mode
> --
>
> Key: SPARK-1706
> URL: https://issues.apache.org/jira/browse/SPARK-1706
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Patrick Wendell
>Assignee: Nan Zhu
>
> Right now if people want to launch multiple executors on each machine they 
> need to start multiple standalone workers. This is not too difficult, but it 
> means you have extra JVM's sitting around.
> We should just allow users to set a number of cores they want per-executor in 
> standalone mode and then allow packing multiple executors on each node. This 
> would make standalone mode more consistent with YARN in the way you request 
> resources.
> It's not too big of a change as far as I can see. You'd need to:
> 1. Introduce a configuration for how many cores you want per executor.
> 2. Change the scheduling logic in Master.scala to take this into account.
> 3. Change CoarseGrainedSchedulerBackend to not assume a 1<->1 correspondence 
> between hosts and executors.
> And maybe modify a few other places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1911) Warn users if their assembly jars are not built with Java 6

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1911:
---
Fix Version/s: (was: 1.3.0)

> Warn users if their assembly jars are not built with Java 6
> ---
>
> Key: SPARK-1911
> URL: https://issues.apache.org/jira/browse/SPARK-1911
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>
> The root cause of the problem is detailed in: 
> https://issues.apache.org/jira/browse/SPARK-1520.
> In short, an assembly jar built with Java 7+ is not always accessible by 
> Python or other versions of Java (especially Java 6). If the assembly jar is 
> not built on the cluster itself, this problem may manifest itself in strange 
> exceptions that are not trivial to debug. This is an issue especially for 
> PySpark on YARN, which relies on the python files included within the 
> assembly jar.
> Currently we warn users only in make-distribution.sh, but most users build 
> the jars directly. At the very least we need to emphasize this in the docs 
> (currently missing entirely). The next step is to add a warning prompt in the 
> mvn scripts whenever Java 7+ is detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1866) Closure cleaner does not null shadowed fields when outer scope is referenced

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1866:
---
Fix Version/s: (was: 1.3.0)
   (was: 1.0.1)

> Closure cleaner does not null shadowed fields when outer scope is referenced
> 
>
> Key: SPARK-1866
> URL: https://issues.apache.org/jira/browse/SPARK-1866
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Assignee: Kan Zhang
>Priority: Critical
>
> Take the following example:
> {code}
> val x = 5
> val instances = new org.apache.hadoop.fs.Path("/") /* non-serializable */
> sc.parallelize(0 until 10).map { _ =>
>   val instances = 3
>   (instances, x)
> }.collect
> {code}
> This produces a "java.io.NotSerializableException: 
> org.apache.hadoop.fs.Path", despite the fact that the outer instances is not 
> actually used within the closure. If you change the name of the outer 
> variable instances to something else, the code executes correctly, indicating 
> that it is the fact that the two variables share a name that causes the issue.
> Additionally, if the outer scope is not used (i.e., we do not reference "x" 
> in the above example), the issue does not appear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1792) Missing Spark-Shell Configure Options

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1792:
---
Fix Version/s: (was: 1.3.0)

> Missing Spark-Shell Configure Options
> -
>
> Key: SPARK-1792
> URL: https://issues.apache.org/jira/browse/SPARK-1792
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Spark Core
>Reporter: Joseph E. Gonzalez
>
> The `conf/spark-env.sh.template` does not have configure options for the 
> spark shell.   For example to enable Kryo for GraphX when using the spark 
> shell in stand alone mode it appears you must add:
> {code}
> SPARK_SUBMIT_OPTS="-Dspark.serializer=org.apache.spark.serializer.KryoSerializer
>  "
> SPARK_SUBMIT_OPTS+="-Dspark.kryo.registrator=org.apache.spark.graphx.GraphKryoRegistrator
>   "
> {code}
> However SPARK_SUBMIT_OPTS is not documented anywhere.  Perhaps the 
> spark-shell should have its own options (e.g., SPARK_SHELL_OPTS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1989:
---
Fix Version/s: (was: 1.3.0)

> Exit executors faster if they get into a cycle of heavy GC
> --
>
> Key: SPARK-1989
> URL: https://issues.apache.org/jira/browse/SPARK-1989
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>
> I've seen situations where an application is allocating too much memory 
> across its tasks + cache to proceed, but Java gets into a cycle where it 
> repeatedly runs full GCs, frees up a bit of the heap, and continues instead 
> of giving up. This then leads to timeouts and confusing error messages. It 
> would be better to crash with OOM sooner. The JVM has options to support 
> this: http://java.dzone.com/articles/tracking-excessive-garbage.
> The right solution would probably be:
> - Add some config options used by spark-submit to set XX:GCTimeLimit and 
> XX:GCHeapFreeLimit, with more conservative values than the defaults (e.g. 90% 
> time limit, 5% free limit)
> - Make sure we pass these into the Java options for executors in each 
> deployment mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1924) Make local:/ scheme work in more deploy modes

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1924:
---
Fix Version/s: (was: 1.3.0)

> Make local:/ scheme work in more deploy modes
> -
>
> Key: SPARK-1924
> URL: https://issues.apache.org/jira/browse/SPARK-1924
> Project: Spark
>  Issue Type: Sub-task
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> A resource marked "local:/" is assumed to be available on the local file 
> system of every node. In such case, no data copying over network should 
> happen. I tested different deploy modes in v1.0, right now we only support 
> local:/ scheme for the app jar and secondary jars in the following modes:
> 1) local (jars are copied to the working directory)
> 2) standalone client
> 3) yarn client
> It doesn’t work for --files and python apps (--py-files and app.py). For the 
> next release, we could support more deploy modes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1921) Allow duplicate jar files among the app jar and secondary jars in yarn-cluster mode

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1921:
---
Fix Version/s: (was: 1.3.0)

> Allow duplicate jar files among the app jar and secondary jars in 
> yarn-cluster mode
> ---
>
> Key: SPARK-1921
> URL: https://issues.apache.org/jira/browse/SPARK-1921
> Project: Spark
>  Issue Type: Sub-task
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> In yarn-cluster mode, jars are uploaded to a staging folder on hdfs. If there 
> are duplicates among the app jar and secondary jars, there will be overwrites 
> that cause inconsistent timestamps. I saw the following message:
> {code}
> Application application_1400965808642_0021 failed 2 times due to AM Container 
> for appattempt_1400965808642_0021_02 exited with  exitCode: -1000 due to: 
> Resource 
> hdfs://localhost.localdomain:8020/user/cloudera/.sparkStaging/application_1400965808642_0021/app_2.10-0.1.jar
>  changed on src filesystem (expected 1400998721965, was 1400998723123
> {code}
> Tested on a CDH-5 quickstart VM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1972) Add support for setting and visualizing custom task-related metrics

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1972:
---
Fix Version/s: (was: 1.3.0)

> Add support for setting and visualizing custom task-related metrics
> ---
>
> Key: SPARK-1972
> URL: https://issues.apache.org/jira/browse/SPARK-1972
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 0.9.1
>Reporter: Kalpit Shah
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Various RDDs may want to set/track custom metrics for improved monitoring and 
> performance tuning. For e.g.:
> 1. A Task involving a JdbcRDD may want to track some metric related to JDBC 
> execution.
> 2. A Task involving a user-defined RDD may want to track some metric specific 
> to user's application.
> We currently use TaskMetrics for tracking task-related metrics, which 
> provides no way of tracking custom task-related metrics. It is not good to 
> introduce a new field in TaskMetric everytime we want to track a custom 
> metric. That approach would be cumbersome and ugly. Besides, some of these 
> custom metrics may only make sense for a specific RDD-subclass. Therefore, we 
> need TaskMetrics to provide a generic way to allow RDD-subclasses to track 
> custom metrics when computing partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2063) Creating a SchemaRDD via sql() does not correctly resolve nested types

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2063:
---
Fix Version/s: (was: 1.3.0)

> Creating a SchemaRDD via sql() does not correctly resolve nested types
> --
>
> Key: SPARK-2063
> URL: https://issues.apache.org/jira/browse/SPARK-2063
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Assignee: Cheng Lian
>
> For example, from the typical twitter dataset:
> {code}
> scala> val popularTweets = sql("SELECT retweeted_status.text, 
> MAX(retweeted_status.retweet_count) AS s FROM tweets WHERE retweeted_status 
> is not NULL GROUP BY retweeted_status.text ORDER BY s DESC LIMIT 30")
> scala> popularTweets.toString
> 14/06/06 21:27:48 INFO analysis.Analyzer: Max iterations (2) reached for 
> batch MultiInstanceRelations
> 14/06/06 21:27:48 INFO analysis.Analyzer: Max iterations (2) reached for 
> batch CaseInsensitiveAttributeReferences
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> qualifiers on unresolved object, tree: 'retweeted_status.text
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:51)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:47)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:67)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:65)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>   at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:65)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$3$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:100)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$3$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:97)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1(QueryPlan.scala:51)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1$$anonfun$apply$1.apply(QueryPlan.scala:65)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:64)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:69)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:40)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$3.applyOrElse(Analyzer.scala:97)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$3.applyOrElse(Analyzer.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:217)
>   

[jira] [Updated] (SPARK-2068) Remove other uses of @transient lazy val in physical plan nodes

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2068:
---
Fix Version/s: (was: 1.3.0)

> Remove other uses of @transient lazy val in physical plan nodes
> ---
>
> Key: SPARK-2068
> URL: https://issues.apache.org/jira/browse/SPARK-2068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>
> [SPARK-1994] was caused by this, we fixed it there, but in general doing 
> planning on the slaves breaks a lot of our assumptions and seems to cause 
> concurrency problems



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2584) Do not mutate block storage level on the UI

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2584:
---
Fix Version/s: (was: 1.3.0)

> Do not mutate block storage level on the UI
> ---
>
> Key: SPARK-2584
> URL: https://issues.apache.org/jira/browse/SPARK-2584
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>
> If a block is stored MEMORY_AND_DISK and we drop it from memory, it becomes 
> DISK_ONLY on the UI. We should preserve the original storage level  proposed 
> by the user, in addition to the change in actual storage level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2167) spark-submit should return exit code based on failure/success

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2167:
---
Fix Version/s: (was: 1.3.0)

> spark-submit should return exit code based on failure/success
> -
>
> Key: SPARK-2167
> URL: https://issues.apache.org/jira/browse/SPARK-2167
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>Assignee: Guoqiang Li
>
> spark-submit script and Java class should exit with 0 for success and 
> non-zero with failure so that other command line tools and workflow managers 
> (like oozie) can properly tell if the spark app succeeded or failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2069) MIMA false positives (umbrella)

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2069.

   Resolution: Fixed
Fix Version/s: (was: 1.3.0)
   1.2.0

> MIMA false positives (umbrella)
> ---
>
> Key: SPARK-2069
> URL: https://issues.apache.org/jira/browse/SPARK-2069
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: Patrick Wendell
>Assignee: Prashant Sharma
>Priority: Critical
> Fix For: 1.2.0
>
>
> Since we started using MIMA more actively in core we've been running into 
> situations were we get false positives. We should address these ASAP as they 
> require having manual excludes in our build files which is pretty tedious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2638) Improve concurrency of fetching Map outputs

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2638:
---
Fix Version/s: (was: 1.3.0)

> Improve concurrency of fetching Map outputs
> ---
>
> Key: SPARK-2638
> URL: https://issues.apache.org/jira/browse/SPARK-2638
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Stephen Boesch
>Assignee: Josh Rosen
>Priority: Minor
>  Labels: MapOutput, concurrency
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> This issue was noticed while perusing the MapOutputTracker source code. 
> Notice that the synchronization is on the containing "fetching" collection - 
> which makes ALL fetches wait if any fetch were occurring.  
> The fix is to synchronize instead on the shuffleId (interned as a string to 
> ensure JVM wide visibility).
>   def getServerStatuses(shuffleId: Int, reduceId: Int): 
> Array[(BlockManagerId, Long)] = {
> val statuses = mapStatuses.get(shuffleId).orNull
> if (statuses == null) {
>   logInfo("Don't have map outputs for shuffle " + shuffleId + ", fetching 
> them")
>   var fetchedStatuses: Array[MapStatus] = null
>   fetching.synchronized {   // This is existing code
>  //  shuffleId.toString.intern.synchronized {  // New Code
> if (fetching.contains(shuffleId)) {
>   // Someone else is fetching it; wait for them to be done
>   while (fetching.contains(shuffleId)) {
> try {
>   fetching.wait()
> } catch {
>   case e: InterruptedException =>
> }
>   }
> This is only a small code change, but the testcases to prove (a) proper 
> functionality and (b) proper performance improvement are not so trivial.  
> For (b) it is not worthwhile to add a testcase to the codebase. Instead I 
> have added a git project that demonstrates the concurrency/performance 
> improvement using the fine-grained approach . The github project is at
> https://github.com/javadba/scalatesting.git  .  Simply run "sbt test". Note: 
> it is unclear how/where to include this ancillary testing/verification 
> information that will not be included in the git PR: i am open for any 
> suggestions - even as far as simply removing references to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2703) Make Tachyon related unit tests execute without deploying a Tachyon system locally.

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2703:
---
Fix Version/s: (was: 1.3.0)

> Make Tachyon related unit tests execute without deploying a Tachyon system 
> locally.
> ---
>
> Key: SPARK-2703
> URL: https://issues.apache.org/jira/browse/SPARK-2703
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Haoyuan Li
>Assignee: Rong Gu
>
> Use the LocalTachyonCluster class in tachyon-test.jar in 0.5.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2624) Datanucleus jars not accessible in yarn-cluster mode

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2624:
---
Fix Version/s: (was: 1.3.0)

> Datanucleus jars not accessible in yarn-cluster mode
> 
>
> Key: SPARK-2624
> URL: https://issues.apache.org/jira/browse/SPARK-2624
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>Assignee: Jim Lim
>
> This is because we add it to the class path of the command that launches 
> spark submit, but the containers never get it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2722) Mechanism for escaping spark configs is not consistent

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2722:
---
Fix Version/s: (was: 1.3.0)

> Mechanism for escaping spark configs is not consistent
> --
>
> Key: SPARK-2722
> URL: https://issues.apache.org/jira/browse/SPARK-2722
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Andrew Or
>Priority: Minor
>
> Currently, you can specify a spark config in spark-defaults.conf as follows:
> {code}
> spark.magic "Mr. Johnson"
> {code}
> and this will preserve the double quotes as part of the string. Naturally, if 
> you want to do the equivalent in spark.*.extraJavaOptions, you would use the 
> following:
> {code}
> spark.executor.extraJavaOptions "-Dmagic=\"Mr. Johnson\""
> {code}
> However, this fails because the backslashes go away and it tries to interpret 
> "Johnson" as the main class argument. Instead, you have to do the following:
> {code}
> spark.executor.extraJavaOptions "-Dmagic=\\\"Mr. Johnson\\\""
> {code}
> which is not super intuitive.
> Note that this only applies to standalone mode. In YARN it's not even 
> possible to use quoted strings in config values (SPARK-2718).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2757) Add Mima test for Spark Sink after 1.10 is released

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2757:
---
Target Version/s: 1.3.0  (was: 1.2.0)

> Add Mima test for Spark Sink after 1.10 is released
> ---
>
> Key: SPARK-2757
> URL: https://issues.apache.org/jira/browse/SPARK-2757
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Hari Shreedharan
>
> We are adding it in 1.1.0, so it is excluded from Mima right now. Once we 
> release 1.1.0, we should add it to Mima so we do binary compat checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2757) Add Mima test for Spark Sink after 1.10 is released

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2757:
---
Assignee: Hari Shreedharan

> Add Mima test for Spark Sink after 1.10 is released
> ---
>
> Key: SPARK-2757
> URL: https://issues.apache.org/jira/browse/SPARK-2757
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> We are adding it in 1.1.0, so it is excluded from Mima right now. Once we 
> release 1.1.0, we should add it to Mima so we do binary compat checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2757) Add Mima test for Spark Sink after 1.10 is released

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2757:
---
Fix Version/s: (was: 1.3.0)

> Add Mima test for Spark Sink after 1.10 is released
> ---
>
> Key: SPARK-2757
> URL: https://issues.apache.org/jira/browse/SPARK-2757
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Hari Shreedharan
>
> We are adding it in 1.1.0, so it is excluded from Mima right now. Once we 
> release 1.1.0, we should add it to Mima so we do binary compat checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2793) Correctly lock directory creation in DiskBlockManager.getFile

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2793:
---
Fix Version/s: (was: 1.3.0)

> Correctly lock directory creation in DiskBlockManager.getFile
> -
>
> Key: SPARK-2793
> URL: https://issues.apache.org/jira/browse/SPARK-2793
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2770) Rename spark-ganglia-lgpl to ganglia-lgpl

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2770:
---
Fix Version/s: (was: 1.3.0)

> Rename spark-ganglia-lgpl to ganglia-lgpl
> -
>
> Key: SPARK-2770
> URL: https://issues.apache.org/jira/browse/SPARK-2770
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Chris Fregly
>Assignee: Chris Fregly
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2913) Spark's log4j.properties should always appear ahead of Hadoop's on classpath

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2913:
---
Fix Version/s: (was: 1.3.0)

> Spark's log4j.properties should always appear ahead of Hadoop's on classpath
> 
>
> Key: SPARK-2913
> URL: https://issues.apache.org/jira/browse/SPARK-2913
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.0.2, 1.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In the current {{compute-classpath}} scripts, the Hadoop conf directory may 
> appear before Spark's conf directory in the computed classpath.  This leads 
> to Hadoop's log4j.properties being used instead of Spark's, preventing users 
> from easily changing Spark's logging settings.
> To fix this, we should add a new classpath entry for Spark's log4j.properties 
> file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2794) Use Java 7 isSymlink when available

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2794:
---
Fix Version/s: (was: 1.3.0)

> Use Java 7 isSymlink when available
> ---
>
> Key: SPARK-2794
> URL: https://issues.apache.org/jira/browse/SPARK-2794
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2795) Improve DiskBlockObjectWriter API

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2795:
---
Fix Version/s: (was: 1.3.0)

> Improve DiskBlockObjectWriter API
> -
>
> Key: SPARK-2795
> URL: https://issues.apache.org/jira/browse/SPARK-2795
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2914) spark.*.extraJavaOptions are evaluated too many times

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2914:
---
Fix Version/s: (was: 1.3.0)

> spark.*.extraJavaOptions are evaluated too many times
> -
>
> Key: SPARK-2914
> URL: https://issues.apache.org/jira/browse/SPARK-2914
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.2.0
>
>
> If we pass the following to spark.executor.extraJavaOptions,
> {code}
> -Dthem.quotes="the \"best\" joke ever" -Dthem.backslashes=" \\ \\  "
> {code}
> These will first be escaped once when the SparkSubmit JVM is launched. This 
> becomes the following string.
> {code}
> scala> sc.getConf.get("spark.driver.extraJavaOptions")
> res0: String = -Dthem.quotes="the "best" joke ever" -Dthem.backslashes=" \ \ 
> \\ "
> {code}
> This will be split incorrectly by Utils.splitCommandString.
> Of course, this also affects spark.driver.extraJavaOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2914) spark.*.extraJavaOptions are evaluated too many times

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2914:
---
Assignee: Andrew Or

> spark.*.extraJavaOptions are evaluated too many times
> -
>
> Key: SPARK-2914
> URL: https://issues.apache.org/jira/browse/SPARK-2914
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.2.0
>
>
> If we pass the following to spark.executor.extraJavaOptions,
> {code}
> -Dthem.quotes="the \"best\" joke ever" -Dthem.backslashes=" \\ \\  "
> {code}
> These will first be escaped once when the SparkSubmit JVM is launched. This 
> becomes the following string.
> {code}
> scala> sc.getConf.get("spark.driver.extraJavaOptions")
> res0: String = -Dthem.quotes="the "best" joke ever" -Dthem.backslashes=" \ \ 
> \\ "
> {code}
> This will be split incorrectly by Utils.splitCommandString.
> Of course, this also affects spark.driver.extraJavaOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2914) spark.*.extraJavaOptions are evaluated too many times

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2914:
---
Fix Version/s: 1.2.0

> spark.*.extraJavaOptions are evaluated too many times
> -
>
> Key: SPARK-2914
> URL: https://issues.apache.org/jira/browse/SPARK-2914
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.2.0
>
>
> If we pass the following to spark.executor.extraJavaOptions,
> {code}
> -Dthem.quotes="the \"best\" joke ever" -Dthem.backslashes=" \\ \\  "
> {code}
> These will first be escaped once when the SparkSubmit JVM is launched. This 
> becomes the following string.
> {code}
> scala> sc.getConf.get("spark.driver.extraJavaOptions")
> res0: String = -Dthem.quotes="the "best" joke ever" -Dthem.backslashes=" \ \ 
> \\ "
> {code}
> This will be split incorrectly by Utils.splitCommandString.
> Of course, this also affects spark.driver.extraJavaOptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2973) Add a way to show tables without executing a job

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2973:
---
Fix Version/s: (was: 1.3.0)
   1.2.0

> Add a way to show tables without executing a job
> 
>
> Key: SPARK-2973
> URL: https://issues.apache.org/jira/browse/SPARK-2973
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Aaron Davidson
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.2.0
>
>
> Right now, sql("show tables").collect() will start a Spark job which shows up 
> in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3039:
---
Fix Version/s: (was: 1.3.0)

> Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
> 1 API
> --
>
> Key: SPARK-3039
> URL: https://issues.apache.org/jira/browse/SPARK-3039
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Input/Output, Spark Core
>Affects Versions: 0.9.1, 1.0.0, 1.1.0
> Environment: hadoop2, hadoop-2.4.0, HDP-2.1
>Reporter: Bertrand Bossy
>Assignee: Bertrand Bossy
>
> The spark assembly contains the artifact "org.apache.avro:avro-mapred" as a 
> dependency of "org.spark-project.hive:hive-serde".
> The avro-mapred package provides a hadoop FileInputFormat to read and write 
> avro files. There are two versions of this package, distinguished by a 
> classifier. avro-mapred for the new Hadoop API uses the classifier "hadoop2". 
> avro-mapred for the old Hadoop API uses no classifier.
> E.g. when reading avro files using 
> {code}
> sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]]("hdfs://path/to/file.avro")
> {code}
> The following error occurs:
> {code}
> java.lang.IncompatibleClassChangeError: Found interface 
> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
> at 
> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:111)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> This error usually is a hint that there was a mix up of the old and the new 
> Hadoop API. As a work-around, if avro-mapred for hadoop2 is "forced" to 
> appear before the version that is bundled with Spark, reading avro files 
> works fine. 
> Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3039:
---
Fix Version/s: 1.2.0

> Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
> 1 API
> --
>
> Key: SPARK-3039
> URL: https://issues.apache.org/jira/browse/SPARK-3039
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Input/Output, Spark Core
>Affects Versions: 0.9.1, 1.0.0, 1.1.0
> Environment: hadoop2, hadoop-2.4.0, HDP-2.1
>Reporter: Bertrand Bossy
>Assignee: Bertrand Bossy
> Fix For: 1.2.0
>
>
> The spark assembly contains the artifact "org.apache.avro:avro-mapred" as a 
> dependency of "org.spark-project.hive:hive-serde".
> The avro-mapred package provides a hadoop FileInputFormat to read and write 
> avro files. There are two versions of this package, distinguished by a 
> classifier. avro-mapred for the new Hadoop API uses the classifier "hadoop2". 
> avro-mapred for the old Hadoop API uses no classifier.
> E.g. when reading avro files using 
> {code}
> sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]]("hdfs://path/to/file.avro")
> {code}
> The following error occurs:
> {code}
> java.lang.IncompatibleClassChangeError: Found interface 
> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
> at 
> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:111)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> This error usually is a hint that there was a mix up of the old and the new 
> Hadoop API. As a work-around, if avro-mapred for hadoop2 is "forced" to 
> appear before the version that is bundled with Spark, reading avro files 
> works fine. 
> Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3039:
---
Target Version/s: 1.2.0  (was: 1.1.1, 1.2.0)

> Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
> 1 API
> --
>
> Key: SPARK-3039
> URL: https://issues.apache.org/jira/browse/SPARK-3039
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Input/Output, Spark Core
>Affects Versions: 0.9.1, 1.0.0, 1.1.0
> Environment: hadoop2, hadoop-2.4.0, HDP-2.1
>Reporter: Bertrand Bossy
>Assignee: Bertrand Bossy
> Fix For: 1.2.0
>
>
> The spark assembly contains the artifact "org.apache.avro:avro-mapred" as a 
> dependency of "org.spark-project.hive:hive-serde".
> The avro-mapred package provides a hadoop FileInputFormat to read and write 
> avro files. There are two versions of this package, distinguished by a 
> classifier. avro-mapred for the new Hadoop API uses the classifier "hadoop2". 
> avro-mapred for the old Hadoop API uses no classifier.
> E.g. when reading avro files using 
> {code}
> sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]]("hdfs://path/to/file.avro")
> {code}
> The following error occurs:
> {code}
> java.lang.IncompatibleClassChangeError: Found interface 
> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
> at 
> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:111)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> This error usually is a hint that there was a mix up of the old and the new 
> Hadoop API. As a work-around, if avro-mapred for hadoop2 is "forced" to 
> appear before the version that is bundled with Spark, reading avro files 
> works fine. 
> Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3039.

Resolution: Fixed

This fix did appear in Spark 1.2.0 so I'm closing this issue.

> Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 
> 1 API
> --
>
> Key: SPARK-3039
> URL: https://issues.apache.org/jira/browse/SPARK-3039
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Input/Output, Spark Core
>Affects Versions: 0.9.1, 1.0.0, 1.1.0
> Environment: hadoop2, hadoop-2.4.0, HDP-2.1
>Reporter: Bertrand Bossy
>Assignee: Bertrand Bossy
> Fix For: 1.2.0
>
>
> The spark assembly contains the artifact "org.apache.avro:avro-mapred" as a 
> dependency of "org.spark-project.hive:hive-serde".
> The avro-mapred package provides a hadoop FileInputFormat to read and write 
> avro files. There are two versions of this package, distinguished by a 
> classifier. avro-mapred for the new Hadoop API uses the classifier "hadoop2". 
> avro-mapred for the old Hadoop API uses no classifier.
> E.g. when reading avro files using 
> {code}
> sc.newAPIHadoopFile[AvroKey[SomeClass]],NullWritable,AvroKeyInputFormat[SomeClass]]("hdfs://path/to/file.avro")
> {code}
> The following error occurs:
> {code}
> java.lang.IncompatibleClassChangeError: Found interface 
> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
> at 
> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:111)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> This error usually is a hint that there was a mix up of the old and the new 
> Hadoop API. As a work-around, if avro-mapred for hadoop2 is "forced" to 
> appear before the version that is bundled with Spark, reading avro files 
> works fine. 
> Also, if Spark is built using avro-mapred for hadoop2, it works fine as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3403) NaiveBayes crashes with blas/lapack native libraries for breeze (netlib-java)

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3403:
---
Fix Version/s: (was: 1.3.0)

> NaiveBayes crashes with blas/lapack native libraries for breeze (netlib-java)
> -
>
> Key: SPARK-3403
> URL: https://issues.apache.org/jira/browse/SPARK-3403
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.0.2
> Environment: Setup: Windows 7, x64 libraries for netlib-java (as 
> described on https://github.com/fommil/netlib-java). I used OpenBlas x64 and 
> MinGW64 precompiled dlls.
>Reporter: Alexander Ulanov
> Attachments: NativeNN.scala
>
>
> Code:
> val model = NaiveBayes.train(train)
> val predictionAndLabels = test.map { point =>
>   val score = model.predict(point.features)
>   (score, point.label)
> }
> predictionAndLabels.foreach(println)
> Result: 
> program crashes with: "Process finished with exit code -1073741819 
> (0xC005)" after displaying the first prediction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3379) Implement 'POWER' for sql

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3379:
---
Fix Version/s: (was: 1.3.0)

> Implement 'POWER' for sql
> -
>
> Key: SPARK-3379
> URL: https://issues.apache.org/jira/browse/SPARK-3379
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.0.2, 1.1.0
> Environment: All
>Reporter: Xinyun Huang
>Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add support for the mathematical function "POWER" within spark sql. Spitted 
> from SPARK-3176



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3505) Augmenting SparkStreaming updateStateByKey API with timestamp

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3505:
---
Fix Version/s: (was: 1.3.0)

> Augmenting SparkStreaming updateStateByKey API with timestamp
> -
>
> Key: SPARK-3505
> URL: https://issues.apache.org/jira/browse/SPARK-3505
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Xi Liu
>Priority: Minor
>
> The current updateStateByKey API in Spark Streaming does not expose timestamp 
> to the application. 
> In our use case, the application need to know the batch timestamp to decide 
> whether to keep the state or not. And we do not want to use real system time 
> because we want to decouple the two (because the same code base is used for 
> streaming and offline processing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3628:
---
Fix Version/s: (was: 1.3.0)
   1.2.0

> Don't apply accumulator updates multiple times for tasks in result stages
> -
>
> Key: SPARK-3628
> URL: https://issues.apache.org/jira/browse/SPARK-3628
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Matei Zaharia
>Assignee: Nan Zhu
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> In previous versions of Spark, accumulator updates only got applied once for 
> accumulators that are only used in actions (i.e. result stages), letting you 
> use them to deterministically compute a result. Unfortunately, this got 
> broken in some recent refactorings.
> This is related to https://issues.apache.org/jira/browse/SPARK-732, but that 
> issue is about applying the same semantics to intermediate stages too, which 
> is more work and may not be what we want for debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3628:
---
Labels: backport-needed  (was: )

> Don't apply accumulator updates multiple times for tasks in result stages
> -
>
> Key: SPARK-3628
> URL: https://issues.apache.org/jira/browse/SPARK-3628
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Matei Zaharia
>Assignee: Nan Zhu
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> In previous versions of Spark, accumulator updates only got applied once for 
> accumulators that are only used in actions (i.e. result stages), letting you 
> use them to deterministically compute a result. Unfortunately, this got 
> broken in some recent refactorings.
> This is related to https://issues.apache.org/jira/browse/SPARK-732, but that 
> issue is about applying the same semantics to intermediate stages too, which 
> is more work and may not be what we want for debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3505) Augmenting SparkStreaming updateStateByKey API with timestamp

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3505:
---
Target Version/s: 1.3.0  (was: 1.1.0)

> Augmenting SparkStreaming updateStateByKey API with timestamp
> -
>
> Key: SPARK-3505
> URL: https://issues.apache.org/jira/browse/SPARK-3505
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Xi Liu
>Priority: Minor
>
> The current updateStateByKey API in Spark Streaming does not expose timestamp 
> to the application. 
> In our use case, the application need to know the batch timestamp to decide 
> whether to keep the state or not. And we do not want to use real system time 
> because we want to decouple the two (because the same code base is used for 
> streaming and offline processing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3632:
---
Fix Version/s: 1.2.0

> ConnectionManager can run out of receive threads with authentication on
> ---
>
> Key: SPARK-3632
> URL: https://issues.apache.org/jira/browse/SPARK-3632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> If you turn authentication on and you are using a lot of executors. There is 
> a chance that all the of the threads in the handleMessageExecutor could be 
> waiting to send a message because they are blocked waiting on authentication 
> to happen. This can cause a temporary deadlock until the connection times out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3632:
---
Labels: backport-needed  (was: )

> ConnectionManager can run out of receive threads with authentication on
> ---
>
> Key: SPARK-3632
> URL: https://issues.apache.org/jira/browse/SPARK-3632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> If you turn authentication on and you are using a lot of executors. There is 
> a chance that all the of the threads in the handleMessageExecutor could be 
> waiting to send a message because they are blocked waiting on authentication 
> to happen. This can cause a temporary deadlock until the connection times out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3632:
---
Fix Version/s: (was: 1.3.0)

> ConnectionManager can run out of receive threads with authentication on
> ---
>
> Key: SPARK-3632
> URL: https://issues.apache.org/jira/browse/SPARK-3632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> If you turn authentication on and you are using a lot of executors. There is 
> a chance that all the of the threads in the handleMessageExecutor could be 
> waiting to send a message because they are blocked waiting on authentication 
> to happen. This can cause a temporary deadlock until the connection times out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3987) NNLS generates incorrect result

2014-12-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3987:
---
Fix Version/s: (was: 1.3.0)
   1.2.0

> NNLS generates incorrect result
> ---
>
> Key: SPARK-3987
> URL: https://issues.apache.org/jira/browse/SPARK-3987
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.1.0
>Reporter: Debasish Das
>Assignee: Shuo Xiang
> Fix For: 1.1.1, 1.2.0
>
>
> Hi,
> Please see the example gram matrix and linear term:
> val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
> 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
> -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
> -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
> -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
> -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
> 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
> 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
> 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
> 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
> -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
> -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
> -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
> -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
> 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
> 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
> 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
> -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
> 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
> 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
> 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
> 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
> -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
> -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
> 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
> 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
> -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
> -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
> -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
> 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
> -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
> -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
> -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
> -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
> 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
> 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
> 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
> -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
> 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
> 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
> 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
> -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
> 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
> 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
> 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
> 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
> -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
> -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
> -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
> 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
> -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
> -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
> -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
> -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
> 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
> 9735.061160, -45360.674033, 10634.633559, 0.00, -11652.364691, 
> 15039.566630, -1202.539106, -293517.883778, 56991.742991, -183046.845592, 
> 148311.355507, 42231.381747, -12524.38008

[jira] [Updated] (SPARK-4006) Spark Driver crashes whenever an Executor is registered twice

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4006:
---
Fix Version/s: (was: 1.3.0)
   1.2.0

> Spark Driver crashes whenever an Executor is registered twice
> -
>
> Key: SPARK-4006
> URL: https://issues.apache.org/jira/browse/SPARK-4006
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 0.9.2, 1.0.2, 1.1.0, 1.2.0
> Environment: Mesos, Coarse Grained
>Reporter: Tal Sliwowicz
>Assignee: Tal Sliwowicz
>Priority: Critical
> Fix For: 1.1.1, 1.2.0, 1.0.3
>
>
> This is a huge robustness issue for us (Taboola), in mission critical , time 
> sensitive (real time) spark jobs.
> We have long running spark drivers and even though we have state of the art 
> hardware, from time to time executors disconnect. In many cases, the 
> RemoveExecutor is not received, and when the new executor registers, the 
> driver crashes. In mesos coarse grained, executor ids are fixed. 
> The issue is with the System.exit(1) in BlockManagerMasterActor
> {code}
> private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
> ActorRef) {
> if (!blockManagerInfo.contains(id)) {
>   blockManagerIdByExecutor.get(id.executorId) match {
> case Some(manager) =>
>   // A block manager of the same executor already exists.
>   // This should never happen. Let's just quit.
>   logError("Got two different block manager registrations on " + 
> id.executorId)
>   System.exit(1)
> case None =>
>   blockManagerIdByExecutor(id.executorId) = id
>   }
>   logInfo("Registering block manager %s with %s RAM".format(
> id.hostPort, Utils.bytesToString(maxMemSize)))
>   blockManagerInfo(id) =
> new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, 
> slaveActor)
> }
> listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4006) Spark Driver crashes whenever an Executor is registered twice

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4006.

  Resolution: Fixed
Target Version/s: 1.2.0, 1.1.1, 1.0.3  (was: 1.1.1, 1.2.0, 1.0.3)

I verified that this is present in branch 1.2, 1.1, and 1.0, so I'm resolving 
it.

> Spark Driver crashes whenever an Executor is registered twice
> -
>
> Key: SPARK-4006
> URL: https://issues.apache.org/jira/browse/SPARK-4006
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 0.9.2, 1.0.2, 1.1.0, 1.2.0
> Environment: Mesos, Coarse Grained
>Reporter: Tal Sliwowicz
>Assignee: Tal Sliwowicz
>Priority: Critical
> Fix For: 1.0.3, 1.2.0, 1.1.1
>
>
> This is a huge robustness issue for us (Taboola), in mission critical , time 
> sensitive (real time) spark jobs.
> We have long running spark drivers and even though we have state of the art 
> hardware, from time to time executors disconnect. In many cases, the 
> RemoveExecutor is not received, and when the new executor registers, the 
> driver crashes. In mesos coarse grained, executor ids are fixed. 
> The issue is with the System.exit(1) in BlockManagerMasterActor
> {code}
> private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
> ActorRef) {
> if (!blockManagerInfo.contains(id)) {
>   blockManagerIdByExecutor.get(id.executorId) match {
> case Some(manager) =>
>   // A block manager of the same executor already exists.
>   // This should never happen. Let's just quit.
>   logError("Got two different block manager registrations on " + 
> id.executorId)
>   System.exit(1)
> case None =>
>   blockManagerIdByExecutor(id.executorId) = id
>   }
>   logInfo("Registering block manager %s with %s RAM".format(
> id.hostPort, Utils.bytesToString(maxMemSize)))
>   blockManagerInfo(id) =
> new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, 
> slaveActor)
> }
> listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4184) Improve Spark Streaming documentation

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4184:
---
Fix Version/s: (was: 1.3.0)

> Improve Spark Streaming documentation
> -
>
> Key: SPARK-4184
> URL: https://issues.apache.org/jira/browse/SPARK-4184
> Project: Spark
>  Issue Type: Documentation
>  Components: Streaming
>Reporter: Chris Fregly
>  Labels: documentation, streaming
>
> Improve Streaming documentation including API descriptions, 
> concurrency/thread safety, fault tolerance, replication, checkpointing, 
> scalability, resource allocation and utilization, back pressure, and 
> monitoring.
> also, add a section to the kinesis streaming guide describing how to use IAM 
> roles with the Spark Kinesis Receiver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4180) SparkContext constructor should throw exception if another SparkContext is already running

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4180:
---
Fix Version/s: (was: 1.3.0)
   1.2.0

> SparkContext constructor should throw exception if another SparkContext is 
> already running
> --
>
> Key: SPARK-4180
> URL: https://issues.apache.org/jira/browse/SPARK-4180
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> Spark does not currently support multiple concurrently-running SparkContexts 
> in the same JVM (see SPARK-2243).  Therefore, SparkContext's constructor 
> should throw an exception if there is an active SparkContext that has not 
> been shut down via {{stop()}}.
> PySpark already does this, but the Scala SparkContext should do the same 
> thing.  The current behavior with multiple active contexts is unspecified / 
> not understood and it may be the source of confusing errors (see the user 
> error report in SPARK-4080, for example).
> This should be pretty easy to add: just add a {{activeSparkContext}} field to 
> the SparkContext companion object and {{synchronize}} on it in the 
> constructor and {{stop()}} methods; see PySpark's {{context.py}} file for an 
> example of this approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4231:
---
Target Version/s: 1.3.0  (was: 1.2.0)

> Add RankingMetrics to examples.MovieLensALS
> ---
>
> Key: SPARK-4231
> URL: https://issues.apache.org/jira/browse/SPARK-4231
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 1.2.0
>Reporter: Debasish Das
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> examples.MovieLensALS computes RMSE for movielens dataset but after addition 
> of RankingMetrics and enhancements to ALS, it is critical to look at not only 
> the RMSE but also measures like prec@k and MAP.
> In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and 
> also added a flag that takes an input whether user/product recommendation is 
> being validated.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4237) Generate right Manifest File for maven building

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4237:
---
Fix Version/s: (was: 1.3.0)

> Generate right Manifest File for maven building
> ---
>
> Key: SPARK-4237
> URL: https://issues.apache.org/jira/browse/SPARK-4237
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Now build spark with maven produce the Manifest File of guava,
> we should make right Manifest File for Maven building



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4258) NPE with new Parquet Filters

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4258:
---
Fix Version/s: (was: 1.3.0)

> NPE with new Parquet Filters
> 
>
> Key: SPARK-4258
> URL: https://issues.apache.org/jira/browse/SPARK-4258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.2.0
>
>
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 21.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 21.0 (TID 160, ip-10-0-247-144.us-west-2.compute.internal): 
> java.lang.NullPointerException: 
> parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:206)
> parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:162)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:100)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:210)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$Or.accept(Operators.java:302)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:201)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$And.accept(Operators.java:290)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.canDrop(StatisticsFilter.java:52)
> parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:46)
> parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22)
> 
> parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108)
> 
> parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28)
> 
> parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158)
> 
> parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
> {code}
> This occurs when reading parquet data encoded with the older version of the 
> library for TPC-DS query 34.  Will work on coming up with a smaller 
> reproduction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4237) Generate right Manifest File for maven building

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4237:
---
Target Version/s:   (was: 1.2.0)

> Generate right Manifest File for maven building
> ---
>
> Key: SPARK-4237
> URL: https://issues.apache.org/jira/browse/SPARK-4237
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Now build spark with maven produce the Manifest File of guava,
> we should make right Manifest File for Maven building



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4231:
---
Fix Version/s: (was: 1.3.0)

> Add RankingMetrics to examples.MovieLensALS
> ---
>
> Key: SPARK-4231
> URL: https://issues.apache.org/jira/browse/SPARK-4231
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 1.2.0
>Reporter: Debasish Das
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> examples.MovieLensALS computes RMSE for movielens dataset but after addition 
> of RankingMetrics and enhancements to ALS, it is critical to look at not only 
> the RMSE but also measures like prec@k and MAP.
> In this JIRA we added RMSE and MAP computation for examples.MovieLensALS and 
> also added a flag that takes an input whether user/product recommendation is 
> being validated.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4258) NPE with new Parquet Filters

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4258.

Resolution: Fixed

> NPE with new Parquet Filters
> 
>
> Key: SPARK-4258
> URL: https://issues.apache.org/jira/browse/SPARK-4258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.2.0
>
>
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 21.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 21.0 (TID 160, ip-10-0-247-144.us-west-2.compute.internal): 
> java.lang.NullPointerException: 
> parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:206)
> parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:162)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:100)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:210)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$Or.accept(Operators.java:302)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:201)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$And.accept(Operators.java:290)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.canDrop(StatisticsFilter.java:52)
> parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:46)
> parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22)
> 
> parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108)
> 
> parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28)
> 
> parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158)
> 
> parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
> {code}
> This occurs when reading parquet data encoded with the older version of the 
> library for TPC-DS query 34.  Will work on coming up with a smaller 
> reproduction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4258) NPE with new Parquet Filters

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4258:
---
Fix Version/s: 1.2.0

> NPE with new Parquet Filters
> 
>
> Key: SPARK-4258
> URL: https://issues.apache.org/jira/browse/SPARK-4258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.2.0
>
>
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 21.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 21.0 (TID 160, ip-10-0-247-144.us-west-2.compute.internal): 
> java.lang.NullPointerException: 
> parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:206)
> parquet.io.api.Binary$ByteArrayBackedBinary.compareTo(Binary.java:162)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:100)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$Eq.accept(Operators.java:162)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:210)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$Or.accept(Operators.java:302)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:201)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.visit(StatisticsFilter.java:47)
> parquet.filter2.predicate.Operators$And.accept(Operators.java:290)
> 
> parquet.filter2.statisticslevel.StatisticsFilter.canDrop(StatisticsFilter.java:52)
> parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:46)
> parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:22)
> 
> parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:108)
> 
> parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:28)
> 
> parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:158)
> 
> parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
> {code}
> This occurs when reading parquet data encoded with the older version of the 
> library for TPC-DS query 34.  Will work on coming up with a smaller 
> reproduction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4355) OnlineSummarizer doesn't merge mean correctly

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4355:
---
Fix Version/s: (was: 1.3.0)
   1.2.0

> OnlineSummarizer doesn't merge mean correctly
> -
>
> Key: SPARK-4355
> URL: https://issues.apache.org/jira/browse/SPARK-4355
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>  Labels: backport-needed
> Fix For: 1.1.1, 1.2.0
>
>
> It happens when the mean on one side is zero. I will send an PR with some 
> code clean-up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4362) Make prediction probability available in NaiveBayesModel

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4362:
---
Fix Version/s: (was: 1.3.0)

> Make prediction probability available in NaiveBayesModel
> 
>
> Key: SPARK-4362
> URL: https://issues.apache.org/jira/browse/SPARK-4362
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Jatinpreet Singh
>Priority: Minor
>  Labels: naive-bayes
>
> There is currently no way to get the posterior probability of a prediction 
> with Naive Baye's model during prediction. This should be made available 
> along with the label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4355) OnlineSummarizer doesn't merge mean correctly

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4355:
---
Labels: backport-needed  (was: )

> OnlineSummarizer doesn't merge mean correctly
> -
>
> Key: SPARK-4355
> URL: https://issues.apache.org/jira/browse/SPARK-4355
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>  Labels: backport-needed
> Fix For: 1.1.1, 1.2.0
>
>
> It happens when the mean on one side is zero. I will send an PR with some 
> code clean-up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4422) In some cases, Vectors.fromBreeze get wrong results.

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4422:
---
Fix Version/s: (was: 1.3.0)

> In some cases, Vectors.fromBreeze get wrong results.
> 
>
> Key: SPARK-4422
> URL: https://issues.apache.org/jira/browse/SPARK-4422
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>Priority: Minor
>  Labels: backport-needed
>
> {noformat}
> import breeze.linalg.{DenseVector => BDV, DenseMatrix => BDM, sum => brzSum} 
> var x = BDM.zeros[Double](10, 10)
> val v = Vectors.fromBreeze(x(::, 0))
> assert(v.size == x.rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4261) make right version info for beeline

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4261:
---
Fix Version/s: (was: 1.3.0)

> make right version info for beeline
> ---
>
> Key: SPARK-4261
> URL: https://issues.apache.org/jira/browse/SPARK-4261
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Running with spark sql jdbc/odbc, the output will be
> JackydeMacBook-Pro:spark1 jackylee$ bin/beeline 
> Spark assembly has been built with Hive, including Datanucleus jars on 
> classpath
> Beeline version ??? by Apache Hive
> we should make right version info for beeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4422) In some cases, Vectors.fromBreeze get wrong results.

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4422:
---
Fix Version/s: 1.2.0

> In some cases, Vectors.fromBreeze get wrong results.
> 
>
> Key: SPARK-4422
> URL: https://issues.apache.org/jira/browse/SPARK-4422
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> {noformat}
> import breeze.linalg.{DenseVector => BDV, DenseMatrix => BDM, sum => brzSum} 
> var x = BDM.zeros[Double](10, 10)
> val v = Vectors.fromBreeze(x(::, 0))
> assert(v.size == x.rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4449) specify port range in spark

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4449:
---
Fix Version/s: (was: 1.3.0)

> specify port range in spark
> ---
>
> Key: SPARK-4449
> URL: https://issues.apache.org/jira/browse/SPARK-4449
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: wangfei
>
>  In some case, we need specify port range used in spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4422) In some cases, Vectors.fromBreeze get wrong results.

2014-12-27 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259310#comment-14259310
 ] 

Patrick Wendell commented on SPARK-4422:


[~mengxr] so should we just close this and decide not to backport it then?

> In some cases, Vectors.fromBreeze get wrong results.
> 
>
> Key: SPARK-4422
> URL: https://issues.apache.org/jira/browse/SPARK-4422
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.2.0
>
>
> {noformat}
> import breeze.linalg.{DenseVector => BDV, DenseMatrix => BDM, sum => brzSum} 
> var x = BDM.zeros[Double](10, 10)
> val v = Vectors.fromBreeze(x(::, 0))
> assert(v.size == x.rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4553) query for parquet table with string fields in spark sql hive get binary result

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4553:
---
Fix Version/s: (was: 1.3.0)

> query for parquet table with string fields in spark sql hive get binary result
> --
>
> Key: SPARK-4553
> URL: https://issues.apache.org/jira/browse/SPARK-4553
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> run 
> create table test_parquet(key int, value string) stored as parquet;
> insert into table test_parquet select * from src;
> select * from test_parquet;
> get result as follow
> ...
> 282 [B@38fda3b
> 138 [B@1407a24
> 238 [B@12de6fb
> 419 [B@6c97695
> 15 [B@4885067
> 118 [B@156a8d3
> 72 [B@65d20dd
> 90 [B@4c18906
> 307 [B@60b24cc
> 19 [B@59cf51b
> 435 [B@39fdf37
> 10 [B@4f799d7
> 277 [B@3950951
> 273 [B@596bf4b
> 306 [B@3e91557
> 224 [B@3781d61
> 309 [B@2d0d128



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4559) Adding support for ucase and lcase

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4559:
---
Fix Version/s: (was: 1.3.0)

> Adding support for ucase and lcase
> --
>
> Key: SPARK-4559
> URL: https://issues.apache.org/jira/browse/SPARK-4559
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Adding support for ucase and lcase in spark sql



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4574) Adding support for defining schema in foreign DDL commands.

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4574:
---
Fix Version/s: (was: 1.3.0)

> Adding support for defining schema in foreign DDL commands.
> ---
>
> Key: SPARK-4574
> URL: https://issues.apache.org/jira/browse/SPARK-4574
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Adding support for defining schema in foreign DDL commands. Now foreign DDL 
> support commands like:
>CREATE TEMPORARY TABLE avroTable
>USING org.apache.spark.sql.avro
>OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
> Let user can define schema instead of infer from file, so we can support ddl 
> command as follows:
>CREATE TEMPORARY TABLE avroTable(a int, b string)
>USING org.apache.spark.sql.avro
>OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4574) Adding support for defining schema in foreign DDL commands.

2014-12-27 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259311#comment-14259311
 ] 

Patrick Wendell commented on SPARK-4574:


[~scwf] when creating issues please to not set the "Fix Version/s" field. That 
field is only meant to be set by a committer once a patch is merged into a 
specific branch.

> Adding support for defining schema in foreign DDL commands.
> ---
>
> Key: SPARK-4574
> URL: https://issues.apache.org/jira/browse/SPARK-4574
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: wangfei
>
> Adding support for defining schema in foreign DDL commands. Now foreign DDL 
> support commands like:
>CREATE TEMPORARY TABLE avroTable
>USING org.apache.spark.sql.avro
>OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
> Let user can define schema instead of infer from file, so we can support ddl 
> command as follows:
>CREATE TEMPORARY TABLE avroTable(a int, b string)
>USING org.apache.spark.sql.avro
>OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4699) Make caseSensitive configurable in Analyzer.scala

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4699:
---
Fix Version/s: (was: 1.3.0)

> Make caseSensitive configurable in Analyzer.scala
> -
>
> Key: SPARK-4699
> URL: https://issues.apache.org/jira/browse/SPARK-4699
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Jacky Li
>
> Currently, case sensitivity is true by default in Analyzer. It should be 
> configurable by setting SQLConf in the client application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4501) Create build/mvn to automatically download maven/zinc/scalac

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4501:
---
Assignee: Brennon York  (was: Prashant Sharma)

> Create build/mvn to automatically download maven/zinc/scalac
> 
>
> Key: SPARK-4501
> URL: https://issues.apache.org/jira/browse/SPARK-4501
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Brennon York
> Fix For: 1.3.0
>
>
> For a long time we've had the sbt/sbt and this works well for users who want 
> to build Spark with minimal dependencies (only Java). It would be nice to 
> generalize this to maven as well and have build/sbt and build/mvn, where 
> build/mvn was a script that downloaded Maven, Zinc, and Scala locally and set 
> them up correctly. This would be totally "opt in" and people using system 
> maven would be able to continue doing so.
> My sense is that very few maven users are currently using Zinc even though 
> from some basic tests I saw a huge improvement from using this. Also, having 
> a simple way to use Zinc would make it easier to use Maven on our jenkins 
> test machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4501) Create build/mvn to automatically download maven/zinc/scalac

2014-12-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4501.

   Resolution: Fixed
Fix Version/s: 1.3.0

> Create build/mvn to automatically download maven/zinc/scalac
> 
>
> Key: SPARK-4501
> URL: https://issues.apache.org/jira/browse/SPARK-4501
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Brennon York
> Fix For: 1.3.0
>
>
> For a long time we've had the sbt/sbt and this works well for users who want 
> to build Spark with minimal dependencies (only Java). It would be nice to 
> generalize this to maven as well and have build/sbt and build/mvn, where 
> build/mvn was a script that downloaded Maven, Zinc, and Scala locally and set 
> them up correctly. This would be totally "opt in" and people using system 
> maven would be able to continue doing so.
> My sense is that very few maven users are currently using Zinc even though 
> from some basic tests I saw a huge improvement from using this. Also, having 
> a simple way to use Zinc would make it easier to use Maven on our jenkins 
> test machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries

2014-12-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4908:
---
Target Version/s: 1.2.1

> Spark SQL built for Hive 13 fails under concurrent metadata queries
> ---
>
> Key: SPARK-4908
> URL: https://issues.apache.org/jira/browse/SPARK-4908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: David Ross
>Priority: Critical
>
> We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: 
> https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6
> We are using Spark built for Hive 13, using this option:
> {{-Phive-0.13.1}}
> In single-threaded mode, normal operations look fine. However, under 
> concurrency, with at least 2 concurrent connections, metadata queries fail.
> For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} 
> statement when you pass a default schema in the JDBC URL, all fail.
> {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue.
> Here is some example code:
> {code}
> object main extends App {
>   import java.sql._
>   import scala.concurrent._
>   import scala.concurrent.duration._
>   import scala.concurrent.ExecutionContext.Implicits.global
>   Class.forName("org.apache.hive.jdbc.HiveDriver")
>   val host = "localhost" // update this
>   val url = s"jdbc:hive2://${host}:10511/some_db" // update this
>   val future = Future.traverse(1 to 3) { i =>
> Future {
>   println("Starting: " + i)
>   try {
> val conn = DriverManager.getConnection(url)
>   } catch {
> case e: Throwable => e.printStackTrace()
> println("Failed: " + i)
>   }
>   println("Finishing: " + i)
> }
>   }
>   Await.result(future, 2.minutes)
>   println("done!")
> }
> {code}
> Here is the output:
> {code}
> Starting: 1
> Starting: 3
> Starting: 2
> java.sql.SQLException: 
> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation 
> cancelled
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231)
>   at 
> org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:195)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:270)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Failed: 3
> Finishing: 3
> java.sql.SQLException: 
> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation 
> cancelled
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231)
>   at 
> org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:195)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:270)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedT

[jira] [Updated] (SPARK-5008) Persistent HDFS does not recognize EBS Volumes

2014-12-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5008:
---
Component/s: EC2

> Persistent HDFS does not recognize EBS Volumes
> --
>
> Key: SPARK-5008
> URL: https://issues.apache.org/jira/browse/SPARK-5008
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.2.0
> Environment: 8 Node Cluster Generated from 1.2.0 spark-ec2 script.
> -m c3.2xlarge -t c3.8xlarge --ebs-vol-size 300 --ebs-vol-type gp2 
> --ebs-vol-num 1
>Reporter: Brad Willard
>
> Cluster is built with correct size EBS volumes. It creates the volume at 
> /dev/xvds and it mounted to /vol0. However when you start persistent hdfs 
> with start-all script, it starts but it isn't correctly configured to use the 
> EBS volume.
> I'm assuming some sym links or expected mounts are not correctly configured.
> This has worked flawlessly on all previous versions of spark.
> I have a stupid workaround by installing pssh and mucking with it by mounting 
> it to /vol, which worked, however it doesn't not work between restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5008) Persistent HDFS does not recognize EBS Volumes

2014-12-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5008:
---
Labels:   (was: amazon aws ec2 hdfs persistent)

> Persistent HDFS does not recognize EBS Volumes
> --
>
> Key: SPARK-5008
> URL: https://issues.apache.org/jira/browse/SPARK-5008
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.2.0
> Environment: 8 Node Cluster Generated from 1.2.0 spark-ec2 script.
> -m c3.2xlarge -t c3.8xlarge --ebs-vol-size 300 --ebs-vol-type gp2 
> --ebs-vol-num 1
>Reporter: Brad Willard
>
> Cluster is built with correct size EBS volumes. It creates the volume at 
> /dev/xvds and it mounted to /vol0. However when you start persistent hdfs 
> with start-all script, it starts but it isn't correctly configured to use the 
> EBS volume.
> I'm assuming some sym links or expected mounts are not correctly configured.
> This has worked flawlessly on all previous versions of spark.
> I have a stupid workaround by installing pssh and mucking with it by mounting 
> it to /vol, which worked, however it doesn't not work between restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5025) Write a guide for creating well-formed packages for Spark

2014-12-30 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5025:
--

 Summary: Write a guide for creating well-formed packages for Spark
 Key: SPARK-5025
 URL: https://issues.apache.org/jira/browse/SPARK-5025
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell


There are an increasing number of OSS projects providing utilities and 
extensions to Spark. We should write a guide in the Spark docs that explains 
how to create, package, and publish a third party Spark library. There are a 
few issues here such as how to list your dependency on Spark, how to deal with 
your own third party dependencies, etc. We should also cover how to do this for 
Python libraries.

In general, we should make it easy to build extension points against any of 
Spark's API's (e.g. for new data sources, streaming receivers, ML algos, etc) 
and self-publish libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4737) Prevent serialization errors from ever crashing the DAG scheduler

2015-01-05 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4737:
---
Affects Version/s: 1.0.2
   1.1.1

> Prevent serialization errors from ever crashing the DAG scheduler
> -
>
> Key: SPARK-4737
> URL: https://issues.apache.org/jira/browse/SPARK-4737
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Patrick Wendell
>Assignee: Matthew Cheah
>Priority: Blocker
>
> Currently in Spark we assume that when tasks are serialized in the 
> TaskSetManager that the serialization cannot fail. We assume this because 
> upstream in the DAGScheduler we attempt to catch any serialization errors by 
> serializing a single partition. However, in some cases this upstream test is 
> not accurate - i.e. an RDD can have one partition that can serialize cleanly 
> but not others.
> Do do this in the proper way we need to catch and propagate the exception at 
> the time of serialization. The tricky bit is making sure it gets propagated 
> in the right way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4687) SparkContext#addFile doesn't keep file folder information

2015-01-05 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265416#comment-14265416
 ] 

Patrick Wendell commented on SPARK-4687:


I spent some more time looking at this and talking with [~sandyr] and 
[~joshrosen]. I think having some limited version of this is fine given that, 
from what I can tell, this is pretty difficult to implement outside of Spark. I 
am going to post further comments on the JIRA.

> SparkContext#addFile doesn't keep file folder information
> -
>
> Key: SPARK-4687
> URL: https://issues.apache.org/jira/browse/SPARK-4687
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Jimmy Xiang
>
> Files added with SparkContext#addFile are loaded with Utils#fetchFile before 
> a task starts. However, Utils#fetchFile puts all files under the Spart root 
> on the worker node. We should have an option to keep the folder information. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5113:
--

 Summary: Audit and document use of hostnames and IP addresses in 
Spark
 Key: SPARK-5113
 URL: https://issues.apache.org/jira/browse/SPARK-5113
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Critical


Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind interface also (e.g. I think this happens in 
the connection manager and possibly akka). In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind hostname also (e.g. I think this happens in 
the connection manager and possibly akka) - which will likely internally result 
in a re-resolution of this to an IP address. In other cases (the web UI and 
netty shuffle) we seem to bind to all interfaces.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind interface also (e.g. I think this happens in 
the connection manager and possibly akka). In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.


> Audit and document use of hostnames and IP addresses in Spark
> -
>
> Key: SPARK-5113
> URL: https://issues.apache.org/jira/browse/SPARK-5113
> Project: Spark
>  Issue Type: Bug
>Reporter: Patrick Wendell
>Priority: Critical
>
> Spark has multiple network components that start servers and advertise their 
> network addresses to other processes.
> We should go through each of these components and make sure they have 
> consistent and/or documented behavior wrt (a) what interface(s) they bind to 
> and (b) what hostname they use to advertise themselves to other processes. We 
> should document this clearly and explain to people what to do in different 
> cases (e.g. EC2, dockerized containers, etc).
> When Spark initializes, it will search for a network interface until it finds 
> one that is not a loopback address. Then it will do a reverse DNS lookup for 
> a hostname associated with that interface. Then the network components will 
> use that hostname to advertise the component to other processes. In some 
> cases, that hostname is used as the bind hostname also (e.g. I think this 
> happens in the connection manager and possibly akka) - which will likely 
> internally result in a re-resolution of this to an IP address. In other cases 
> (the web UI and netty shuffle) we seem to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier. In some cases, that hostname 
is used as the bind hostname also (e.g. I think this happens in the connection 
manager and possibly akka) - which will likely internally result in a 
re-resolution of this to an IP address. In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind hostname also (e.g. I think this happens in 
the connection manager and possibly akka) - which will likely internally result 
in a re-resolution of this to an IP address. In other cases (the web UI and 
netty shuffle) we seem to bind to all interfaces.


> Audit and document use of hostnames and IP addresses in Spark
> -
>
> Key: SPARK-5113
> URL: https://issues.apache.org/jira/browse/SPARK-5113
> Project: Spark
>  Issue Type: Bug
>Reporter: Patrick Wendell
>Priority: Critical
>
> Spark has multiple network components that start servers and advertise their 
> network addresses to other processes.
> We should go through each of these components and make sure they have 
> consistent and/or documented behavior wrt (a) what interface(s) they bind to 
> and (b) what hostname they use to advertise themselves to other processes. We 
> should document this clearly and explain to people what to do in different 
> cases (e.g. EC2, dockerized containers, etc).
> When Spark initializes, it will search for a network interface until it finds 
> one that is not a loopback address. Then it will do a reverse DNS lookup for 
> a hostname associated with that interface. Then the network components will 
> use that hostname to advertise the component to other processes. That 
> hostname is also the one used for the akka system identifier. In some cases, 
> that hostname is used as the bind hostname also (e.g. I think this happens in 
> the connection manager and possibly akka) - which will likely internally 
> result in a re-resolution of this to an IP address. In other cases (the web 
> UI and netty shuffle) we seem to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier (akka supports only supplying 
a single name which it uses both as the bind interface and as the actor 
identifier). In some cases, that hostname is used as the bind hostname also 
(e.g. I think this happens in the connection manager and possibly akka) - which 
will likely internally result in a re-resolution of this to an IP address. In 
other cases (the web UI and netty shuffle) we seem to bind to all interfaces.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier. In some cases, that hostname 
is used as the bind hostname also (e.g. I think this happens in the connection 
manager and possibly akka) - which will likely internally result in a 
re-resolution of this to an IP address. In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.


> Audit and document use of hostnames and IP addresses in Spark
> -
>
> Key: SPARK-5113
> URL: https://issues.apache.org/jira/browse/SPARK-5113
> Project: Spark
>  Issue Type: Bug
>Reporter: Patrick Wendell
>Priority: Critical
>
> Spark has multiple network components that start servers and advertise their 
> network addresses to other processes.
> We should go through each of these components and make sure they have 
> consistent and/or documented behavior wrt (a) what interface(s) they bind to 
> and (b) what hostname they use to advertise themselves to other processes. We 
> should document this clearly and explain to people what to do in different 
> cases (e.g. EC2, dockerized containers, etc).
> When Spark initializes, it will search for a network interface until it finds 
> one that is not a loopback address. Then it will do a reverse DNS lookup for 
> a hostname associated with that interface. Then the network components will 
> use that hostname to advertise the component to other processes. That 
> hostname is also the one used for the akka system identifier (akka supports 
> only supplying a single name which it uses both as the bind interface and as 
> the actor identifier). In some cases, that hostname is used as the bind 
> hostname also (e.g. I think this happens in the connection manager and 
> possibly akka) - which will likely internally result in a re-resolution of 
> this to an IP address. In other cases (the web UI and netty shuffle) we seem 
> to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier (akka supports only supplying 
a single name which it uses both as the bind interface and as the actor 
identifier). In some cases, that hostname is used as the bind hostname also 
(e.g. I think this happens in the connection manager and possibly akka) - which 
will likely internally result in a re-resolution of this to an IP address. In 
other cases (the web UI and netty shuffle) we seem to bind to all interfaces.

The best outcome would be to have three configs that can be set on each machine:

{code}
SPARK_LOCAL_IP # Ip address we bind to for all services
SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within the 
cluster
SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the 
cluster (e.g. the UI)
{code}

It's not clear how easily we can support that scheme while providing backwards 
compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - it's just an 
alias for what is now SPARK_PUBLIC_DNS.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier (akka supports only supplying 
a single name which it uses both as the bind interface and as the actor 
identifier). In some cases, that hostname is used as the bind hostname also 
(e.g. I think this happens in the connection manager and possibly akka) - which 
will likely internally result in a re-resolution of this to an IP address. In 
other cases (the web UI and netty shuffle) we seem to bind to all interfaces.


> Audit and document use of hostnames and IP addresses in Spark
> -
>
> Key: SPARK-5113
> URL: https://issues.apache.org/jira/browse/SPARK-5113
> Project: Spark
>  Issue Type: Bug
>Reporter: Patrick Wendell
>Priority: Critical
>
> Spark has multiple network components that start servers and advertise their 
> network addresses to other processes.
> We should go through each of these components and make sure they have 
> consistent and/or documented behavior wrt (a) what interface(s) they bind to 
> and (b) what hostname they use to advertise themselves to other processes. We 
> should document this clearly and explain to people what to do in different 
> cases (e.g. EC2, dockerized containers, etc).
> When Spark initializes, it will search for a network interface until it finds 
> one that is not a loopback address. Then it will do a reverse DNS lookup for 
> a hostname associated with that interface. Then the network components will 
> use that hostname to advertise the component to other processes. That 
> hostname is also the one used for the akka system identifier (akka supports 
> only supplying a single name which it uses both as the bind interface and as 
> the actor identifier). In some cases, that hostname is used as the bind 
> hostname also (e.g. I think this happens in the connection manager and 
> possibly akka) - which will likely internally result in a re-resolution of 
> this to an IP address. In other cases (the web UI and netty shuffle) we seem 
> to bind to all interfaces.
> The best outcome would be to have three configs that can be set on each 
>

[jira] [Updated] (SPARK-5097) Adding data frame APIs to SchemaRDD

2015-01-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5097:
---
Priority: Critical  (was: Major)

> Adding data frame APIs to SchemaRDD
> ---
>
> Key: SPARK-5097
> URL: https://issues.apache.org/jira/browse/SPARK-5097
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
> Attachments: DesignDocAddingDataFrameAPIstoSchemaRDD.pdf
>
>
> SchemaRDD, through its DSL, already provides common data frame 
> functionalities. However, the DSL was originally created for constructing 
> test cases without much end-user usability and API stability consideration. 
> This design doc proposes a set of API changes for Scala and Python to make 
> the SchemaRDD DSL API more usable and stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

2015-01-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267419#comment-14267419
 ] 

Patrick Wendell commented on SPARK-1529:


Hey Sean,

>From what I remember of this, the issue is that MapR clusters are not 
>typically provisioned with much local disk space available, because the MapRFS 
>supports accessing "local" volumes in its API, unlike the HDFS API. So in 
>general the expectation is that large amounts of local data should be written 
>through MapR's API to its local filesystem. They have an NFS mount you can use 
>as a work around to provide POSIX API's, and I think most MapR users set this 
>mount up and then have Spark write shuffle data there.

Option 2 which [~rkannan82] mentions is not actually feasible in Spark right 
now. We don't support writing shuffle data through the Hadoop API's right now 
and I think Cheng's patch was only a prototype of how we might do that...

> Support setting spark.local.dirs to a hadoop FileSystem 
> 
>
> Key: SPARK-1529
> URL: https://issues.apache.org/jira/browse/SPARK-1529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Cheng Lian
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

2015-01-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267424#comment-14267424
 ] 

Patrick Wendell commented on SPARK-1529:


BTW - I think if MapR wants to have a customized shuffle, the direction 
proposed in this patch is probably not the best way to do it. It would make 
more sense to implement a DFS-based shuffle using the new pluggable shuffle 
API. I.e. a shuffle that communicates through the filesystem rather than doing 
transfers through Spark.

> Support setting spark.local.dirs to a hadoop FileSystem 
> 
>
> Key: SPARK-1529
> URL: https://issues.apache.org/jira/browse/SPARK-1529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Cheng Lian
>
> In some environments, like with MapR, local volumes are accessed through the 
> Hadoop filesystem interface. We should allow setting spark.local.dir to a 
> Hadoop filesystem location. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6087) Provide actionable exception if Kryo buffer is not large enough

2015-03-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6087:
---
Labels: starter  (was: )

> Provide actionable exception if Kryo buffer is not large enough
> ---
>
> Key: SPARK-6087
> URL: https://issues.apache.org/jira/browse/SPARK-6087
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Patrick Wendell
>Priority: Critical
>  Labels: starter
>
> Right now if you don't have a large enough Kryo buffer, you get a really 
> confusing exception. I noticed this when using Kryo to serialize broadcasted 
> tables in Spark SQL. We should catch-then-rethrow this in the KryoSerializer, 
> wrapping it in a message that suggests increasing the Kryo buffer size 
> configuration variable.
> {code}
> com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, 
> required: 3
> Serialization trace:
> value (org.apache.spark.sql.catalyst.expressions.MutableAny)
> values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow)
>   at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
>   at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446)
>   at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
>   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:167)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:234)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> /cc [~kayousterhout] who helped report his issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6066) Metadata in event log makes it very difficult for external libraries to parse event log

2015-03-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-6066.

   Resolution: Fixed
Fix Version/s: 1.3.0

Thanks Andrew and Marcelo for your work on this patch.

> Metadata in event log makes it very difficult for external libraries to parse 
> event log
> ---
>
> Key: SPARK-6066
> URL: https://issues.apache.org/jira/browse/SPARK-6066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Kay Ousterhout
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.3.0
>
>
> The fix for SPARK-2261 added a line at the beginning of the event log that 
> encodes metadata.  This line makes it much more difficult to parse the event 
> logs from external libraries (like 
> https://github.com/kayousterhout/trace-analysis, which is used by folks at 
> Berkeley) because:
> (1) The metadata is not written as JSON, unlike the rest of the file
> (2) More annoyingly, if the file is compressed, the metadata is not 
> compressed.  This has a few side-effects: first, someone can't just use the 
> command line to uncompress the file and then look at the logs, because the 
> file is in this weird half-compressed format; and second, now external tools 
> that parse these logs also need to deal with this weird format.
> We should fix this before the 1.3 release, because otherwise we'll have to 
> add a bunch more backward-compatibility code to handle this weird format!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6048) SparkConf.translateConfKey should not translate on set

2015-03-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-6048.

   Resolution: Fixed
Fix Version/s: 1.3.0

> SparkConf.translateConfKey should not translate on set
> --
>
> Key: SPARK-6048
> URL: https://issues.apache.org/jira/browse/SPARK-6048
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.3.0
>
>
> There are several issues with translating on set.
> (1) The most serious one is that if the user has both the deprecated and the 
> latest version of the same config set, then the value picked up by SparkConf 
> will be arbitrary. Why? Because during initialization of the conf we call 
> `conf.set` on each property in `sys.props` in an order arbitrarily defined by 
> Java. As a result, the value of the more recent config may be overridden by 
> that of the deprecated one. Instead, we should always use the value of the 
> most recent config.
> (2) If we translate on set, then we must keep translating everywhere else. In 
> fact, the current code does not translate on remove, which means the 
> following won't work if X is deprecated:
> {code}
> conf.set(X, Y)
> conf.remove(X) // X is not in the conf
> {code}
> This requires us to also translate in remove and other places, as we already 
> do for contains, leading to more duplicate code.
> (3) Since we call `conf.set` on all configs when initializing the conf, we 
> print all deprecation warnings in the beginning. Elsewhere in Spark, however, 
> we warn the user when the deprecated config / option / env var is actually 
> being used.
> We should keep this consistent so the user won't expect to find all 
> deprecation messages in the beginning of his logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6122:
---
Assignee: (was: Patrick Wendell)

> Upgrade Tachyon dependency to 0.6.0
> ---
>
> Key: SPARK-6122
> URL: https://issues.apache.org/jira/browse/SPARK-6122
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.1
>Reporter: Haoyuan Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6122:
---
Target Version/s: 1.4.0

> Upgrade Tachyon dependency to 0.6.0
> ---
>
> Key: SPARK-6122
> URL: https://issues.apache.org/jira/browse/SPARK-6122
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.1
>Reporter: Haoyuan Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >