[jira] [Updated] (SPARK-1684) Merge script should standardize SPARK-XXX prefix

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1684: --- Fix Version/s: (was: 1.3.0) Merge script should standardize SPARK-XXX prefix

[jira] [Updated] (SPARK-1706) Allow multiple executors per worker in Standalone mode

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1706: --- Fix Version/s: (was: 1.3.0) Allow multiple executors per worker in Standalone mode

[jira] [Updated] (SPARK-1911) Warn users if their assembly jars are not built with Java 6

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1911: --- Fix Version/s: (was: 1.3.0) Warn users if their assembly jars are not built with Java 6

[jira] [Updated] (SPARK-1866) Closure cleaner does not null shadowed fields when outer scope is referenced

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1866: --- Fix Version/s: (was: 1.3.0) (was: 1.0.1) Closure cleaner does

[jira] [Updated] (SPARK-1792) Missing Spark-Shell Configure Options

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1792: --- Fix Version/s: (was: 1.3.0) Missing Spark-Shell Configure Options

[jira] [Updated] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1989: --- Fix Version/s: (was: 1.3.0) Exit executors faster if they get into a cycle of heavy GC

[jira] [Updated] (SPARK-1924) Make local:/ scheme work in more deploy modes

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1924: --- Fix Version/s: (was: 1.3.0) Make local:/ scheme work in more deploy modes

[jira] [Updated] (SPARK-1921) Allow duplicate jar files among the app jar and secondary jars in yarn-cluster mode

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1921: --- Fix Version/s: (was: 1.3.0) Allow duplicate jar files among the app jar and secondary

[jira] [Updated] (SPARK-1972) Add support for setting and visualizing custom task-related metrics

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1972: --- Fix Version/s: (was: 1.3.0) Add support for setting and visualizing custom task-related

[jira] [Updated] (SPARK-2063) Creating a SchemaRDD via sql() does not correctly resolve nested types

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2063: --- Fix Version/s: (was: 1.3.0) Creating a SchemaRDD via sql() does not correctly resolve

[jira] [Updated] (SPARK-2068) Remove other uses of @transient lazy val in physical plan nodes

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2068: --- Fix Version/s: (was: 1.3.0) Remove other uses of @transient lazy val in physical plan

[jira] [Updated] (SPARK-2584) Do not mutate block storage level on the UI

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2584: --- Fix Version/s: (was: 1.3.0) Do not mutate block storage level on the UI

[jira] [Updated] (SPARK-2167) spark-submit should return exit code based on failure/success

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2167: --- Fix Version/s: (was: 1.3.0) spark-submit should return exit code based on failure

[jira] [Resolved] (SPARK-2069) MIMA false positives (umbrella)

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2069. Resolution: Fixed Fix Version/s: (was: 1.3.0) 1.2.0 MIMA

[jira] [Updated] (SPARK-2638) Improve concurrency of fetching Map outputs

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2638: --- Fix Version/s: (was: 1.3.0) Improve concurrency of fetching Map outputs

[jira] [Updated] (SPARK-2703) Make Tachyon related unit tests execute without deploying a Tachyon system locally.

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2703: --- Fix Version/s: (was: 1.3.0) Make Tachyon related unit tests execute without deploying

[jira] [Updated] (SPARK-2624) Datanucleus jars not accessible in yarn-cluster mode

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2624: --- Fix Version/s: (was: 1.3.0) Datanucleus jars not accessible in yarn-cluster mode

[jira] [Updated] (SPARK-2722) Mechanism for escaping spark configs is not consistent

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2722: --- Fix Version/s: (was: 1.3.0) Mechanism for escaping spark configs is not consistent

[jira] [Updated] (SPARK-2757) Add Mima test for Spark Sink after 1.10 is released

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2757: --- Target Version/s: 1.3.0 (was: 1.2.0) Add Mima test for Spark Sink after 1.10 is released

[jira] [Updated] (SPARK-2757) Add Mima test for Spark Sink after 1.10 is released

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2757: --- Assignee: Hari Shreedharan Add Mima test for Spark Sink after 1.10 is released

[jira] [Updated] (SPARK-2757) Add Mima test for Spark Sink after 1.10 is released

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2757: --- Fix Version/s: (was: 1.3.0) Add Mima test for Spark Sink after 1.10 is released

[jira] [Updated] (SPARK-2793) Correctly lock directory creation in DiskBlockManager.getFile

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2793: --- Fix Version/s: (was: 1.3.0) Correctly lock directory creation

[jira] [Updated] (SPARK-2770) Rename spark-ganglia-lgpl to ganglia-lgpl

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2770: --- Fix Version/s: (was: 1.3.0) Rename spark-ganglia-lgpl to ganglia-lgpl

[jira] [Updated] (SPARK-2913) Spark's log4j.properties should always appear ahead of Hadoop's on classpath

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2913: --- Fix Version/s: (was: 1.3.0) Spark's log4j.properties should always appear ahead

[jira] [Updated] (SPARK-2794) Use Java 7 isSymlink when available

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2794: --- Fix Version/s: (was: 1.3.0) Use Java 7 isSymlink when available

[jira] [Updated] (SPARK-2795) Improve DiskBlockObjectWriter API

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2795: --- Fix Version/s: (was: 1.3.0) Improve DiskBlockObjectWriter API

[jira] [Updated] (SPARK-2914) spark.*.extraJavaOptions are evaluated too many times

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2914: --- Fix Version/s: (was: 1.3.0) spark.*.extraJavaOptions are evaluated too many times

[jira] [Updated] (SPARK-2914) spark.*.extraJavaOptions are evaluated too many times

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2914: --- Assignee: Andrew Or spark.*.extraJavaOptions are evaluated too many times

[jira] [Updated] (SPARK-2914) spark.*.extraJavaOptions are evaluated too many times

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2914: --- Fix Version/s: 1.2.0 spark.*.extraJavaOptions are evaluated too many times

[jira] [Updated] (SPARK-2973) Add a way to show tables without executing a job

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2973: --- Fix Version/s: (was: 1.3.0) 1.2.0 Add a way to show tables without

[jira] [Updated] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3039: --- Fix Version/s: (was: 1.3.0) Spark assembly for new hadoop API (hadoop 2) contains avro

[jira] [Updated] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3039: --- Fix Version/s: 1.2.0 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred

[jira] [Updated] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3039: --- Target Version/s: 1.2.0 (was: 1.1.1, 1.2.0) Spark assembly for new hadoop API (hadoop 2

[jira] [Resolved] (SPARK-3039) Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3039. Resolution: Fixed This fix did appear in Spark 1.2.0 so I'm closing this issue. Spark

[jira] [Updated] (SPARK-3403) NaiveBayes crashes with blas/lapack native libraries for breeze (netlib-java)

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3403: --- Fix Version/s: (was: 1.3.0) NaiveBayes crashes with blas/lapack native libraries

[jira] [Updated] (SPARK-3379) Implement 'POWER' for sql

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3379: --- Fix Version/s: (was: 1.3.0) Implement 'POWER' for sql

[jira] [Updated] (SPARK-3505) Augmenting SparkStreaming updateStateByKey API with timestamp

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3505: --- Fix Version/s: (was: 1.3.0) Augmenting SparkStreaming updateStateByKey API

[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3628: --- Fix Version/s: (was: 1.3.0) 1.2.0 Don't apply accumulator updates

[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3628: --- Labels: backport-needed (was: ) Don't apply accumulator updates multiple times for tasks

[jira] [Updated] (SPARK-3505) Augmenting SparkStreaming updateStateByKey API with timestamp

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3505: --- Target Version/s: 1.3.0 (was: 1.1.0) Augmenting SparkStreaming updateStateByKey API

[jira] [Updated] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3632: --- Fix Version/s: 1.2.0 ConnectionManager can run out of receive threads with authentication

[jira] [Updated] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3632: --- Labels: backport-needed (was: ) ConnectionManager can run out of receive threads

[jira] [Updated] (SPARK-3632) ConnectionManager can run out of receive threads with authentication on

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3632: --- Fix Version/s: (was: 1.3.0) ConnectionManager can run out of receive threads

[jira] [Updated] (SPARK-3987) NNLS generates incorrect result

2014-12-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3987: --- Fix Version/s: (was: 1.3.0) 1.2.0 NNLS generates incorrect result

[jira] [Commented] (SPARK-4923) Maven build should keep publishing spark-repl

2014-12-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258772#comment-14258772 ] Patrick Wendell commented on SPARK-4923: Yes - I can retro-actively publish

[jira] [Resolved] (SPARK-4953) Fix the description of building Spark with YARN

2014-12-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4953. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Kousuke Saruta Fix

[jira] [Resolved] (SPARK-4926) Spark manipulate Hbase

2014-12-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4926. Resolution: Invalid Resolving in agreement with [~sowen] Spark manipulate Hbase

[jira] [Updated] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries

2014-12-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4908: --- Component/s: SQL Spark SQL built for Hive 13 fails under concurrent metadata queries

[jira] [Resolved] (SPARK-4909) Error communicating with MapOutputTracker when run a big spark job

2014-12-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4909. Resolution: Duplicate Can you see how this job does with Spark 1.2? there was a lot of work

[jira] [Updated] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries

2014-12-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4908: --- Priority: Critical (was: Major) Spark SQL built for Hive 13 fails under concurrent metadata

Re: Problems with large dataset using collect() and broadcast()

2014-12-24 Thread Patrick Wendell
Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory? - Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang era.ye...@gmail.com wrote: Hi all, In my occasion, I have a huge

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: u...@spark.apache.org; dev@spark.apache.org Subject: Re: Question on saveAsTextFile with overwrite option Is it sufficient to set spark.hadoop.validateOutputSpecs to false? http

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
Is it sufficient to set spark.hadoop.validateOutputSpecs to false? http://spark.apache.org/docs/latest/configuration.html - Patrick On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai saisai.s...@intel.com wrote: Hi, We have such requirements to save RDD output to HDFS with saveAsTextFile like

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: user@spark.apache.org; d...@spark.apache.org Subject: Re: Question on saveAsTextFile with overwrite option Is it sufficient to set spark.hadoop.validateOutputSpecs to false? http

[jira] [Resolved] (SPARK-4079) Snappy bundled with Spark does not work on older Linux distributions

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4079. Resolution: Fixed Fix Version/s: 1.3.0 Snappy bundled with Spark does not work

[jira] [Resolved] (SPARK-4864) Add documentation to Netty-based configs

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4864. Resolution: Fixed Fix Version/s: 1.2.1 1.3.0 Add documentation

[jira] [Updated] (SPARK-4520) SparkSQL exception when reading certain columns from a parquet file

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4520: --- Target Version/s: 1.2.1 (was: 1.3.0) SparkSQL exception when reading certain columns from

[jira] [Commented] (SPARK-4923) Maven build should keep publishing spark-repl

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256308#comment-14256308 ] Patrick Wendell commented on SPARK-4923: Hey [~pc...@uowmail.edu.au] - we removed

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Fix Version/s: (was: 1.1.2) (was: 1.2.0) Publish Spark SQL hive

[jira] [Commented] (SPARK-4923) Maven build should keep publishing spark-repl

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256393#comment-14256393 ] Patrick Wendell commented on SPARK-4923: Hey [~pc...@uowmail.edu.au], thanks

[jira] [Commented] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256397#comment-14256397 ] Patrick Wendell commented on SPARK-4925: The hive-thriftserver module is just used

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Component/s: Build Publish Spark SQL hive-thriftserver maven artifact

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Affects Version/s: (was: 1.1.1) 1.2.0 Publish Spark SQL hive

[jira] [Updated] (SPARK-4923) Maven build should keep publishing spark-repl

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4923: --- Component/s: Build Maven build should keep publishing spark-repl

[jira] [Updated] (SPARK-4920) current spark version in UI is not striking

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4920: --- Assignee: uncleGen current spark version in UI is not striking

[jira] [Resolved] (SPARK-4920) current spark version in UI is not striking

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4920. Resolution: Fixed Fix Version/s: 1.2.1 I believe this has been fixed: https://git

[jira] [Commented] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256468#comment-14256468 ] Patrick Wendell commented on SPARK-4906: Hey [~mingyu.z...@gmail.com] - could you

[jira] [Updated] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4349: --- Target Version/s: 1.3.0 Spark driver hangs on sc.parallelize() if exception is thrown during

[jira] [Updated] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4349: --- Fix Version/s: (was: 1.3.0) Spark driver hangs on sc.parallelize() if exception

[jira] [Updated] (SPARK-4349) Spark driver hangs on sc.parallelize() if exception is thrown during serialization

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4349: --- Priority: Critical (was: Major) Spark driver hangs on sc.parallelize() if exception

[jira] [Updated] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener

2014-12-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4906: --- Component/s: Web UI Spark master OOMs with exception stack trace stored

Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Patrick Wendell
I also couldn't reproduce this issued. On Mon, Dec 22, 2014 at 2:24 AM, Sean Owen so...@cloudera.com wrote: I just tried the exact same command and do not see any error. Maybe you can make sure you're starting from a clean extraction of the distro, and check your environment. I'm on OSX, Maven

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Xiangrui asked me to report that it's back and running :) On Mon, Dec 22, 2014 at 3:21 PM, peng pc...@uowmail.edu.au wrote: Me 2 :) On 12/22/2014 06:14 PM, Andrew Ash wrote: Hi Xiangrui, That link is currently returning a 503 Over Quota error message. Would you mind pinging back out

Re: More general submitJob API

2014-12-22 Thread Patrick Wendell
A SparkContext is thread safe, so you can just have different threads that create their own RDD's and do actions, etc. - Patrick On Mon, Dec 22, 2014 at 4:15 PM, Alessandro Baretta alexbare...@gmail.com wrote: Andrew, Thanks, yes, this is what I wanted: basically just to start multiple jobs

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
missing we should add. - Patrick On Mon, Dec 22, 2014 at 6:17 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Does this include contributions made against the spark-ec2 repo? On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to the very high

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
s/Josh/Nick/ - sorry! On Mon, Dec 22, 2014 at 10:52 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Josh, We don't explicitly track contributions to spark-ec2 in the Apache Spark release notes. The main reason is that usually updates to spark-ec2 include a corresponding update to spark so

Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark

Re: Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
to different commits in https://github.com/apache/spark/releases Best Regards, Shixiong Zhu 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com: I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest

Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark

[jira] [Created] (SPARK-4892) java.io.FileNotFound exceptions when creating EXTERNAL hive tables

2014-12-18 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-4892: -- Summary: java.io.FileNotFound exceptions when creating EXTERNAL hive tables Key: SPARK-4892 URL: https://issues.apache.org/jira/browse/SPARK-4892 Project: Spark

Re: Which committers care about Kafka?

2014-12-18 Thread Patrick Wendell
Hey Cody, Thanks for reaching out with this. The lead on streaming is TD - he is traveling this week though so I can respond a bit. To the high level point of whether Kafka is important - it definitely is. Something like 80% of Spark Streaming deployments (anecdotally) ingest data from Kafka.

Re: [RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-18 Thread Patrick Wendell
Update: An Apache infrastructure issue prevented me from pushing this last night. The issue was resolved today and I should be able to push the final release artifacts tonight. On Tue, Dec 16, 2014 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote: This vote has PASSED with 12 +1 votes (8

Re: spark streaming kafa best practices ?

2014-12-17 Thread Patrick Wendell
, 2014 at 12:57 AM, Patrick Wendell pwend...@gmail.com wrote: The second choice is better. Once you call collect() you are pulling all of the data onto a single node, you want to do most of the processing in parallel on the cluster, which is what map() will do. Ideally you'd try to summarize

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
This vote has PASSED with 12 +1 votes (8 binding) and no 0 or -1 votes: +1: Matei Zaharia* Madhu Siddalingaiah Reynold Xin* Sandy Ryza Josh Rozen* Mark Hamstra* Denny Lee Tom Graves* GuiQiang Li Nick Pentreath* Sean McNamara* Patrick Wendell* 0: -1: I'll finalize and package this release

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p

[ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-16 Thread Patrick Wendell
Hey All, Due to the very high volume of contributions, we're switching to an automated process for generating release credits. This process relies on JIRA for categorizing contributions, so it's not possible for us to provide credits in the case where users submit pull requests with no associated

[jira] [Commented] (SPARK-4837) NettyBlockTransferService does not abide by spark.blockManager.port config option

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246996#comment-14246996 ] Patrick Wendell commented on SPARK-4837: Hey [~aash] because there is a work

[jira] [Updated] (SPARK-4837) NettyBlockTransferService does not abide by spark.blockManager.port config option

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4837: --- Target Version/s: 1.2.1 NettyBlockTransferService does not abide by spark.blockManager.port

[jira] [Commented] (SPARK-4826) Possible flaky tests in WriteAheadLogBackedBlockRDDSuite: java.lang.IllegalStateException: File exists and there is no append support!

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247043#comment-14247043 ] Patrick Wendell commented on SPARK-4826: I pushed a hotfix disabling these tests

[jira] [Commented] (SPARK-4810) Failed to run collect

2014-12-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247081#comment-14247081 ] Patrick Wendell commented on SPARK-4810: Actually can I suggest we move

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
/apache/spark/pull/3701 We might be close to fixing this via one of those PRs, so maybe we should try using one of those instead? On December 15, 2014 at 10:51:46 AM, Patrick Wendell (pwend...@gmail.com) wrote: Hey All, It appears that a single test suite is failing after the jenkins upgrade

Re: zinc invocation examples

2014-12-12 Thread Patrick Wendell
) and would be great to get your initial read on it. Per this thread I need to add in the -scala-home call to zinc, but its close to ready for a PR. On 12/5/14, 2:10 PM, Patrick Wendell pwend...@gmail.com wrote: One thing I created a JIRA for a while back was to have a similar script to sbt/sbt

Re: Spark Server - How to implement

2014-12-12 Thread Patrick Wendell
Hey Manoj, One proposal potentially of interest is the Spark Kernel project from IBM - you should look for their. The interface in that project is more of a remote REPL interface, i.e. you submit commands (as strings) and get back results (as strings), but you don't have direct programmatic

[jira] [Resolved] (SPARK-4807) Add support for hadoop-2.5 + bump jets3t version

2014-12-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4807. Resolution: Not a Problem Closing this in agreement with [~srowen]'s comment. Add support

[jira] [Created] (SPARK-4820) Spark build encounters File name too long on some encrypted filesystems

2014-12-10 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-4820: -- Summary: Spark build encounters File name too long on some encrypted filesystems Key: SPARK-4820 URL: https://issues.apache.org/jira/browse/SPARK-4820 Project

[jira] [Updated] (SPARK-4820) Spark build encounters File name too long on some encrypted filesystems

2014-12-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4820: --- Description: This was reported by Luchesar Cekov on github along with a proposed fix

[jira] [Closed] (SPARK-4633) Support gzip in spark.compression.io.codec

2014-12-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell closed SPARK-4633. -- Resolution: Won't Fix I'd like to close this issue for now until we get a better understanding

[jira] [Resolved] (SPARK-3526) Docs section on data locality

2014-12-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3526. Resolution: Fixed Fix Version/s: 1.2.0 Thanks [~aash] for contributing. Docs

[jira] [Commented] (SPARK-4687) SparkContext#addFile doesn't keep file folder information

2014-12-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241974#comment-14241974 ] Patrick Wendell commented on SPARK-4687: I commented a bit on the JIRA after

<    9   10   11   12   13   14   15   16   17   18   >