[jira] [Commented] (SPARK-5845) Time to cleanup intermediate shuffle files not included in shuffle write time

2015-02-24 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335667#comment-14335667 ] Patrick Wendell commented on SPARK-5845: [~kayousterhout] did you mean the time

[jira] [Updated] (SPARK-3851) Support for reading parquet files with different but compatible schema

2015-02-24 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3851: --- Fix Version/s: 1.3.0 Support for reading parquet files with different but compatible schema

Re: Can you add Big Industries to the Powered by Spark page?

2015-02-24 Thread Patrick Wendell
I've added it, thanks! On Fri, Feb 20, 2015 at 12:22 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, Could you please add Big Industries to the Powered by Spark page at https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ? Company Name: Big Industries URL:

[jira] [Resolved] (SPARK-5904) DataFrame methods with varargs do not work in Java

2015-02-23 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5904. Resolution: Fixed Fix Version/s: 1.3.0 I think rxin just forgot to close

[jira] [Commented] (SPARK-5463) Fix Parquet filter push-down

2015-02-23 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333608#comment-14333608 ] Patrick Wendell commented on SPARK-5463: Bumping to critical. Per our offline

[jira] [Updated] (SPARK-5463) Fix Parquet filter push-down

2015-02-23 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5463: --- Priority: Critical (was: Blocker) Fix Parquet filter push-down

[jira] [Updated] (SPARK-3650) Triangle Count handles reverse edges incorrectly

2015-02-23 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3650: --- Priority: Critical (was: Blocker) Triangle Count handles reverse edges incorrectly

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Patrick Wendell
So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1 unless there are no known issues that I'm aware of that would actually block the release (that's what the snapshot ones are for). I'm going to clean those up and push others to do so also. The main issues I'm

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1 unless there are no known issues that I'm aware of that would actually block the release (that's what the snapshot ones are for). I'm going to clean those up and push

Re: FW: Submitting jobs to Spark EC2 cluster remotely

2015-02-23 Thread Patrick Wendell
is exactly the issue: on my master node UI at the bottom I can see the list of Completed Drivers all with ERROR state... Thanks, Oleg -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Monday, February 23, 2015 12:59 AM To: Oleg Shirokikh Cc: user

[jira] [Updated] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

2015-02-21 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5920: --- Priority: Critical (was: Major) Use a BufferedInputStream to read local shuffle data

[jira] [Updated] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

2015-02-21 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5920: --- Priority: Blocker (was: Critical) Use a BufferedInputStream to read local shuffle data

[jira] [Commented] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

2015-02-21 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331986#comment-14331986 ] Patrick Wendell commented on SPARK-5920: We should definitely do this. Use

[jira] [Commented] (SPARK-5916) $SPARK_HOME/bin/beeline conflicts with $HIVE_HOME/bin/beeline

2015-02-21 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332001#comment-14332001 ] Patrick Wendell commented on SPARK-5916: The naming conflict is unfortunate

[jira] [Resolved] (SPARK-5887) Class not found exception com.datastax.spark.connector.rdd.partitioner.CassandraPartition

2015-02-19 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5887. Resolution: Invalid The Datastax connector is not part of the Apache Spark distribution

[jira] [Updated] (SPARK-5863) Performance regression in Spark SQL/Parquet due to ScalaReflection.convertRowToScala

2015-02-19 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5863: --- Priority: Critical (was: Major) Performance regression in Spark SQL/Parquet due

Merging code into branch 1.3

2015-02-18 Thread Patrick Wendell
Hey Committers, Now that Spark 1.3 rc1 is cut, please restrict branch-1.3 merges to the following: 1. Fixes for issues blocking the 1.3 release (i.e. 1.2.X regressions) 2. Documentation and tests. 3. Fixes for non-blocker issues that are surgical, low-risk, and/or outside of the core. If there

[VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2 The release files, including signatures, digests, etc.

[jira] [Resolved] (SPARK-5864) support .jar as python package

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5864. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Davies Liu support .jar

[jira] [Resolved] (SPARK-5850) Remove experimental label for Scala 2.11 and FlumePollingStream

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5850. Resolution: Fixed Fix Version/s: 1.3.0 Remove experimental label for Scala 2.11

[jira] [Resolved] (SPARK-5856) In Maven build script, launch Zinc with more memory

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5856. Resolution: Fixed Fix Version/s: 1.3.0 In Maven build script, launch Zinc with more

[jira] [Updated] (SPARK-4579) Scheduling Delay appears negative

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4579: --- Assignee: Andrew Or Scheduling Delay appears negative

[jira] [Commented] (SPARK-4579) Scheduling Delay appears negative

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325622#comment-14325622 ] Patrick Wendell commented on SPARK-4579: [~andrewor14] Can you take a look

[jira] [Updated] (SPARK-4579) Scheduling Delay appears negative

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4579: --- Labels: (was: starter) Scheduling Delay appears negative

[jira] [Updated] (SPARK-4579) Scheduling Delay appears negative

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4579: --- Priority: Critical (was: Minor) Scheduling Delay appears negative

[jira] [Updated] (SPARK-4579) Scheduling Delay appears negative

2015-02-18 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4579: --- Labels: starter (was: ) Scheduling Delay appears negative

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Patrick Wendell
UISeleniumSuite: *** RUN ABORTED *** java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal ... This is a newer test suite. There is something flaky about it, we should definitely fix it, IMO it's not a blocker though. Patrick this link gives a 404:

Re: [Performance] Possible regression in rdd.take()?

2015-02-18 Thread Patrick Wendell
I believe the heuristic governing the way that take() decides to fetch partitions changed between these versions. It could be that in certain cases the new heuristic is worse, but it might be good to just look at the source code and see, for your number of elements taken and number of partitions,

[jira] [Resolved] (SPARK-5811) Documentation for --packages and --repositories on Spark Shell

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5811. Resolution: Fixed Assignee: Burak Yavuz Documentation for --packages

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Labels: backport-needed (was: ) Race condition in DAGScheduler

[jira] [Reopened] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-4454: Actually, re-opening this since we need to back port it. Race condition in DAGScheduler

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Target Version/s: 1.3.0, 1.2.2 (was: 1.3.0) Race condition in DAGScheduler

Re: Replacing Jetty with TomCat

2015-02-17 Thread Patrick Wendell
Hey Niranda, It seems to me a lot of effort to support multiple libraries inside of Spark like this, so I'm not sure that's a great solution. If you are building an application that embeds Spark, is it not possible for you to continue to use Jetty for Spark's internal servers and use tomcat for

[jira] [Resolved] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4454. Resolution: Fixed Fix Version/s: 1.3.0 We can't be 100% sure this is fixed because

[jira] [Commented] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324747#comment-14324747 ] Patrick Wendell commented on SPARK-4454: [~srowen] yeah I meant the particular PR

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Priority: Critical (was: Minor) Race condition in DAGScheduler

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Target Version/s: 1.3.0 Race condition in DAGScheduler

[jira] [Commented] (SPARK-5864) support .jar as python package

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324787#comment-14324787 ] Patrick Wendell commented on SPARK-5864: I merged davies PR, but per Burak's

[jira] [Resolved] (SPARK-5778) Throw if nonexistent spark.metrics.conf file is provided

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5778. Resolution: Fixed Fix Version/s: 1.3.0 Throw if nonexistent spark.metrics.conf file

[jira] [Updated] (SPARK-5778) Throw if nonexistent spark.metrics.conf file is provided

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5778: --- Assignee: Ryan Williams Throw if nonexistent spark.metrics.conf file is provided

[jira] [Updated] (SPARK-5848) ConsoleProgressBar timer thread leaks SparkContext

2015-02-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5848: --- Component/s: (was: Web UI) Spark Shell ConsoleProgressBar timer thread

[jira] [Updated] (SPARK-5846) Spark SQL does not correctly set job description and scheduler pool

2015-02-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5846: --- Priority: Critical (was: Major) Spark SQL does not correctly set job description

[jira] [Updated] (SPARK-5850) Remove experimental label for Scala 2.11 and FlumePollingStream

2015-02-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5850: --- Summary: Remove experimental label for Scala 2.11 and FlumePollingStream (was: Clean up

[jira] [Created] (SPARK-5850) Clean up experimental label for Scala 2.11 and FlumePollingStream

2015-02-16 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5850: -- Summary: Clean up experimental label for Scala 2.11 and FlumePollingStream Key: SPARK-5850 URL: https://issues.apache.org/jira/browse/SPARK-5850 Project: Spark

[jira] [Created] (SPARK-5856) In Maven build script, launch Zinc with more memory

2015-02-16 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5856: -- Summary: In Maven build script, launch Zinc with more memory Key: SPARK-5856 URL: https://issues.apache.org/jira/browse/SPARK-5856 Project: Spark Issue

[jira] [Updated] (SPARK-5081) Shuffle write increases

2015-02-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5081: --- Priority: Critical (was: Major) Shuffle write increases

[jira] [Updated] (SPARK-5850) Remove experimental label for Scala 2.11 and FlumePollingStream

2015-02-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5850: --- Priority: Blocker (was: Major) Remove experimental label for Scala 2.11

[jira] [Commented] (SPARK-5081) Shuffle write increases

2015-02-16 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323629#comment-14323629 ] Patrick Wendell commented on SPARK-5081: Hey [~cb_betz], can you verify a few

[jira] [Commented] (SPARK-5745) Allow to use custom TaskMetrics implementation

2015-02-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322091#comment-14322091 ] Patrick Wendell commented on SPARK-5745: Hey [~jlewandowski] - TaskMetrics

[jira] [Updated] (SPARK-5826) JavaStreamingContext.fileStream cause Configuration NotSerializableException

2015-02-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5826: --- Priority: Critical (was: Minor) JavaStreamingContext.fileStream cause Configuration

[jira] [Commented] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader

2015-02-14 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321269#comment-14321269 ] Patrick Wendell commented on SPARK-5770: If the request is to support hot

[jira] [Updated] (SPARK-5801) Shuffle creates too many nested directories

2015-02-14 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5801: --- Priority: Critical (was: Major) Shuffle creates too many nested directories

[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK

2015-02-14 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321267#comment-14321267 ] Patrick Wendell commented on SPARK-5813: Hey [~florianverhein]. Just wondering

[jira] [Updated] (SPARK-5801) Shuffle creates too many nested directories

2015-02-14 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5801: --- Component/s: Spark Core Shuffle creates too many nested directories

[jira] [Updated] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5731: --- Priority: Blocker (was: Major) Flaky Test

[jira] [Commented] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320739#comment-14320739 ] Patrick Wendell commented on SPARK-5731: [~c...@koeninger.org] [~tdas] FYI we've

[jira] [Updated] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5731: --- Labels: flaky-test (was: ) Flaky Test

[jira] [Resolved] (SPARK-5679) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method

2015-02-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5679. Resolution: Fixed Fix Version/s: 1.2.2 1.3.0 Assignee

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread Patrick Wendell
The map will start with a capacity of 64, but will grow to accommodate new data. Are you using the groupBy operator in Spark or are you using Spark SQL's group by? This usually happens if you are grouping or aggregating in a way that doesn't sufficiently condense the data created from each input

Re: driver fail-over in Spark streaming 1.2.0

2015-02-12 Thread Patrick Wendell
It will create and connect to new executors. The executors are mostly stateless, so the program can resume with new executors. On Wed, Feb 11, 2015 at 11:24 PM, lin kurtt@gmail.com wrote: Hi, all In Spark Streaming 1.2.0, when the driver fails and a new driver starts with the most updated

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread Patrick Wendell
The map will start with a capacity of 64, but will grow to accommodate new data. Are you using the groupBy operator in Spark or are you using Spark SQL's group by? This usually happens if you are grouping or aggregating in a way that doesn't sufficiently condense the data created from each input

Re: How to track issues that must wait for Spark 2.x in JIRA?

2015-02-12 Thread Patrick Wendell
Yeah my preferred is also having a more open ended 2+ for issues that are clearly desirable but blocked by compatibility concerns. What I would really want to avoid is major feature proposals sitting around in our JIRA and tagged under some 2.X version. IMO JIRA isn't the place for thoughts about

[ANNOUNCE] Spark 1.3.0 Snapshot 1

2015-02-11 Thread Patrick Wendell
Hey All, I've posted Spark 1.3.0 snapshot 1. At this point the 1.3 branch is ready for community testing and we are strictly merging fixes and documentation across all components. The release files, including signatures, digests, etc can be found at:

[jira] [Updated] (SPARK-5606) Support plus sign in HiveContext

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5606: --- Assignee: Yadong Qi Support plus sign in HiveContext

Re: feeding DataFrames into predictive algorithms

2015-02-11 Thread Patrick Wendell
I think there is a minor error here in that the first example needs a tail after the seq: df.map { row = (row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double])) }.toDataFrame(label, features) On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust mich...@databricks.com wrote: It sounds like

[jira] [Updated] (SPARK-5656) NegativeArraySizeException in EigenValueDecomposition.symmetricEigs for large n and/or large k

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5656: --- Assignee: Mark Bittmann NegativeArraySizeException in EigenValueDecomposition.symmetricEigs

[jira] [Updated] (SPARK-5366) check for mode of private key file

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5366: --- Assignee: liu chang check for mode of private key file

[jira] [Updated] (SPARK-5611) Allow spark-ec2 repo to be specified in CLI of spark_ec2.py

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5611: --- Assignee: Florian Verhein Allow spark-ec2 repo to be specified in CLI of spark_ec2.py

[jira] [Updated] (SPARK-5648) support alter ... unset tblproperties(key)

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5648: --- Assignee: DoingDone9 support alter ... unset tblproperties(key

[jira] [Updated] (SPARK-5568) Python API for the write support of the data source API

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5568: --- Assignee: Yin Huai Python API for the write support of the data source API

[jira] [Updated] (SPARK-5733) Error Link in Pagination of HistroyPage when showing Incomplete Applications

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5733: --- Assignee: Liangliang Gu Error Link in Pagination of HistroyPage when showing Incomplete

[jira] [Updated] (SPARK-5658) Finalize DDL and write support APIs

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5658: --- Assignee: Yin Huai Finalize DDL and write support APIs

[jira] [Updated] (SPARK-5704) createDataFrame replace applySchema/inferSchema

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5704: --- Assignee: Davies Liu createDataFrame replace applySchema/inferSchema

[jira] [Updated] (SPARK-5709) Add EXPLAIN support for DataFrame API for debugging purpose

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5709: --- Assignee: Cheng Hao Add EXPLAIN support for DataFrame API for debugging purpose

[jira] [Updated] (SPARK-5683) Improve the json serialization for DataFrame API

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5683: --- Assignee: Cheng Hao Improve the json serialization for DataFrame API

[jira] [Updated] (SPARK-5509) EqualTo operator doesn't handle binary type properly

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5509: --- Assignee: Cheng Lian EqualTo operator doesn't handle binary type properly

[jira] [Updated] (SPARK-5135) Add support for describe [extended] table to DDL in SQLContext

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5135: --- Assignee: Li Sheng Add support for describe [extended] table to DDL in SQLContext

[jira] [Updated] (SPARK-5380) There will be an ArrayIndexOutOfBoundsException if the format of the source file is wrong

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5380: --- Assignee: Leo_lh There will be an ArrayIndexOutOfBoundsException if the format of the source

[jira] [Updated] (SPARK-5528) Support schema merging while reading Parquet files

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5528: --- Assignee: Cheng Lian Support schema merging while reading Parquet files

[jira] [Updated] (SPARK-5640) org.apache.spark.sql.catalyst.ScalaReflection is not thread safe

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5640: --- Assignee: Tobias Schlatter org.apache.spark.sql.catalyst.ScalaReflection is not thread safe

[jira] [Updated] (SPARK-5619) Support 'show roles' in HiveContext

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5619: --- Assignee: Yadong Qi Support 'show roles' in HiveContext

[jira] [Updated] (SPARK-5686) Support `show current roles`

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5686: --- Assignee: Li Sheng Support `show current roles

[jira] [Updated] (SPARK-5668) spark_ec2.py region parameter could be either mandatory or its value displayed

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5668: --- Assignee: Miguel Peralvo spark_ec2.py region parameter could be either mandatory or its

[jira] [Updated] (SPARK-5716) Support TOK_CHARSETLITERAL in HiveQl

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5716: --- Assignee: Adrian Wang Support TOK_CHARSETLITERAL in HiveQl

[jira] [Updated] (SPARK-5667) Remove version from spark-ec2 example.

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5667: --- Assignee: Miguel Peralvo Remove version from spark-ec2 example

[jira] [Updated] (SPARK-5595) In memory data cache should be invalidated after insert into/overwrite

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5595: --- Assignee: Yin Huai In memory data cache should be invalidated after insert into/overwrite

[jira] [Updated] (SPARK-5278) check ambiguous reference to fields in Spark SQL is incompleted

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5278: --- Assignee: Wenchen Fan check ambiguous reference to fields in Spark SQL is incompleted

[jira] [Updated] (SPARK-5324) Results of describe can't be queried

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5324: --- Assignee: Li Sheng Results of describe can't be queried

[jira] [Updated] (SPARK-5603) Preinsert casting and renaming rule is needed in the Analyzer

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5603: --- Assignee: Yin Huai Preinsert casting and renaming rule is needed in the Analyzer

[jira] [Updated] (SPARK-5650) Optional 'FROM' clause in HiveQl

2015-02-11 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5650: --- Assignee: Liang-Chi Hsieh Optional 'FROM' clause in HiveQl

[jira] [Updated] (SPARK-5679) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5679: --- Priority: Major (was: Blocker) Flaky tests in InputOutputMetricsSuite: input metrics

[jira] [Created] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-10 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5731: -- Summary: Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset Key: SPARK-5731 URL

[jira] [Updated] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5731: --- Affects Version/s: 1.3.0 Flaky Test

[jira] [Updated] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5731: --- Component/s: Tests Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic

[jira] [Updated] (SPARK-5493) Support proxy users under kerberos

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5493: --- Assignee: Marcelo Vanzin Support proxy users under kerberos

[jira] [Resolved] (SPARK-5493) Support proxy users under kerberos

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5493. Resolution: Fixed Fix Version/s: 1.3.0 Target Version/s: 1.3.0 Support

Re: Powered by Spark: Concur

2015-02-10 Thread Patrick Wendell
Thanks Paolo - I've fixed it. On Mon, Feb 9, 2015 at 11:10 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi, I checked the powered by wiki too and Agile Labs should be Agile Lab. The link is wrong too, it should be www.agilelab.it. The description is correct. Thanks a lot Paolo

[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314821#comment-14314821 ] Patrick Wendell commented on SPARK-5613: I have cherry picked it into the 1.3

[jira] [Updated] (SPARK-4382) Add locations parameter to Twitter Stream

2015-02-10 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4382: --- Component/s: Streaming Add locations parameter to Twitter Stream

[jira] [Created] (SPARK-5735) Replace uses of EasyMock with Mockito

2015-02-10 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5735: -- Summary: Replace uses of EasyMock with Mockito Key: SPARK-5735 URL: https://issues.apache.org/jira/browse/SPARK-5735 Project: Spark Issue Type

<    4   5   6   7   8   9   10   11   12   13   >