[jira] [Updated] (SPARK-6942) Umbrella: UI Visualizations for Core and Dataframes

2015-04-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6942: --- Component/s: Web UI Umbrella: UI Visualizations for Core and Dataframes

[jira] [Created] (SPARK-6942) Umbrella: UI Visualizations for Core and Dataframes

2015-04-15 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6942: -- Summary: Umbrella: UI Visualizations for Core and Dataframes Key: SPARK-6942 URL: https://issues.apache.org/jira/browse/SPARK-6942 Project: Spark Issue

[jira] [Updated] (SPARK-3468) WebUI Timeline-View feature

2015-04-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3468: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-6942 WebUI Timeline-View

[jira] [Created] (SPARK-6943) Graphically show RDD's included in a stage

2015-04-15 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6943: -- Summary: Graphically show RDD's included in a stage Key: SPARK-6943 URL: https://issues.apache.org/jira/browse/SPARK-6943 Project: Spark Issue Type: Sub

[jira] [Updated] (SPARK-3468) Provide timeline view in Job and Stage pages

2015-04-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3468: --- Summary: Provide timeline view in Job and Stage pages (was: WebUI Timeline-View feature

[jira] [Updated] (SPARK-3468) Provide timeline view in Job and Stage UI pages

2015-04-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3468: --- Summary: Provide timeline view in Job and Stage UI pages (was: Provide timeline view in Job

[jira] [Updated] (SPARK-6950) Spark master UI believes some applications are in progress when they are actually completed

2015-04-15 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6950: --- Component/s: Web UI Spark master UI believes some applications are in progress when

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-14 Thread Patrick Wendell
I'd like to close this vote to coincide with the 1.3.1 release, however, it would be great to have more people test this release first. I'll leave it open for a bit longer and see if others can give a +1. On Tue, Apr 14, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote: +1 from me ass

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-14 Thread Patrick Wendell
+1 from myself as well On Mon, Apr 13, 2015 at 8:35 PM, GuoQiang Li wi...@qq.com wrote: +1 (non-binding) -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Sat, Apr 11, 2015 02:05 PM To: dev@spark.apache.orgdev@spark.apache.org; Subject

[RESULT] [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-14 Thread Patrick Wendell
This vote passes with 10 +1 votes (5 binding) and no 0 or -1 votes. +1: Sean Owen* Reynold Xin* Krishna Sankar Denny Lee Mark Hamstra* Sean McNamara* Sree V Marcelo Vanzin GuoQiang Li Patrick Wendell* 0: -1: I will work on packaging this release in the next 48 hours. - Patrick

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-14 Thread Patrick Wendell
,1/14/15 SPARK-4888,Spark EC2 doesn't mount local disks for i2.8xlarge instances,,Open,1/27/15 SPARK-4879,Missing output partitions after job completes with speculative execution,Josh Rosen,Open,3/5/15 SPARK-4568,Publish release candidates under $VERSION-RCX instead of $VERSION,Patrick Wendell

[jira] [Commented] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492888#comment-14492888 ] Patrick Wendell commented on SPARK-6703: Hey [~ilganeli] - sure thing. I've pinged

[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6703: --- Assignee: Ilya Ganelin Provide a way to discover existing SparkContext's

[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6703: --- Priority: Critical (was: Major) Provide a way to discover existing SparkContext's

Re: Configuring amount of disk space available to spark executors in mesos?

2015-04-13 Thread Patrick Wendell
Hey Jonathan, Are you referring to disk space used for storing persisted RDD's? For that, Spark does not bound the amount of data persisted to disk. It's a similar story to how Spark's shuffle disk output works (and also Hadoop and other frameworks make this assumption as well for their shuffle

[jira] [Commented] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492898#comment-14492898 ] Patrick Wendell commented on SPARK-6703: /cc [~velvia] Provide a way to discover

[jira] [Comment Edited] (SPARK-6511) Publish hadoop provided build with instructions for different distros

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493183#comment-14493183 ] Patrick Wendell edited comment on SPARK-6511 at 4/13/15 10:11 PM

[jira] [Commented] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493179#comment-14493179 ] Patrick Wendell commented on SPARK-6703: Yes, ideally we get it into 1.4 - though

[jira] [Commented] (SPARK-6511) Publish hadoop provided build with instructions for different distros

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493183#comment-14493183 ] Patrick Wendell commented on SPARK-6511: Just as an example I tried to wire Spark

[jira] [Commented] (SPARK-6511) Publish hadoop provided build with instructions for different distros

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493274#comment-14493274 ] Patrick Wendell commented on SPARK-6511: Can we just run HADOOP_HOME/bin/hadoop

[jira] [Commented] (SPARK-6889) Streamline contribution process with update to Contribution wiki, JIRA rules

2015-04-13 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493254#comment-14493254 ] Patrick Wendell commented on SPARK-6889: Thanks for posting this Sean. Overall, I

[jira] [Updated] (SPARK-6199) Support CTE

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6199: --- Assignee: (was: Cheng Hao) Support CTE --- Key: SPARK-6199

[jira] [Updated] (SPARK-6858) Register Java HashMap for SparkSqlSerializer

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6858: --- Assignee: Liang-Chi Hsieh Register Java HashMap for SparkSqlSerializer

[jira] [Resolved] (SPARK-4760) ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4760. Resolution: Not A Problem ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating

[jira] [Updated] (SPARK-6611) Add support for INTEGER as synonym of INT to DDLParser

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6611: --- Assignee: Santiago M. Mola Add support for INTEGER as synonym of INT to DDLParser

[jira] [Commented] (SPARK-1529) Support setting spark.local.dirs to a hadoop FileSystem

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491863#comment-14491863 ] Patrick Wendell commented on SPARK-1529: Hey Kannan, We originally considered

[jira] [Reopened] (SPARK-4760) ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-4760: ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables

[jira] [Updated] (SPARK-6179) Support SHOW PRINCIPALS role_name;

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6179: --- Assignee: Zhongshuai Pei Support SHOW PRINCIPALS role_name

[jira] [Updated] (SPARK-6199) Support CTE

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6199: --- Assignee: Cheng Hao Support CTE --- Key: SPARK-6199

[jira] [Updated] (SPARK-6863) Formatted list broken on Hive compatibility section of SQL programming guide

2015-04-12 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6863: --- Assignee: Santiago M. Mola Formatted list broken on Hive compatibility section of SQL

Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Patrick Wendell
spark on yarn against hadoop 2.6. Tom On Wednesday, April 8, 2015 6:15 AM, Sean Owen so...@cloudera.com wrote: Still a +1 from me; same result (except that now of course the UISeleniumSuite test does not fail) On Wed, Apr 8, 2015 at 1:46 AM, Patrick Wendell pwend...@gmail.com

Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Patrick Wendell
:30 Patrick Wendell pwend...@gmail.com wrote: Hey Denny, I beleive the 2.4 bits are there. The 2.6 bits I had done specially (we haven't merge that into our upstream build script). I'll do it again now for RC2. - Patrick On Wed, Apr 8, 2015 at 1:53 PM, Timothy Chen tnac...@gmail.com wrote

[jira] [Resolved] (SPARK-6792) pySpark groupByKey returns rows with the same key

2015-04-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6792. Resolution: Not A Problem Resolving per Josh's comment. pySpark groupByKey returns rows

[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-04-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6785: --- Component/s: SQL DateUtils can not handle date before 1970/01/01 correctly

[jira] [Resolved] (SPARK-6778) SQL contexts in spark-shell and pyspark should both be called sqlContext

2015-04-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6778. Resolution: Duplicate SQL contexts in spark-shell and pyspark should both be called

[jira] [Commented] (SPARK-6399) Code compiled against 1.3.0 may not run against older Spark versions

2015-04-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486595#comment-14486595 ] Patrick Wendell commented on SPARK-6399: It would be good to document more clearly

[jira] [Updated] (SPARK-6784) Clean up all the inbound/outbound conversions for DateType

2015-04-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6784: --- Component/s: SQL Clean up all the inbound/outbound conversions for DateType

[jira] [Updated] (SPARK-6783) Add timing and test output for PR tests

2015-04-08 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6783: --- Component/s: Project Infra Add timing and test output for PR tests

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Patrick Wendell
) Ran standalone and yarn tests on the hadoop-2.6 tarball, with and without the external shuffle service in yarn mode. On Sat, Apr 4, 2015 at 5:09 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted

[RESULT] [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Patrick Wendell
to 1.3.x. - Josh Sent from my phone On Apr 7, 2015, at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Today SPARK-6737 came to my attention. This is a bug that causes a memory leak for any long running program that repeatedly saves data out to a Hadoop FileSystem

[VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-07 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted on is v1.3.1-rc2 (commit 7c4473a): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7c4473aa5a7f5de0323394aaedeefbf9738e8eb5 The list of fixes present in this release can be found

[jira] [Updated] (SPARK-6222) [STREAMING] All data may not be recovered from WAL when driver is killed

2015-04-06 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6222: --- Fix Version/s: 1.4.0 1.3.1 [STREAMING] All data may not be recovered from

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-06 Thread Patrick Wendell
,,Open,3/24/15 SPARK-5098,Number of running tasks become negative after tasks lost,,Open,1/14/15 SPARK-4925,Publish Spark SQL hive-thriftserver maven artifact,Patrick Wendell,Reopened,3/23/15 SPARK-4922,Support dynamic allocation for coarse-grained Mesos,,Open,3/31/15 SPARK-4888,Spark EC2

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
as possible. On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote: What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote: Similar

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
attempt. Trying to build as clean as possible. On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote: What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote: Similar problem on 1.2 branch: [ERROR] Failed to execute goal on project spark-core_2.11: Could not

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
The only think that can persist outside of Spark is if there is still a live Zinc process. We took care to make sure this was a generally stateless mechanism. Both the 1.2.X and 1.3.X releases are built with Scala 2.11 for packaging purposes. And these have been built as recently as in the last

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
Hmm.. Make sure you are building with the right flags. I think you need to pass -Dscala-2.11 to maven. Take a look at the upstream docs - on my phone now so can't easily access. On Apr 7, 2015 1:01 AM, mjhb sp...@mjhb.com wrote: I even deleted my local maven repository (.m2) but still stuck

[jira] [Updated] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6703: --- Description: Right now it is difficult to write a Spark application in a way that can be run

[jira] [Commented] (SPARK-6676) Add hadoop 2.4+ for profiles in POM.xml

2015-04-04 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395932#comment-14395932 ] Patrick Wendell commented on SPARK-6676: [~srowen] This is such a common source

[jira] [Created] (SPARK-6703) Provide a way to discover existing SparkContext's

2015-04-03 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6703: -- Summary: Provide a way to discover existing SparkContext's Key: SPARK-6703 URL: https://issues.apache.org/jira/browse/SPARK-6703 Project: Spark Issue

[jira] [Resolved] (SPARK-6627) Clean up of shuffle code and interfaces

2015-04-02 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6627. Resolution: Fixed Fix Version/s: 1.4.0 Clean up of shuffle code and interfaces

[jira] [Updated] (SPARK-6659) Spark SQL 1.3 cannot read json file that only with a record.

2015-04-01 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6659: --- Component/s: SQL Spark SQL 1.3 cannot read json file that only with a record

[jira] [Closed] (SPARK-6659) Spark SQL 1.3 cannot read json file that only with a record.

2015-04-01 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell closed SPARK-6659. -- Resolution: Invalid Per the comment, I think the issue is the JSON is not correctly formatted

Re: Unit test logs in Jenkins?

2015-04-01 Thread Patrick Wendell
Hey Marcelo, Great question. Right now, some of the more active developers have an account that allows them to log into this cluster to inspect logs (we copy the logs from each run to a node on that cluster). The infrastructure is maintained by the AMPLab. I will put you in touch the someone

[jira] [Created] (SPARK-6627) Clean up of shuffle code and interfaces

2015-03-31 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-6627: -- Summary: Clean up of shuffle code and interfaces Key: SPARK-6627 URL: https://issues.apache.org/jira/browse/SPARK-6627 Project: Spark Issue Type

[jira] [Commented] (SPARK-6561) Add partition support in saveAsParquet

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383413#comment-14383413 ] Patrick Wendell commented on SPARK-6561: FYI - I just removed Affects Version's

[jira] [Updated] (SPARK-6561) Add partition support in saveAsParquet

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6561: --- Affects Version/s: (was: 1.3.1) (was: 1.3.0) Add partition

Re: RDD resiliency -- does it keep state?

2015-03-27 Thread Patrick Wendell
If you invoke this, you will get at-least-once semantics on failure. For instance, if a machine dies in the middle of executing the foreach for a single partition, that will be re-executed on another machine. It could even fully complete on one machine, but the machine dies immediately before

[jira] [Updated] (SPARK-6544) Problem with Avro and Kryo Serialization

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6544: --- Fix Version/s: 1.3.1 Problem with Avro and Kryo Serialization

Re: Spark 1.3 Source - Github and source tar does not seem to match

2015-03-27 Thread Patrick Wendell
The source code should match the Spark commit 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc. Do you see any differences? On Fri, Mar 27, 2015 at 11:28 AM, Manoj Samel manojsamelt...@gmail.com wrote: While looking into a issue, I noticed that the source displayed on Github site does not matches the

[jira] [Resolved] (SPARK-4073) Parquet+Snappy can cause significant off-heap memory usage

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4073. Resolution: Won't Fix I have never seen someone else run into this, so closing

[jira] [Resolved] (SPARK-5025) Write a guide for creating well-formed packages for Spark

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5025. Resolution: Won't Fix I'm closing this as wont fix. There are now a bunch of community

[jira] [Resolved] (SPARK-1844) Support maven-style dependency resolution in sbt build

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1844. Resolution: Won't Fix Closing given the combination of (a) this is not that important

[jira] [Updated] (SPARK-2709) Add a tool for certifying Spark API compatiblity

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2709: --- Target Version/s: (was: 1.2.0) Add a tool for certifying Spark API compatiblity

[jira] [Updated] (SPARK-2709) Add a tool for certifying Spark API compatiblity

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2709: --- Priority: Critical (was: Major) Add a tool for certifying Spark API compatiblity

[jira] [Reopened] (SPARK-2709) Add a tool for certifying Spark API compatiblity

2015-03-27 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-2709: This came up in some recent conversations. I actually don't think we ever merged

[jira] [Resolved] (SPARK-6405) Spark Kryo buffer should be forced to be max. 2GB

2015-03-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6405. Resolution: Fixed Assignee: Matthew Cheah Spark Kryo buffer should be forced

[jira] [Resolved] (SPARK-6549) Spark console logger logs to stderr by default

2015-03-26 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6549. Resolution: Won't Fix I think this is a wont fix due to compatibility issues. If I'm wrong

Re: RDD.map does not allowed to preservesPartitioning?

2015-03-26 Thread Patrick Wendell
I think we have a version of mapPartitions that allows you to tell Spark the partitioning is preserved: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L639 We could also add a map function that does same. Or you can just write your map using an

[jira] [Updated] (SPARK-6499) pyspark: printSchema command on a dataframe hangs

2015-03-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6499: --- Component/s: PySpark pyspark: printSchema command on a dataframe hangs

[jira] [Updated] (SPARK-6520) Kyro serialization broken in the shell

2015-03-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6520: --- Component/s: Spark Shell Kyro serialization broken in the shell

[jira] [Commented] (SPARK-6481) Set In Progress when a PR is opened for an issue

2015-03-25 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380432#comment-14380432 ] Patrick Wendell commented on SPARK-6481: Hey All, One issue here, (I think

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
and pullreq when i have some time. On Wed, Mar 25, 2015 at 1:23 AM, Patrick Wendell pwend...@gmail.com wrote: I see - if you look, in the saving functions we have the option for the user to pass an arbitrary Configuration. https://github.com/apache/spark/blob/master/core/src/main/scala/org

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
Great - that's even easier. Maybe we could have a simple example in the doc. On Wed, Mar 25, 2015 at 7:06 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Regarding Patrick's question, you can just do new Configuration(oldConf) to get a cloned Configuration object and add any new properties to it.

Re: 1.3 Hadoop File System problem

2015-03-24 Thread Patrick Wendell
Hey Jim, Thanks for reporting this. Can you give a small end-to-end code example that reproduces it? If so, we can definitely fix it. - Patrick On Tue, Mar 24, 2015 at 4:55 PM, Jim Carroll jimfcarr...@gmail.com wrote: I have code that works under 1.2.1 but when I upgraded to 1.3.0 it fails to

Re: Any guidance on when to back port and how far?

2015-03-24 Thread Patrick Wendell
My philosophy has been basically what you suggested, Sean. One thing you didn't mention though is if a bug fix seems complicated, I will think very hard before back-porting it. This is because fixes can introduce their own new bugs, in some cases worse than the original issue. It's really bad to

Re: hadoop input/output format advanced control

2015-03-24 Thread Patrick Wendell
Yeah - to Nick's point, I think the way to do this is to pass in a custom conf when you create a Hadoop RDD (that's AFAIK why the conf field is there). Is there anything you can't do with that feature? On Tue, Mar 24, 2015 at 11:50 AM, Nick Pentreath nick.pentre...@gmail.com wrote: Imran, on

Experience using binary packages on various Hadoop distros

2015-03-24 Thread Patrick Wendell
Hey All, For a while we've published binary packages with different Hadoop client's pre-bundled. We currently have three interfaces to a Hadoop cluster (a) the HDFS client (b) the YARN client (c) the Hive client. Because (a) and (b) are supposed to be backwards compatible interfaces. My working

[jira] [Commented] (SPARK-2331) SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]

2015-03-23 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376495#comment-14376495 ] Patrick Wendell commented on SPARK-2331: By the way - [~rxin] recently pointed out

[jira] [Reopened] (SPARK-6122) Upgrade Tachyon dependency to 0.6.0

2015-03-23 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-6122: I reverted this because it looks like it was responsible for some testing failures due

Re: enum-like types in Spark

2015-03-23 Thread Patrick Wendell
If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4,

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Patrick Wendell
Hey Yiannis, If you just perform a count on each name, date pair... can it succeed? If so, can you do a count and then order by to find the largest one? I'm wondering if there is a single pathologically large group here that is somehow causing OOM. Also, to be clear, you are getting GC limit

[jira] [Updated] (SPARK-6449) Driver OOM results in reported application result SUCCESS

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6449: --- Component/s: (was: Spark Core) YARN Driver OOM results in reported

[jira] [Updated] (SPARK-6456) Spark Sql throwing exception on large partitioned data

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6456: --- Component/s: (was: Spark Core) Spark Sql throwing exception on large partitioned data

[jira] [Resolved] (SPARK-2858) Default log4j configuration no longer seems to work

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2858. Resolution: Invalid This is really old and I don't think it still an issue. I'm just

[jira] [Commented] (SPARK-5863) Improve performance of convertToScala codepath.

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375229#comment-14375229 ] Patrick Wendell commented on SPARK-5863: This seems worth potentially fixing

[jira] [Updated] (SPARK-5863) Improve performance of convertToScala codepath.

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5863: --- Target Version/s: 1.3.1, 1.4.0 (was: 1.4.0) Improve performance of convertToScala codepath

[jira] [Updated] (SPARK-4227) Document external shuffle service

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4227: --- Priority: Critical (was: Major) Document external shuffle service

[jira] [Updated] (SPARK-4227) Document external shuffle service

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4227: --- Target Version/s: 1.3.1, 1.4.0 (was: 1.3.0, 1.4.0) Document external shuffle service

[jira] [Updated] (SPARK-5863) Improve performance of convertToScala codepath.

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5863: --- Target Version/s: 1.4.0 (was: 1.3.1, 1.4.0) Improve performance of convertToScala codepath

[jira] [Updated] (SPARK-6012) Deadlock when asking for partitions from CoalescedRDD on top of a TakeOrdered operator

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6012: --- Target Version/s: 1.4.0 (was: 1.3.1, 1.4.0) Deadlock when asking for partitions from

[jira] [Updated] (SPARK-6012) Deadlock when asking for partitions from CoalescedRDD on top of a TakeOrdered operator

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6012: --- Target Version/s: 1.3.1, 1.4.0 (was: 1.4.0) Deadlock when asking for partitions from

[jira] [Commented] (SPARK-5863) Improve performance of convertToScala codepath.

2015-03-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375230#comment-14375230 ] Patrick Wendell commented on SPARK-5863: Ah actually - I see [~marmbrus

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Fix Version/s: (was: 1.2.1) (was: 1.3.0) Publish Spark SQL hive

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Priority: Critical (was: Major) Publish Spark SQL hive-thriftserver maven artifact

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Affects Version/s: (was: 1.2.0) 1.3.0 1.2.1

[jira] [Updated] (SPARK-4123) Show dependency changes in pull requests

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4123: --- Summary: Show dependency changes in pull requests (was: Show new dependencies added in pull

[jira] [Reopened] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-4925: Thanks for bringing this up. Actually - realized this wasn't fixed by some of the other work

[jira] [Updated] (SPARK-4925) Publish Spark SQL hive-thriftserver maven artifact

2015-03-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4925: --- Target Version/s: 1.3.1 Publish Spark SQL hive-thriftserver maven artifact

<    2   3   4   5   6   7   8   9   10   11   >