[jira] [Resolved] (SPARK-4796) Spark does not remove temp files

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4796. -- Resolution: Duplicate These are the same issue; not sure which way to resolve this. I am not clear

[jira] [Resolved] (SPARK-5811) Documentation for --packages and --repositories on Spark Shell

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5811. Resolution: Fixed Assignee: Burak Yavuz Documentation for --packages and

[jira] [Resolved] (SPARK-5875) logical.Project should not be resolved if it contains aggregates or generators

2015-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5875. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4663

[jira] [Created] (SPARK-5876) generalize the type of categoricalFeaturesInfo to PartialFunction[Int, Int]

2015-02-17 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-5876: - Summary: generalize the type of categoricalFeaturesInfo to PartialFunction[Int, Int] Key: SPARK-5876 URL: https://issues.apache.org/jira/browse/SPARK-5876 Project:

[jira] [Reopened] (SPARK-4299) In spark-submit, the driver-memory value is used for the SPARK_SUBMIT_DRIVER_MEMORY value

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-4299: -- Wait a sec. I am not clear it's resolved on testing with 1.3.0. But, it is a duplicate of SPARK-3884 In

[jira] [Commented] (SPARK-925) Allow ec2 scripts to load default options from a json file

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325317#comment-14325317 ] Nicholas Chammas commented on SPARK-925: I would prefer a format that is more human

[jira] [Resolved] (SPARK-5723) Change the default file format to Parquet for CTAS statements.

2015-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5723. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4639

[jira] [Updated] (SPARK-5875) logical.Project should not be resolved if it contains aggregates or generators

2015-02-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-5875: Description: To reproduce... {code} val rdd = sc.parallelize((1 to 10).map(i = s{a:$i, b:str${i}}))

[jira] [Commented] (SPARK-5570) No docs stating that `new SparkConf().set(spark.driver.memory, ...) will not work

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325155#comment-14325155 ] Apache Spark commented on SPARK-5570: - User 'ilganeli' has created a pull request for

[jira] [Updated] (SPARK-4449) specify port range in spark

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4449: - Priority: Minor (was: Major) Target Version/s: (was: 1.2.0) Issue Type:

[jira] [Updated] (SPARK-3674) Add support for launching YARN clusters in spark-ec2

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-3674: - Issue Type: Improvement (was: Bug) Add support for launching YARN clusters in spark-ec2

[jira] [Created] (SPARK-5877) Scheduler delay is incorrect for running tasks

2015-02-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-5877: - Summary: Scheduler delay is incorrect for running tasks Key: SPARK-5877 URL: https://issues.apache.org/jira/browse/SPARK-5877 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Labels: backport-needed (was: ) Race condition in DAGScheduler

[jira] [Reopened] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-4454: Actually, re-opening this since we need to back port it. Race condition in DAGScheduler

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Target Version/s: 1.3.0, 1.2.2 (was: 1.3.0) Race condition in DAGScheduler

[jira] [Updated] (SPARK-5821) JSONRelation should check if delete is successful for the overwrite operation.

2015-02-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-5821: Priority: Major (was: Blocker) JSONRelation should check if delete is successful for the overwrite

[jira] [Commented] (SPARK-925) Allow ec2 scripts to load default options from a json file

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325334#comment-14325334 ] Nicholas Chammas commented on SPARK-925: Here's an example of what a spark-ec2

[jira] [Resolved] (SPARK-5785) Pyspark does not support narrow dependencies

2015-02-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-5785. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4629

[jira] [Updated] (SPARK-5785) Pyspark does not support narrow dependencies

2015-02-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-5785: -- Assignee: Davies Liu Pyspark does not support narrow dependencies

[jira] [Created] (SPARK-5880) Change log level of batch pruning string in InMemoryColumnarTableScan from Info to Debug

2015-02-17 Thread Nitin Goyal (JIRA)
Nitin Goyal created SPARK-5880: -- Summary: Change log level of batch pruning string in InMemoryColumnarTableScan from Info to Debug Key: SPARK-5880 URL: https://issues.apache.org/jira/browse/SPARK-5880

[jira] [Commented] (SPARK-5722) Infer_schema_type incorrect for Integers in pyspark

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325209#comment-14325209 ] Apache Spark commented on SPARK-5722: - User 'davies' has created a pull request for

[jira] [Updated] (SPARK-5875) logical.Project should not be resolved if it contains aggregates or generators

2015-02-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-5875: Description: {code} val rdd = sc.parallelize((1 to 10).map(i = s{a:$i, b:str${i}}))

[jira] [Created] (SPARK-5879) spary_ec2.py should expose/return master and slave lists (e.g. write to file)

2015-02-17 Thread Florian Verhein (JIRA)
Florian Verhein created SPARK-5879: -- Summary: spary_ec2.py should expose/return master and slave lists (e.g. write to file) Key: SPARK-5879 URL: https://issues.apache.org/jira/browse/SPARK-5879

[jira] [Commented] (SPARK-5629) Add spark-ec2 action to return info about an existing cluster

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325372#comment-14325372 ] Nicholas Chammas commented on SPARK-5629: - YAML is not part of the Python standard

[jira] [Commented] (SPARK-5875) logical.Project should not be resolved if it contains aggregates or generators

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325121#comment-14325121 ] Apache Spark commented on SPARK-5875: - User 'yhuai' has created a pull request for

[jira] [Resolved] (SPARK-5872) pyspark shell should start up with SQL/HiveContext

2015-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5872. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4659

[jira] [Commented] (SPARK-5877) Scheduler delay is incorrect for running tasks

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325193#comment-14325193 ] Sean Owen commented on SPARK-5877: -- Same as SPARK-4579? Scheduler delay is incorrect

[jira] [Commented] (SPARK-5878) Python DataFrame.repartition() is broken

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325220#comment-14325220 ] Apache Spark commented on SPARK-5878: - User 'davies' has created a pull request for

[jira] [Created] (SPARK-5878) Python DataFrame.repartition() is broken

2015-02-17 Thread Davies Liu (JIRA)
Davies Liu created SPARK-5878: - Summary: Python DataFrame.repartition() is broken Key: SPARK-5878 URL: https://issues.apache.org/jira/browse/SPARK-5878 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-5629) Add spark-ec2 action to return info about an existing cluster

2015-02-17 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325354#comment-14325354 ] Shivaram Venkataraman commented on SPARK-5629: -- This sounds fine to me and I

[jira] [Resolved] (SPARK-5852) Fail to convert a newly created empty metastore parquet table to a data source parquet table.

2015-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5852. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4655

[jira] [Commented] (SPARK-5507) Add user guide for block matrix and its operations

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325145#comment-14325145 ] Apache Spark commented on SPARK-5507: - User 'brkyvz' has created a pull request for

[jira] [Resolved] (SPARK-4368) Ceph integration?

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4368. -- Resolution: Won't Fix Ceph integration? - Key: SPARK-4368

[jira] [Updated] (SPARK-4368) Ceph integration?

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-4368: - Issue Type: Improvement (was: Bug) I don't think this ever evolved into a proposal to change Spark, and

[jira] [Resolved] (SPARK-4299) In spark-submit, the driver-memory value is used for the SPARK_SUBMIT_DRIVER_MEMORY value

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4299. -- Resolution: Not a Problem This may have been fixed along the way, but from examining related issues

[jira] [Resolved] (SPARK-4299) In spark-submit, the driver-memory value is used for the SPARK_SUBMIT_DRIVER_MEMORY value

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4299. -- Resolution: Duplicate In spark-submit, the driver-memory value is used for the

[jira] [Updated] (SPARK-5877) Scheduler delay is incorrect for running tasks

2015-02-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-5877: -- Priority: Minor (was: Major) Scheduler delay is incorrect for running tasks

[jira] [Closed] (SPARK-5877) Scheduler delay is incorrect for running tasks

2015-02-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout closed SPARK-5877. - Resolution: Duplicate Scheduler delay is incorrect for running tasks

[jira] [Commented] (SPARK-4579) Scheduling Delay appears negative

2015-02-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325202#comment-14325202 ] Kay Ousterhout commented on SPARK-4579: --- This is only for running tasks, I'm

[jira] [Resolved] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4454. Resolution: Fixed Fix Version/s: 1.3.0 We can't be 100% sure this is fixed because

[jira] [Commented] (SPARK-5629) Add spark-ec2 action to return info about an existing cluster

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325346#comment-14325346 ] Nicholas Chammas commented on SPARK-5629: - For example, you run: {code} $

[jira] [Comment Edited] (SPARK-4912) Persistent data source tables

2015-02-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325388#comment-14325388 ] Kay Ousterhout edited comment on SPARK-4912 at 2/18/15 4:23 AM:

[jira] [Created] (SPARK-5881) RDD remains cached after the table gets overridden by CACHE TABLE

2015-02-17 Thread Yin Huai (JIRA)
Yin Huai created SPARK-5881: --- Summary: RDD remains cached after the table gets overridden by CACHE TABLE Key: SPARK-5881 URL: https://issues.apache.org/jira/browse/SPARK-5881 Project: Spark Issue

[jira] [Commented] (SPARK-4912) Persistent data source tables

2015-02-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325388#comment-14325388 ] Kay Ousterhout commented on SPARK-4912: --- Is it possible to backport this to 1.2? It

[jira] [Commented] (SPARK-4903) RDD remains cached after DROP TABLE

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325434#comment-14325434 ] Apache Spark commented on SPARK-4903: - User 'yhuai' has created a pull request for

[jira] [Comment Edited] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7

2015-02-17 Thread saravanan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325533#comment-14325533 ] saravanan edited comment on SPARK-5389 at 2/18/15 7:31 AM: --- I

[jira] [Comment Edited] (SPARK-4912) Persistent data source tables

2015-02-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325388#comment-14325388 ] Kay Ousterhout edited comment on SPARK-4912 at 2/18/15 4:29 AM:

[jira] [Commented] (SPARK-5880) Change log level of batch pruning string in InMemoryColumnarTableScan from Info to Debug

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325406#comment-14325406 ] Apache Spark commented on SPARK-5880: - User 'nitin2goyal' has created a pull request

[jira] [Commented] (SPARK-4912) Persistent data source tables

2015-02-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325426#comment-14325426 ] Yin Huai commented on SPARK-4912: - [~kayousterhout] Seems the master still has the first

[jira] [Commented] (SPARK-4912) Persistent data source tables

2015-02-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325433#comment-14325433 ] Yin Huai commented on SPARK-4912: - The backport for the second issue is at

[jira] [Resolved] (SPARK-5731) Flaky Test: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.basic stream receiving with multiple topics and smallest starting offset

2015-02-17 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-5731. -- Resolution: Fixed Fix Version/s: 1.3.0 Flaky Test:

[jira] [Commented] (SPARK-5389) spark-shell.cmd does not run from DOS Windows 7

2015-02-17 Thread saravanan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325533#comment-14325533 ] saravanan commented on SPARK-5389: -- I got the same issue in windows 7 and i set the PATH

[jira] [Commented] (SPARK-4879) Missing output partitions after job completes with speculative execution

2015-02-17 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324418#comment-14324418 ] Andrew Ash commented on SPARK-4879: --- [~romi-totango] what filesystem are you writing to?

[jira] [Commented] (SPARK-5480) GraphX pageRank: java.lang.ArrayIndexOutOfBoundsException:

2015-02-17 Thread Stephane Maarek (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324349#comment-14324349 ] Stephane Maarek commented on SPARK-5480: Hi Sean, We have included the following

[jira] [Closed] (SPARK-5026) PySpark rdd.randomSpit() is not documented

2015-02-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastián Ramírez closed SPARK-5026. You are right. It's OK now. Thanks. PySpark rdd.randomSpit() is not documented

[jira] [Commented] (SPARK-5854) Implement Personalized PageRank with GraphX

2015-02-17 Thread Baoxu Shi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324374#comment-14324374 ] Baoxu Shi commented on SPARK-5854: -- Oh sure, there are lots of paper talking about

[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory

2015-02-17 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324466#comment-14324466 ] Marcelo Vanzin commented on SPARK-5861: --- Yeah, most probably the same issue.

[jira] [Resolved] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5861. -- Resolution: Duplicate [yarn-client mode] Application master should not use memory =

[jira] [Created] (SPARK-5863) Performance regression in Spark SQL/Parquet due to ScalaReflection.convertRowToScala

2015-02-17 Thread Cristian O (JIRA)
Cristian O created SPARK-5863: - Summary: Performance regression in Spark SQL/Parquet due to ScalaReflection.convertRowToScala Key: SPARK-5863 URL: https://issues.apache.org/jira/browse/SPARK-5863

[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324460#comment-14324460 ] Sean Owen commented on SPARK-5861: -- OK, so is it a pretty good bet that this is exactly

[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory

2015-02-17 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324446#comment-14324446 ] Marcelo Vanzin commented on SPARK-5861: --- bq. In ClientArguments.scala, cluster mode

[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324451#comment-14324451 ] Sean Owen commented on SPARK-5861: -- What about calling {{run-example}} with

[jira] [Commented] (SPARK-5861) [yarn-client mode] Application master should not use memory = spark.driver.memory

2015-02-17 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324457#comment-14324457 ] Marcelo Vanzin commented on SPARK-5861: --- {{run-example}} uses SparkSubmit.

[jira] [Updated] (SPARK-5863) Performance regression in Spark SQL/Parquet due to ScalaReflection.convertRowToScala

2015-02-17 Thread Cristian O (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristian O updated SPARK-5863: -- Description: Was doing some perf testing on reading parquet files and noticed that moving from Spark

[jira] [Commented] (SPARK-5862) Only transformUp the given plan once in HiveMetastoreCatalog

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324290#comment-14324290 ] Apache Spark commented on SPARK-5862: - User 'viirya' has created a pull request for

[jira] [Created] (SPARK-5862) Only transformUp the given plan once in HiveMetastoreCatalog

2015-02-17 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-5862: -- Summary: Only transformUp the given plan once in HiveMetastoreCatalog Key: SPARK-5862 URL: https://issues.apache.org/jira/browse/SPARK-5862 Project: Spark

[jira] [Commented] (SPARK-2579) Reading from S3 returns an inconsistent number of items with Spark 0.9.1

2015-02-17 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324285#comment-14324285 ] Romi Kuntsman commented on SPARK-2579: -- Does this still happen with Spark 1.2.1?

[jira] [Commented] (SPARK-5005) Failed to start spark-shell when using yarn-client mode with the Spark1.2.0

2015-02-17 Thread anuj (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324292#comment-14324292 ] anuj commented on SPARK-5005: - this command work in 1.1.1 but NOT in 1.2.1

[jira] [Commented] (SPARK-4879) Missing output partitions after job completes with speculative execution

2015-02-17 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324278#comment-14324278 ] Romi Kuntsman commented on SPARK-4879: -- Could this happen very very rarely when not

[jira] [Resolved] (SPARK-5688) Splits for Categorical Variables in DecisionTrees

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5688. -- Resolution: Not a Problem Closing this per Joseph's comments. Splits for Categorical Variables in

[jira] [Commented] (SPARK-5852) Fail to convert a newly created empty metastore parquet table to a data source parquet table.

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324731#comment-14324731 ] Apache Spark commented on SPARK-5852: - User 'yhuai' has created a pull request for

[jira] [Updated] (SPARK-5866) pyspark read from s3

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5866: - Priority: Major (was: Blocker) The immediate error is: {code} :

[jira] [Commented] (SPARK-5866) pyspark read from s3

2015-02-17 Thread venu k tangirala (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324741#comment-14324741 ] venu k tangirala commented on SPARK-5866: - yes, the path exists. The same thing

[jira] [Commented] (SPARK-5005) Failed to start spark-shell when using yarn-client mode with the Spark1.2.0

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324737#comment-14324737 ] Sean Owen commented on SPARK-5005: -- {code} #!/usr/bin/env bash # This file is sourced

[jira] [Commented] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324749#comment-14324749 ] Sean Owen commented on SPARK-4454: -- My fault, I misunderstood the resolution. yes please

[jira] [Resolved] (SPARK-5869) Exception when deleting Spark local dirs when shutting down DiskBlockManager

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5869. -- Resolution: Duplicate Just fixed as a follow on to SPARK-5841 Exception when deleting Spark local

[jira] [Updated] (SPARK-5867) Update spark.ml docs with DataFrame, Python examples

2015-02-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-5867: - Description: The spark.ml programming guide needs to be updated to use the new SQL

[jira] [Updated] (SPARK-5627) Enhance spark-ec2 for some programmatic use cases

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5627: Description: There are some cases where users may want to programmatically invoke

[jira] [Updated] (SPARK-5627) Enhance spark-ec2 to return machine-readable output

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5627: Summary: Enhance spark-ec2 to return machine-readable output (was: Enhance spark-ec2 for

[jira] [Updated] (SPARK-5627) Enhance spark-ec2 for some programmatic use cases

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5627: Description: There are some cases where users may want to programmatically invoke

[jira] [Created] (SPARK-5867) Update spark.ml docs with DataFrame

2015-02-17 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-5867: Summary: Update spark.ml docs with DataFrame Key: SPARK-5867 URL: https://issues.apache.org/jira/browse/SPARK-5867 Project: Spark Issue Type:

[jira] [Reopened] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-4454: --- Assignee: Josh Rosen I'm re-opening this issue. [~srowen], we shouldn't resolve this as Won't

[jira] [Commented] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324747#comment-14324747 ] Patrick Wendell commented on SPARK-4454: [~srowen] yeah I meant the particular PR

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Priority: Critical (was: Minor) Race condition in DAGScheduler

[jira] [Commented] (SPARK-5871) Explain in python should output using python

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324819#comment-14324819 ] Apache Spark commented on SPARK-5871: - User 'davies' has created a pull request for

[jira] [Updated] (SPARK-5629) Add spark-ec2 action to return info about an existing cluster

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5629: Description: You can launch multiple clusters using spark-ec2. At some point, you might

[jira] [Issue Comment Deleted] (SPARK-4544) Spark JVM Metrics doesn't have context.

2015-02-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4544: -- Comment: was deleted (was: User 'JoshRosen' has created a pull request for this issue:

[jira] [Updated] (SPARK-5541) Allow running Maven or SBT in run-tests

2015-02-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5541: - Issue Type: Improvement (was: Bug) Allow running Maven or SBT in run-tests

[jira] [Commented] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324858#comment-14324858 ] Apache Spark commented on SPARK-4454: - User 'JoshRosen' has created a pull request for

[jira] [Updated] (SPARK-5865) Add doc warnings for methods that return local data structures

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5865: Description: We should include a note in the doc string for any method that collects an RDD

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4454: --- Target Version/s: 1.3.0 Race condition in DAGScheduler --

[jira] [Created] (SPARK-5870) GradientBoostedTrees should cache residuals from partial model

2015-02-17 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-5870: Summary: GradientBoostedTrees should cache residuals from partial model Key: SPARK-5870 URL: https://issues.apache.org/jira/browse/SPARK-5870 Project: Spark

[jira] [Commented] (SPARK-5868) Python UDFs broken by analysis check in HiveContext

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324779#comment-14324779 ] Apache Spark commented on SPARK-5868: - User 'marmbrus' has created a pull request for

[jira] [Commented] (SPARK-5864) support .jar as python package

2015-02-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324787#comment-14324787 ] Patrick Wendell commented on SPARK-5864: I merged davies PR, but per Burak's

[jira] [Updated] (SPARK-5629) Add spark-ec2 action to return info about an existing cluster

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5629: Description: You can launch multiple clusters using spark-ec2. At some point, you might

[jira] [Resolved] (SPARK-5862) Only transformUp the given plan once in HiveMetastoreCatalog

2015-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-5862. - Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4651

[jira] [Commented] (SPARK-5872) pyspark shell should start up with SQL/HiveContext

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324837#comment-14324837 ] Apache Spark commented on SPARK-5872: - User 'davies' has created a pull request for

[jira] [Updated] (SPARK-5629) Add spark-ec2 action to return info about an existing cluster

2015-02-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5629: Description: You can launch multiple clusters using spark-ec2. At some point, you might

[jira] [Commented] (SPARK-4544) Spark JVM Metrics doesn't have context.

2015-02-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324842#comment-14324842 ] Apache Spark commented on SPARK-4544: - User 'JoshRosen' has created a pull request for

[jira] [Updated] (SPARK-5203) union with different decimal type report error

2015-02-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-5203: Target Version/s: 1.4.0 union with different decimal type report error

  1   2   3   >