[jira] [Commented] (SPARK-2546) Configuration object thread safety issue

2014-07-17 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064625#comment-14064625 ] Andrew Ash commented on SPARK-2546: --- On the thread: Me: {quote} Reynold's recent

[jira] [Commented] (SPARK-2521) Broadcast RDD object once per TaskSet (instead of sending it for every task)

2014-07-17 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064626#comment-14064626 ] Andrew Ash commented on SPARK-2521: --- Reynold's PR:

[jira] [Issue Comment Deleted] (SPARK-2521) Broadcast RDD object once per TaskSet (instead of sending it for every task)

2014-07-17 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash updated SPARK-2521: -- Comment: was deleted (was: Reynold's PR: https://github.com/apache/spark/pull/1452) Broadcast RDD

[jira] [Commented] (SPARK-2492) KafkaReceiver minor changes to align with Kafka 0.8

2014-07-17 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064636#comment-14064636 ] Saisai Shao commented on SPARK-2492: Hi TD, I revisit the Kafka's ConsoleConsumer

[jira] [Created] (SPARK-2548) JavaRecoverableWordCount is missing

2014-07-17 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-2548: Summary: JavaRecoverableWordCount is missing Key: SPARK-2548 URL: https://issues.apache.org/jira/browse/SPARK-2548 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-2549) Functions defined inside of other functions trigger failures

2014-07-17 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-2549: -- Summary: Functions defined inside of other functions trigger failures Key: SPARK-2549 URL: https://issues.apache.org/jira/browse/SPARK-2549 Project: Spark

[jira] [Updated] (SPARK-2549) Functions defined inside of other functions trigger failures

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2549: --- Description: If we have a function declaration inside of another function, it still triggers

[jira] [Updated] (SPARK-2549) Functions defined inside of other functions trigger failures

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2549: --- Description: If we have a function declaration inside of another function, it still triggers

[jira] [Resolved] (SPARK-2412) CoalescedRDD throws exception with certain pref locs

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2412. Resolution: Fixed Fix Version/s: 1.1.0 1.0.2 Issue resolved by

[jira] [Resolved] (SPARK-2526) Simplify make-distribution.sh to just pass through Maven options

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2526. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1445

[jira] [Created] (SPARK-2550) Support regularization and intercept in pyspark's linear methods

2014-07-17 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-2550: Summary: Support regularization and intercept in pyspark's linear methods Key: SPARK-2550 URL: https://issues.apache.org/jira/browse/SPARK-2550 Project: Spark

[jira] [Updated] (SPARK-2551) Cleanup FilteringParquetRowInputFormat

2014-07-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-2551: -- Description: To workaround [PARQUET-16|https://issues.apache.org/jira/browse/PARQUET-16] and fix

[jira] [Created] (SPARK-2551) Cleanup FilteringParquetRowInputFormat

2014-07-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-2551: - Summary: Cleanup FilteringParquetRowInputFormat Key: SPARK-2551 URL: https://issues.apache.org/jira/browse/SPARK-2551 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-2552) Stabilize the computation of logistic function in pyspark

2014-07-17 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-2552: Summary: Stabilize the computation of logistic function in pyspark Key: SPARK-2552 URL: https://issues.apache.org/jira/browse/SPARK-2552 Project: Spark

[jira] [Updated] (SPARK-2552) Stabilize the computation of logistic function in pyspark

2014-07-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2552: - Description: exp(1000) throws an error in python. For logistic function, we can use either 1 / (

[jira] [Updated] (SPARK-2423) Clean up SparkSubmit for readability

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2423: --- Assignee: Andrew Or Clean up SparkSubmit for readability

[jira] [Updated] (SPARK-2552) Stabilize the computation of logistic function in pyspark

2014-07-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-2552: - Labels: Starter (was: ) Stabilize the computation of logistic function in pyspark

[jira] [Commented] (SPARK-2119) Reading Parquet InputSplits dominates query execution time when reading off S3

2014-07-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064701#comment-14064701 ] Cheng Lian commented on SPARK-2119: --- Agree. Created SPARK-2551 for removing those

[jira] [Updated] (SPARK-2476) Have sbt-assembly include runtime dependencies in jar

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2476: --- Priority: Minor (was: Major) Have sbt-assembly include runtime dependencies in jar

[jira] [Created] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2553: - Summary: CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key Key: SPARK-2553 URL: https://issues.apache.org/jira/browse/SPARK-2553 Project: Spark Issue

[jira] [Commented] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064709#comment-14064709 ] Sandy Ryza commented on SPARK-2553: --- https://github.com/apache/spark/pull/1461

[jira] [Created] (SPARK-2554) CountDistinct and SumDistinct should do partial aggregation

2014-07-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-2554: - Summary: CountDistinct and SumDistinct should do partial aggregation Key: SPARK-2554 URL: https://issues.apache.org/jira/browse/SPARK-2554 Project: Spark Issue

[jira] [Updated] (SPARK-2551) Cleanup FilteringParquetRowInputFormat

2014-07-17 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-2551: -- Issue Type: Improvement (was: Bug) Cleanup FilteringParquetRowInputFormat

[jira] [Commented] (SPARK-2492) KafkaReceiver minor changes to align with Kafka 0.8

2014-07-17 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064721#comment-14064721 ] Saisai Shao commented on SPARK-2492: Hi TD, Also I did some experiments on the

[jira] [Created] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
Zhihui created SPARK-2555: - Summary: Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode. Key: SPARK-2555 URL: https://issues.apache.org/jira/browse/SPARK-2555 Project: Spark

[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos

[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it

[jira] [Commented] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064743#comment-14064743 ] Zhihui commented on SPARK-2555: --- I submit a PR https://github.com/apache/spark/pull/1462

[jira] [Updated] (SPARK-2555) Support configuration spark.scheduler.minRegisteredExecutorsRatio in Mesos mode.

2014-07-17 Thread Zhihui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihui updated SPARK-2555: -- Description: In SPARK-1946, configuration spark.scheduler.minRegisteredExecutorsRatio was introduced, but it

[jira] [Created] (SPARK-2556) Multiple SparkContexts can coexist in one process

2014-07-17 Thread YanTang Zhai (JIRA)
YanTang Zhai created SPARK-2556: --- Summary: Multiple SparkContexts can coexist in one process Key: SPARK-2556 URL: https://issues.apache.org/jira/browse/SPARK-2556 Project: Spark Issue Type:

[jira] [Updated] (SPARK-2491) When an OOM is thrown,the executor does not stop properly.

2014-07-17 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-2491: --- Summary: When an OOM is thrown,the executor does not stop properly. (was: When an OOM is thrown,the

[jira] [Commented] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks

2014-07-17 Thread DjvuLee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064965#comment-14064965 ] DjvuLee commented on SPARK-2156: I see this fixed in the spark branch-0.9 in the github,

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064996#comment-14064996 ] Ken Carlile commented on SPARK-2282: So we've just given this a try with a 32 node

[jira] [Created] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-2557: - Summary: createTaskScheduler should be consistent between local and local-n-failures Key: SPARK-2557 URL: https://issues.apache.org/jira/browse/SPARK-2557 Project: Spark

[jira] [Commented] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065001#comment-14065001 ] Ye Xianjin commented on SPARK-2557: --- I will send a pr for this. createTaskScheduler

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065006#comment-14065006 ] Matthew Farrellee commented on SPARK-2494: -- [~davies] will you provide an example

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065025#comment-14065025 ] Ken Carlile commented on SPARK-2282: A little more info: Nodes are running Scientific

[jira] [Commented] (SPARK-2557) createTaskScheduler should be consistent between local and local-n-failures

2014-07-17 Thread Ye Xianjin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065029#comment-14065029 ] Ye Xianjin commented on SPARK-2557: --- Github pr:

[jira] [Updated] (SPARK-2523) Potential Bugs if SerDe is not the identical among partitions and table

2014-07-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-2523: Target Version/s: 1.1.0 Potential Bugs if SerDe is not the identical among partitions and table

[jira] [Commented] (SPARK-2256) pyspark: RDD.take doesn't work ... sometimes ...

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065032#comment-14065032 ] Matthew Farrellee commented on SPARK-2256: -- [~angel2014] i've tried this using a

[jira] [Commented] (SPARK-2523) Potential Bugs if SerDe is not the identical among partitions and table

2014-07-17 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065035#comment-14065035 ] Yin Huai commented on SPARK-2523: - I see. Although we are using the right SerDe to

[jira] [Commented] (SPARK-2021) External hashing in PySpark

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065040#comment-14065040 ] Matthew Farrellee commented on SPARK-2021: -- [~matei][~prashant_] what do you mean

[jira] [Commented] (SPARK-1670) PySpark Fails to Create SparkContext Due To Debugging Options in conf/java-opts

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065047#comment-14065047 ] Matthew Farrellee commented on SPARK-1670: -- SPARK-2313 is the root cause of this.

[jira] [Updated] (SPARK-2256) pyspark: RDD.take doesn't work ... sometimes ...

2014-07-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ángel Álvarez updated SPARK-2256: - Attachment: A_test.zip I've tried with different files and sizes ... but I can't figure out the

[jira] [Commented] (SPARK-1662) PySpark fails if python class is used as a data container

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065059#comment-14065059 ] Matthew Farrellee commented on SPARK-1662: -- [~nrchandan] and [~pwendell] - i

[jira] [Commented] (SPARK-2256) pyspark: RDD.take doesn't work ... sometimes ...

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065066#comment-14065066 ] Matthew Farrellee commented on SPARK-2256: -- are you using a local master, mesos,

[jira] [Created] (SPARK-2558) Mention --queue argument in YARN documentation

2014-07-17 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2558: Summary: Mention --queue argument in YARN documentation Key: SPARK-2558 URL: https://issues.apache.org/jira/browse/SPARK-2558 Project: Spark Issue Type:

[jira] [Updated] (SPARK-2558) Mention --queue argument in YARN documentation

2014-07-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2558: - Labels: Starter (was: ) Mention --queue argument in YARN documentation

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065121#comment-14065121 ] Aaron Davidson commented on SPARK-2282: --- This problem does look identical. I think I

[jira] [Commented] (SPARK-2083) Allow local task to retry after failure.

2014-07-17 Thread Bill Havanki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065143#comment-14065143 ] Bill Havanki commented on SPARK-2083: - Pull request available:

[jira] [Created] (SPARK-2559) Add A Link to Download the Application Events Log for Offline Analysis

2014-07-17 Thread Pat McDonough (JIRA)
Pat McDonough created SPARK-2559: Summary: Add A Link to Download the Application Events Log for Offline Analysis Key: SPARK-2559 URL: https://issues.apache.org/jira/browse/SPARK-2559 Project: Spark

[jira] [Commented] (SPARK-2256) pyspark: RDD.take doesn't work ... sometimes ...

2014-07-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065230#comment-14065230 ] Ángel Álvarez commented on SPARK-2256: -- I've tried using local and master spark in

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065240#comment-14065240 ] Davies Liu commented on SPARK-2494: --- This bug only happen in cluster mode, so it's can

[jira] [Commented] (SPARK-2256) pyspark: RDD.take doesn't work ... sometimes ...

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065249#comment-14065249 ] Matthew Farrellee commented on SPARK-2256: -- maybe there's an issue in the

[jira] [Commented] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations

2014-07-17 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065256#comment-14065256 ] Shivaram Venkataraman commented on SPARK-2316: -- I'd just like to add that in

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065306#comment-14065306 ] Aaron Davidson commented on SPARK-2282: --- This problem is kinda silly because we're

[jira] [Updated] (SPARK-2447) Add common solution for sending upsert actions to HBase (put, deletes, and increment)

2014-07-17 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-2447: - Assignee: Ted Malaska Add common solution for sending upsert actions to HBase (put, deletes,

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065317#comment-14065317 ] Davies Liu commented on SPARK-2494: --- The tip version already handle hash of None, but it

[jira] [Updated] (SPARK-2528) spark-ec2 security group permissions are too open

2014-07-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-2528: Description: {{spark-ec2}} configures EC2 security groups with ports [open to the world |

[jira] [Commented] (SPARK-2501) Handle stage re-submissions properly in the UI

2014-07-17 Thread Masayoshi TSUZUKI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065353#comment-14065353 ] Masayoshi TSUZUKI commented on SPARK-2501: -- Yes, this ticket covers it. I think

[jira] [Created] (SPARK-2560) Create Spark SQL syntax reference

2014-07-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-2560: --- Summary: Create Spark SQL syntax reference Key: SPARK-2560 URL: https://issues.apache.org/jira/browse/SPARK-2560 Project: Spark Issue Type:

[jira] [Updated] (SPARK-2560) Create Spark SQL syntax reference

2014-07-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-2560: Description: Does Spark SQL support {{LEN()}}? How about {{LIMIT}}? And what about {{MY

[jira] [Commented] (SPARK-2542) Exit Code Class should be renamed and placed package properly

2014-07-17 Thread Kousuke Saruta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065528#comment-14065528 ] Kousuke Saruta commented on SPARK-2542: --- PR:

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

2014-07-17 Thread Ken Carlile (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065549#comment-14065549 ] Ken Carlile commented on SPARK-2282: Awesome. I was afraid we were trying to chase

[jira] [Resolved] (SPARK-1215) Clustering: Index out of bounds error

2014-07-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1215. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1468

[jira] [Commented] (SPARK-2470) Fix PEP 8 violations

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065655#comment-14065655 ] Reynold Xin commented on SPARK-2470: That PR only covers a small fraction of the

[jira] [Commented] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-17 Thread Matthew Farrellee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065671#comment-14065671 ] Matthew Farrellee commented on SPARK-2494: -- thank you. i've confirmed this:

[jira] [Created] (SPARK-2562) Add Date datatype support to Spark SQL

2014-07-17 Thread Zongheng Yang (JIRA)
Zongheng Yang created SPARK-2562: Summary: Add Date datatype support to Spark SQL Key: SPARK-2562 URL: https://issues.apache.org/jira/browse/SPARK-2562 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-1458) Expose sc.version in PySpark

2014-07-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065693#comment-14065693 ] Nicholas Chammas commented on SPARK-1458: - Perhaps that could also be some kind of

[jira] [Updated] (SPARK-2365) Add IndexedRDD, an efficient updatable key-value store

2014-07-17 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-2365: -- Attachment: 2014-07-07-IndexedRDD-design-review.pdf Slides explaining the motivation, design, and

[jira] [Comment Edited] (SPARK-2365) Add IndexedRDD, an efficient updatable key-value store

2014-07-17 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065694#comment-14065694 ] Ankur Dave edited comment on SPARK-2365 at 7/17/14 10:31 PM: -

[jira] [Commented] (SPARK-872) Should revive offer after tasks finish in Mesos fine-grained mode

2014-07-17 Thread Timothy Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065703#comment-14065703 ] Timothy Chen commented on SPARK-872: I'm not quite understanding your statement where

[jira] [Updated] (SPARK-2454) Separate driver spark home from executor spark home

2014-07-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-2454: - Description: The driver may not always share the same directory structure as the executors. It makes

[jira] [Commented] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException

2014-07-17 Thread Timothy Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065706#comment-14065706 ] Timothy Chen commented on SPARK-1702: - The PR is merged and closed already, is this

[jira] [Commented] (SPARK-1764) EOF reached before Python server acknowledged

2014-07-17 Thread Timothy Chen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065709#comment-14065709 ] Timothy Chen commented on SPARK-1764: - I'm not sure how this is related to Mesos, is

[jira] [Created] (SPARK-2563) Make number of connection retries configurable

2014-07-17 Thread Shivaram Venkataraman (JIRA)
Shivaram Venkataraman created SPARK-2563: Summary: Make number of connection retries configurable Key: SPARK-2563 URL: https://issues.apache.org/jira/browse/SPARK-2563 Project: Spark

[jira] [Commented] (SPARK-2563) Make number of connection retries configurable

2014-07-17 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065735#comment-14065735 ] Shivaram Venkataraman commented on SPARK-2563: --

[jira] [Commented] (SPARK-2491) When an OOM is thrown,the executor does not stop properly.

2014-07-17 Thread Kousuke Saruta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065757#comment-14065757 ] Kousuke Saruta commented on SPARK-2491: --- Hi [~gq] I found the issue related to you

[jira] [Resolved] (SPARK-2534) Avoid pulling in the entire RDD or PairRDDFunctions in various operators

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2534. Resolution: Fixed Fix Version/s: 1.0.2 1.1.0 Avoid pulling in the

[jira] [Created] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2564: - Summary: ShuffleReadMetrics.totalBlocksFetched is redundant Key: SPARK-2564 URL: https://issues.apache.org/jira/browse/SPARK-2564 Project: Spark Issue Type:

[jira] [Created] (SPARK-2565) Update ShuffleReadMetrics as blocks are fetched

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2565: - Summary: Update ShuffleReadMetrics as blocks are fetched Key: SPARK-2565 URL: https://issues.apache.org/jira/browse/SPARK-2565 Project: Spark Issue Type:

[jira] [Created] (SPARK-2566) Update ShuffleWriteMetrics as data is written

2014-07-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-2566: - Summary: Update ShuffleWriteMetrics as data is written Key: SPARK-2566 URL: https://issues.apache.org/jira/browse/SPARK-2566 Project: Spark Issue Type:

[jira] [Commented] (SPARK-2564) ShuffleReadMetrics.totalBlocksFetched is redundant

2014-07-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065826#comment-14065826 ] Sandy Ryza commented on SPARK-2564: --- https://github.com/apache/spark/pull/1474

[jira] [Created] (SPARK-2567) Resubmitted stage sometimes remains as active stage in the web UI

2014-07-17 Thread Masayoshi TSUZUKI (JIRA)
Masayoshi TSUZUKI created SPARK-2567: Summary: Resubmitted stage sometimes remains as active stage in the web UI Key: SPARK-2567 URL: https://issues.apache.org/jira/browse/SPARK-2567 Project:

[jira] [Updated] (SPARK-2567) Resubmitted stage sometimes remains as active stage in the web UI

2014-07-17 Thread Masayoshi TSUZUKI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masayoshi TSUZUKI updated SPARK-2567: - Attachment: SPARK-2567.png Resubmitted stage sometimes remains as active stage in the

[jira] [Created] (SPARK-2568) RangePartitioner should go through the data only once

2014-07-17 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-2568: -- Summary: RangePartitioner should go through the data only once Key: SPARK-2568 URL: https://issues.apache.org/jira/browse/SPARK-2568 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-2299) Consolidate various stageIdTo* hash maps

2014-07-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-2299. Resolution: Fixed Fix Version/s: 1.1.0 Consolidate various stageIdTo* hash maps

[jira] [Created] (SPARK-2569) Customized UDFs in hive not running with Spark SQL

2014-07-17 Thread jacky hung (JIRA)
jacky hung created SPARK-2569: - Summary: Customized UDFs in hive not running with Spark SQL Key: SPARK-2569 URL: https://issues.apache.org/jira/browse/SPARK-2569 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-2570) ClassCastException from HiveFromSpark(examples)

2014-07-17 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-2570: Summary: ClassCastException from HiveFromSpark(examples) Key: SPARK-2570 URL: https://issues.apache.org/jira/browse/SPARK-2570 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-2570) ClassCastException from HiveFromSpark(examples)

2014-07-17 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065905#comment-14065905 ] Cheng Hao commented on SPARK-2570: -- https://github.com/apache/spark/pull/1475

[jira] [Created] (SPARK-2571) Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies

2014-07-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-2571: - Summary: Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies Key: SPARK-2571 URL: https://issues.apache.org/jira/browse/SPARK-2571

[jira] [Updated] (SPARK-2571) Shuffle read bytes are reported incorrectly for stages with multiple shuffle dependencies

2014-07-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-2571: -- Description: In BlockStoreShuffleFetcher, we set the shuffle metrics for a task to include

[jira] [Commented] (SPARK-1458) Expose sc.version in PySpark

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065929#comment-14065929 ] Patrick Wendell commented on SPARK-1458: Isn't it possible to just have the python

[jira] [Updated] (SPARK-2411) Standalone Master - direct users to turn on event logs

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2411: --- Assignee: Andrew Or Standalone Master - direct users to turn on event logs

[jira] [Updated] (SPARK-2411) Standalone Master - direct users to turn on event logs

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2411: --- Fix Version/s: 1.1.0 Standalone Master - direct users to turn on event logs

[jira] [Updated] (SPARK-2543) Resizable serialization buffers for kryo

2014-07-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2543: --- Assignee: Koert Kuipers Resizable serialization buffers for kryo