[jira] [Created] (SPARK-2684) Update ExternalAppendOnlyMap to take an iterator as input

2014-07-25 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2684: Summary: Update ExternalAppendOnlyMap to take an iterator as input Key: SPARK-2684 URL: https://issues.apache.org/jira/browse/SPARK-2684 Project: Spark

[jira] [Created] (SPARK-2685) Update ExternalAppendOnlyMap to avoid buffer.remove()

2014-07-25 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2685: Summary: Update ExternalAppendOnlyMap to avoid buffer.remove() Key: SPARK-2685 URL: https://issues.apache.org/jira/browse/SPARK-2685 Project: Spark Issue

[jira] [Commented] (SPARK-2689) Remove use of println in ActorHelper

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074664#comment-14074664 ] Matei Zaharia commented on SPARK-2689: -- Pull request:

[jira] [Created] (SPARK-2689) Remove use of println in ActorHelper

2014-07-25 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2689: Summary: Remove use of println in ActorHelper Key: SPARK-2689 URL: https://issues.apache.org/jira/browse/SPARK-2689 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-2689) Remove use of println in ActorHelper

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2689. -- Resolution: Fixed Fix Version/s: 1.1.0 Remove use of println in ActorHelper

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074673#comment-14074673 ] Matei Zaharia commented on SPARK-2620: -- The problem is that case class is compiled

[jira] [Resolved] (SPARK-2683) unidoc failed because org.apache.spark.util.CallSite uses Java keywords as value names

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2683. -- Resolution: Fixed Fix Version/s: 1.1.0 unidoc failed because

[jira] [Resolved] (SPARK-2682) Javadoc generated from Scala source code is not in javadoc's index

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2682. -- Resolution: Fixed Fix Version/s: 1.1.0 Javadoc generated from Scala source code is not

[jira] [Resolved] (SPARK-2125) Add sorting flag to ShuffleManager, and implement it in HashShuffleManager

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2125. -- Resolution: Fixed Fix Version/s: 1.1.0 Add sorting flag to ShuffleManager, and

[jira] [Resolved] (SPARK-1726) Tasks that fail to serialize remain in active stages forever.

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1726. -- Resolution: Fixed Fix Version/s: 1.1.0 Tasks that fail to serialize remain in active

[jira] [Commented] (SPARK-2567) Resubmitted stage sometimes remains as active stage in the web UI

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075035#comment-14075035 ] Matei Zaharia commented on SPARK-2567: -- I've merged this into 1.1 because the patch

[jira] [Resolved] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2661. -- Resolution: Fixed Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2538) External aggregation in Python

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2538: - Priority: Critical (was: Major) External aggregation in Python --

[jira] [Resolved] (SPARK-2014) Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2014. -- Resolution: Fixed Fix Version/s: 1.1.0 Make PySpark store RDDs in MEMORY_ONLY_SER with

[jira] [Created] (SPARK-2680) Lower spark.shuffle.memoryFraction to 0.2 by default

2014-07-24 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2680: Summary: Lower spark.shuffle.memoryFraction to 0.2 by default Key: SPARK-2680 URL: https://issues.apache.org/jira/browse/SPARK-2680 Project: Spark Issue

[jira] [Resolved] (SPARK-2538) External aggregation in Python

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2538. -- Resolution: Fixed Fix Version/s: (was: 1.0.1) (was: 1.0.0)

[jira] [Resolved] (SPARK-2609) Log thread ID when spilling ExternalAppendOnlyMap

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2609. -- Resolution: Fixed Log thread ID when spilling ExternalAppendOnlyMap

[jira] [Updated] (SPARK-2609) Log thread ID when spilling ExternalAppendOnlyMap

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2609: - Assignee: Andrew Or Log thread ID when spilling ExternalAppendOnlyMap

[jira] [Resolved] (SPARK-2640) In local[N], free cores of the only executor should be touched by spark.task.cpus for every finish/start-up of tasks.

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2640. -- Resolution: Fixed Fix Version/s: 1.1.0 In local[N], free cores of the only executor

[jira] [Updated] (SPARK-2640) In local[N], free cores of the only executor should be touched by spark.task.cpus for every finish/start-up of tasks.

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2640: - Priority: Minor (was: Major) In local[N], free cores of the only executor should be touched by

[jira] [Updated] (SPARK-2640) In local[N], free cores of the only executor should be touched by spark.task.cpus for every finish/start-up of tasks.

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2640: - Assignee: woshilaiceshide In local[N], free cores of the only executor should be touched by

[jira] [Updated] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2277: - Fix Version/s: 1.1.0 Make TaskScheduler track whether there's host on a rack

[jira] [Updated] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2277: - Assignee: Rui Li Make TaskScheduler track whether there's host on a rack

[jira] [Resolved] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2277. -- Resolution: Fixed Make TaskScheduler track whether there's host on a rack

[jira] [Created] (SPARK-2657) Use more compact data structures than ArrayBuffer in groupBy and cogroup

2014-07-23 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2657: Summary: Use more compact data structures than ArrayBuffer in groupBy and cogroup Key: SPARK-2657 URL: https://issues.apache.org/jira/browse/SPARK-2657 Project:

[jira] [Assigned] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-2574: Assignee: Matei Zaharia Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

[jira] [Commented] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072605#comment-14072605 ] Matei Zaharia commented on SPARK-2574: -- I implemented this as part of

[jira] [Updated] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2574: - Priority: Trivial (was: Major) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Affects Version/s: 1.0.0 Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Affects Version/s: (was: 1.0.0) Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Assignee: Adrian Wang Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Affects Version/s: (was: 1.0.1) Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Fix Version/s: 1.1.0 Unpersist last RDD in bagel iteration

[jira] [Resolved] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator

2014-07-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2047. -- Resolution: Fixed Fix Version/s: 1.1.0 Use less memory in

[jira] [Updated] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator

2014-07-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2047: - Assignee: Aaron Davidson Use less memory in AppendOnlyMap.destructiveSortedIterator

[jira] [Updated] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2494: - Priority: Major (was: Blocker) Hash of None is different cross machines in CPython

[jira] [Updated] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2494: - Affects Version/s: 0.9.2 0.9.0 0.9.1 Hash of None

[jira] [Updated] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2494: - Fix Version/s: (was: 1.0.1) (was: 1.0.0) 0.9.3

[jira] [Updated] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2494: - Target Version/s: 1.1.0, 1.0.2, 0.9.3 (was: 1.1.0, 1.0.2) Hash of None is different cross

[jira] [Updated] (SPARK-2494) Hash of None is different cross machines in CPython

2014-07-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2494: - Assignee: Davies Liu Hash of None is different cross machines in CPython

[jira] [Updated] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-18 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2553: - Assignee: Sandy Ryza CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

[jira] [Resolved] (SPARK-2553) CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key

2014-07-18 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2553. -- Resolution: Fixed Target Version/s: 1.1.0 CoGroupedRDD unnecessarily allocates a

[jira] [Assigned] (SPARK-2045) Sort-based shuffle implementation

2014-07-18 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-2045: Assignee: Matei Zaharia Sort-based shuffle implementation

[jira] [Created] (SPARK-2558) Mention --queue argument in YARN documentation

2014-07-17 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2558: Summary: Mention --queue argument in YARN documentation Key: SPARK-2558 URL: https://issues.apache.org/jira/browse/SPARK-2558 Project: Spark Issue Type:

[jira] [Updated] (SPARK-2558) Mention --queue argument in YARN documentation

2014-07-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2558: - Labels: Starter (was: ) Mention --queue argument in YARN documentation

[jira] [Updated] (SPARK-2048) Optimizations to CPU usage of external spilling code

2014-07-16 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2048: - Description: In the external spilling code in ExternalAppendOnlyMap and CoGroupedRDD, there are

[jira] [Commented] (SPARK-2048) Optimizations to CPU usage of external spilling code

2014-07-16 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064602#comment-14064602 ] Matei Zaharia commented on SPARK-2048: -- I added one more issue to this BTW, about

[jira] [Updated] (SPARK-2045) Sort-based shuffle implementation

2014-07-15 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2045: - Attachment: (was: Sort-basedshuffledesign.pdf) Sort-based shuffle implementation

[jira] [Updated] (SPARK-2045) Sort-based shuffle implementation

2014-07-15 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2045: - Attachment: Sort-basedshuffledesign.pdf I've posted a design doc for a simple version of this.

[jira] [Updated] (SPARK-2045) Sort-based shuffle implementation

2014-07-15 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2045: - Attachment: Sort-basedshuffledesign.pdf Oops, attached the wrong file before. Here's the right

[jira] [Commented] (SPARK-2045) Sort-based shuffle implementation

2014-07-15 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063009#comment-14063009 ] Matei Zaharia commented on SPARK-2045: -- Right now I was thinking it would happen

[jira] [Created] (SPARK-2371) Show locally-running tasks (e.g. from take()) in web UI

2014-07-04 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2371: Summary: Show locally-running tasks (e.g. from take()) in web UI Key: SPARK-2371 URL: https://issues.apache.org/jira/browse/SPARK-2371 Project: Spark Issue

[jira] [Updated] (SPARK-1937) Tasks can be submitted before executors are registered

2014-06-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1937: - Assignee: Rui Li Tasks can be submitted before executors are registered

[jira] [Resolved] (SPARK-1937) Tasks can be submitted before executors are registered

2014-06-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1937. -- Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: 1.1.0 Tasks can be

[jira] [Created] (SPARK-2248) spark.default.parallelism does not apply in local mode

2014-06-23 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2248: Summary: spark.default.parallelism does not apply in local mode Key: SPARK-2248 URL: https://issues.apache.org/jira/browse/SPARK-2248 Project: Spark Issue

[jira] [Resolved] (SPARK-2124) Move aggregation into ShuffleManager implementations

2014-06-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2124. -- Resolution: Fixed Fix Version/s: 1.1.0 Move aggregation into ShuffleManager

[jira] [Updated] (SPARK-2206) Automatically infer the number of classification classes in multiclass classification

2014-06-19 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2206: - Assignee: Manish Amde Automatically infer the number of classification classes in multiclass

[jira] [Updated] (SPARK-2207) Add minimum information gain and minimum instances per node as training parameters for decision tree.

2014-06-19 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2207: - Assignee: Manish Amde Add minimum information gain and minimum instances per node as training

[jira] [Updated] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution

2014-06-18 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1112: - Affects Version/s: 1.0.0 When spark.akka.frameSize 10, task results bigger than 10MiB block

[jira] [Resolved] (SPARK-1837) NumericRange should be partitioned in the same way as other sequences

2014-06-14 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1837. -- Resolution: Fixed Fix Version/s: 1.1.0 NumericRange should be partitioned in the same

[jira] [Commented] (SPARK-889) Bring back DFS broadcast

2014-06-12 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030080#comment-14030080 ] Matei Zaharia commented on SPARK-889: - This is a really old JIRA and actually I

[jira] [Created] (SPARK-2123) Basic pluggable interface for shuffle

2014-06-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2123: Summary: Basic pluggable interface for shuffle Key: SPARK-2123 URL: https://issues.apache.org/jira/browse/SPARK-2123 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-2125) Add sorting flag to ShuffleManager, and implement it in HashShuffleManager

2014-06-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2125: Summary: Add sorting flag to ShuffleManager, and implement it in HashShuffleManager Key: SPARK-2125 URL: https://issues.apache.org/jira/browse/SPARK-2125 Project:

[jira] [Created] (SPARK-2124) Move aggregation into ShuffleManager implementations

2014-06-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2124: Summary: Move aggregation into ShuffleManager implementations Key: SPARK-2124 URL: https://issues.apache.org/jira/browse/SPARK-2124 Project: Spark Issue

[jira] [Updated] (SPARK-2124) Move aggregation into ShuffleManager implementations

2014-06-11 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2124: - Assignee: Saisai Shao Move aggregation into ShuffleManager implementations

[jira] [Resolved] (SPARK-2123) Basic pluggable interface for shuffle

2014-06-11 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2123. -- Resolution: Fixed Resolved in https://github.com/apache/spark/pull/1009 Basic pluggable

[jira] [Created] (SPARK-2126) Move MapOutputTracker behind ShuffleManager interface

2014-06-11 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2126: Summary: Move MapOutputTracker behind ShuffleManager interface Key: SPARK-2126 URL: https://issues.apache.org/jira/browse/SPARK-2126 Project: Spark Issue

[jira] [Commented] (SPARK-1416) Add support for SequenceFiles and binary Hadoop InputFormats in PySpark

2014-06-10 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026701#comment-14026701 ] Matei Zaharia commented on SPARK-1416: -- That pull request also added generic

[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles

2014-06-09 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025580#comment-14025580 ] Matei Zaharia commented on SPARK-2044: -- Hey Weihua, I'll look into the sorting flag;

[jira] [Resolved] (SPARK-1416) Add support for SequenceFiles in PySpark

2014-06-09 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1416. -- Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: 1.1.0 Implemented in

[jira] [Updated] (SPARK-1416) Add support for SequenceFiles in PySpark

2014-06-09 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1416: - Assignee: Nick Pentreath Add support for SequenceFiles in PySpark

[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles

2014-06-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021115#comment-14021115 ] Matei Zaharia commented on SPARK-2044: -- Alright so I've posted my code at

[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles

2014-06-06 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020087#comment-14020087 ] Matei Zaharia commented on SPARK-2044: -- {quote} 1. Is it a goal to support more kind

[jira] [Commented] (SPARK-2044) Pluggable interface for shuffles

2014-06-06 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020329#comment-14020329 ] Matei Zaharia commented on SPARK-2044: -- So BTW I think what I'll do is move over the

[jira] [Created] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling

2014-06-05 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2032: Summary: Add an RDD.samplePartitions method for partition-level sampling Key: SPARK-2032 URL: https://issues.apache.org/jira/browse/SPARK-2032 Project: Spark

[jira] [Updated] (SPARK-2032) Add an RDD.samplePartitions method for partition-level sampling

2014-06-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2032: - Priority: Minor (was: Major) Add an RDD.samplePartitions method for partition-level sampling

[jira] [Created] (SPARK-2043) ExternalAppendOnlyMap doesn't always find matching keys

2014-06-05 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2043: Summary: ExternalAppendOnlyMap doesn't always find matching keys Key: SPARK-2043 URL: https://issues.apache.org/jira/browse/SPARK-2043 Project: Spark Issue

[jira] [Created] (SPARK-2045) Sort-based shuffle implementation

2014-06-05 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2045: Summary: Sort-based shuffle implementation Key: SPARK-2045 URL: https://issues.apache.org/jira/browse/SPARK-2045 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator

2014-06-05 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2047: Summary: Use less memory in AppendOnlyMap.destructiveSortedIterator Key: SPARK-2047 URL: https://issues.apache.org/jira/browse/SPARK-2047 Project: Spark

[jira] [Updated] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator

2014-06-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2047: - Priority: Minor (was: Major) Use less memory in AppendOnlyMap.destructiveSortedIterator

[jira] [Updated] (SPARK-2047) Use less memory in AppendOnlyMap.destructiveSortedIterator

2014-06-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2047: - Priority: Major (was: Minor) Use less memory in AppendOnlyMap.destructiveSortedIterator

[jira] [Commented] (SPARK-2043) ExternalAppendOnlyMap doesn't always find matching keys

2014-06-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019482#comment-14019482 ] Matei Zaharia commented on SPARK-2043: -- https://github.com/apache/spark/pull/986

[jira] [Created] (SPARK-2013) Add Python pickleFile to programming guide

2014-06-04 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2013: Summary: Add Python pickleFile to programming guide Key: SPARK-2013 URL: https://issues.apache.org/jira/browse/SPARK-2013 Project: Spark Issue Type:

[jira] [Created] (SPARK-2014) Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default

2014-06-04 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2014: Summary: Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default Key: SPARK-2014 URL: https://issues.apache.org/jira/browse/SPARK-2014 Project: Spark

[jira] [Updated] (SPARK-2013) Add Python pickleFile to programming guide

2014-06-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2013: - Assignee: Kan Zhang Add Python pickleFile to programming guide

[jira] [Updated] (SPARK-1912) Compression memory issue during reduce

2014-06-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1912: - Target Version/s: 0.9.2, 1.0.1, 1.1.0 (was: 0.9.2, 1.0.1) Compression memory issue during

[jira] [Updated] (SPARK-1912) Compression memory issue during reduce

2014-06-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1912: - Target Version/s: 0.9.2, 1.0.1 Compression memory issue during reduce

[jira] [Created] (SPARK-2024) Add saveAsSequenceFile to PySpark

2014-06-04 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2024: Summary: Add saveAsSequenceFile to PySpark Key: SPARK-2024 URL: https://issues.apache.org/jira/browse/SPARK-2024 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-1790) Update EC2 scripts to support r3 instance types

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017001#comment-14017001 ] Matei Zaharia commented on SPARK-1790: -- It's fine to skip the check right now; I

[jira] [Updated] (SPARK-1942) Stop clearing spark.driver.port in unit tests

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1942: - Fix Version/s: 1.1.0 Stop clearing spark.driver.port in unit tests

[jira] [Resolved] (SPARK-1912) Compression memory issue during reduce

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1912. -- Resolution: Fixed Compression memory issue during reduce

[jira] [Updated] (SPARK-1992) Support for Pivotal HD in the Maven build

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1992: - Assignee: Christian Tzolov Support for Pivotal HD in the Maven build

[jira] [Updated] (SPARK-1992) Support for Pivotal HD in the Maven build

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1992: - Fix Version/s: 1.0.1 Support for Pivotal HD in the Maven build

[jira] [Updated] (SPARK-1992) Support for Pivotal HD in the Maven build

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1992: - Issue Type: Improvement (was: Bug) Support for Pivotal HD in the Maven build

[jira] [Updated] (SPARK-1992) Support for Pivotal HD in the Maven build

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1992: - Fix Version/s: 1.1.0 Support for Pivotal HD in the Maven build

[jira] [Resolved] (SPARK-1468) The hash method used by partitionBy in Pyspark doesn't deal with None correctly.

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1468. -- Resolution: Fixed The hash method used by partitionBy in Pyspark doesn't deal with None

[jira] [Resolved] (SPARK-1161) Add saveAsObjectFile and SparkContext.objectFile in Python

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1161. -- Resolution: Fixed Merged this in -- thanks Kan! Add saveAsObjectFile and

[jira] [Updated] (SPARK-1161) Add saveAsObjectFile and SparkContext.objectFile in Python

2014-06-03 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1161: - Fix Version/s: 1.1.0 Add saveAsObjectFile and SparkContext.objectFile in Python

[jira] [Created] (SPARK-1996) Remove use of special Maven repo for Akka

2014-06-02 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1996: Summary: Remove use of special Maven repo for Akka Key: SPARK-1996 URL: https://issues.apache.org/jira/browse/SPARK-1996 Project: Spark Issue Type:

[jira] [Created] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC

2014-06-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1989: Summary: Exit executors faster if they get into a cycle of heavy GC Key: SPARK-1989 URL: https://issues.apache.org/jira/browse/SPARK-1989 Project: Spark

<    1   2   3   4   5   6   7   8   9   10   >