Re: Include permalinks in mail footer

2014-08-05 Thread Matei Zaharia
Oh actually sorry, it looks like infra has looked at it but they can't add permalinks. They can only add here's how to unsubscribe footers. My bad, I just didn't catch the email update from them. Matei On August 5, 2014 at 2:39:45 PM, Matei Zaharia (matei.zaha...@gmail.com) wrote: Emails sent

[jira] [Updated] (SPARK-2787) Make sort-based shuffle write files directly when there is no sorting / aggregation and # of partitions is small

2014-08-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2787: - Target Version/s: 1.1.0 Make sort-based shuffle write files directly when there is no sorting

[jira] [Assigned] (SPARK-2787) Make sort-based shuffle write files directly when there is no sorting / aggregation and # of partitions is small

2014-08-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-2787: Assignee: Matei Zaharia Make sort-based shuffle write files directly when

[jira] [Assigned] (SPARK-2685) Update ExternalAppendOnlyMap to avoid buffer.remove()

2014-08-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-2685: Assignee: Matei Zaharia Update ExternalAppendOnlyMap to avoid buffer.remove

[jira] [Updated] (SPARK-2685) Update ExternalAppendOnlyMap to avoid buffer.remove()

2014-08-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2685: - Target Version/s: 1.1.0 Update ExternalAppendOnlyMap to avoid buffer.remove

[jira] [Resolved] (SPARK-1811) Support resizable output buffer for kryo serializer

2014-08-04 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1811. -- Resolution: Duplicate Fix Version/s: 1.1.0 Support resizable output buffer for kryo

Re: log overloaded in SparkContext/ Spark 1.0.x

2014-08-04 Thread Matei Zaharia
Hah, weird. log should be protected actually (look at trait Logging). Is your class extending SparkContext or somehow being placed in the org.apache.spark package? Or maybe the Scala compiler looks at it anyway.. in that case we can rename it. Please open a JIRA for it if that's the case. On

Re: Spark Training Course?

2014-08-04 Thread Matei Zaharia
This looks pretty comprehensive to me. A few quick suggestions: - On the VM part: we've actually been avoiding this in all the Databricks training efforts because the VM itself can be annoying to install and it makes it harder for people to really use Spark for development (they can learn it,

Re: Create a new object by given classtag

2014-08-04 Thread Matei Zaharia
To get the ClassTag object inside your function with the original syntax you used (T: ClassTag), you can do this: def read[T: ClassTag](): T = {   val ct = classTag[T]   ct.runtimeClass.newInstance().asInstanceOf[T] } Passing the ClassTag with : ClassTag lets you have an implicit parameter that

[jira] [Resolved] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2670. -- Resolution: Fixed Fix Version/s: 1.1.0 FetchFailedException should be thrown when

[jira] [Updated] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2670: - Assignee: Kousuke Saruta FetchFailedException should be thrown when local fetch has failed

[jira] [Updated] (SPARK-2670) FetchFailedException should be thrown when local fetch has failed

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2670: - Priority: Major (was: Critical) FetchFailedException should be thrown when local fetch has

[jira] [Resolved] (SPARK-983) Support external sorting for RDD#sortByKey()

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-983. - Resolution: Fixed Fix Version/s: 1.1.0 Support external sorting for RDD#sortByKey

[jira] [Created] (SPARK-2787) Make sort-based shuffle write files directly when there is no sorting / aggregation and # of partitions is small

2014-08-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2787: Summary: Make sort-based shuffle write files directly when there is no sorting / aggregation and # of partitions is small Key: SPARK-2787 URL: https://issues.apache.org/jira

[jira] [Resolved] (SPARK-2134) Report metrics before application finishes

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2134. -- Resolution: Fixed Fix Version/s: 1.1.0 Report metrics before application finishes

[jira] [Resolved] (SPARK-695) Exponential recursion in getPreferredLocations

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-695. - Resolution: Fixed Fix Version/s: 1.1.0 Exponential recursion in getPreferredLocations

[jira] [Resolved] (SPARK-2490) StackOverflowError when RDD dependencies are too long

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2490. -- Resolution: Fixed Fix Version/s: 1.1.0 StackOverflowError when RDD dependencies

[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082878#comment-14082878 ] Matei Zaharia commented on SPARK-2532: -- I'm going to create a few sub-tasks

[jira] [Created] (SPARK-2791) Fix committing, reverting and state tracking in shuffle file consolidation

2014-08-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2791: Summary: Fix committing, reverting and state tracking in shuffle file consolidation Key: SPARK-2791 URL: https://issues.apache.org/jira/browse/SPARK-2791 Project

[jira] [Created] (SPARK-2792) Fix reading too much or too little data from each stream in ExternalMap / Sorter

2014-08-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2792: Summary: Fix reading too much or too little data from each stream in ExternalMap / Sorter Key: SPARK-2792 URL: https://issues.apache.org/jira/browse/SPARK-2792

[jira] [Created] (SPARK-2793) Correctly lock directory creation in DiskBlockManager.getFile

2014-08-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2793: Summary: Correctly lock directory creation in DiskBlockManager.getFile Key: SPARK-2793 URL: https://issues.apache.org/jira/browse/SPARK-2793 Project: Spark

[jira] [Created] (SPARK-2795) Improve DiskBlockObjectWriter API

2014-08-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2795: Summary: Improve DiskBlockObjectWriter API Key: SPARK-2795 URL: https://issues.apache.org/jira/browse/SPARK-2795 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-2794) Use Java 7 isSymlink when available

2014-08-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2794: Summary: Use Java 7 isSymlink when available Key: SPARK-2794 URL: https://issues.apache.org/jira/browse/SPARK-2794 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-1612) Potential resource leaks in Utils.copyStream and Utils.offsetBytes

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1612. -- Resolution: Fixed Fix Version/s: 1.1.0 Potential resource leaks in Utils.copyStream

[jira] [Updated] (SPARK-2791) Fix committing, reverting and state tracking in shuffle file consolidation

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2791: - Assignee: Mridul Muralidharan Fix committing, reverting and state tracking in shuffle file

[jira] [Updated] (SPARK-2532) Fix issues with consolidated shuffle

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2532: - Fix Version/s: (was: 1.1.0) Fix issues with consolidated shuffle

[jira] [Resolved] (SPARK-2791) Fix committing, reverting and state tracking in shuffle file consolidation

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2791. -- Resolution: Fixed Fix committing, reverting and state tracking in shuffle file consolidation

[jira] [Resolved] (SPARK-2684) Update ExternalAppendOnlyMap to take an iterator as input

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2684. -- Resolution: Fixed Fix Version/s: 1.1.0 Update ExternalAppendOnlyMap to take

[jira] [Updated] (SPARK-2792) Fix reading too much or too little data from each stream in ExternalMap / Sorter

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2792: - Assignee: (was: Matei Zaharia) Fix reading too much or too little data from each stream

[jira] [Updated] (SPARK-2116) Load spark-defaults.conf from directory specified by SPARK_CONF_DIR

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2116: - Assignee: Albert Chu Load spark-defaults.conf from directory specified by SPARK_CONF_DIR

[jira] [Resolved] (SPARK-2116) Load spark-defaults.conf from directory specified by SPARK_CONF_DIR

2014-08-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2116. -- Resolution: Fixed Fix Version/s: 1.1.0 Load spark-defaults.conf from directory

Re: correct upgrade process

2014-08-01 Thread Matei Zaharia
This should be okay, but make sure that your cluster also has the right code deployed. Maybe you have the wrong one. If you built Spark from source multiple times, you may also want to try sbt clean before sbt assembly. Matei On August 1, 2014 at 12:00:07 PM, SK (skrishna...@gmail.com) wrote:

[jira] [Updated] (SPARK-2762) SparkILoop leaks memory in multi-repl configurations

2014-07-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2762: - Assignee: Timothy Hunter SparkILoop leaks memory in multi-repl configurations

[jira] [Resolved] (SPARK-2762) SparkILoop leaks memory in multi-repl configurations

2014-07-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2762. -- Resolution: Fixed Fix Version/s: 1.1.0 SparkILoop leaks memory in multi-repl

[jira] [Commented] (SPARK-2762) SparkILoop leaks memory in multi-repl configurations

2014-07-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081154#comment-14081154 ] Matei Zaharia commented on SPARK-2762: -- PR: https://github.com/apache/spark/pull/1674

[jira] [Resolved] (SPARK-2028) Let users of HadoopRDD access the partition InputSplits

2014-07-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2028. -- Resolution: Fixed Fix Version/s: 1.1.0 Let users of HadoopRDD access the partition

[jira] [Updated] (SPARK-2711) Create a ShuffleMemoryManager that allocates across spilling collections in the same task

2014-07-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2711: - Target Version/s: 1.1.0 Create a ShuffleMemoryManager that allocates across spilling

[jira] [Updated] (SPARK-2711) Create a ShuffleMemoryManager that allocates across spilling collections in the same task

2014-07-30 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2711: - Priority: Critical (was: Major) Create a ShuffleMemoryManager that allocates across spilling

[jira] [Updated] (SPARK-2447) Add common solution for sending upsert actions to HBase (put, deletes, and increment)

2014-07-30 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2447: - Target Version/s: 1.2.0 (was: 1.1.0) Add common solution for sending upsert actions to HBase

[jira] [Commented] (SPARK-983) Support external sorting for RDD#sortByKey()

2014-07-30 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080385#comment-14080385 ] Matei Zaharia commented on SPARK-983: - Now that an ExternalSorter class from SPARK-2045

[jira] [Assigned] (SPARK-983) Support external sorting for RDD#sortByKey()

2014-07-30 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-983: --- Assignee: Matei Zaharia Support external sorting for RDD#sortByKey

Re: Do I need to know Scala to take full advantage of spark?

2014-07-30 Thread Matei Zaharia
Java is very close to Scala across the board, the only thing missing in it right now is GraphX (which is still alpha). Python is missing GraphX, streaming and a few of the ML algorithms, though most of them are there. So it should be fine to start with  any of them. See 

[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support

2014-07-29 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077473#comment-14077473 ] Matei Zaharia commented on SPARK-1981: -- The EC2 scripts actually fetch a package

[jira] [Resolved] (SPARK-2305) pyspark - depend on py4j 0.8.1

2014-07-29 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2305. -- Resolution: Fixed Fix Version/s: 1.1.0 pyspark - depend on py4j 0.8.1

Re: JIRA content request

2014-07-29 Thread Matei Zaharia
I agree as well. FWIW sometimes I've seen this happen due to language barriers, i.e. contributors whose primary language is not English, but we need more motivation for each change. On July 29, 2014 at 5:12:01 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: +1 on using JIRA workflows

Re: Job using Spark for Machine Learning

2014-07-29 Thread Matei Zaharia
Hi Martin, Job ads are actually not allowed on the list, but thanks for asking. Just posting this for others' future reference. Matei On July 29, 2014 at 8:34:59 AM, Martin Goodson (mar...@skimlinks.com) wrote: I'm not sure if job adverts are allowed on here - please let me know if not. 

[jira] [Updated] (SPARK-2134) Report metrics before application finishes

2014-07-28 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2134: - Assignee: Rahul Singhal Report metrics before application finishes

[jira] [Updated] (SPARK-1777) Pass cached blocks directly to disk if memory is not large enough

2014-07-27 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1777: - Priority: Critical (was: Major) Pass cached blocks directly to disk if memory is not large

[jira] [Created] (SPARK-2711) Create a ShuffleMemoryManager that allocates across spilling collections in the same task

2014-07-27 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2711: Summary: Create a ShuffleMemoryManager that allocates across spilling collections in the same task Key: SPARK-2711 URL: https://issues.apache.org/jira/browse/SPARK-2711

[jira] [Updated] (SPARK-2711) Create a ShuffleMemoryManager that allocates across spilling collections in the same task

2014-07-27 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2711: - Description: Right now if there are two ExternalAppendOnlyMaps, they don't compete correctly

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Matei Zaharia
+1 Tested this on Mac OS X. Matei On Jul 25, 2014, at 4:08 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452:

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Matei Zaharia
or somesuch, but testing for A will give an incorrect answer, and the code can't be expected to look for everyone's A+X versions. Actually inspecting the code is more robust if a bit messier. On Sun, Jul 27, 2014 at 9:50 PM, Matei Zaharia matei.zaha...@gmail.com wrote: For this particular issue

Re: Spark MLlib vs BIDMach Benchmark

2014-07-27 Thread Matei Zaharia
These numbers are from GPUs and Intel MKL (a closed-source math library for Intel processors), where for CPU-bound algorithms you are going to get faster speeds than MLlib's JBLAS. However, there's in theory nothing preventing the use of these in MLlib (e.g. if you have a faster BLAS locally;

[jira] [Resolved] (SPARK-1458) Expose sc.version in PySpark

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1458. -- Resolution: Fixed Fix Version/s: 1.1.0 Expose sc.version in PySpark

[jira] [Updated] (SPARK-2696) Reduce default spark.serializer.objectStreamReset

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2696: - Assignee: Hossein Falaki Reduce default spark.serializer.objectStreamReset

[jira] [Resolved] (SPARK-2696) Reduce default spark.serializer.objectStreamReset

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2696. -- Resolution: Fixed Fix Version/s: 1.0.3 Target Version/s: 1.0.3 (was: 1.0.0

[jira] [Resolved] (SPARK-2652) Turning default configurations for PySpark

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2652. -- Resolution: Fixed Turning default configurations for PySpark

[jira] [Updated] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2279: - Assignee: Bob Paulin JavaSparkContext should allow creation of EmptyRDD

[jira] [Resolved] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2279. -- Resolution: Fixed Fix Version/s: 1.1.0 JavaSparkContext should allow creation

[jira] [Updated] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2279: - Priority: Minor (was: Major) JavaSparkContext should allow creation of EmptyRDD

[jira] [Resolved] (SPARK-2704) ConnectionManager threads should be named and daemon

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2704. -- Resolution: Fixed Fix Version/s: 1.1.0 ConnectionManager threads should be named

[jira] [Resolved] (SPARK-2601) py4j.Py4JException on sc.pickleFile

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2601. -- Resolution: Fixed Fix Version/s: 1.1.0 py4j.Py4JException on sc.pickleFile

[jira] [Assigned] (SPARK-2684) Update ExternalAppendOnlyMap to take an iterator as input

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-2684: Assignee: Matei Zaharia Update ExternalAppendOnlyMap to take an iterator as input

[jira] [Resolved] (SPARK-2680) Lower spark.shuffle.memoryFraction to 0.2 by default

2014-07-26 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2680. -- Resolution: Fixed Lower spark.shuffle.memoryFraction to 0.2 by default

Re: Spilling in-memory... messages in log even with MEMORY_ONLY

2014-07-26 Thread Matei Zaharia
These messages are actually not about spilling the RDD, they're about spilling intermediate state in a reduceByKey, groupBy or other operation whose state doesn't fit in memory. We have to do that in these cases to avoid going out of memory. You can minimize spilling by having more reduce tasks

Re: Spilling in-memory... messages in log even with MEMORY_ONLY

2014-07-26 Thread Matei Zaharia
Even in local mode, Spark serializes data that would be sent across the network, e.g. in a reduce operation, so that you can catch errors that would happen in distributed mode. You can make serialization much faster by using the Kryo serializer; see

[jira] [Created] (SPARK-2684) Update ExternalAppendOnlyMap to take an iterator as input

2014-07-25 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2684: Summary: Update ExternalAppendOnlyMap to take an iterator as input Key: SPARK-2684 URL: https://issues.apache.org/jira/browse/SPARK-2684 Project: Spark

[jira] [Created] (SPARK-2685) Update ExternalAppendOnlyMap to avoid buffer.remove()

2014-07-25 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2685: Summary: Update ExternalAppendOnlyMap to avoid buffer.remove() Key: SPARK-2685 URL: https://issues.apache.org/jira/browse/SPARK-2685 Project: Spark Issue

[jira] [Commented] (SPARK-2689) Remove use of println in ActorHelper

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074664#comment-14074664 ] Matei Zaharia commented on SPARK-2689: -- Pull request: https://github.com/apache/spark

[jira] [Created] (SPARK-2689) Remove use of println in ActorHelper

2014-07-25 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2689: Summary: Remove use of println in ActorHelper Key: SPARK-2689 URL: https://issues.apache.org/jira/browse/SPARK-2689 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-2689) Remove use of println in ActorHelper

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2689. -- Resolution: Fixed Fix Version/s: 1.1.0 Remove use of println in ActorHelper

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074673#comment-14074673 ] Matei Zaharia commented on SPARK-2620: -- The problem is that case class is compiled

[jira] [Resolved] (SPARK-2683) unidoc failed because org.apache.spark.util.CallSite uses Java keywords as value names

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2683. -- Resolution: Fixed Fix Version/s: 1.1.0 unidoc failed because

[jira] [Resolved] (SPARK-2682) Javadoc generated from Scala source code is not in javadoc's index

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2682. -- Resolution: Fixed Fix Version/s: 1.1.0 Javadoc generated from Scala source code

[jira] [Resolved] (SPARK-2125) Add sorting flag to ShuffleManager, and implement it in HashShuffleManager

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2125. -- Resolution: Fixed Fix Version/s: 1.1.0 Add sorting flag to ShuffleManager

[jira] [Resolved] (SPARK-1726) Tasks that fail to serialize remain in active stages forever.

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1726. -- Resolution: Fixed Fix Version/s: 1.1.0 Tasks that fail to serialize remain in active

[jira] [Commented] (SPARK-2567) Resubmitted stage sometimes remains as active stage in the web UI

2014-07-25 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075035#comment-14075035 ] Matei Zaharia commented on SPARK-2567: -- I've merged this into 1.1 because the patch

[jira] [Resolved] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2661. -- Resolution: Fixed Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2538) External aggregation in Python

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2538: - Priority: Critical (was: Major) External aggregation in Python

[jira] [Resolved] (SPARK-2014) Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2014. -- Resolution: Fixed Fix Version/s: 1.1.0 Make PySpark store RDDs in MEMORY_ONLY_SER

[jira] [Created] (SPARK-2680) Lower spark.shuffle.memoryFraction to 0.2 by default

2014-07-24 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2680: Summary: Lower spark.shuffle.memoryFraction to 0.2 by default Key: SPARK-2680 URL: https://issues.apache.org/jira/browse/SPARK-2680 Project: Spark Issue

[jira] [Resolved] (SPARK-2538) External aggregation in Python

2014-07-24 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2538. -- Resolution: Fixed Fix Version/s: (was: 1.0.1) (was: 1.0.0

Re: akka 2.3.x?

2014-07-24 Thread Matei Zaharia
This is being tracked here: https://issues.apache.org/jira/browse/SPARK-1812, since it will also be needed for cross-building with Scala 2.11. Maybe we can do it before that. Probably too late for 1.1, but you should open an issue for 1.2. In that JIRA I linked, there's a pull request from a

Re: mapToPair vs flatMapToPair vs flatMap function usage.

2014-07-24 Thread Matei Zaharia
The Pair ones return a JavaPairRDD, which has additional operations on key-value pairs. Take a look at http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs for details. Matei On Jul 24, 2014, at 3:41 PM, abhiguruvayya sharath.abhis...@gmail.com wrote: Can

[jira] [Resolved] (SPARK-2609) Log thread ID when spilling ExternalAppendOnlyMap

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2609. -- Resolution: Fixed Log thread ID when spilling ExternalAppendOnlyMap

[jira] [Updated] (SPARK-2609) Log thread ID when spilling ExternalAppendOnlyMap

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2609: - Assignee: Andrew Or Log thread ID when spilling ExternalAppendOnlyMap

[jira] [Resolved] (SPARK-2640) In local[N], free cores of the only executor should be touched by spark.task.cpus for every finish/start-up of tasks.

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2640. -- Resolution: Fixed Fix Version/s: 1.1.0 In local[N], free cores of the only executor

[jira] [Updated] (SPARK-2640) In local[N], free cores of the only executor should be touched by spark.task.cpus for every finish/start-up of tasks.

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2640: - Priority: Minor (was: Major) In local[N], free cores of the only executor should be touched

[jira] [Updated] (SPARK-2640) In local[N], free cores of the only executor should be touched by spark.task.cpus for every finish/start-up of tasks.

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2640: - Assignee: woshilaiceshide In local[N], free cores of the only executor should be touched

[jira] [Updated] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2277: - Fix Version/s: 1.1.0 Make TaskScheduler track whether there's host on a rack

[jira] [Updated] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2277: - Assignee: Rui Li Make TaskScheduler track whether there's host on a rack

[jira] [Resolved] (SPARK-2277) Make TaskScheduler track whether there's host on a rack

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2277. -- Resolution: Fixed Make TaskScheduler track whether there's host on a rack

[jira] [Created] (SPARK-2657) Use more compact data structures than ArrayBuffer in groupBy and cogroup

2014-07-23 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-2657: Summary: Use more compact data structures than ArrayBuffer in groupBy and cogroup Key: SPARK-2657 URL: https://issues.apache.org/jira/browse/SPARK-2657 Project

[jira] [Assigned] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia reassigned SPARK-2574: Assignee: Matei Zaharia Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

[jira] [Commented] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072605#comment-14072605 ] Matei Zaharia commented on SPARK-2574: -- I implemented this as part of https

[jira] [Updated] (SPARK-2574) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2574: - Priority: Trivial (was: Major) Avoid allocating new ArrayBuffer in groupByKey's mergeCombiner

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Affects Version/s: 1.0.0 Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Affects Version/s: (was: 1.0.0) Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Assignee: Adrian Wang Unpersist last RDD in bagel iteration

[jira] [Updated] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2661: - Affects Version/s: (was: 1.0.1) Unpersist last RDD in bagel iteration

<    2   3   4   5   6   7   8   9   10   11   >