Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Pei-Lun Lee
Thanks for the DirectOutputCommitter example. However I found it only works for saveAsHadoopFile. What about saveAsParquetFile? It looks like SparkSQL is using ParquetOutputCommitter, which is subclass of FileOutputCommitter. On Fri, Feb 27, 2015 at 1:52 AM, Thomas Demoor

Re: enum-like types in Spark

2015-03-05 Thread Patrick Wendell
Yes - only new or internal API's. I doubt we'd break any exposed APIs for the purpose of clean up. Patrick On Mar 5, 2015 12:16 AM, Mridul Muralidharan mri...@gmail.com wrote: While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is targetted at (new) api being designed in spark. Rewiring what we already have exposed will lead to incompatible api change (StorageLevel for example, is in 1.0). Regards, Mridul On

RE: Unable to Read/Write Avro RDD on cluster.

2015-03-05 Thread java8964
You can give Spark-Avro a try. It works great for our project. https://github.com/databricks/spark-avro From: deepuj...@gmail.com Date: Thu, 5 Mar 2015 10:27:04 +0530 Subject: Fwd: Unable to Read/Write Avro RDD on cluster. To: dev@spark.apache.org I am trying to read RDD avro, transform

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally