Thanks for the DirectOutputCommitter example.
However I found it only works for saveAsHadoopFile. What about
saveAsParquetFile?
It looks like SparkSQL is using ParquetOutputCommitter, which is subclass
of FileOutputCommitter.
On Fri, Feb 27, 2015 at 1:52 AM, Thomas Demoor
Yes - only new or internal API's. I doubt we'd break any exposed APIs for
the purpose of clean up.
Patrick
On Mar 5, 2015 12:16 AM, Mridul Muralidharan mri...@gmail.com wrote:
While I dont have any strong opinions about how we handle enum's
either way in spark, I assume the discussion is
While I dont have any strong opinions about how we handle enum's
either way in spark, I assume the discussion is targetted at (new) api
being designed in spark.
Rewiring what we already have exposed will lead to incompatible api
change (StorageLevel for example, is in 1.0).
Regards,
Mridul
On
You can give Spark-Avro a try. It works great for our project.
https://github.com/databricks/spark-avro
From: deepuj...@gmail.com
Date: Thu, 5 Mar 2015 10:27:04 +0530
Subject: Fwd: Unable to Read/Write Avro RDD on cluster.
To: dev@spark.apache.org
I am trying to read RDD avro, transform
I have a strong dislike for java enum's due to the fact that they
are not stable across JVM's - if it undergoes serde, you end up with
unpredictable results at times [1].
One of the reasons why we prevent enum's from being key : though it is
highly possible users might depend on it internally