Dear Spark community,
Has the pySpark saveAsSequenceFile(folder) method the ability to append
the new sequencefile into an other one or to overwrite an existing
sequencefile? If the folder already exists then I get an error message...
Thank You!
Csaba
:)
Cheers,
Holden :)
On Tuesday, October 28, 2014, Csaba Ragany rag...@gmail.com wrote:
Dear Spark Community,
Is it possible to convert text files (.log or .txt files) into
sequencefiles in Python?
Using PySpark I can create a parallelized file with
rdd=sc.parallelize([('key1', 1.0
Dear Spark Community,
Is it possible to convert text files (.log or .txt files) into
sequencefiles in Python?
Using PySpark I can create a parallelized file with
rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile
with rdd.saveAsSequenceFile(). But how can I put the whole