pySpark saveAsSequenceFile append overwrite

2014-12-02 Thread Csaba Ragany
Dear Spark community, Has the pySpark saveAsSequenceFile(folder) method the ability to append the new sequencefile into an other one or to overwrite an existing sequencefile? If the folder already exists then I get an error message... Thank You! Csaba

Re: pySpark - convert log/txt files into sequenceFile

2014-10-29 Thread Csaba Ragany
:) Cheers, Holden :) On Tuesday, October 28, 2014, Csaba Ragany rag...@gmail.com wrote: Dear Spark Community, Is it possible to convert text files (.log or .txt files) into sequencefiles in Python? Using PySpark I can create a parallelized file with rdd=sc.parallelize([('key1', 1.0

pySpark - convert log/txt files into sequenceFile

2014-10-28 Thread Csaba Ragany
Dear Spark Community, Is it possible to convert text files (.log or .txt files) into sequencefiles in Python? Using PySpark I can create a parallelized file with rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile with rdd.saveAsSequenceFile(). But how can I put the whole