Befire saveAsParquetFile(), you can call coalesce(N), then you will have N files, it will keep the order as before (repartition() will not).
On Mon, Nov 3, 2014 at 1:16 AM, ag007 <agre...@mac.com> wrote: > Thanks Akhil, > > Am I right in saying that the repartition will spread the data randomly so I > loose chronological order? > > I really just want the csv --> parquet format in the same order it came in. > If I set repartition with 1 will this not be random? > > cheers, > Ag > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-files-are-only-6-20MB-in-size-tp17935p17941.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org