It is just a comma separated file, about 10 columns wide which we append with a unique id and a few additional values.
On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu <yuzhih...@gmail.com> wrote: > jamborta : > Please also describe the format of your csv files. > > Cheers > > On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail <deanwamp...@gmail.com> wrote: > >> Show us the code. This shouldn't happen for the simple process you >> described >> >> Sent from my rotary phone. >> >> >> > On Mar 27, 2015, at 5:47 AM, jamborta <jambo...@gmail.com> wrote: >> > >> > Hi all, >> > >> > We have a workflow that pulls in data from csv files, then originally >> setup >> > up of the workflow was to parse the data as it comes in (turn into >> array), >> > then store it. This resulted in out of memory errors with larger files >> (as a >> > result of increased GC?). >> > >> > It turns out if the data gets stored as a string first, then parsed, it >> > issues does not occur. >> > >> > Why is that? >> > >> > Thanks, >> > >> > >> > >> > -- >> > View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tp22255.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >