It is just a comma separated file, about 10 columns wide which we append
with a unique id and a few additional values.

On Fri, Mar 27, 2015 at 2:43 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> jamborta :
> Please also describe the format of your csv files.
>
> Cheers
>
> On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail <deanwamp...@gmail.com> wrote:
>
>> Show us the code. This shouldn't happen for the simple process you
>> described
>>
>> Sent from my rotary phone.
>>
>>
>> > On Mar 27, 2015, at 5:47 AM, jamborta <jambo...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > We have a workflow that pulls in data from csv files, then originally
>> setup
>> > up of the workflow was to parse the data as it comes in (turn into
>> array),
>> > then store it. This resulted in out of memory errors with larger files
>> (as a
>> > result of increased GC?).
>> >
>> > It turns out if the data gets stored as a string first, then parsed, it
>> > issues does not occur.
>> >
>> > Why is that?
>> >
>> > Thanks,
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tp22255.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to