Re: Loading CSV Files & LOAD large files behavior in local mode

Defenestrator Thu, 19 Aug 2010 23:42:50 -0700

Thanks, Jeff.

A quick follow-up question relating to the loading/storing of data - what is
the best practice when dealing with multiple relations with many tuples, do
people typically STORE intermediate relations to minimize memory usage and
RELOAD the intermediate data for use later on in the same script?  Because I
noticed that when tuples are written out using the TupleFormat, which
outputs text with an additional parenthesis that would cause a subsequent
PigStorage LOAD to get extra parenthesis characters, right?


On Thu, Aug 19, 2010 at 1:50 AM, Jeff Zhang <zjf...@gmail.com> wrote:

> I am afraid you should write your own LoadFunc to interpret the text.
> From Pig 0.7, the local mode use the hadoop's standalone local mode,
> so it will won't store all the data in memory, the data will been read
> in stream mode, but this mode need more memory because each task is
> executed in another jvm.
>
>
> On Thu, Aug 19, 2010 at 12:48 AM, Defenestrator
> <defenestration...@gmail.com> wrote:
> > What loader should I use on csv files with quoted strings that contain
> > embedded commas?  (i.e. Embedded commas should not be a separator.)
> >
> > And when LOADing large files in local mode, does Pig just store it all
> > in memory?  Or does it have memory management ala buffer managers in
> > DBMS's?
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Loading CSV Files & LOAD large files behavior in local mode

Reply via email to