dump is just pigServer.openIterator(alias). the problem is you must have
a handle to the alias, so if you are reading from another program, you
would need to do a load and then open an iterator on that alias, which
will probably run a map/reduce job.
ben
Vincent Barat wrote:
Thank you for your concern.
I've implemented this method by scanning all part-* files found in
the directory. It is far from being elegant, but at least the code
is written :)
Dump cannot be called from Java AFAIK.
I will definitively have a look at Zebra's contrib.
zaki rahaman a écrit :
>From my understanding, the part-0000 files correspond to each of the final
reduce tasks in a M/R job (whether you're running it from Pig or directly in
Hadoop). The easiest solution is to just cat the part files in the created
directory as you suggested. I'm not sure if there's some other method in the
API to directly read output. I suppose you could call dump and read it in
that way, but that seems even less elegant. Alternatively, if you're looking
to store into table output, take a look at the zebra contrib, although I
myself am pretty clueless as to the details.
On Wed, Oct 28, 2009 at 12:20 PM, Vincent Barat <[email protected]>wrote:
Hello,
I'm using PIG from Java and I store my results using the regular call:
pigServer.store(pigAlias, outputFilePath);
Now, I need to read the file produced (in order to store it to a MySQL
table).
The problem is that PIG (when used in map/reduce mode) is creating a
directory + a set of part files for each stored "file".
I cannot figure out how to read this output: should I concatenate all part
files? Is there an PIG API that hide this complexity?
Thanks for your help, as it is a blocking issue for me.
Regards,