Re: Loading multiple files, each file as a record

2015-03-10 Thread Ronald Green
Thanks for your suggestion.

I wouldn't want to run a map reduce job just to just get the file in a
single tuple. But also, I can't be sure I get the lines sorted within the
group, in the same order they are in the file.

Thanks

On 10 March 2015 at 06:39, Arvind S arvind18...@gmail.com wrote:

 while loading file you can attempt to use
 PigStorage(',','-tagFile')
 then regex on each line of the file .. then group by file name


 https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/builtin/PigStorage.html

 *Cheers !!*
 Arvind

 On Fri, Mar 6, 2015 at 2:26 AM, Daniel Dai da...@hortonworks.com wrote:

  Didn¹t realize any, but it should be pretty easy to write a customized
  Loader/InputFormat for that.
 
  Daniel
 
  On 3/5/15, 6:18 AM, Ronald Green green.ron...@gmail.com wrote:
 
  Hi,
  
  I'm looking for a loader function that will let me read each file as a
  record on its own so I'll be able to treat each as a single
 record/field.
  For example:
  
  a = load '/files' USING TheLoader() as (file:chararray);
  b = foreach a GENERATE REGEX_EXTRACT(file,'...');
  
  PigStorage and TextLoader return each line in the file as a
 record/tuple.
  
  Do you know any other loader that allows to get an entire file as a
  record?
  
  Thanks,
  Ron
 
 



Re: Loading multiple files, each file as a record

2015-03-09 Thread Arvind S
while loading file you can attempt to use
PigStorage(',','-tagFile')
then regex on each line of the file .. then group by file name

https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/builtin/PigStorage.html

*Cheers !!*
Arvind

On Fri, Mar 6, 2015 at 2:26 AM, Daniel Dai da...@hortonworks.com wrote:

 Didn¹t realize any, but it should be pretty easy to write a customized
 Loader/InputFormat for that.

 Daniel

 On 3/5/15, 6:18 AM, Ronald Green green.ron...@gmail.com wrote:

 Hi,
 
 I'm looking for a loader function that will let me read each file as a
 record on its own so I'll be able to treat each as a single record/field.
 For example:
 
 a = load '/files' USING TheLoader() as (file:chararray);
 b = foreach a GENERATE REGEX_EXTRACT(file,'...');
 
 PigStorage and TextLoader return each line in the file as a record/tuple.
 
 Do you know any other loader that allows to get an entire file as a
 record?
 
 Thanks,
 Ron