HI,

 

 

Is there any way in Pig where a LoadFunc can retrieve the Schema definition
entered by the user in the AS clause?

e.g. A = LOAD '$INPUT' USING MyLoader() AS (a:int,  b:int);

 

My question comes from  the below problem I'm facing:

 

So I'm writing a Loader that adds partition fields to the Schema. E.g.
daydate, day, year month etc.

These partitions are used to filter out entire folders in the storage
location. 

I want to use the FILTER statement to filter by these keys. 

 

So if I create a Loader that returns its own Schema the following works and
the LoadMetaData: setPartitionFilter method gets called correctly by pig.

e.g.

A = LOAD '$INPUT' using MyLoader('a:int, b:int'); -- the loader will parse
this and also add the partition folder daydate

F = FILTER A BY daydate='2010-11-01';

STORE F INTO '$OUTPUT'

 

 

But if the Loader does not return a Schema and the Schema is defined by the
user in the AS clause Pig never calls LoadMetaData:setPartitionFilter at all
and the partition filtering never happens.

e.g.

A = LOAD '$INPUT' AS (a:int, b:int, daydate:chararray);

F = FILTER A BY daydate='2010-11-01';

STORE F INTO '$OUTPUT';

 

Any suggestions?

 

Thanks,

 Gerrit

Reply via email to