HI,
Is there any way in Pig where a LoadFunc can retrieve the Schema definition
entered by the user in the AS clause?
e.g. A = LOAD '$INPUT' USING MyLoader() AS (a:int, b:int);
My question comes from the below problem I'm facing:
So I'm writing a Loader that adds partition fields to the Schema. E.g.
daydate, day, year month etc.
These partitions are used to filter out entire folders in the storage
location.
I want to use the FILTER statement to filter by these keys.
So if I create a Loader that returns its own Schema the following works and
the LoadMetaData: setPartitionFilter method gets called correctly by pig.
e.g.
A = LOAD '$INPUT' using MyLoader('a:int, b:int'); -- the loader will parse
this and also add the partition folder daydate
F = FILTER A BY daydate='2010-11-01';
STORE F INTO '$OUTPUT'
But if the Loader does not return a Schema and the Schema is defined by the
user in the AS clause Pig never calls LoadMetaData:setPartitionFilter at all
and the partition filtering never happens.
e.g.
A = LOAD '$INPUT' AS (a:int, b:int, daydate:chararray);
F = FILTER A BY daydate='2010-11-01';
STORE F INTO '$OUTPUT';
Any suggestions?
Thanks,
Gerrit