To answer your direct question, no, there is currently no provision in
the interface for Pig to provide the user defined schema to the load
function.
But it seems like the real solution to your problem is that
LoadMetaData:setPartitionFilter ought to be called regardless of
whether the loader returns a schema. Is there a technical reason we
don't do that?
Alan.
On Nov 5, 2010, at 8:13 AM, Gerrit Jansen van Vuuren wrote:
HI,
Is there any way in Pig where a LoadFunc can retrieve the Schema
definition
entered by the user in the AS clause?
e.g. A = LOAD '$INPUT' USING MyLoader() AS (a:int, b:int);
My question comes from the below problem I'm facing:
So I'm writing a Loader that adds partition fields to the Schema. E.g.
daydate, day, year month etc.
These partitions are used to filter out entire folders in the storage
location.
I want to use the FILTER statement to filter by these keys.
So if I create a Loader that returns its own Schema the following
works and
the LoadMetaData: setPartitionFilter method gets called correctly by
pig.
e.g.
A = LOAD '$INPUT' using MyLoader('a:int, b:int'); -- the loader will
parse
this and also add the partition folder daydate
F = FILTER A BY daydate='2010-11-01';
STORE F INTO '$OUTPUT'
But if the Loader does not return a Schema and the Schema is defined
by the
user in the AS clause Pig never calls
LoadMetaData:setPartitionFilter at all
and the partition filtering never happens.
e.g.
A = LOAD '$INPUT' AS (a:int, b:int, daydate:chararray);
F = FILTER A BY daydate='2010-11-01';
STORE F INTO '$OUTPUT';
Any suggestions?
Thanks,
Gerrit