To answer your direct question, no, there is currently no provision in the interface for Pig to provide the user defined schema to the load function.

But it seems like the real solution to your problem is that LoadMetaData:setPartitionFilter ought to be called regardless of whether the loader returns a schema. Is there a technical reason we don't do that?

Alan.

On Nov 5, 2010, at 8:13 AM, Gerrit Jansen van Vuuren wrote:

HI,





Is there any way in Pig where a LoadFunc can retrieve the Schema definition
entered by the user in the AS clause?

e.g. A = LOAD '$INPUT' USING MyLoader() AS (a:int,  b:int);



My question comes from  the below problem I'm facing:



So I'm writing a Loader that adds partition fields to the Schema. E.g.
daydate, day, year month etc.

These partitions are used to filter out entire folders in the storage
location.

I want to use the FILTER statement to filter by these keys.



So if I create a Loader that returns its own Schema the following works and the LoadMetaData: setPartitionFilter method gets called correctly by pig.

e.g.

A = LOAD '$INPUT' using MyLoader('a:int, b:int'); -- the loader will parse
this and also add the partition folder daydate

F = FILTER A BY daydate='2010-11-01';

STORE F INTO '$OUTPUT'





But if the Loader does not return a Schema and the Schema is defined by the user in the AS clause Pig never calls LoadMetaData:setPartitionFilter at all
and the partition filtering never happens.

e.g.

A = LOAD '$INPUT' AS (a:int, b:int, daydate:chararray);

F = FILTER A BY daydate='2010-11-01';

STORE F INTO '$OUTPUT';



Any suggestions?



Thanks,

Gerrit


Reply via email to