Hi, I guess it should only call the setPartitionFilter when the LoadMetadata:getPartitionKeys returns a none null value. Currently getPartitionKeys is only called if the Loader returns a schema.
Should I create a Jira and try at proposing a fix to this? Cheers, Gerrit -----Original Message----- From: Alan Gates [mailto:[email protected]] Sent: Wednesday, November 10, 2010 9:56 PM To: [email protected] Subject: Re: pig LoadMetaData find schema in AS clause from Loader. To answer your direct question, no, there is currently no provision in the interface for Pig to provide the user defined schema to the load function. But it seems like the real solution to your problem is that LoadMetaData:setPartitionFilter ought to be called regardless of whether the loader returns a schema. Is there a technical reason we don't do that? Alan. On Nov 5, 2010, at 8:13 AM, Gerrit Jansen van Vuuren wrote: > HI, > > > > > > Is there any way in Pig where a LoadFunc can retrieve the Schema > definition > entered by the user in the AS clause? > > e.g. A = LOAD '$INPUT' USING MyLoader() AS (a:int, b:int); > > > > My question comes from the below problem I'm facing: > > > > So I'm writing a Loader that adds partition fields to the Schema. E.g. > daydate, day, year month etc. > > These partitions are used to filter out entire folders in the storage > location. > > I want to use the FILTER statement to filter by these keys. > > > > So if I create a Loader that returns its own Schema the following > works and > the LoadMetaData: setPartitionFilter method gets called correctly by > pig. > > e.g. > > A = LOAD '$INPUT' using MyLoader('a:int, b:int'); -- the loader will > parse > this and also add the partition folder daydate > > F = FILTER A BY daydate='2010-11-01'; > > STORE F INTO '$OUTPUT' > > > > > > But if the Loader does not return a Schema and the Schema is defined > by the > user in the AS clause Pig never calls > LoadMetaData:setPartitionFilter at all > and the partition filtering never happens. > > e.g. > > A = LOAD '$INPUT' AS (a:int, b:int, daydate:chararray); > > F = FILTER A BY daydate='2010-11-01'; > > STORE F INTO '$OUTPUT'; > > > > Any suggestions? > > > > Thanks, > > Gerrit >
