Hi,

 

I've create a Jira for this issue:
https://issues.apache.org/jira/browse/PIG-1717

 

The getPartitionKeys in my case will always return the keys that are defined
as partitions in the path so that if the user loads from :
/log/type1/daydate=2010-11-01 the partition key returned always is
"daydate".

 

Currently the following code does not cause the loader to be notified on the
partition filter:

A = load 'input' using MyLoader() as (q, p, daydate);

F = FILTER A BY daydate='2010-11-01';

 

If in some way pig could call the getPartitionKeys and then be aware that
the daydate is a partition, all would work well.

 

 

Cheers,

 

 

From: Thejas M Nair [mailto:[email protected]] 
Sent: Thursday, November 11, 2010 3:18 PM
To: [email protected]; Gerrit van Vuuren
Subject: Re: pig LoadMetaData find schema in AS clause from Loader.

 

Yes, setPartitionFilter can be called only if pig knows the partition
columns. Without knowing the partition columns the partition filter cannot
be extracted.
If a user specifies a schema in the load statement, pig finds the partition
columns by finding the position of columns returned by getPartitionKeys in
the user defined schema, based on mapping of schema from getSchema() to user
specified schema. Ie, pig assumes that the columns returned in
getPartitionKeys() are columns in the schema returned in getSchema().

In your case, does getPartitionKeys return columns that are specified in the
user defined schema ?

Yes, please open a jira, and lets discuss it there. I think at least javadoc
might need to be updated

-Thejas

On 11/11/10 1:30 AM, "Gerrit Jansen van Vuuren"
<[email protected]> wrote:

Hi,

I guess it should only call the setPartitionFilter when the
LoadMetadata:getPartitionKeys returns a none null value. Currently
getPartitionKeys is only called if the Loader returns a schema.


Should I create a Jira and try at proposing a fix to this?

Cheers,
 Gerrit


-----Original Message-----
From: Alan Gates [mailto:[email protected]]
Sent: Wednesday, November 10, 2010 9:56 PM
To: [email protected]
Subject: Re: pig LoadMetaData find schema in AS clause from Loader.

To answer your direct question, no, there is currently no provision in 
the interface for Pig to provide the user defined schema to the load 
function.

But it seems like the real solution to your problem is that 
LoadMetaData:setPartitionFilter ought to be called regardless of 
whether the loader returns a schema.  Is there a technical reason we 
don't do that?

Alan.

On Nov 5, 2010, at 8:13 AM, Gerrit Jansen van Vuuren wrote:

> HI,
>
>
>
>
>
> Is there any way in Pig where a LoadFunc can retrieve the Schema 
> definition
> entered by the user in the AS clause?
>
> e.g. A = LOAD '$INPUT' USING MyLoader() AS (a:int,  b:int);
>
>
>
> My question comes from  the below problem I'm facing:
>
>
>
> So I'm writing a Loader that adds partition fields to the Schema. E.g.
> daydate, day, year month etc.
>
> These partitions are used to filter out entire folders in the storage
> location.
>
> I want to use the FILTER statement to filter by these keys.
>
>
>
> So if I create a Loader that returns its own Schema the following 
> works and
> the LoadMetaData: setPartitionFilter method gets called correctly by 
> pig.
>
> e.g.
>
> A = LOAD '$INPUT' using MyLoader('a:int, b:int'); -- the loader will 
> parse
> this and also add the partition folder daydate
>
> F = FILTER A BY daydate='2010-11-01';
>
> STORE F INTO '$OUTPUT'
>
>
>
>
>
> But if the Loader does not return a Schema and the Schema is defined 
> by the
> user in the AS clause Pig never calls 
> LoadMetaData:setPartitionFilter at all
> and the partition filtering never happens.
>
> e.g.
>
> A = LOAD '$INPUT' AS (a:int, b:int, daydate:chararray);
>
> F = FILTER A BY daydate='2010-11-01';
>
> STORE F INTO '$OUTPUT';
>
>
>
> Any suggestions?
>
>
>
> Thanks,
>
> Gerrit
>




 

Reply via email to