[ 
https://issues.apache.org/jira/browse/PIG-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-2340:
----------------------------

    Description: 
A table created by HCAT has the following partitions; 

hcat -e "show partitions paritionedtable"

{quote}
grid=AB/dt=2011_07_01
grid=AB/dt=2011_07_02
grid=AB/dt=2011_07_03
grid=XY/dt=2011_07_01
grid=XY/dt=2011_07_02
grid=XY/dt=2011_07_03
grid=XY/dt=2011_07_04
...
{quote}

The total number of partitions in the table is around 3200.

A Pig script of this nature tries to access this data using the partitions in 
it's filter. 

{code}
A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader();
B = FILTER A BY grid=='AB' AND dt=='2011_07_04';
C = LIMIT B 10;
store C into 'HCAT' using PigStorage();
{code}


This script, fails to run as the job.xml generated by Pig is so large (8MB), 
that the Hadoop Fred's limitation does not allow it to submit the job. 

After debugging it was found that in the HCatTableInfo class the function gets 
a null filter value. getInputTableInfo(filter=null ..)

I suspect that "setPartitionFilter" function in Pig does not pass the filter 
correctly to the HCatLoader. This is happening with both Pig 0.9 and 0.8

Viraj

  was:
A table created by HCAT has the following partitions; 

hcat -e "show partitions paritionedtable"

{quote}
grid=AB/dt=2011_07_01
grid=AB/dt=2011_07_02
grid=AB/dt=2011_07_03
grid=XY/dt=2011_07_01
grid=XY/dt=2011_07_02
grid=XY/dt=2011_07_03
grid=XY/dt=2011_07_04
...
{quote}

The total number of partitions in the table is around 3200.

A Pig script of this nature tries to access this data using the partitions in 
it's filter. 

{script}
A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader();
B = FILTER A BY grid=='AB' AND dt=='2011_07_04';
C = LIMIT B 10;
store C into 'HCAT' using PigStorage();
{script}


This script, fails to run as the job.xml generated by Pig is so large (8MB), 
that the Hadoop Fred's limitation does not allow it to submit the job. 

After debugging it was found that in the HCatTableInfo class the function gets 
a null filter value. getInputTableInfo(filter=null ..)

I suspect that "setPartitionFilter" function in Pig does not pass the filter 
correctly to the HCatLoader. This is happening with both Pig 0.9 and 0.8

Viraj

    
> HCatLoader loads all the partitions in a partitioned table even though a 
> filter clause on the partitions is specified in the Pig script
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2340
>                 URL: https://issues.apache.org/jira/browse/PIG-2340
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Viraj Bhat
>             Fix For: 0.9.1, 0.8.2
>
>
> A table created by HCAT has the following partitions; 
> hcat -e "show partitions paritionedtable"
> {quote}
> grid=AB/dt=2011_07_01
> grid=AB/dt=2011_07_02
> grid=AB/dt=2011_07_03
> grid=XY/dt=2011_07_01
> grid=XY/dt=2011_07_02
> grid=XY/dt=2011_07_03
> grid=XY/dt=2011_07_04
> ...
> {quote}
> The total number of partitions in the table is around 3200.
> A Pig script of this nature tries to access this data using the partitions in 
> it's filter. 
> {code}
> A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader();
> B = FILTER A BY grid=='AB' AND dt=='2011_07_04';
> C = LIMIT B 10;
> store C into 'HCAT' using PigStorage();
> {code}
> This script, fails to run as the job.xml generated by Pig is so large (8MB), 
> that the Hadoop Fred's limitation does not allow it to submit the job. 
> After debugging it was found that in the HCatTableInfo class the function 
> gets a null filter value. getInputTableInfo(filter=null ..)
> I suspect that "setPartitionFilter" function in Pig does not pass the filter 
> correctly to the HCatLoader. This is happening with both Pig 0.9 and 0.8
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to