[jira] [Commented] (PIG-2339) HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script

Min Zhou (Commented) (JIRA) Thu, 24 Nov 2011 22:23:12 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156999#comment-13156999
 ]


Min Zhou commented on PIG-2339:
-------------------------------

do you try a non-equal expression like below, where pt is the partition column.
{noformat}
A = LOAD 'partitioned_nation' USING org.apache.hcatalog.pig.HCatLoader();
B = FILTER A BY pt <= '2';
DUMP V;
{noformat}

an exception TApplicationException would thrown.  check the metastore service 
side, we found an internal exception
{noformat}
11/11/25 13:11:55 ERROR api.ThriftHiveMetastore$Processor: Internal error 
processing get_partitions_by_filter
java.lang.NullPointerException
        at 
org.datanucleus.store.mapped.mapping.MappingHelper.getMappingIndices(MappingHelper.java:35)
        at 
org.datanucleus.store.mapped.expression.StatementText.applyParametersToStatement(StatementText.java:194)
        at 
org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getPreparedStatementForQuery(RDBMSQueryUtils.java:233)
        at 
org.datanucleus.store.rdbms.query.legacy.SQLEvaluator.evaluate(SQLEvaluator.java:115)
        at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.performExecute(JDOQLQuery.java:288)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1657)
        at 
org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
        at org.datanucleus.store.query.Query.executeWithMap(Query.java:1526)
        at org.datanucleus.jdo.JDOQuery.executeWithMap(JDOQuery.java:334)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1329)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1241)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$40.run(HiveMetaStore.java:2369)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$40.run(HiveMetaStore.java:2366)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2366)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.process(ThriftHiveMetastore.j
ava:6099)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor.process(ThriftHiveMetastore.java:4789)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$TLoggingProcessor.process(HiveMetaStore.java:3167)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

It very likely be a bug of datanucleus 2.0.3. 


                
> HCatLoader loads all the partitions in a partitioned table even though a 
> filter clause on the partitions is specified in the Pig script
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2339
>                 URL: https://issues.apache.org/jira/browse/PIG-2339
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Viraj Bhat
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.9.2, 0.11
>
>         Attachments: PIG-2339-1.patch, PIG-2339-2.patch
>
>
> A table created by HCAT has the following partitions; 
> hcat -e "show partitions paritionedtable"
> {quote}
> grid=AB/dt=2011_07_01
> grid=AB/dt=2011_07_02
> grid=AB/dt=2011_07_03
> grid=XY/dt=2011_07_01
> grid=XY/dt=2011_07_02
> grid=XY/dt=2011_07_03
> grid=XY/dt=2011_07_04
> ...
> {quote}
> The total number of partitions in the table is around 3200.
> A Pig script of this nature tries to access this data using the partitions in 
> it's filter. 
> {script}
> A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader();
> B = FILTER A BY grid=='AB' AND dt=='2011_07_04';
> C = LIMIT B 10;
> store C into 'HCAT' using PigStorage();
> {script}
> This script, fails to run as the job.xml generated by Pig is so large (8MB), 
> that the Hadoop Fred's limitation does not allow it to submit the job. 
> After debugging it was found that in the HCatTableInfo class the function 
> gets a null filter value. getInputTableInfo(filter=null ..)
> I suspect that "setPartitionFilter" function in Pig does not pass the filter 
> correctly to the HCatLoader. This is happening with both Pig 0.9 and 0.8
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2339) HCatLoader loads all the partitions in a partitioned table even though a filter clause on the partitions is specified in the Pig script

Reply via email to