[
https://issues.apache.org/jira/browse/PIG-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157020#comment-13157020
]
Min Zhou commented on PIG-2339:
-------------------------------
I've found the reason. It's indeed a bug of datanucleus 2.0.3, neither pig's
nor hcatalog's.
try this code:
{code:borderStyle=solid}
TTransport transport = new TSocket("your_thrift_server_host",
your_thrift_server_port);
TProtocol protocol = new TBinaryProtocol(transport);
ThriftHiveMetastore.Iface client =
new ThriftHiveMetastore.Client(protocol);
boolean open = false;
for (int i = 0; i < 5 && !open; ++i) {
try {
transport.open();
open = true;
} catch (TTransportException e) {
System.out.println("failed to connect to MetaStore, re-trying...");
try {
Thread.sleep(1000);
} catch (InterruptedException ignore) {}
}
}
try {
List<Partition> parts =
client.get_partitions_by_filter("default", "partitioned_nation",
"pt < '2'", (short) -1);
for (Partition part : parts) {
System.out.println(part.getSd().getLocation());
}
} catch (Exception te) {
te.printStackTrace();
}
{code}
The same exception would be thrown.
A null JavaTypeMapping was passed into
org.datanucleus.store.mapped.mapping.MappingHelper.(int initialPosition,
JavaTypeMapping mapping), that caused NPE.
After digged into the datanucleus source, I found that the null value was born
in the constructor of
org.datanucleus.store.mapped.expression.SubstringExpression. see
{code}
/**
* Constructs the substring
* @param str the String Expression
* @param begin The start position
* @param end The end position expression
**/
public SubstringExpression(StringExpression str, NumericExpression begin,
NumericExpression end)
{
super(str.getQueryExpression());
st.append("SUBSTRING(").append(str).append(" FROM ")
.append(begin.add(new IntegerLiteral(qs, mapping, BigInteger.ONE)))
.append(" FOR ").append(end.sub(begin)).append(')');
}
{code}
The field mapping hasn't been instanced at that moment.
How do you deal with such a external bug?
> HCatLoader loads all the partitions in a partitioned table even though a
> filter clause on the partitions is specified in the Pig script
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-2339
> URL: https://issues.apache.org/jira/browse/PIG-2339
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Viraj Bhat
> Assignee: Daniel Dai
> Fix For: 0.10, 0.9.2, 0.11
>
> Attachments: PIG-2339-1.patch, PIG-2339-2.patch
>
>
> A table created by HCAT has the following partitions;
> hcat -e "show partitions paritionedtable"
> {quote}
> grid=AB/dt=2011_07_01
> grid=AB/dt=2011_07_02
> grid=AB/dt=2011_07_03
> grid=XY/dt=2011_07_01
> grid=XY/dt=2011_07_02
> grid=XY/dt=2011_07_03
> grid=XY/dt=2011_07_04
> ...
> {quote}
> The total number of partitions in the table is around 3200.
> A Pig script of this nature tries to access this data using the partitions in
> it's filter.
> {script}
> A = LOAD 'paritionedtable' USING org.apache.hcatalog.pig.HCatLoader();
> B = FILTER A BY grid=='AB' AND dt=='2011_07_04';
> C = LIMIT B 10;
> store C into 'HCAT' using PigStorage();
> {script}
> This script, fails to run as the job.xml generated by Pig is so large (8MB),
> that the Hadoop Fred's limitation does not allow it to submit the job.
> After debugging it was found that in the HCatTableInfo class the function
> gets a null filter value. getInputTableInfo(filter=null ..)
> I suspect that "setPartitionFilter" function in Pig does not pass the filter
> correctly to the HCatLoader. This is happening with both Pig 0.9 and 0.8
> Viraj
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira