Re: Custom SerDe -- tracking down stack trace

2012-02-22 Thread Evan Pollan
/tmp it is!  My bad — it was the one obvious place I omitted from my find/grep 
statement.  Thanks!

From: Matthew Byrd mb...@acunu.commailto:mb...@acunu.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Wed, 22 Feb 2012 11:54:09 +
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Custom SerDe -- tracking down stack trace

Hi Evan,
Did you look in your hive.log file?
Mine is found in /tmp/$USER/ ... usually where stack traces from hive cli show 
up If I'm not mistaken.
Have you tried hooking up a debugger to hive yet also? I'm guessing this is how 
you knew the null pointer was being thrown on deserialize?
what is actually null?
Matt


On Tue, Feb 21, 2012 at 11:01 PM, Evan Pollan 
evan.pol...@bazaarvoice.commailto:evan.pol...@bazaarvoice.com wrote:
One more data point:  I can read data from this partition as long as I don't 
reference the partition explicitly…

E.g., I my partition column is ArrivalDate, and I have several different 
partitions:  2012-02-01…, and a partition with my test data with 
ArrivalDate=test.

This works:  'select * from table where some constraint such that I only get 
results from the test partition'.

And this works:  'select * from table where ArrivalDate=2012-02-01'

But, this fails:  'select * from table where ArrivalDate=test'

Does this make sense to anybody?



From: Evan Pollan 
evan.pol...@bazaarvoice.commailto:evan.pol...@bazaarvoice.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Tue, 21 Feb 2012 20:56:07 +
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Custom SerDe -- tracking down stack trace

I have a custom SerDe that's initializing properly and works on one data set.  
I built it to adapt to a couple of different data formats, though, and it's 
choking on a different data set (different partitions in the same table).

A null pointer exception is being thrown on deserialize, that's being wrapped 
by an IOException somewhere up the stack.  The exception is showing up in the 
hive output (Failed with exception 
java.io.IOException:java.lang.NullPointerException), but I can't find the 
stack trace in any logs.

It's worth noting that I'm running hive via the cli on a machine external to 
the cluster, and the query doesn't get far enough to create any M/R tasks.  I 
looked in all log files in /var/log on the hive client machine, and in all 
userlogs on each cluster instance.  I also looked in derby.log (I'm using the 
embedded metastore) and in /var/lib/hive/metastore on the hive client machine.

I'm sure I'm missing something obvious…  Any ideas?



Re: Custom SerDe -- tracking down stack trace

2012-02-22 Thread Evan Pollan
So, I tracked down the problem.  But, I'm curious as to why I got such 
different behavior when selecting directly from the partition vs. selecting 
from all partitions.

Context:  my custom deserializer was returning null when it encountered an 
unintelligible line (I saw this pattern in the contrib RegexSerDe and reused 
it).  This was apparently causing the LazySimpleSerDe.serialize() operation to 
NPE as the CLI driver was fetching the results when selecting directly from the 
partition with the bad line:

2012-02-21 22:55:14,166 ERROR CliDriver (SessionState.java:printError(365)) - 
Failed with exception java.io.IOException:java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:516)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.NullPointerException
at java.util.ArrayList.addAll(ArrayList.java:497)
at 
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldsDataAsList(UnionStructObjectInspector.java:144)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:357)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:142)
... 9 more

However, when I queried across the entire data set (eliminating the partition 
predicate), there query returns without any errors.  Does the CLI behave 
differently based on the query plan?



From: Evan Pollan 
evan.pol...@bazaarvoice.commailto:evan.pol...@bazaarvoice.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Wed, 22 Feb 2012 12:28:26 +
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Custom SerDe -- tracking down stack trace

/tmp it is!  My bad — it was the one obvious place I omitted from my find/grep 
statement.  Thanks!

From: Matthew Byrd mb...@acunu.commailto:mb...@acunu.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Wed, 22 Feb 2012 11:54:09 +
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Custom SerDe -- tracking down stack trace

Hi Evan,
Did you look in your hive.log file?
Mine is found in /tmp/$USER/ ... usually where stack traces from hive cli show 
up If I'm not mistaken.
Have you tried hooking up a debugger to hive yet also? I'm guessing this is how 
you knew the null pointer was being thrown on deserialize?
what is actually null?
Matt


On Tue, Feb 21, 2012 at 11:01 PM, Evan Pollan 
evan.pol...@bazaarvoice.commailto:evan.pol...@bazaarvoice.com wrote:
One more data point:  I can read data from this partition as long as I don't 
reference the partition explicitly…

E.g., I my partition column is ArrivalDate, and I have several different 
partitions:  2012-02-01…, and a partition with my test data with 
ArrivalDate=test.

This works:  'select * from table where some constraint such that I only get 
results from the test partition'.

And this works:  'select * from table where ArrivalDate=2012-02-01'

But, this fails:  'select * from table where ArrivalDate=test'

Does this make sense to anybody?



From: Evan Pollan 
evan.pol...@bazaarvoice.commailto:evan.pol...@bazaarvoice.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Tue, 21 Feb 2012 20:56:07 +
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Custom SerDe -- tracking down stack trace

I have a custom SerDe that's initializing properly and works on one data set.  
I built it to adapt to a couple of different data formats, though, and it's 
choking on a different data set (different partitions in the same table).

A null pointer exception is being thrown on deserialize, that's being wrapped 
by an IOException somewhere up the stack.  The exception is showing up in the 
hive output (Failed with exception 
java.io.IOException:java.lang.NullPointerException), but I can't find the 
stack trace in any logs.

It's worth noting that I'm running hive via the cli on a machine external to 
the cluster, and the query doesn't get far enough to create any M/R tasks.  I 
looked in all log files in /var/log on the hive client machine, and in all 
userlogs on each cluster instance.  I also looked in derby.log (I'm using the 
embedded metastore

Custom SerDe -- tracking down stack trace

2012-02-21 Thread Evan Pollan
I have a custom SerDe that's initializing properly and works on one data set.  
I built it to adapt to a couple of different data formats, though, and it's 
choking on a different data set (different partitions in the same table).

A null pointer exception is being thrown on deserialize, that's being wrapped 
by an IOException somewhere up the stack.  The exception is showing up in the 
hive output (Failed with exception 
java.io.IOException:java.lang.NullPointerException), but I can't find the 
stack trace in any logs.

It's worth noting that I'm running hive via the cli on a machine external to 
the cluster, and the query doesn't get far enough to create any M/R tasks.  I 
looked in all log files in /var/log on the hive client machine, and in all 
userlogs on each cluster instance.  I also looked in derby.log (I'm using the 
embedded metastore) and in /var/lib/hive/metastore on the hive client machine.

I'm sure I'm missing something obvious…  Any ideas?


Re: Custom SerDe -- tracking down stack trace

2012-02-21 Thread Evan Pollan
One more data point:  I can read data from this partition as long as I don't 
reference the partition explicitly…

E.g., I my partition column is ArrivalDate, and I have several different 
partitions:  2012-02-01…, and a partition with my test data with 
ArrivalDate=test.

This works:  'select * from table where some constraint such that I only get 
results from the test partition'.

And this works:  'select * from table where ArrivalDate=2012-02-01'

But, this fails:  'select * from table where ArrivalDate=test'

Does this make sense to anybody?



From: Evan Pollan 
evan.pol...@bazaarvoice.commailto:evan.pol...@bazaarvoice.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Tue, 21 Feb 2012 20:56:07 +
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Custom SerDe -- tracking down stack trace

I have a custom SerDe that's initializing properly and works on one data set.  
I built it to adapt to a couple of different data formats, though, and it's 
choking on a different data set (different partitions in the same table).

A null pointer exception is being thrown on deserialize, that's being wrapped 
by an IOException somewhere up the stack.  The exception is showing up in the 
hive output (Failed with exception 
java.io.IOException:java.lang.NullPointerException), but I can't find the 
stack trace in any logs.

It's worth noting that I'm running hive via the cli on a machine external to 
the cluster, and the query doesn't get far enough to create any M/R tasks.  I 
looked in all log files in /var/log on the hive client machine, and in all 
userlogs on each cluster instance.  I also looked in derby.log (I'm using the 
embedded metastore) and in /var/lib/hive/metastore on the hive client machine.

I'm sure I'm missing something obvious…  Any ideas?


Re: query parameters in hive

2012-02-13 Thread Evan Pollan
Sure -- use -hiveconf X=Y, which allows your script to reference ${hiveconf:X}

On Feb 13, 2012, at 7:19 AM, Wojciech Langiewicz wlangiew...@gmail.com 
wrote:

 Hello,
 Is it possible (and how) to pass parameters to hive scripts from command 
 line?
 I would imagine something like:
 hive -f xyz.sql -p date='2012-02-13'
 which will substitute any occurrence of $date string in xyz.sql file.
 
 I've searched wiki and list archive, but I didn't find any clues about 
 such feature.
 
 If you know how to pass parameters to hive queries, please let me know.
 
 --
 Wojciech Langiewicz


Re: Delimiters for nested structures

2012-02-09 Thread Evan Pollan
+1. I've had good luck with json and get_json_object.

On Feb 9, 2012, at 7:39 AM, Tucker, Matt 
matt.tuc...@disney.commailto:matt.tuc...@disney.com wrote:

What about creating a view that converts your data into JSON or XML?  You can 
then make use of the 
get_json_objecthttps://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-getjsonobject()
 or xpathhttps://cwiki.apache.org/Hive/languagemanual-xpathudf.html() 
functions.

Matt Tucker

From: Hao Cheng [mailto:haoc.ch...@yahoo.com]
Sent: Thursday, February 09, 2012 1:15 AM
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Delimiters for nested structures

Hi,

My data have some map of map structures with customized delimiters.
As per Hive documents, by default, '\001' is the field separator; starting from 
'\002', every 2 consecutive characters are the delimiters of 1 level. My data 
do not follow this rule in term of delimiters. I mostly just need to handle map 
of map. I do not find a way in create table statement to redefine delimiters 
for more than 1 level nested structures. I try not to transform the data as it 
was produced by some other upstream process.
Any ideas on how to do that in Hive? Thank you for your helps!

Regards,
Hao