A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-28 Thread Peter Chu
I am trying to write a GenericUDF function to collect all of a specific struct field(s) within an array for each record, and return them in an array as well. I wrote the UDF (as below), and it seems to work but: 1) It does not work when I am performing this on an external table, it works fine on

RE: A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-29 Thread Peter Chu
Sorry, the test should be following (changed extract_shas to extract_product_category): import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import

AmazonClientException: Unable to execute HTTP request: peer not authenticated

2013-04-29 Thread Peter Chu
Hi, I am trying to do a hive run, every time towards the end of the run, over 90% of mapping (and only mapping),it will give this error. I did the same run before and it works fine. Does anyone know what this error means? Peter java.io.IOException:

Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Peter Chu
Hi, I am trying to write a hive query to find the equivalent of NOT IN / NOT EXIST in SQL. However, Hive does not support this. It does have Left Semi Join which serves as IN but NOT does not support. I am wondering if there is any way to do this without resorting to using left outer join and

RE: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Peter Chu
: michaelma...@yahoo.com Subject: Re: Hive QL - NOT IN, NOT EXIST To: user@hive.apache.org --- On Sun, 5/5/13, Peter Chu pete@outlook.com wrote: I am wondering if there is any way to do this without resorting to using left outer join and finding nulls. I have found

RE: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Peter Chu
. You can see whether hive can manage this or if you write a custom m/r job to do it. 2013/5/5 Peter Chu pete@outlook.com It works but it takes a very long time because the subqueries in NOT IN contains 400 million rows (the message table in the example) and the feed table contains 3

RE: Hive QL - NOT IN, NOT EXIST

2013-05-06 Thread Peter Chu
approach. You would stream the larger table through the smaller one: can you see whether the following helps your perf issue? select /*+ streamtable(message) */ f.uuid from message m right outer join feed f on m.uuid = f.uuid where m.uuid = null; 2013/5/5 Peter Chu pete@outlook.com

Max Size of Hive String Type

2013-05-14 Thread Peter Chu
Hi, The results of my hive jobs is inserted in a MySql db. One of the tables in MySql has a field of text type, which corresponds to the String type in Hive's external table. We now have some rows (though very small amount) that exceeds the limit of text type (65k chars). We are thinking

Map Join Problems

2013-05-27 Thread Peter Chu
Using Hive 0.8.1 on Amazon EMR Hadoop Job. Some problems with using mapjoin: 1) Exceed memory, I got the following errors. Then I remove mapjoin in the query and instead set hive.auto.convert.join=true, thinking that let hive decides when mapjoin is suitable. It does run much farther in the