I am trying to write a GenericUDF function to collect all of a specific struct
field(s) within an array for each record, and return them in an array as well.
I wrote the UDF (as below), and it seems to work but:
1) It does not work when I am performing this on an external table, it works
fine on
Sorry, the test should be following (changed extract_shas to
extract_product_category):
import org.apache.hadoop.hive.ql.metadata.HiveException;import
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import
Hi,
I am trying to do a hive run, every time towards the end of the run, over 90%
of mapping (and only mapping),it will give this error.
I did the same run before and it works fine.
Does anyone know what this error means?
Peter
java.io.IOException:
Hi, I am trying to write a hive query to find the equivalent of NOT IN / NOT
EXIST in SQL.
However, Hive does not support this. It does have Left Semi Join which serves
as IN but NOT does not support.
I am wondering if there is any way to do this without resorting to using left
outer join and
: michaelma...@yahoo.com
Subject: Re: Hive QL - NOT IN, NOT EXIST
To: user@hive.apache.org
--- On Sun, 5/5/13, Peter Chu pete@outlook.com wrote:
I am wondering if there is any way to do this without resorting to
using left outer join and finding nulls.
I have found
.
You can see whether hive can manage this or if you write a custom m/r job to do
it.
2013/5/5 Peter Chu pete@outlook.com
It works but it takes a very long time because the subqueries in NOT IN
contains 400 million rows (the message table in the example) and the feed table
contains 3
approach. You would stream the larger table through the
smaller one:
can you see whether the following helps your perf issue?
select /*+ streamtable(message) */ f.uuid from message m right outer join feed
f on m.uuid = f.uuid where m.uuid = null;
2013/5/5 Peter Chu pete@outlook.com
Hi,
The results of my hive jobs is inserted in a MySql db.
One of the tables in MySql has a field of text type, which corresponds to the
String type in Hive's external table. We now have some rows (though very small
amount) that exceeds the limit of text type (65k chars).
We are thinking
Using Hive 0.8.1 on Amazon EMR Hadoop Job.
Some problems with using mapjoin:
1) Exceed memory, I got the following errors. Then I remove mapjoin in the
query and instead set hive.auto.convert.join=true, thinking that let hive
decides when mapjoin is suitable. It does run much farther in the