Re: How to set an empty value to hive.querylog.location to disable the creation of hive history file

2012-12-06 Thread Jithendranath Joijoide
How about setting it to /dev/null . Not sure if that would help in your
case. Just an hack.

Regards.

On Thu, Dec 6, 2012 at 2:14 PM, Bing Li sarah.lib...@gmail.com wrote:

 Hi, all
 Refer to https://cwiki.apache.org/Hive/adminmanual-configuration.html, if
 I set hive.querylog.location to an empty string, it won't create
 structured log.

 I filed hive-site.xml in HIVE_HOME/conf and add the following setting,
 property
   namehive.querylog.location/name
   value/value
 /property

 BUT it didn't work, when launch HIVE_HOME/bin/hive, it created a history
 file in /tmp/user.name which is the default directory of this property.

 Do you know how to set an EMPTY value in hive-site.xml?


 Thanks,
 - Bing



Re: handling null argument in custom udf

2012-12-06 Thread Søren

Right. Thanks for all the help.
It turned out that it did help to check for null in the code. No mystery.
I did try that earlier but the attempt got lost somehow.

Thanks for the advise on using GenericUDF.

cheers
Søren

On 05/12/2012 11:10, Vivek Mishra wrote:

The way UDF works is, you need to tell your ObjectInspector about your 
primitive or JavaTypes. So in your case even if value is null, you should be 
able to assign it as a String or any other object. Then invocation to 
evaluate() function should know about type of java object.

-Vivek

From: Vivek Mishra
Sent: 05 December 2012 15:36
To: user@hive.apache.org
Subject: RE: handling null argument in custom udf

Could you please look into and share your task log/attemptlog for complete 
error trace or actual error behind this?

-Vivek

From: Søren [s...@syntonetic.com]
Sent: 04 December 2012 20:28
To: user@hive.apache.org
Subject: Re: handling null argument in custom udf

Thanks. Did you mean I should handle null in my udf or my serde?

I did try to check for null inside the code in my udf, but it fails even before 
it gets called.

This is from when the udf fails:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute 
method public org.apache.hadoop.io.Text 
com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
on objectcom.company.hive.myfun@1412332 of class com.company.hive.myfun with 
arguments {0:java.lang.Object, null} of size 2

It looks like there is a null, or is this error message misleading?


On 04/12/2012 15:43, Edward Capriolo wrote:
There is no null argument. You should handle the null case in your code.

If (arga == null)

Or optionally you could use a generic udf but a regular one should handle what 
you are doing.

On Tuesday, December 4, 2012, Søren 
s...@syntonetic.commailto:s...@syntonetic.com wrote:

Hi Hive community

I have a custom udf, say myfun, written in Java which I utilize like this

select myfun(col_a, col_b) from mytable where etc

col_b is a string type and sometimes it is null.

When that happens, my query crashes with
---
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
{col_a:val,col_b:null}
...
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute 
method public org.apache.hadoop.io.Text
---

public final class myfun extends UDF {
 public Text evaluate(final Text argA, final Text argB) {

I'm unsure how this should be fixed in a proper way. Is the framework looking 
for an overload of evaluate that would comply with the null argument?

I need to say that the table is declared using my own json serde reading from 
S3. I'm not processing nulls in my serde in any special way because Hive seems 
to handle null in the right way when not passed to my own UDF.

Are there anyone out there with ideas or experiences on this issue?

thanks in advance
Søren











NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.




RE: How to set an empty value to hive.querylog.location to disable the creation of hive history file

2012-12-06 Thread Hezhiqiang (Ransom)
It’s not supported now.
I think you a rise it in JIRA.

Regards
Ransom

From: Bing Li [mailto:sarah.lib...@gmail.com]
Sent: Thursday, December 06, 2012 5:06 PM
To: user@hive.apache.org
Subject: Re: How to set an empty value to hive.querylog.location to disable the 
creation of hive history file

it will exit with error like

FAILED: Failed to open Query Log: /dev/null/hive_job_log_xxx.txt

and pointed that the path is not a directory.



2012/12/6 Jithendranath Joijoide 
pixelma...@gmail.commailto:pixelma...@gmail.com
How about setting it to /dev/null . Not sure if that would help in your case. 
Just an hack.

Regards.

On Thu, Dec 6, 2012 at 2:14 PM, Bing Li 
sarah.lib...@gmail.commailto:sarah.lib...@gmail.com wrote:
Hi, all
Refer to https://cwiki.apache.org/Hive/adminmanual-configuration.html, if I set 
hive.querylog.location to an empty string, it won't create structured log.

I filed hive-site.xml in HIVE_HOME/conf and add the following setting,
property
  namehive.querylog.location/name
  value/value
/property

BUT it didn't work, when launch HIVE_HOME/bin/hive, it created a history file 
in /tmp/user.namehttp://user.name which is the default directory of this 
property.

Do you know how to set an EMPTY value in hive-site.xml?


Thanks,
- Bing




Mapping existing HBase table with many columns to Hive.

2012-12-06 Thread David Koch
Hello,

How can I map an HBase table with the following layout to Hive using the
CREATE EXTERNAL TABLE command from shell (or another programmatic way):

The HBase table's layout is as follows:
Rowkey=16 bytes, a UUID that had the - removed, and the 32hex chars
converted into two 8byte longs.
Columns (qualifiers): timestamps, i.e the bytes of a long which were
converted using Hadoop's Bytes.toBytes(long). There can be many of those in
a single row.
Values: The bytes of a Java string.

I am unsure of which datatypes to use. I am pretty sure there is no way I
can sensible map the row key to anything other than binary but maybe the
columns - which are longs and the values which are strings can be mapped to
their according Hive datatypes.

I include an extract of what a row looks like in HBase shell below:

Thank you,

/David

hbase(main):009:0 scan hits
ROW
  COLUMN+CELL

\x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE\xDC
 column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha
\x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE\xDC
 column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967, value=testtest


Re: How is it that every hive release in maven depends on

2012-12-06 Thread Chris Drome
These jars are pulled in by datanucleus which is a dependency of hive-metastore.

The datanucleus project manages its own repositories for these jars:

http://www.datanucleus.org/downloads/maven2

chris

From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Thursday, December 6, 2012 8:56 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: How is it that every hive release in maven depends on

http://mvnrepository.com/artifact/org.apache.hive/hive-metastore/0.9.0

javax.jdohttp://mvnrepository.com/artifact/javax.jdo  
jdo2-apihttp://mvnrepository.com/artifact/javax.jdo/jdo2-api  2.3-ec

2.3-ec is not in maven central.

All our poms seem to reference this. What is the deal here?




Re: Mapping existing HBase table with many columns to Hive.

2012-12-06 Thread kulkarni.swar...@gmail.com
Hi David,

First of all, you columns are not long. They are binary as well.
Currently as hive stands, there is no support for binary qualifiers.
However, I recently submitted a patch for that[1]. Feel free to give it a
shot and let me know if you see any issues. With that patch, you can
directly give your qualifiers to hive as they look here (
\x00\x00\x01;2\xE6Q\x06).

Until then, the only option you have is to use a map to map all your
columns under the column family t. An example to do that would be:


CREATE EXTERNAL TABLE hbase_table_1(key int, value mapstring,string)


STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,t:)
TBLPROPERTIES(hbase.table.name = some_existing_table);


Also as far as your key goes, it is a composite key. There is also an
existing patch for the support of that here[2].


Hope that helps.


[1] https://issues.apache.org/jira/browse/HIVE-3553
[2] https://issues.apache.org/jira/browse/HIVE-2599


On Thu, Dec 6, 2012 at 12:56 PM, David Koch ogd...@googlemail.com wrote:

 Hello,

 How can I map an HBase table with the following layout to Hive using the
 CREATE EXTERNAL TABLE command from shell (or another programmatic way):

 The HBase table's layout is as follows:
 Rowkey=16 bytes, a UUID that had the - removed, and the 32hex chars
 converted into two 8byte longs.
 Columns (qualifiers): timestamps, i.e the bytes of a long which were
 converted using Hadoop's Bytes.toBytes(long). There can be many of those in
 a single row.
 Values: The bytes of a Java string.

 I am unsure of which datatypes to use. I am pretty sure there is no way I
 can sensible map the row key to anything other than binary but maybe the
 columns - which are longs and the values which are strings can be mapped to
 their according Hive datatypes.

 I include an extract of what a row looks like in HBase shell below:

 Thank you,

 /David

 hbase(main):009:0 scan hits
 ROW
 COLUMN+CELL

 \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE\xDC
  column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha
 \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE\xDC
  column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967,
 value=testtest




-- 
Swarnim


Re: Mapping existing HBase table with many columns to Hive.

2012-12-06 Thread David Koch
Hello Swarnim,

Thank you for your answer. I will try the options you pointed out.

/David


On Thu, Dec 6, 2012 at 9:10 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 map


Locking in HIVE : How to use locking/unlocking features using hive java API ?

2012-12-06 Thread Manish Malhotra
Hi,

I'm building / designing a back-up and restore tool for hive data for
Disaster Recovery scenarios.

I'm trying to understand the locking behavior of HIVE that is currently
supporting ZooKeeper for locking.

My thought process if like this ( early design.)

1. Backing up the meta-data of hive.
2. Backing up the data for hive tables on s3 or hdfs or NFS
3. Restoring table(s):
a. Only Data
b. Schema and data

So, to achieve 1st task, this is the flow I'm thinking.

a. Check whether there is any exclusive lock on the Table, whose meta-data
needs to be backed up.
 if YES then don't do any thing, wait and retry for configured
no/frequency
 if NO: Then get the meta-data of the table and create the DDL
statement for HIVE including table / partition etc.

For 2nd task:

a. Check whether the table has any exclusive lock,
if NOT take shared lock and start copy, once done release the
shared lock.
if YES then then wait and retry.

For 3rd: Restoring:

a. Only Data: Check if there is any lock on the table.
 if NO, then take the exclusive lock, insert the data
into table, release the lock.
 if YES then wait and retry.

b. Schema and Data:

Check if there is any lock on table/partition.
  if NO then Drop and create table/partitions.
  if YES then wait and retry.
 Once schema is created:
  take the exclusive lock, insert data, release lock.


Now I'm going to run this kind of job from my scheduler / WF engine.
I need input on following questions:

a. Is this overall approach looks good?
b. How can I take and release different locks explicitly using HIVE API.
ref: https://cwiki.apache.org/confluence/display/Hive/Locking

If I understood correctly, As per this still HIVE doesn't support locking
explicitly at API level.
Is there any plan or patch to get this done.

I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further to
see, if can use these classes for locking features.

Thanks for your time and effort.

Regards,
Manish