RE: difference between add jar in hive session and hive --auxpath

2013-03-07 Thread Venkatesh Kavuluri
If properly done, "add jar " should work the same as passing the jar 
with --auxpath. Can you run "list jars;" command from CLI or Hue and check if 
you see the jar file.

From: java8...@hotmail.com
To: user@hive.apache.org
Subject: difference between add jar in hive session and hive --auxpath
Date: Thu, 7 Mar 2013 17:47:26 -0500





Hi, 
I have a hive table which uses the jar file provided from the elephant-bird, 
which is a framework integrated between lzo and google protobuf data and 
hadoop/hive.
If I use the hive command like this:
hive --auxpath path_to_jars, it works fine to query my table, 
but if I use the add jar after I started the hive session, I will get 
ClassNotFoundException in the runtime of my query of the classes in those jars.
My questions are:
1) What is the different between hive --auxpath and "add jar" in the hive 
session?2) This problem makes it is hard to access my table in the HUE, as it 
only supports "add jar", but not --auxpath option. Any suggestions?

Thanks
Yong
  

RE: Partition not displaying in the browser

2013-03-07 Thread Venkatesh Kavuluri
The partitions info you see on 'show partitions' is fetched from Hive metadata 
tables. The reason you are not seeing the path you are expecting might be 
either 1) the path got deleted after the data load (do a simple select and 
verify you see some data) or2) you have loaded the data from some other path to 
this partition  
-Venkatesh

Date: Fri, 8 Mar 2013 00:54:17 +0800
From: saigr...@yahoo.in
Subject: Re: Partition not displaying in the browser
To: user@hive.apache.org

I get this ouput for:

hive> show partitions dividends;
OK
partition
exchange=NASDAQ/symbol=AAPL
exchange=NASDAQ/symbol=INTC
Time taken: 0.133 seconds

But when i navigate to my browser folder the partition INTC is not displayed 
even after refreshing it a bunch of times, any suggestions will be appreciated:

*Contents of directory 
/home/satish/data/dividends/input/plain-text/NASDAQGoto : Go to parent directory
Name
Type
Size
Replication
Block Size
Modification Time
Permission
Owner
Group
AAPL
dir



2013-03-07 08:46
rwxr-xr-x
satish
supergroup

Go back to DFS home


Local logs
Any suggestions will be appreciated.
Thanks
Sai
  

RE: Getting Error while executing "show partitions TABLE_NAME"

2013-02-06 Thread Venkatesh Kavuluri
Looks like it's memory/ disk space issue with your database server used to 
store Hive metadata. Can you check the disk usage of /tmp directory (data 
directory of DB server).

Date: Wed, 6 Feb 2013 18:34:31 +0530
Subject: Getting Error while executing "show partitions TABLE_NAME"
From: chunky.gu...@vizury.com
To: user@hive.apache.org

Hi All,

I ran this :-
hive> show partitions tab_name;   

and got this error :-

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error executing 
JDOQL query "SELECT `THIS`.`PART_NAME` AS NUCORDER0 FROM `PARTITIONS` `THIS` 
LEFT OUTER JOIN `TBLS` `THIS_TABLE_DATABASE` ON `THIS`.`TBL_ID` = 
`THIS_TABLE_DATABASE`.`TBL_ID` LEFT OUTER JOIN `DBS` 
`THIS_TABLE_DATABASE_DATABASE_NAME` ON `THIS_TABLE_DATABASE`.`DB_ID` = 
`THIS_TABLE_DATABASE_DATABASE_NAME`.`DB_ID` LEFT OUTER JOIN `TBLS` 
`THIS_TABLE_TABLE_NAME` ON `THIS`.`TBL_ID` = `THIS_TABLE_TABLE_NAME`.`TBL_ID` 
WHERE `THIS_TABLE_DATABASE_DATABASE_NAME`.`NAME` = ? AND 
`THIS_TABLE_TABLE_NAME`.`TBL_NAME` = ? ORDER BY NUCORDER0 " : Error writing 
file '/tmp/MY0TOZFT' (Errcode: 28).

NestedThrowables:
java.sql.SQLException: Error writing file '/tmp/MY0TOZFT' (Errcode: 28)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask


Actually yesterday we had less no. of partitions and today I added around 3000 
more partition for my data which is stored in s3 for Hive. I think this created 
the above error, but don't know how to solve it.


Please help me in this.

Thanks,
Chunky.
  

RE: Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Venkatesh Kavuluri
Hi Mark,
I just started to restore the data to a separate MySQL 5.1 schema, will try to 
create a table and post back here.
I copied the error stack trace below.
Nov  5 22:24:02 127.0.0.1/127.0.0.1 local3:[ETLManager] ERROR [pool-2-thread-1] 
exec.MoveTask - Failed with exception Insert of object 
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor@1db0454f" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`LOCATION`,`OUTPUT_FORMAT`,`IS_COMPRESSED`,`NUM_BUCKETS`,`INPUT_FORMAT`,`SERDE_ID`)
 VALUES (?,?,?,?,?,?,?)" failed : Duplicate entry '5152711' for key 
'PRIMARY'javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MStorageDescriptor@1db0454f" using 
statement "INSERT INTO `SDS` 
(`SD_ID`,`LOCATION`,`OUTPUT_FORMAT`,`IS_COMPRESSED`,`NUM_BUCKETS`,`INPUT_FORMAT`,`SERDE_ID`)
 VALUES (?,?,?,?,?,?,?)" failed : Duplicate entry '5152711' for key 'PRIMARY'   
  at 
org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
   at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:132)   at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:315)
 at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:172)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$29.run(HiveMetaStore.java:1687)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$29.run(HiveMetaStore.java:1684)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307)
   at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1684)
   at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:166)
   at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:354)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1194)at 
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)   at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)  at 
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
Thanks,Venkatesh
Date: Mon, 5 Nov 2012 17:17:46 -0800
Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
From: grover.markgro...@gmail.com
To: user@hive.apache.org

Venkatesh,What's the exact integrity constraint error you are seeing?
I'd be curious to see if you restored the data from the mysqldump onto a 
separate schema/db on MySQL 5.1 server whether you still get the error or not. 

Mark

On Mon, Nov 5, 2012 at 3:37 PM, Venkatesh Kavuluri  
wrote:




Sorry for the confusion, the problem is not with the MySQL version upgrade - I 
have indeed performed the upgrade by doing a mysqldump and restoring the data.
The problem is with how Hive 0.7.1 is interacting with the same metastore data 
on a different version of MySQL server.


> Date: Mon, 5 Nov 2012 18:31:37 -0500
> Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
> From: edlinuxg...@gmail.com
> To: user@hive.apache.org

> 
> Moving underlying data files around is not the correct way to perform
> an upgrade.
> 
> https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html

> 
> I would do a mysqldump and then re-insert the data for maximum comparability.
> 
> On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri
>  wrote:

> > I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1)
> > to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the
> > metastore tables and modified the data under SDS (sub-directories) table to

> > reflect the new data path. However I am getting MySQL integrity constraint
> > violation against SDS.SD_ID column while trying to create new Hive tables.
> > Is this a problem with the MySQL version I am using ? Does Hive 0.7.1

> > support MySQL 5.5 as the metastore.
> >
> > Thanks,
> > Venkatesh
  

  

RE: Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Venkatesh Kavuluri
Sorry for the confusion, the problem is not with the MySQL version upgrade - I 
have indeed performed the upgrade by doing a mysqldump and restoring the data.
The problem is with how Hive 0.7.1 is interacting with the same metastore data 
on a different version of MySQL server.

> Date: Mon, 5 Nov 2012 18:31:37 -0500
> Subject: Re: Hive 0.7.1 with MySQL 5.5 as metastore
> From: edlinuxg...@gmail.com
> To: user@hive.apache.org
> 
> Moving underlying data files around is not the correct way to perform
> an upgrade.
> 
> https://dev.mysql.com/doc/refman/5.5/en/upgrading-from-previous-series.html
> 
> I would do a mysqldump and then re-insert the data for maximum comparability.
> 
> On Mon, Nov 5, 2012 at 6:21 PM, Venkatesh Kavuluri
>  wrote:
> > I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1)
> > to a new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the
> > metastore tables and modified the data under SDS (sub-directories) table to
> > reflect the new data path. However I am getting MySQL integrity constraint
> > violation against SDS.SD_ID column while trying to create new Hive tables.
> > Is this a problem with the MySQL version I am using ? Does Hive 0.7.1
> > support MySQL 5.5 as the metastore.
> >
> > Thanks,
> > Venkatesh
  

Hive 0.7.1 with MySQL 5.5 as metastore

2012-11-05 Thread Venkatesh Kavuluri
I am working on copying existing Hive metadata (Hive 0.7.1 with MySQL 5.1) to a 
new cluster environment (Hive 0.7.1 with MySQL 5.5). I copied over the 
metastore tables and modified the data under SDS (sub-directories) table to 
reflect the new data path. However I am getting MySQL integrity constraint 
violation against SDS.SD_ID column while trying to create new Hive tables. Is 
this a problem with the MySQL version I am using ? Does Hive 0.7.1 support 
MySQL 5.5 as the metastore.
Thanks,Venkatesh  

RE: Question about query result storage

2012-08-09 Thread Venkatesh Kavuluri
You can always do something like 
INSERT OVERWRITE LOCAL DIRECTORY '/path/' SELECT [] FROM []
which saves the result set on to the given path.
Check Hive wiki for more 
info.https://cwiki.apache.org/confluence/display/Hive/GettingStarted 

> Date: Thu, 9 Aug 2012 17:42:17 -0400
> From: pipeha...@gmail.com
> To: user@hive.apache.org
> Subject: Re: Question about query result storage
> 
> Oh, actually is
> hive -S -f some_query.q > some_query.log
> 
> On 08/09/2012 05:41 PM, Yue Guan wrote:
> > We always do something like this:
> > hive -f some_query.q > some_query.log
> >
> > If the output is large, just insert overwrite to some table.
> >
> > On 08/09/2012 05:32 PM, Tom Brown wrote:
> >> Team,
> >>
> >> I'm a new Hive user and I've just run my first large query (a few
> >> hours). Unfortunately, I ran it from the CLI, and the output was
> >> longer than my SSH client allowed for (scroll buffer) so I can't see
> >> the first 1/2 of the result. (It also changes tabs to spaces so
> >> properly aligning the columns of the result is difficult as well).
> >>
> >> When a query is run through the CLI, is the result stored anywhere
> >> (even temporarily)? I would love to not have to run the query again,
> >> but will if I have to.
> >>
> >> However, when I run the query again (from the CLI), I'm not sure how
> >> to correctly store the results. Should I just pipe all output of the
> >> CLI to a file and ignore the few hours of status updates? Can anyone
> >> give any suggestions on what the most appropriate way to accomplish
> >> this is?
> >>
> >> Thanks in advance.
> >>
> >> --Tom
> >
> 
  

RE: Custom UserDefinedFunction in Hive

2012-08-08 Thread Venkatesh Kavuluri
Actually as the custom UDF "yesterday()" mentioned below is NOT marked with the 
annotation @UDFType(deterministic = false), partition pruning should work in 
practice.  The PartitionPruner has a logic around this annotation to check if a 
generic UDF is deterministic or not and would skip partition pruning if it 
finds any non-deterministic function.   
http://svn.apache.org/repos/asf/hive/branches/branch-0.9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
I guess this check is implemented to avoid inconsistencies in result set that 
might arise in scenarios like below where predicate is :part_col = f(time) and 
regular_col = f(time).
The expression involving "part_col" is evaluated at compile time and the 
expression involving "regular_col" is evaluated at run time and the function 
yesterday() might return different values if the query is executed around 
midnight.
Thanks,Venkatesh 

Date: Wed, 8 Aug 2012 03:49:56 -0700
From: bejoy...@yahoo.com
Subject: Re: Custom UserDefinedFunction in Hive
To: user@hive.apache.org

Hi Raihan
UDFs are evaluated at run time when the query is executed. But it is hive 
parser during query parse time decides the boundary of data to be used for the 
query, ie data from which all partitions has to be processed. Because of this 
the entire table will be scanned for your query.
 Regards,Bejoy KS
From: Raihan Jamal 
 To: user@hive.apache.org 
Cc: d...@hive.apache.org 
 Sent: Tuesday, August 7, 2012 10:50 PM
 Subject: Re: Custom UserDefinedFunction in Hive
   

Hi Jan,

 

I figured that out, it is working fine for me now. The only question I have is, 
if I am doing like this-

 

SELECT * FROM REALTIME where dt=
yesterdaydate('MMdd') LIMIT 10;

 

Then the above query will be evaluated as below right?

 

SELECT * FROM REALTIME where dt= ‘20120806’
LIMIT 10;

 

So that means it will look for data in the corresponding dt
partition (20120806) only right as above table is partitioned on dt column ?
And it will not scan the whole table right?


Raihan Jamal





On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár  wrote:


Hi Jamal,
Check if the function really returns what it should and that your data are 
really in MMdd format. You can do this by simple query like this:
SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;




I don't see anything wrong with the function itself, it works well for me 
(although I tested it in hive 0.7.1). The only thing I would change about it 
would be to optimize it by calling 'new' only at the time of construction and 
reusing the object when the function is called, but that should not affect the 
functionality at all.




Best regards,

Jan





On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal  wrote:




Problem





I created the below UserDefinedFunction to get the yesterday's
day in the format I wanted as I will be passing the format into this below
method from the query.

 

public final class YesterdayDate extends UDF {

 

   
public String evaluate(final String format) { 

   
DateFormat dateFormat = new SimpleDateFormat(format); 

   
Calendar cal = Calendar.getInstance();

   
cal.add(Calendar.DATE, -1); 

   
return dateFormat.format(cal.getTime()).toString(); 

   
} 

}

 

 

So whenever I try to run the query like below by adding the jar
to classpath and creating the temporary function yesterdaydate, I always get
zero result back-

 

hive> create temporary function yesterdaydate as
'com.example.hive.udf.YesterdayDate';

OK

Time taken: 0.512 seconds

 

Below is the query I am running-

 

hive> SELECT * FROM REALTIME where dt=
yesterdaydate('MMdd') LIMIT 10;

OK

 

And I always get zero result back but the data is there in that
table for Aug 5th.

 

What wrong I am doing? Any suggestions will be appreciated.

 

 

NOTE:- As I am working with Hive 0.6 so it doesn’t support
variable substitution thing, so I cannot use hiveconf here and the above table
has been partitioned on dt(date) column.







  

RE: (Get the current date -1) in Hive

2012-08-07 Thread Venkatesh Kavuluri
Since the UDF unix_timestamp() is a non-deterministic function, Hive query 
planner doesn't run partition pruning based the 'dt' column value. If your 
table is partitioned by 'dt' column, the query would end up scanning entire 
table.
It is ideal to compute the required date value dynamically in a shell script if 
you are working against partitioned columns. If your version of Hive doesn't 
support variable substitution based on hiveconf, you can achieve the same as 
shown below
hive -e "SELECT  FROM  where dt=$date_var;"
Thanks,Venkatesh

From: carla.stae...@nokia.com
To: user@hive.apache.org
Subject: RE: (Get the current date -1) in Hive
Date: Mon, 6 Aug 2012 23:26:46 +







In the case here it literally is taking the UNIX timestamp, formatting it in 
-mm-dd format and then subtracting the specified integer (in this case 1)




Sent from my Lumia 900




From:
ext Techy Teck

Sent:
8/6/2012 3:37 PM

To:
user@hive.apache.org

Subject:
Re: (Get the current date -1) in Hive



Thanks Carla for the suggestion, I am currently using Hive 0.6 and that Hive 
version doesn't supports variable substitution with hiveconf variable, so that 
is the reason I was looking for some other alternative-



So you are saying basically, If I add your suggestion in my query like below-



select * from lip_data_quality where dt = 
'date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'-MM-dd') , 1)';



Then the above query will be interpreted as like below-




select * from lip_data_quality where dt = '2012-08-05';






Am I right? And what does date_sub do here? I am not familiar with that.



Correct me if I am wrong.















On Mon, Aug 6, 2012 at 12:23 PM,  wrote:


If you are just using it in a query, you can do this:

date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'-MM-dd') , 1)



I generally do my date calculations in a shell script and pass them in with a 
hiveconf variable.



Carla




-Original Message-

From: ext Yue Guan [mailto:pipeha...@gmail.com]

Sent: Monday, August 06, 2012 15:00

To: user@hive.apache.org

Subject: Re: (Get the current date -1) in Hive



guess you can use sub_date, but you have to get today by some outside script.



On 08/06/2012 02:10 PM, Techy Teck wrote:

> Is there any way to get the current date -1 in Hive means yesterdays

> date always?

>

>

>