Using Hive metastore as general purpose RDBMS

2014-01-22 Thread Petter von Dolwitz (Hem)
Hi,

I have a case where I would like to extend Hive to use information from a
regular RDBMS. To limit the complexity of the installation I thought I
could piggyback on the already existing metatstore.

As I understand it, HCatalog is not built for this purpose. Is there
someone out there that has a similar usecase or have any input on how this
is done or if it should be avoided?

The use case is to look up which partitions that contain certain data.

Thanks,
Petter


HiverServer2 Logging

2014-01-22 Thread Dima Machlin
Hey,
We are considering to start using HiveServer2 and its seems that the logging 
capabilities are really limited as the server runs on a single port.
I'll explain the problem more accurately :
Today, we are using HiveServer where each user gets a specific port (or a set 
of ports) then, we start X hiveservers , 1 for each port in the following manner

nohup hive --service hiveserver -p $2 -v --hiveconf 
hive.root.logger=INFO,console  ~/hive.$2 21 
For example :
nohup hive --service hiveserver -p 12354 -v --hiveconf 
hive.root.logger=INFO,console  ~/hive.12354 21 

This way, hiveserver runs in the background, the logging is redirected to the 
console but then its redirected to a file.
So each port has its own log file, assuming there are no concurrent executions 
on the same port, each file is a sequential execution log.

This is really convenient to debugging executions on the HiveServer as the full 
stacktrace isn't passed through the JDBC and mostly we get a MapredTask error
The full errors are found in the logs and debugging is easy.

Now in the HiveServer2, I couldn't find any way to separate  the logging per 
session.
In fact i see that there is a 
Jirahttps://issues.apache.org/jira/browse/HIVE-5924 open for this issue.

Is there any work around for this? I want each session / connection in its own 
log file, higher resolution is also acceptable (like a log per session)
In fact, looking at hive-exec-log4j.properties it seems that logging is per 
query :
hive.log.file=${hive.query.id}.log
But this is a different log full with counter data / task progress when i'm 
looking for the standard logging as seem in the CLI.
I think that this is defined by hive.log.file in hive-log4j.properties which is 
set to :
hive.log.file=hive.log
Trying to set it also as the previous doesn't help as ${hive.query.id} resolves 
to nothing.

Any advice appreciated.

---
[cid:image001.jpg@01CE92B5.CB034C90]
Dima Machlin, Big Data Architect
Pursway.comhttp://www.pursway.com/

inline: image001.jpg

Re: Formatting hive queries

2014-01-22 Thread John Meagher
I use vim and https://github.com/vim-scripts/SQLUtilities to do it.
It's not hive specific.  Any SQL formatting tool will work.

On Tue, Jan 21, 2014 at 11:23 PM, pandees waran pande...@gmail.com wrote:
 Hi,

 I would like to come up with a code which automatically formats your hql
 files.
 Because, formatting is one of the tedious task and i would like to  come up
 with an utility for that.
 Please let me know, whether any  specific utilities exist already for
 formatting hive queries.

 --
 Thanks,
 Pandeeswaran


Re: Using Hive metastore as general purpose RDBMS

2014-01-22 Thread Alan Gates
HCatalog is definitely not designed for this purpose.  Could you explain your 
use case more fully?  Is this indexing for better query planning or faster file 
access?  If so, you might look at some of the work going on in ORC, which is 
storing indices of its data in the format itself for these purposes.  Also, how 
much data do you need to store?  Even index size on a Hadoop scale data can 
quickly overwhelm MySQL or Postgres (which is what most people use for their 
metastores) if you are keeping per row information.  If you truly want to 
access an RDBMS as if it were an external data store, you could implement a 
HiveStorageHandler for your RDBMS.

Alan.

On Jan 22, 2014, at 2:02 AM, Petter von Dolwitz (Hem) 
petter.von.dolw...@gmail.com wrote:

 Hi,
 
 I have a case where I would like to extend Hive to use information from a 
 regular RDBMS. To limit the complexity of the installation I thought I could 
 piggyback on the already existing metatstore.
 
 As I understand it, HCatalog is not built for this purpose. Is there someone 
 out there that has a similar usecase or have any input on how this is done or 
 if it should be avoided?
 
 The use case is to look up which partitions that contain certain data.
 
 Thanks,
 Petter
 
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


HIVE SUB QUERY:: How to implement this case

2014-01-22 Thread yogesh dhari
Hello all,

I have a case statement where I need to work like this logic.

select as_of_dt as as_of_dt, max_feed_key as max_feed_key, min_feed_key as
min_feed_key from table feed_key_temp where max_fed_key  ( select
max(feed_key) from summ_table ) group by as_of_dt ;


Here, max_feed_key and min_feed_key are the fields in table  feed_key_temp.


As Hive does not provide (0.9 version)  sub query in where clause, Pls
suggest the work around it, and how to implement it


Thanks  Regards

Yogesh Kumar


Re: Using Hive metastore as general purpose RDBMS

2014-01-22 Thread Petter von Dolwitz (Hem)
Hi Alan,

thank you for your reply. The loose idea I had was to store one row in the
RDBMS per Hive partition so I don't think the size will be an issue
(expecting 3000 partitions or so). The end goal was to help to decide which
partitions that are relevant for a query. Something like adding partition
info to the WHERE clause behind the scenes. The way the data is structured
we currently need to look up which partitions to use elsewhere.

I'll look into ORC for sure. Currently we do not use any of the provided
file formats but have implemented our own InputFormat that read gzip:ed
protobufs. I suspect that we later on should investigate a possible
performance gain coming from moving to a another file format.

Petter


2014/1/22 Alan Gates ga...@hortonworks.com

 HCatalog is definitely not designed for this purpose.  Could you explain
 your use case more fully?  Is this indexing for better query planning or
 faster file access?  If so, you might look at some of the work going on in
 ORC, which is storing indices of its data in the format itself for these
 purposes.  Also, how much data do you need to store?  Even index size on a
 Hadoop scale data can quickly overwhelm MySQL or Postgres (which is what
 most people use for their metastores) if you are keeping per row
 information.  If you truly want to access an RDBMS as if it were an
 external data store, you could implement a HiveStorageHandler for your
 RDBMS.

 Alan.

 On Jan 22, 2014, at 2:02 AM, Petter von Dolwitz (Hem) 
 petter.von.dolw...@gmail.com wrote:

  Hi,
 
  I have a case where I would like to extend Hive to use information from
 a regular RDBMS. To limit the complexity of the installation I thought I
 could piggyback on the already existing metatstore.
 
  As I understand it, HCatalog is not built for this purpose. Is there
 someone out there that has a similar usecase or have any input on how this
 is done or if it should be avoided?
 
  The use case is to look up which partitions that contain certain data.
 
  Thanks,
  Petter
 
 


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



DESCRIBE EXTENDED show numRows=0

2014-01-22 Thread Haroon Muhammad
Hi,

I have observed that 
DESCRIBE EXTENDED always shows number of rows to be zero despite the 
fact that the table has data. Is it a bug? Is it known ? Has anyone else also 
come 
across the same ?


Thanks,   

Hive 0.12.0 mysql metastore exception

2014-01-22 Thread Jander g
Hi, guys

I upgrade hive 0.70 to hive 0.12.0 recently. We resolve some problems, but
the one as follows puzzles me.

We have thousands jobs every day and that will have about 40 this errors. I
increase mysql max connections from 2000 to 4000 and I found the history
max connection less than 2000.

Any suggestions will be greatly appreciated. Thanks in advance.


2014-01-23 06:35:47,724 WARN  bonecp.BoneCPConfig
(BoneCPConfig.java:sanitize(1537)) - Max Connections  1. Setting to 20
2014-01-23 06:35:48,433 ERROR metastore.RetryingRawStore
(RetryingRawStore.java:invoke(146)) - JDO datastore error. Retrying
metastore command after 1000 ms (attempt 1 of 1)
2014-01-23 06:35:49,467 ERROR exec.DDLTask (DDLTask.java:execute(435)) -
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143)
at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
at
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at
org.apache.hadoop.hive.cli.NewCliDriver.processCmd(NewCliDriver.java:166)
at
org.apache.hadoop.hive.cli.NewCliDriver.processLine(NewCliDriver.java:243)
at
org.apache.hadoop.hive.cli.NewCliDriver.main(NewCliDriver.java:427)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
/ERROR

at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:286)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:54)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at
org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:121)
... 27 more
Caused by: java.sql.BatchUpdateException: Duplicate entry 'default' for key
2
at
com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2020)
at
com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1451)
at
com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
at
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
at
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
at
org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:596)
at
org.datanucleus.store.rdbms.SQLController$1.transactionFlushed(SQLController.java:683)
at
org.datanucleus.store.connection.AbstractManagedConnection.transactionFlushed(AbstractManagedConnection.java:86)
at
org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:454)
at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:199)
at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:263)
at
org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:98)
... 43 more

2014-01-23 06:35:49,468 ERROR ql.Driver (SessionState.java:printError(436))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable
to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient


-- 
Thanks,
Jander


Re: Hive 0.12.0 mysql metastore exception

2014-01-22 Thread Jov
2014/1/23 Jander g jande...@gmail.com

 Caused by: java.sql.BatchUpdateException: Duplicate entry 'default' for
 key 2


what is the HQL you run? it look like hive try to insert 'default' to a
meta table which violate the unique key.


Jov
blog: http:amutu.com/blog http://amutu.com/blog


Re: Hive 0.12.0 mysql metastore exception

2014-01-22 Thread Jander g
Hi, jov

Thanks for your attention.

we use hive like this:
hive -e use abc; insert overwrite .
Here, abc is a hive schema which already exists in hive metastore. But,
from the log we can see, hive ddl switch database is failed.
That is to say, it doesn't read the exist schema?

2014-01-23 06:35:49,467 ERROR exec.DDLTask (DDLTask.java:execute(435)) -
org.apache.hadoop.hive.ql.
metadata.HiveException: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143)
at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
at org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase
(DDLTask.java:3479)



Best Regards,

On Thu, Jan 23, 2014 at 1:27 PM, Jov am...@amutu.com wrote:


 2014/1/23 Jander g jande...@gmail.com

 Caused by: java.sql.BatchUpdateException: Duplicate entry 'default' for
 key 2


 what is the HQL you run? it look like hive try to insert 'default' to a
 meta table which violate the unique key.


 Jov
 blog: http:amutu.com/blog http://amutu.com/blog




-- 
Thanks,
Jander