SAS-->Hive integration

2013-08-28 Thread Sanjay Subramanian
Hi guys

Anyone tried SAS-->Hive integration successfully ?

I tried a simple query in SAS (select col1 from table1 limit 10) and it opened 
3 connections to hive-server and killed it !!! :-(

I will setup a dev environment for SAS and Hive to test all this

But I was wondering if you guys had any clues ? Any thoughts ?

sanjay

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: hcatalog takes minutes talking to mysql metadata

2013-08-28 Thread Eugene Koifman
perhaps HIVE-4914 relevant


On Wed, Aug 28, 2013 at 3:11 AM, Michał Czerwiński  wrote:

> Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT
> jars (from
> https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/)
> with exactly the same issue. That could possibly indicate that problem may
> be related to the actual hive-metastore component and the way it interacts
> with metastore, thoughts?
>
>
> On 27 August 2013 18:41, Michał Czerwiński wrote:
>
>> In PIG I am doing query like this:
>>
>> sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader;
>> sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2';
>> ll = LIMIT sdp 100;
>> dump ll;
>>
>> and hcatalog starts talking for few minutes to mysql asking for metadata,
>> in the meantime after few seconds pig
>> does: org.apache.thrift.transport.TTransportException:
>> java.net.SocketTimeoutException: Read timed out
>>
>> Number of partitions I have:
>> hive -e 'use db1; show partitions table1' |wc -l
>> Time taken: 1.467 seconds
>> 37748
>>
>> When I run the same query on a different environment where I have only
>> ~1000 partitions all works fine.
>>
>> Also problem does not exist on cdh3 and hcatalog-0.4.0.
>>
>> In hcatalog's logs I can see:
>> (note the timestamp, I run the query at 17:10:45,216)
>>
>> 2013-08-27 17:10:46,275 INFO  DataNucleus.MetaData
>> (Log4JLogger.java:info(77)) - Listener found initialisation for persistable
>> class org.apache.hadoop.hive.metastore.model.MPartition
>>
>> 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore
>> (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all
>> objects for listMPartitionsByFilter
>>
>> 2013-08-27 17:22:32,410 INFO  metastore.ObjectStore
>> (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning =
>> 37748
>>
>> After that the hcatalog continues to:
>> 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction
>> (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms
>>
>> Please note that I have datanucleus set to DEBUG and that slows things
>> down significantly, without that, it still takes around 7 minutes for
>> hcatalog to settle.
>>
>> Also datanucleus settings from the hcatalog's logs:
>>
>>  datanucleus.autoStartMechanismMode = checked
>>  javax.jdo.option.Multithreaded = true
>>  datanucleus.identifierFactory = datanucleus
>>  datanucleus.transactionIsolation = read
>>  datanucleus.validateTables = false
>>  javax.jdo.option.ConnectionURL = jdbc:mysql://XXX
>>  javax.jdo.option.DetachAllOnCommit = true
>>  javax.jdo.option.NonTransactionalRead = true
>>  datanucleus.validateConstraints = false
>>  javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver
>>  javax.jdo.option.ConnectionUserName = hive
>>  datanucleus.validateColumns = false
>>  datanucleus.cache.level2 = false
>>  datanucleus.plugin.pluginRegistryBundleCheck = LOG
>>  datanucleus.cache.level2.type = none
>>  javax.jdo.PersistenceManagerFactoryClass =
>> org.datanucleus.jdo.JDOPersistenceManagerFactory
>>  datanucleus.autoCreateSchema = true
>>  datanucleus.storeManagerType = rdbms
>>  datanucleus.connectionPoolingType = DBCP
>>
>> This runs on CDH4 4.3.0
>> hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0
>>
>> Ideas?
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


The role of servers in HIVE

2013-08-28 Thread Jay Vyas
Its clear that we can run hive ql scripts without a running server.  Im
wondering, however, are the requirements that the server fulfills?

For one, I have found this slide deck which is useful...

https://cwiki.apache.org/confluence/download/attachments/27362054/Hive-client-server-deployment-modes.pdf?version=1&modificationDate=1360856723000

And I've asked a similar question on stackoverflow that could use some
comments

http://stackoverflow.com/questions/18477440/why-does-hive-depend-on-a-server-url-and-hbase
-- 
Jay Vyas
http://jayunit100.blogspot.com


File does not exist in Hive Job

2013-08-28 Thread Punnoose, Roshan
Hi, 

I am using Hive 0.11 with Hadoop 2.1.0-beta1, and on running an INSERT 
OVERWRITE (with my own serde) I get this exception below in a lot of my 
mappers. Any ideas? I even turned hive.exec.parallel=false, with no luck.

Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.
namenode.LeaseExpiredException): No lease on
/tmp/hive-hdfs/hive_2013-08-27_22-03-00_305_6489408868004948109/
_task_tmp.-ext-10002/_tmp.01_0: File does not exist. Holder
DFSClient_attempt_1377640952593_0001_m_01_0_-1310282481_1 does not
have any open files

Punnoose, Roshan
rashan.punnr...@merck.com



Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.



Re: hcatalog takes minutes talking to mysql metadata

2013-08-28 Thread Michał Czerwiński
Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT
jars (from
https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/)
with exactly the same issue. That could possibly indicate that problem may
be related to the actual hive-metastore component and the way it interacts
with metastore, thoughts?


On 27 August 2013 18:41, Michał Czerwiński  wrote:

> In PIG I am doing query like this:
>
> sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader;
> sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2';
> ll = LIMIT sdp 100;
> dump ll;
>
> and hcatalog starts talking for few minutes to mysql asking for metadata,
> in the meantime after few seconds pig
> does: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
>
> Number of partitions I have:
> hive -e 'use db1; show partitions table1' |wc -l
> Time taken: 1.467 seconds
> 37748
>
> When I run the same query on a different environment where I have only
> ~1000 partitions all works fine.
>
> Also problem does not exist on cdh3 and hcatalog-0.4.0.
>
> In hcatalog's logs I can see:
> (note the timestamp, I run the query at 17:10:45,216)
>
> 2013-08-27 17:10:46,275 INFO  DataNucleus.MetaData
> (Log4JLogger.java:info(77)) - Listener found initialisation for persistable
> class org.apache.hadoop.hive.metastore.model.MPartition
>
> 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore
> (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all
> objects for listMPartitionsByFilter
>
> 2013-08-27 17:22:32,410 INFO  metastore.ObjectStore
> (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning =
> 37748
>
> After that the hcatalog continues to:
> 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction
> (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms
>
> Please note that I have datanucleus set to DEBUG and that slows things
> down significantly, without that, it still takes around 7 minutes for
> hcatalog to settle.
>
> Also datanucleus settings from the hcatalog's logs:
>
>  datanucleus.autoStartMechanismMode = checked
>  javax.jdo.option.Multithreaded = true
>  datanucleus.identifierFactory = datanucleus
>  datanucleus.transactionIsolation = read
>  datanucleus.validateTables = false
>  javax.jdo.option.ConnectionURL = jdbc:mysql://XXX
>  javax.jdo.option.DetachAllOnCommit = true
>  javax.jdo.option.NonTransactionalRead = true
>  datanucleus.validateConstraints = false
>  javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver
>  javax.jdo.option.ConnectionUserName = hive
>  datanucleus.validateColumns = false
>  datanucleus.cache.level2 = false
>  datanucleus.plugin.pluginRegistryBundleCheck = LOG
>  datanucleus.cache.level2.type = none
>  javax.jdo.PersistenceManagerFactoryClass =
> org.datanucleus.jdo.JDOPersistenceManagerFactory
>  datanucleus.autoCreateSchema = true
>  datanucleus.storeManagerType = rdbms
>  datanucleus.connectionPoolingType = DBCP
>
> This runs on CDH4 4.3.0
> hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0
>
> Ideas?
>