SAS-->Hive integration
Hi guys Anyone tried SAS-->Hive integration successfully ? I tried a simple query in SAS (select col1 from table1 limit 10) and it opened 3 connections to hive-server and killed it !!! :-( I will setup a dev environment for SAS and Hive to test all this But I was wondering if you guys had any clues ? Any thoughts ? sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: hcatalog takes minutes talking to mysql metadata
perhaps HIVE-4914 relevant On Wed, Aug 28, 2013 at 3:11 AM, Michał Czerwiński wrote: > Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT > jars (from > https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/) > with exactly the same issue. That could possibly indicate that problem may > be related to the actual hive-metastore component and the way it interacts > with metastore, thoughts? > > > On 27 August 2013 18:41, Michał Czerwiński wrote: > >> In PIG I am doing query like this: >> >> sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader; >> sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2'; >> ll = LIMIT sdp 100; >> dump ll; >> >> and hcatalog starts talking for few minutes to mysql asking for metadata, >> in the meantime after few seconds pig >> does: org.apache.thrift.transport.TTransportException: >> java.net.SocketTimeoutException: Read timed out >> >> Number of partitions I have: >> hive -e 'use db1; show partitions table1' |wc -l >> Time taken: 1.467 seconds >> 37748 >> >> When I run the same query on a different environment where I have only >> ~1000 partitions all works fine. >> >> Also problem does not exist on cdh3 and hcatalog-0.4.0. >> >> In hcatalog's logs I can see: >> (note the timestamp, I run the query at 17:10:45,216) >> >> 2013-08-27 17:10:46,275 INFO DataNucleus.MetaData >> (Log4JLogger.java:info(77)) - Listener found initialisation for persistable >> class org.apache.hadoop.hive.metastore.model.MPartition >> >> 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore >> (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all >> objects for listMPartitionsByFilter >> >> 2013-08-27 17:22:32,410 INFO metastore.ObjectStore >> (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning = >> 37748 >> >> After that the hcatalog continues to: >> 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction >> (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms >> >> Please note that I have datanucleus set to DEBUG and that slows things >> down significantly, without that, it still takes around 7 minutes for >> hcatalog to settle. >> >> Also datanucleus settings from the hcatalog's logs: >> >> datanucleus.autoStartMechanismMode = checked >> javax.jdo.option.Multithreaded = true >> datanucleus.identifierFactory = datanucleus >> datanucleus.transactionIsolation = read >> datanucleus.validateTables = false >> javax.jdo.option.ConnectionURL = jdbc:mysql://XXX >> javax.jdo.option.DetachAllOnCommit = true >> javax.jdo.option.NonTransactionalRead = true >> datanucleus.validateConstraints = false >> javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver >> javax.jdo.option.ConnectionUserName = hive >> datanucleus.validateColumns = false >> datanucleus.cache.level2 = false >> datanucleus.plugin.pluginRegistryBundleCheck = LOG >> datanucleus.cache.level2.type = none >> javax.jdo.PersistenceManagerFactoryClass = >> org.datanucleus.jdo.JDOPersistenceManagerFactory >> datanucleus.autoCreateSchema = true >> datanucleus.storeManagerType = rdbms >> datanucleus.connectionPoolingType = DBCP >> >> This runs on CDH4 4.3.0 >> hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0 >> >> Ideas? >> > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
The role of servers in HIVE
Its clear that we can run hive ql scripts without a running server. Im wondering, however, are the requirements that the server fulfills? For one, I have found this slide deck which is useful... https://cwiki.apache.org/confluence/download/attachments/27362054/Hive-client-server-deployment-modes.pdf?version=1&modificationDate=1360856723000 And I've asked a similar question on stackoverflow that could use some comments http://stackoverflow.com/questions/18477440/why-does-hive-depend-on-a-server-url-and-hbase -- Jay Vyas http://jayunit100.blogspot.com
File does not exist in Hive Job
Hi, I am using Hive 0.11 with Hadoop 2.1.0-beta1, and on running an INSERT OVERWRITE (with my own serde) I get this exception below in a lot of my mappers. Any ideas? I even turned hive.exec.parallel=false, with no luck. Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server. namenode.LeaseExpiredException): No lease on /tmp/hive-hdfs/hive_2013-08-27_22-03-00_305_6489408868004948109/ _task_tmp.-ext-10002/_tmp.01_0: File does not exist. Holder DFSClient_attempt_1377640952593_0001_m_01_0_-1310282481_1 does not have any open files Punnoose, Roshan rashan.punnr...@merck.com Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
Re: hcatalog takes minutes talking to mysql metadata
Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT jars (from https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/) with exactly the same issue. That could possibly indicate that problem may be related to the actual hive-metastore component and the way it interacts with metastore, thoughts? On 27 August 2013 18:41, Michał Czerwiński wrote: > In PIG I am doing query like this: > > sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader; > sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2'; > ll = LIMIT sdp 100; > dump ll; > > and hcatalog starts talking for few minutes to mysql asking for metadata, > in the meantime after few seconds pig > does: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > > Number of partitions I have: > hive -e 'use db1; show partitions table1' |wc -l > Time taken: 1.467 seconds > 37748 > > When I run the same query on a different environment where I have only > ~1000 partitions all works fine. > > Also problem does not exist on cdh3 and hcatalog-0.4.0. > > In hcatalog's logs I can see: > (note the timestamp, I run the query at 17:10:45,216) > > 2013-08-27 17:10:46,275 INFO DataNucleus.MetaData > (Log4JLogger.java:info(77)) - Listener found initialisation for persistable > class org.apache.hadoop.hive.metastore.model.MPartition > > 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore > (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all > objects for listMPartitionsByFilter > > 2013-08-27 17:22:32,410 INFO metastore.ObjectStore > (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning = > 37748 > > After that the hcatalog continues to: > 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction > (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms > > Please note that I have datanucleus set to DEBUG and that slows things > down significantly, without that, it still takes around 7 minutes for > hcatalog to settle. > > Also datanucleus settings from the hcatalog's logs: > > datanucleus.autoStartMechanismMode = checked > javax.jdo.option.Multithreaded = true > datanucleus.identifierFactory = datanucleus > datanucleus.transactionIsolation = read > datanucleus.validateTables = false > javax.jdo.option.ConnectionURL = jdbc:mysql://XXX > javax.jdo.option.DetachAllOnCommit = true > javax.jdo.option.NonTransactionalRead = true > datanucleus.validateConstraints = false > javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver > javax.jdo.option.ConnectionUserName = hive > datanucleus.validateColumns = false > datanucleus.cache.level2 = false > datanucleus.plugin.pluginRegistryBundleCheck = LOG > datanucleus.cache.level2.type = none > javax.jdo.PersistenceManagerFactoryClass = > org.datanucleus.jdo.JDOPersistenceManagerFactory > datanucleus.autoCreateSchema = true > datanucleus.storeManagerType = rdbms > datanucleus.connectionPoolingType = DBCP > > This runs on CDH4 4.3.0 > hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0 > > Ideas? >