Re: Performance for hive external to hbase with serval terabyte or more data

2016-05-11 Thread Sathi Chowdhury
Hi Yang,
Did you think of bulk loading option?

http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
This may be a way to go .
Thanks
Sathi


On May 11, 2016, at 6:07 PM, Yi Jiang 
mailto:yi.ji...@ubisoft.com>> wrote:

Hi, Guys
Recently we are debating the usage for hbase as our destination for data 
pipeline job.
Basically, we want to save our logs into hbase, and our pipeline can generate 
2-4 terabytes data everyday, but our IT department think it is not good idea to 
scan so hbase, it will cause the performance and memory issue. And they ask our 
just keep 15 minutes data amount in the hbase for real time analysis.
For now, I am using hive to external to hbase, but what I am thinking that for 
map reduce job, what kind of mapper it is using to scan the data from hbase? Is 
it TableInputFormatBase? and how many mapper it will use in hive to scan the 
hbase. Is it efficient or not? Will it cause the performance issue if we have 
couple T's or more larger data amount?
I am also trying to index some columns that we might use to query. But  I am 
not sure if it is good idea to keep so much history data in the hbase for query.
Thank you
Jacky



Re: Running hive queries in different queue

2016-02-26 Thread Sathi Chowdhury
I think  in your hive script you can do
set mapreduce.job.queuename=;
Thanks
Sathi

From: Rajit Saha
Reply-To: "user@hive.apache.org"
Date: Friday, February 26, 2016 at 5:34 PM
To: "user@hive.apache.org"
Subject: Running hive queries in different queue

Hi

I want to run hive query in a queue others than "default" queue from hive 
client command line . Can anybody please suggest a way to do it.

Regards
Rajit

On Feb 26, 2016, at 07:36, Patrick Duin 
mailto:patd...@gmail.com>> wrote:

Hi Prasanth.

Thanks for the quick reply!

The logs don't show much more of the stacktrace I'm afraid:
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


The stacktrace isn't really the issue though. The NullPointer is a symptom 
caused by not being able to return any stripes, if you look at the line in the 
code it is  because the 'stripes' field is null which should never happen. 
This, we think, is caused by failing namenode network traffic. We would have 
lots of IO warning in the logs saying block's cannot be found or e.g.:
16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
at 
org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
at 
org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:208)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:159)
at 
org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)
at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:885)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:771)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1047)
at org.apache.hadoop.ipc.Client.call(Client.java:1442)
... 33 more

Our job doesn't

Re: Force users to specify partition indexes in queries

2015-09-29 Thread Sathi Chowdhury
hive.exec.dynamic.partition.mode=strict ,seems does the same thing.


From: Ashutosh Chauhan
Reply-To: "user@hive.apache.org"
Date: Tuesday, September 29, 2015 at 1:48 PM
To: "user@hive.apache.org"
Subject: Re: Force users to specify partition indexes in queries

set hive.mapred.mode = strict;
This will fail query if query doesnt specify filter containing partitioning 
column.

On Tue, Sep 29, 2015 at 10:56 AM, Smit Shah 
mailto:who...@gmail.com>> wrote:
Hey,

We have exposed hive console to our org via command line and through Hue ui. 
However, we are currently facing issues when user runs a blanket select or 
insert query from table without specifying partition indexes.

I wanted to know if its possible to enforce users to provide partition index in 
queries or fail otherwise. (Our partition indexes are date and time). Also, if 
that's possible we also want to restrict the range of date to 2 days so no 
malicious user can effect the cluster availability by querying for months worth 
of data.


If this is not doable in Hive at the moment, I am interested in writing a patch 
for it. I am not familiar with hive codebase so not sure how complex this is. 
Any hints or tips would be great and if I do need to write such a patch, I 
would be happy to contribute back to the source.


Regards,
Smit



Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-16 Thread Sathi Chowdhury
Congrats Asutosh!

From: Sergey Shelukhin
Reply-To: "user@hive.apache.org"
Date: Wednesday, September 16, 2015 at 2:31 PM
To: "user@hive.apache.org"
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congrats!

From: Alpesh Patel mailto:alpeshrpa...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, September 16, 2015 at 13:24
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congratulations Ashutosh

On Wed, Sep 16, 2015 at 1:23 PM, Pengcheng Xiong 
mailto:pxi...@apache.org>> wrote:
Congratulations Ashutosh!

On Wed, Sep 16, 2015 at 1:17 PM, John Pullokkaran 
mailto:jpullokka...@hortonworks.com>> wrote:
Congrats Ashutosh!

From: Vaibhav Gumashta 
mailto:vgumas...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, September 16, 2015 at 1:01 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Cc: Ashutosh Chauhan mailto:hashut...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congrats Ashutosh!

—Vaibhav

From: Prasanth Jayachandran 
mailto:pjayachand...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, September 16, 2015 at 12:50 PM
To: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, Ashutosh Chauhan 
mailto:hashut...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congratulations Ashutosh!





On Wed, Sep 16, 2015 at 12:48 PM -0700, "Xuefu Zhang" 
mailto:xzh...@cloudera.com>> wrote:

Congratulations, Ashutosh!. Well-deserved.

Thanks to Carl also for the hard work in the past few years!

--Xuefu

On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach 
mailto:c...@apache.org>> wrote:

> I am very happy to announce that Ashutosh Chauhan is taking over as the
> new VP of the Apache Hive project. Ashutosh has been a longtime contributor
> to Hive and has played a pivotal role in many of the major advances that
> have been made over the past couple of years. Please join me in
> congratulating Ashutosh on his new role!
>




Re: Adding JAR in Hive classpath

2015-09-04 Thread Sathi Chowdhury
I had a similar problem with one of out udf libs using commons-lang3…
The only way it worked was to create a fat jar with shaded class of and place 
it under hdfs ….


How you will shade , an example below




false



com.google.common
org.shaded.google.common


org.apache.commons
org.shaded.apache.commons




x.y.analytics:z-libs is your lib jar
com.google.guava:guava
org.apache.commons:commons-lang3
commons-io:commons-io
commons-logging:commons-logging
commons-codec:commons-codec
org.apache.commons:commons-math3
commons-cli:commons-cli
commons-lang:commons-lang




Additional note
If you are running this with oozie place this is oozie user lib area e….user 
lib is where your workflow.xml is need to have a subfolder “lib” and keep it 
under this lib folder
HTH
Sathi
From: Akansha Jain
Reply-To: "user@hive.apache.org"
Date: Friday, September 4, 2015 at 2:58 PM
To: "user@hive.apache.org"
Subject: Adding JAR in Hive classpath


Hi All,

I am facing an issue with the Hive classpath. I have written a UDAF which is 
using common maths 3.3 version. So, while creating temporary function I first 
add common maths 3.3 and then UDAF jar and create temporary function.

There is another version of common math 3.1 present under HADOOP_HOME/lib 
directory.

Now the problem is even after adding common maths 3.3 in Hive classpath (by ADD 
JAR ..) , Hive is picking common maths 3.1 version from HADOOP_HOME/lib folder. 
How do I remove 3.1 version from classpath.

I tried using DELETE JAR ... but it doesnt work.

Is there any way, I can force Hive to pick my version and not the one with 
Hadoop lib. Any help is appreciated.

Thanks

AJ​


FW: hive server2 jdbc

2014-11-10 Thread Sathi Chowdhury


Hello hive users ,
I am trying to use hive2 jdbc connection for the first time .

publicclass MyHiveJDBCClient {

privatestatic String driverName = "org.apache.hive.jdbc.HiveDriver";

private static final Logger LOG = 
LoggerFactory.getLogger(MyHiveJDBCClient.class);


publicstatic void main(String[] args) throws SQLException {

try {

Class.forName(driverName);

} catch (Exception e) {

e.printStackTrace();

System.exit(1);

}

Connection con = DriverManager.getConnection(

"jdbc:hive2://xyz.ab.com:1/prod", "", "");

Statement stmt = con.createStatement();

String sql = "show tables";

System.out.println("Running: " + sql);

ResultSet res = stmt.executeQuery(sql);

while (res.next()) {

  System.out.println(res.getString(1));

}

}

Seems to be successfully executing the query but acts on only default db and 
shows tables from default db.
Why is it not connecting to prod db,any clue?
Thanks
Sathi