Re: Hive Generic UDF invoking Hbase

2015-09-30 Thread Jason Dere
Take a look at hive.fetch.task.conversion in https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties?, try setting to "none" or "minimal" From: Ryan Harris Sent: Wednesday, September 30, 2015 9:19 AM To:

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Yogesh Keshetty
I believe It's not because of classpath. For a single task / for streaming it's working fine right. Sent from Outlook On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" wrote: are all tasks failing with the same error message? based on this: Caused by:

RE: CombineHiveInputFormat not working

2015-09-30 Thread Ryan Harris
what are your values for: mapred.min.split.size mapred.max.split.size hive.hadoop.supports.splittable.combineinputformat From: Pradeep Gollakota [mailto:pradeep...@gmail.com] Sent: Wednesday, September 30, 2015 2:20 PM To: user@hive.apache.org Subject: CombineHiveInputFormat not working Hi all,

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Yogesh Keshetty
Jason and Ryan, Thanks for the solutions. It's now launching in MapReduce Mode. However, we are encountering another issue, since UDF is executing parallely now, We are facing another issue. Inside the generic UDF we are processing the records and storing in Hbase record by record. The job is

CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION

RE: CombineHiveInputFormat not working

2015-09-30 Thread Ryan Harris
Also... mapreduce.input.fileinputformat.split.maxsize and, what is the size of your input files? From: Ryan Harris Sent: Wednesday, September 30, 2015 2:37 PM To: 'user@hive.apache.org' Subject: RE: CombineHiveInputFormat not working what are your values for: mapred.min.split.size

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Ryan Harris
are all tasks failing with the same error message? based on this: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.HTable at java.net.URLClassLoader$1.run(URLClassLoader.java:366) I'd guess that there may be some classpath issue on your datanodes? I don't have

Re: CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
mapred.min.split.size = mapreduce.input.fileinputformat.split.maxsize = 1 mapred.max.split.size = mapreduce.input.fileinputformat.split.maxsize = 134217728 hive.hadoop.supports.splittable.combineinputformat = false My average file size is pretty small... it's usually between 500K and 20MB. So it

Re: binary column data consistency in hive table copy

2015-09-30 Thread Ujjwal Wadhawan
Great! Thank you all for your inputs. -Ujjwal On Tue, Sep 15, 2015 at 8:08 AM, Gabriel Balan wrote: > Hi > > You see "1w==" when you do a CTAS into a table using text files and > lazysimpleserde > because in that case binary columns are stored as base64. > > That

Re: Force users to specify partition indexes in queries

2015-09-30 Thread Smit Shah
hive.mapred.mode = strict is what I need but I want them to restrict to only one particular index and that too inside a particular max range. I think hive hooks seems like what I need but my problem is anyone can override the config in the session. So if someone discovers that its enforced using

Getting dot files for DAGs

2015-09-30 Thread James Pirz
I am using Tez 0.7.0 on Hadopp 2.6 to run Hive queries. I am interested in checking DAGs for my queries visually, and I realized that I can do that by graphviz once I can get "dot" files of my DAGs. My issue is I can not find those files, they are not in the log directory of Yarn or Hadoop or

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Ryan Harris
without seeing the code I really can't help. Have you written other functioning UDFs? Are you aware of the requirements? https://cwiki.apache.org/confluence/display/Hive/HivePlugins From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com] Sent: Wednesday, September 30, 2015 3:19 PM To:

RE: CombineHiveInputFormat not working

2015-09-30 Thread Ryan Harris
I would suggest trying: set hive.hadoop.supports.splittable.combineinputformat = true; you might also need to increase mapreduce.input.fileinputformat.split.minsize to something larger, like 32MB set mapreduce.input.fileinputformat.split.minsize = 33554432; Depending on your hadoop distro and

Re: CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
I'm running with CDH 5.3.3 (Hadoop 2.5.0 + cdh patches)... so those two issues are hopefully not an issue. I'll try the two configs suggested and report back. Thanks! On Wed, Sep 30, 2015 at 3:14 PM, Ryan Harris wrote: > I would suggest trying: > > set

Re: Hive Generic UDF invoking Hbase

2015-09-30 Thread Jason Dere
So your custom UDF is using org.apache.hadoop.hbase.client.HTable​? How do you resolve your UDF JAR (and this class) on the Hive client - are you doing ADD JAR, or are your UDF JARs and HBase JARs in your Hive class path? ​ From: Ryan Harris

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Yogesh Keshetty
Ryan - Yes, I have written UDFs and Generic UDFs before but this is the first time I wrote a UDF that calls Hbase tables. Jason - Yes in my Generic UDF I am using org.apache.hadoop.hbase.client.HTable​ , On the hive side we should set some auxiliary jars property to add Hbase related jars.

Re: Getting dot files for DAGs

2015-09-30 Thread Jianfeng (Jeff) Zhang
Hi James, It is under the working directory of the yarn container (should be the first container which is AM) Best Regard, Jeff Zhang From: James Pirz > Reply-To: "u...@tez.apache.org"

Re: Hive Generic UDF invoking Hbase

2015-09-30 Thread Jason Dere
Not totally familiar with the aux jars property .. does that make sure that the JAR is shipped as part of the MR job? If it does not, you could try adding the necessary jars using ADD JAR to see if that is the issue From: Yogesh Keshetty

Re: Getting dot files for DAGs

2015-09-30 Thread Hitesh Shah
The .dot file is generated into the Tez Application Master’s container log dir. Firstly, you need to figure out the yarn application in which the query/Tez DAG ran. Once you have the applicationId, you can use one of these 2 approaches: 1) Go to the YARN ResourceManager UI, find the

Re: Getting dot files for DAGs

2015-09-30 Thread James Pirz
Thanks. I could locate them in the proper container's log directory and visualize them. I was at the wrong node, assuming that they would be available on any of the node, but they are really dumped in one of the nodes. On Wed, Sep 30, 2015 at 7:00 PM, Hitesh Shah wrote: >

Re: Getting dot files for DAGs

2015-09-30 Thread Russell Jurney
In tryiegggmbgfzbbvtcsvvedq. On Thursday, October 1, 2015, Jörn Franke wrote: > Why not use tez ui? > > Le jeu. 1 oct. 2015 à 2:29, James Pirz > a écrit : > >> I am using Tez 0.7.0 on Hadopp

Re: Getting dot files for DAGs

2015-09-30 Thread Jörn Franke
Why not use tez ui? Le jeu. 1 oct. 2015 à 2:29, James Pirz a écrit : > I am using Tez 0.7.0 on Hadopp 2.6 to run Hive queries. > I am interested in checking DAGs for my queries visually, and I realized > that I can do that by graphviz once I can get "dot" files of my DAGs.

Re: Force users to specify partition indexes in queries

2015-09-30 Thread Ashutosh Chauhan
For your second use case, there is another config : *hive.limit.query.max.table.partition* set it to number of partitions you want to allow in a query. On Wed, Sep 30, 2015 at 5:01 AM, Smit Shah wrote: > hive.mapred.mode = strict is what I need but I want them to restrict to >

Insert overwrite to directory stored as Avro

2015-09-30 Thread Kiran T
Hi, I am trying to insert overwrite to a directory on S3 to avro format which is failing. *Query 1: * insert overwrite directory 's3a://**/' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'