Re: Hive query optimization

2012-07-23 Thread Igor Tatarinov
Here is my 2 cents. The parameters you are looking at are quite specific. Unless you know what you are doing it might be hard to set them exactly right and they shouldn't make that much of a difference - again unless you know the specifics. What worked for me is using a single "wave" of reducers.

Re: Performance Issues in Hive with S3 and Partitions

2012-07-23 Thread Igor Tatarinov
Are you using EMR? Have you tried setting Hive.optimize.s3.query=true as mentioned in http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-version-details.html I haven't tried using that option myself. I am curious if it helps in your scenario. The above page also me

Performance Issues in Hive with S3 and Partitions

2012-07-23 Thread richin.jain
Hi, Sorry this is an AWS Hive Specific question. I have two External Hive tables for my custom logs. 1. flat directory structure on AWS S3, no partition and files in bz2 compressed format (few big files) 2. With 3 level of partitions on AWS S3 (lot of small uncompressed files) I noticed that

Re: Structs in Hive

2012-07-23 Thread Edward Capriolo
If you are writing a GenericUDF or serde and want to return struct types there are object inspectors to build structs. The Java type return is an Object[]. Hive expects that if the struct has 5 fields the object array will have a length of 5. On Mon, Jul 23, 2012 at 1:34 PM, wrote: > Cool. Thank

Re: Report Grnerating tools for hive

2012-07-23 Thread shashwat shriparv
Check out these... - Pentaho – http://www.pentaho.com/hadoop/, Business Intelligence Player Pentaho Embraces Hadoop - Intellicus – Intellicus to support Hadoop framework for Large Data

RE: Report Grnerating tools for hive

2012-07-23 Thread yogesh.kumar13
Hey Nitin I am not getting way to connect Pentaho with hive --service hiveserver, Please Nitin help and suggest. Regards Yogesh Kumar From: Nitin Pawar [nitinpawar...@gmail.com] Sent: Monday, July 23, 2012 10:56 PM To: user@hive.apache.org Subject: Re: Re

Re: Structs in Hive

2012-07-23 Thread kulkarni . swarnim
Cool. Thanks :) Also was just curious what do people generally use to write struct data in hive tables? I see that there is a STRUCT function defined that takes parameters and creates structs off them. Can we use a custom class as well? Thanks again. Sent from my iPhone On Jul 23, 2012, at 12

Re: Report Grnerating tools for hive

2012-07-23 Thread Nitin Pawar
just download from pentaho site and follow the instruction from the README file its straight forward On Mon, Jul 23, 2012 at 10:45 PM, wrote: > Hello Nitin, > > Would you please share how to install Pentaho on ubuntu, and to use it with > Hive. > > Thanks & Regards > Yogesh Kumar > ___

RE: Report Grnerating tools for hive

2012-07-23 Thread yogesh.kumar13
Hello Nitin, Would you please share how to install Pentaho on ubuntu, and to use it with Hive. Thanks & Regards Yogesh Kumar From: Nitin Pawar [nitinpawar...@gmail.com] Sent: Monday, July 23, 2012 5:27 PM To: user@hive.apache.org; bejoy...@yahoo.com Subje

Re: Structs in Hive

2012-07-23 Thread Edward Capriolo
in your case hbase has a custom serde, the Deserializer interface is what turns the value from the input format into something that hive can understand. HBase support uses the user specified table property columns.mapping as information for what it should parse out of the hbase result. On Mon, Jul

Structs in Hive

2012-07-23 Thread kulkarni.swar...@gmail.com
Hello, I kind of have a pretty basic question here. I am trying to read structs stored in HBase to be read by Hive. In what format should these structs be written so that they can be read? For instance, if my query has the following struct: s struct How should I be writing my data in HBase so t

Re: 回复: [ANNOUNCE] New PMC member - Ashutosh Chauhan

2012-07-23 Thread Aniket Mokashi
Congrats Ashutosh! ~Aniket On Wed, Jul 18, 2012 at 10:25 PM, Ashutosh Chauhan wrote: > Thanks, Andes and Bejoy ! > > Ashutosh > > On Tue, Jul 17, 2012 at 12:52 AM, Bejoy KS wrote: > >> ** >> Well deserved one. Congrats Ashutosh. >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typ

Re: Report Grnerating tools for hive

2012-07-23 Thread Nitin Pawar
microstrategy comes with a linux server but has a the BI tool limitation to windows, same goes with Tableau (Though I am not sure if they added support for mac) I have used pentaho and it worked well across linux, mac and windows. It also has open source edition (with lesser features) but that sho

Re: Report Grnerating tools for hive

2012-07-23 Thread Bejoy KS
Hi Yogesh I know micro strategy and tableau used for reporting on top of hive, but not sure on Mac support for those. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Date: Mon, 23 Jul 2012 11:31:03 To: Reply-To: user@hive.apache.org Subject: Repor

Report Grnerating tools for hive

2012-07-23 Thread yogesh.kumar13
Hi All, I am looking for the report generating tool over Apache Hadoop-hive, Please suggest me some of these tools which are easily compatible and better. I am using Hadoop-0.20.2 and hive-0.8.1 versions. O.S - Mac OS X 10.6.8 Thanks & Regards Yogesh Kumar Dhari Please do not print this em

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Nitin Pawar
manisha, set is variable key value pair, it does not execute any functions if you want to use it like that then I would recomment write a shell script which will generate all required things to a file and then execute that file with hive -f option On Mon, Jul 23, 2012 at 3:27 PM, Manisha Gayath

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Vinod Singh
You misread what I said. I meant to say that at run time we compile list of variables e.g. SET a.b.c=some_thing; SET x.y.z=other_thing; then concatenate them with query to come up with a final script, which will look like- SET a.b.c=some_thing; SET x.y.z=other_thing; INSERT OVERWRITE LOCAL DIRE

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Manisha Gayathri
Thanks Vinod. I tried concatenating variables. But that is also not possible as I see. set pqr = concat(foo,bar); set file_name= home/user/Desktop Then the file_name I am getting is *NOT* home/user/Desktop/foo_bar But what I am getting is, /home/user/Desktop/concat(foo,bar) On Mon, Jul 23, 20

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Vinod Singh
We generate variables dynamically and then create a final script file by concatenating variables (SET commands) and Hive queries. Then final script is executed. Probably you can adopt something similar approach. Thanks, Vinod 2012/7/23 Manisha Gayathri > Thanks again Vinod. > > Will try to find

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Manisha Gayathri
Thanks again Vinod. Will try to find a way to pass the directory URLs from outside then. And it would be grateful if you can direct me to any guide or documentation that describes how to pass values from outside of Hive. Thanks Manisha On Mon, Jul 23, 2012 at 2:08 PM, Vinod Singh wrote: > SET

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Vinod Singh
SET commands are handled differently and UDFs can't be invoked there. IMO you need to pass the directory location value from outside of Hive. That is how we do. Thanks, Vinod 2012/7/23 Manisha Gayathri > Hi Vinod, > > Thanks for the prompt reply. > Understood your point and sorry for not provid

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Manisha Gayathri
Hi Vinod, Thanks for the prompt reply. Understood your point and sorry for not providing the complete code segment earlier. I have the getFilePath function which should return a URL like this. home/user/Desktop/logDir/logs/log_0_testServer_2012_07_22 The defined function works perfectly if I pu

Re: Possibility of defining the Output directory programmatically

2012-07-23 Thread Vinod Singh
The output path in this query is already parameterized- *INSERT OVERWRITE LOCAL DIRECTORY 'file:///${hiveconf:file_name}'* Though UDF is not going to be invoked here. Thanks, Vinod 2012/7/23 Manisha Gayathri > Hi > > Is there any possibility of defining the output directory of a hive > query

Possibility of defining the Output directory programmatically

2012-07-23 Thread Manisha Gayathri
Hi Is there any possibility of defining the output directory of a hive query using a Hive UDF? In my UDF, I am passing 2 parameters (as follows) and this generates a file-system URL *getFilePath( "0","testServer" );* Can I use the above getFilePath( "0","testServer" ) value, as the Local Directo

SocketTimeoutException when insert into HBase by Hive

2012-07-23 Thread Cdy Chen
Hi all, When I usr 447 files which are 64M each one as input to insert into HBase, it throws SocketTimeoutException. But if I use smaller input, it works well. I guess it is related to Hadoop configuration. But how to configure? Thank you! Best Regards, Chen