beeline client

2014-07-11 Thread Bogala, Chandra Reddy
Hi, Currently I am submitting multiple hive jobs using hive cli with hive -f from different scripts. All these jobs I could see in application tracker and these get processed in parallel. Now I planned to switch to HiveServer2 and submitting jobs using beeline client from multiple scripts

difference between partition by and distribute by in rank()

2014-07-11 Thread Eric Chu
Does anyone know what *rank() over(distribute by p_mfgr sort by p_name) * does exactly and how it's different from *rank() over(partition by p_mfgr order by p_name)*? Thanks, Eric

Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Nitin Pawar
In general principle, distribute by ensures each of N reducers gets non-overlapping ranges of X , but doesn't sort the output of each reducer. You end up with N or unsorted files with non-overlapping ranges. So this is more of a horizontal distribution of data. In my view, Partition by is more

Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Joshi, Rekha
Hi, Quite known, are order and sort reducer nuances related to total order in final output. One could simulate rank over() functionality by using distribute by () /sort by() on datasets{cluster by/ if same key} as in Edward

Re: beeline client

2014-07-11 Thread Xuefu Zhang
Chaudra, The difference you saw between Hive CLI and Beeline might indicate a bug. However, before making such a conclusion, could you give an example of your queries? Are the jobs you expect to run parallel for a single query? Please note that your script file is executed line by line in either

Re: Hive job scheduling

2014-07-11 Thread Xuefu Zhang
Or you can just run CRON tasks in your OS. On Thu, Jul 10, 2014 at 4:55 PM, moon soo Lee leemoon...@gmail.com wrote: for simpler use, Zeppelin (http://zeppelin-project.org) runs hive query with web based editor, and it's got cron tab style scheduler. Best, moon On Fri, Jul 11, 2014 at

hive plan generation

2014-07-11 Thread AnilKumar B
Hi, I am trying to generate hive plan as below. But even after creating the src table, I am facing, Table not found Exception due to MetaStore issue. Can any one help me in resolving this? private Driver createDriver() { HiveConf conf = new HiveConf(Driver.class);

Re: Hive job scheduling

2014-07-11 Thread Jerome Banks
Cheng, We are working on an exciting new project called Satisfaction, to handle next generation scheduling and workflow for Hive and other Hadoop/BigData technologies. We plan to open source sometime in the near future. Stay tuned !!! --- jerome On Fri, Jul 11, 2014 at 7:02 AM, Xuefu Zhang

JOIN query results not printing to cli - HELP please.

2014-07-11 Thread Sarfraz Ramay
Hi, A very strange thing is happening. I am running the TPC-H benchmark. I have loaded the tables on HDFS running in pseudo-distributed mode. When i query one table at a time select * from customer LIMIT 2; OR select * from NATION LIMIT 2; results are printed to the cli but as soon as i try

Re: Issue while running Hive 0.13

2014-07-11 Thread Jason Dere
Looking at that error online, I see http://slf4j.org/faq.html#compatibility Maybe try to find what version of the slf libraries you have installed (in hadoop? hive?), and try updating to later version. On Jul 10, 2014, at 9:57 PM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote:

Re: difference between partition by and distribute by in rank()

2014-07-11 Thread Eric Chu
Thanks for the responses. I understand DISTRIBUTE BY and SORT BY in the normal case (as described in the Hive doc); I just don't understand their behavior in the OVER clause with RANK, which apparently you can do. See ql/src/test/queries/clientpositive/windowing.q for example. Yes I saw Edward's

Re: hive plan generation

2014-07-11 Thread Abirami V
Hi Anil, If you use derby as metastore where ever you try to start hive cli it will create a metastore in that directory. First option-Use other rdbms such as postgresql, mysql, oracle ... Second option- conf.set(javax.jdo.option.ConnectionURL,