Hi,
Currently I am submitting multiple hive jobs using hive cli with hive -f
from different scripts. All these jobs I could see in application tracker and
these get processed in parallel.
Now I planned to switch to HiveServer2 and submitting jobs using beeline client
from multiple scripts
Does anyone know what
*rank() over(distribute by p_mfgr sort by p_name) *
does exactly and how it's different from
*rank() over(partition by p_mfgr order by p_name)*?
Thanks,
Eric
In general principle,
distribute by ensures each of N reducers gets non-overlapping ranges of X ,
but doesn't sort the output of each reducer. You end up with N or unsorted
files with non-overlapping ranges. So this is more of a horizontal
distribution of data.
In my view,
Partition by is more
Hi,
Quite known, are order and sort reducer nuances related to total order in final
output.
One could simulate rank over() functionality by using distribute by () /sort
by() on datasets{cluster by/ if same key} as in Edward
Chaudra,
The difference you saw between Hive CLI and Beeline might indicate a bug.
However, before making such a conclusion, could you give an example of your
queries? Are the jobs you expect to run parallel for a single query? Please
note that your script file is executed line by line in either
Or you can just run CRON tasks in your OS.
On Thu, Jul 10, 2014 at 4:55 PM, moon soo Lee leemoon...@gmail.com wrote:
for simpler use, Zeppelin (http://zeppelin-project.org) runs hive query
with web based editor, and it's got cron tab style scheduler.
Best,
moon
On Fri, Jul 11, 2014 at
Hi,
I am trying to generate hive plan as below. But even after creating the
src table, I am facing, Table not found Exception due to MetaStore issue.
Can any one help me in resolving this?
private Driver createDriver() {
HiveConf conf = new HiveConf(Driver.class);
Cheng,
We are working on an exciting new project called Satisfaction, to handle
next generation scheduling and workflow for Hive and other Hadoop/BigData
technologies. We plan to open source sometime in the near future.
Stay tuned !!!
--- jerome
On Fri, Jul 11, 2014 at 7:02 AM, Xuefu Zhang
Hi,
A very strange thing is happening. I am running the TPC-H benchmark. I have
loaded the tables on HDFS running in pseudo-distributed mode. When i query
one table at a time
select * from customer LIMIT 2; OR
select * from NATION LIMIT 2; results are printed to the cli but as soon as
i try
Looking at that error online, I see http://slf4j.org/faq.html#compatibility
Maybe try to find what version of the slf libraries you have installed (in
hadoop? hive?), and try updating to later version.
On Jul 10, 2014, at 9:57 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Thanks for the responses. I understand DISTRIBUTE BY and SORT BY in the
normal case (as described in the Hive doc); I just don't understand their
behavior in the OVER clause with RANK, which apparently you can do. See
ql/src/test/queries/clientpositive/windowing.q for example.
Yes I saw Edward's
Hi Anil,
If you use derby as metastore where ever you try to start hive cli it will
create a metastore in that directory.
First option-Use other rdbms such as postgresql, mysql, oracle ...
Second option- conf.set(javax.jdo.option.ConnectionURL,
12 matches
Mail list logo