Re: What to choose, hive 0.11 (marked as stable release ) or hive 0.12

2014-02-27 Thread twinkle sachdeva
any thought regarding what is more stable? hive 0.11 or hive 0.12?


On Thu, Feb 27, 2014 at 12:42 PM, twinkle sachdeva 
twinkle.sachd...@gmail.com wrote:

 Hi,

 I am planning to use hive for my use case, but I am confused between hive
 0.12 and hive 0.11.

 Hive 0.12 has been there for reasonable amount of time , while hive 0.11
 has been marked as stable release.

 Are there any known critical issues which are there in hive 0.12, that it
 has not been marked as stable or is it due to some policy which drives
 marking a release as stable due to which it has not been marked stable.

 Please provide some inputs.

 Thanks and Regards,
 Twinkle



Re: What to choose, hive 0.11 (marked as stable release ) or hive 0.12

2014-02-27 Thread Nitin Pawar
here is a mail from Edward on a different thread

All stable really is is a sym link, Hive is heavily unit and integration
tested. Also the release is not made after some manual testing as well.
releases have historically been very stable. 12 has been out for some time.

You can use 0.12, It has more things compared to 0.11 and have not seen
anyone complaining much about that release yet.

Thanks,
Nitin



On Thu, Feb 27, 2014 at 4:24 PM, twinkle sachdeva 
twinkle.sachd...@gmail.com wrote:

 any thought regarding what is more stable? hive 0.11 or hive 0.12?


 On Thu, Feb 27, 2014 at 12:42 PM, twinkle sachdeva 
 twinkle.sachd...@gmail.com wrote:

 Hi,

 I am planning to use hive for my use case, but I am confused between hive
 0.12 and hive 0.11.

 Hive 0.12 has been there for reasonable amount of time , while hive 0.11
 has been marked as stable release.

 Are there any known critical issues which are there in hive 0.12, that it
 has not been marked as stable or is it due to some policy which drives
 marking a release as stable due to which it has not been marked stable.

 Please provide some inputs.

 Thanks and Regards,
 Twinkle





-- 
Nitin Pawar


Hive query parser bug resulting in FAILED: NullPointerException null

2014-02-27 Thread Krishna Rao
Hi all,

we've experienced a bug which seems to be caused by having a query
constraint involving partitioned columns. The following query results in
FAILED: NullPointerException null being returned nearly instantly:

EXPLAIN SELECT
  col1
FROM
  tbl1
WHERE
(part_col1 = 2014 AND part_col2 = 2)
OR part_col1  2014;

The exception doesn't happen if any of the conditions are removed. The
table is defined like the following:

CREATE TABLE tbl1 (
  col1STRING,
  ...
  col12   STRING
)
PARTITIONED BY (part_col1 INT, part_col2 TINYINT, part_col3 TINYINT)
STORED AS SEQUENCEFILE;


Unfortunately I cannot construct a test case to replicate this. Seen as
though it appears to be a query parser bug, I thought the following would
replicate it:

CREATE TABLE tbl2 LIKE tbl1;
EXPLAIN SELECT
  col1
FROM
  tbl2
WHERE
(part_col1 = 2014 AND part_col2 = 2)
OR part_col1  2014;

But it does not. Could it somehow be data specific? Does the query parser
use partition information?

Are there any logs I could see to investigate this further? Or is this a
known bug?

We're using hive 0.10.0-cdh4.4.0.


Cheers,

Krishna


Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-27 Thread Norbert Burger
Thanks everyone for the feedback.  Just to follow up in case someone else
runs into this: I can confirm that local client works around the OOMEs, but
it's still very slow.

It does seem like we were hitting some combination of HIVE-4051 and
HIVE-5158.  We'll try reducing partition count first, and then switch to
0.12.0 if that doesn't improve things significantly.

Fwiw - http://www.slideshare.net/oom65/optimize-hivequeriespptx also has
has some good rules-of-thumb.

Norbert


On Sat, Feb 22, 2014 at 1:27 PM, Stephen Sprague sprag...@gmail.com wrote:

 yeah. That traceback pretty much spells it out - its metastore related and
 that's where the partitions are stored.

 I'm with the others on this. HiveServer2 is still a little jankey on
 memory management.  I bounce mine once a day at midnight just to play it
 safe (and because i can.)

 Again, for me, i use the hive local client for production jobs and remote
 client for adhoc stuff.

 you may wish to confirm the local hive client has no problem with your
 query.

 other than that you either increase your heap size on the HS2 process and
 hope for the best and/or file a bug report.

 bottom line hiveserver2 isn't production bullet proof just yet, IMHO.
 Others may disagree.

 Regards,
 Stephen.



 On Sat, Feb 22, 2014 at 9:50 AM, Norbert Burger 
 norbert.bur...@gmail.comwrote:

 Thanks all for the quick feedback.

 I'm a bit surprised to learn 15k is considered too much, but we can work
 around it.  I guess I'm also curious why the query planner needs to know
 about all partitions even in the case of simple select/limit queries, where
 the query might target only a single partition.

 Here's the client-side OOME with HADOOP_HEAPSIZE=2048:


 https://gist.githubusercontent.com/nburger/3286d2052060e2efe161/raw/dc30231086803c1d33b9137b5844d2d0e20e350d/gistfile1.txt

 This was from a CDH4.3.0 client hitting HIveServer2.  Any idea what's
 consuming the heap?

 Norbert


 On Sat, Feb 22, 2014 at 10:32 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Dont make tbales with that many partitions. It is an anti pattern. I
 hwve tables with 2000 partitions a day and that is rewlly to many. Hive
 needs go load that informqtion into memory to plan the query.


 On Saturday, February 22, 2014, Terje Marthinussen 
 tmarthinus...@gmail.com wrote:
  Query optimizer in hive is awful on memory consumption. 15k partitions
 sounds a bit early for it to fail though..
 
  What is your heap size?
 
  Regards,
  Terje
 
  On 22 Feb 2014, at 12:05, Norbert Burger norbert.bur...@gmail.com
 wrote:
 
  Hi folks,
 
  We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.
 
  In Hive, we have an external table backed by HDFS which has a 3-level
 partitioning scheme that currently has 15000+ partitions.
 
  Within the last day or so, queries against this table have started
 failing.  A simple query which shouldn't take very long at all (select *
 from ... limit 10) fails after several minutes with a client OOME.  I get
 the same outcome on count(*) queries (which I thought wouldn't send any
 data back to the client).  Increasing heap on both client and server JVMs
 (via HADOOP_HEAPSIZE) doesn't have any impact.
 
  We were only able to work around the client OOMEs by reducing the
 number of partitions in the table.
 
  Looking at the MySQL querylog, my thought is that the Hive client is
 quite busy making requests for partitions that doesn't contribute to the
 query.  Has anyone else had similar experience against tables this size?
 
  Thanks,
  Norbert
 

 --
 Sorry this was sent from mobile. Will do less grammar and spell check
 than usual.






Log Progress of Queries

2014-02-27 Thread Edson Ramiro
Hi all,

I was using hive-0.11 and I used to get the query status from log files.

But I changed from 0.11.0 to 0.12.0 and, even if it's configured, hive is
not more generating the logs with the progress of the queries. Does the
query status have been disabled or may be I've misconfigured hive? These
are my configs:

property
  namehive.querylog.location/name
  value/tmp/${user.name}/value
  description
Location of Hive run time structured log file
  /description
/property

property
  namehive.querylog.enable.plan.progress/name
  valuetrue/value
  description
Whether to log the plan's progress every time a job's progress is
checked.
These logs are written to the location specified by
hive.querylog.location
  /description
/property

This is the logging I used to get.

Counters
plan={queryId:xxx_20131213115858_3699e7ff-8ff5-4dd7-91df-983b0588682b,queryType:null,queryAttributes:{queryString:
 insert overwrite table q7_volume_shipping_tmp select*  from   (
selectn1.n_name as supp_nation, n2.n_name as cust_nation,
n1.n_nationkey as s_nationkey, n2.n_nationkey as c_nationkey
fromnation n1 join nation n2on  n1.n_name = 'FRANCE' and
n2.n_name = 'GERMANY' UNION ALL selectn1.n_name as supp_nation,
n2.n_name as cust_nation, n1.n_nationkey as s_nationkey,n2.n_nationkey
as c_nationkey fromnation n1 join nation n2on  n2.n_name =
'FRANCE' and n1.n_name = 'GERMANY' )
a},queryCounters:null,stageGraph:{nodeType:STAGE,roots:null,adjacencyList:[{node:Stage-1,children:[Stage-2],adjacencyType:CONJUNCTIVE},{node:Stage-10,children:[Stage-2],adjacencyType:CONJUNCTIVE},{node:Stage-2,children:[Stage-8],adjacencyType:CONJUNCTIVE},{node:Stage-2,children:[Stage-8],adjacencyType:CONJUNCTIVE},{node:Stage-8,children:[Stage-5,Stage-4,Stage-6],adjacencyType:DISJUNCTIVE},{node:Stage-8,children:[Stage-5,Stage-4,Stage-6],adjacencyType:DISJUNCTIVE},{node:Stage-5,children:[Stage-0],adjacencyType:CONJUNCTIVE},{node:Stage-4,children:[Stage-

Thanks in advance,

  Edson Ramiro


RE: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-27 Thread java8964
That is good to know.
We are using Hive 0.9. Right now the biggest table contains 2 years data, and 
we partitioned by hour, as the data volume is big.
So right now, it has 2*365*24 around 17000+ partitions. So far we didn't see 
too much problem yet, but I do have some concerns about it.
We are using IBM BigInsight, which is using derby as the hive metastore, not as 
mysql as my most experience was on.
Yong

From: norbert.bur...@gmail.com
Date: Thu, 27 Feb 2014 07:57:05 -0500
Subject: Re: Metastore performance on HDFS-backed table with 15000+ partitions
To: user@hive.apache.org

Thanks everyone for the feedback.  Just to follow up in case someone else runs 
into this: I can confirm that local client works around the OOMEs, but it's 
still very slow.
It does seem like we were hitting some combination of HIVE-4051 and HIVE-5158.  
We'll try reducing partition count first, and then switch to 0.12.0 if that 
doesn't improve things significantly.


Fwiw - http://www.slideshare.net/oom65/optimize-hivequeriespptx also has has 
some good rules-of-thumb.



Norbert

On Sat, Feb 22, 2014 at 1:27 PM, Stephen Sprague sprag...@gmail.com wrote:


yeah. That traceback pretty much spells it out - its metastore related and 
that's where the partitions are stored.





I'm with the others on this. HiveServer2 is still a little jankey on memory 
management.  I bounce mine once a day at midnight just to play it safe (and 
because i can.)





Again, for me, i use the hive local client for production jobs and remote 
client for adhoc stuff.

you may wish to confirm the local hive client has no problem with your query.





other than that you either increase your heap size on the HS2 process and hope 
for the best and/or file a bug report.





bottom line hiveserver2 isn't production bullet proof just yet, IMHO. Others 
may disagree.

Regards,
Stephen.







On Sat, Feb 22, 2014 at 9:50 AM, Norbert Burger norbert.bur...@gmail.com 
wrote:




Thanks all for the quick feedback.
I'm a bit surprised to learn 15k is considered too much, but we can work around 
it.  I guess I'm also curious why the query planner needs to know about all 
partitions even in the case of simple select/limit queries, where the query 
might target only a single partition.






Here's the client-side OOME with HADOOP_HEAPSIZE=2048:
https://gist.githubusercontent.com/nburger/3286d2052060e2efe161/raw/dc30231086803c1d33b9137b5844d2d0e20e350d/gistfile1.txt







This was from a CDH4.3.0 client hitting HIveServer2.  Any idea what's consuming 
the heap?
Norbert





On Sat, Feb 22, 2014 at 10:32 AM, Edward Capriolo edlinuxg...@gmail.com wrote:






Dont make tbales with that many partitions. It is an anti pattern. I hwve 
tables with 2000 partitions a day and that is rewlly to many. Hive needs go 
load that informqtion into memory to plan the query.



On Saturday, February 22, 2014, Terje Marthinussen tmarthinus...@gmail.com 
wrote:

 Query optimizer in hive is awful on memory consumption. 15k partitions sounds 
 a bit early for it to fail though..

 What is your heap size?

 Regards,
 Terje

 On 22 Feb 2014, at 12:05, Norbert Burger norbert.bur...@gmail.com wrote:








 Hi folks,

 We are running CDH 4.3.0 Hive (0.10.0+121) with a MySQL metastore.

 In Hive, we have an external table backed by HDFS which has a 3-level 
 partitioning scheme that currently has 15000+ partitions.








 Within the last day or so, queries against this table have started failing.  
 A simple query which shouldn't take very long at all (select * from ... 
 limit 10) fails after several minutes with a client OOME.  I get the same 
 outcome on count(*) queries (which I thought wouldn't send any data back to 
 the client).  Increasing heap on both client and server JVMs (via 
 HADOOP_HEAPSIZE) doesn't have any impact.








 We were only able to work around the client OOMEs by reducing the number of 
 partitions in the table.

 Looking at the MySQL querylog, my thought is that the Hive client is quite 
 busy making requests for partitions that doesn't contribute to the query.  
 Has anyone else had similar experience against tables this size?








 Thanks,
 Norbert


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.






  

RE: Hive query parser bug resulting in FAILED: NullPointerException null

2014-02-27 Thread java8964
Can you reproduce with an empty table? I can't reproduce it.
Also, can you paste the stack trace?
Yong

From: krishnanj...@gmail.com
Date: Thu, 27 Feb 2014 12:44:28 +
Subject: Hive query parser bug resulting in FAILED: NullPointerException null
To: user@hive.apache.org

Hi all,
we've experienced a bug which seems to be caused by having a query constraint 
involving partitioned columns. The following query results in FAILED: 
NullPointerException null being returned nearly instantly:



EXPLAIN SELECT  col1FROM  tbl1WHERE(part_col1 = 2014 AND part_col2 = 2)OR 
part_col1  2014;



The exception doesn't happen if any of the conditions are removed. The table is 
defined like the following:
CREATE TABLE tbl1 (  col1STRING,

  ...  col12   STRING)PARTITIONED BY (part_col1 INT, 
part_col2 TINYINT, part_col3 TINYINT)STORED AS SEQUENCEFILE;



Unfortunately I cannot construct a test case to replicate this. Seen as though 
it appears to be a query parser bug, I thought the following would replicate it:


CREATE TABLE tbl2 LIKE tbl1;EXPLAIN SELECT  col1FROM  tbl2WHERE(part_col1 = 
2014 AND part_col2 = 2)OR part_col1  2014;



But it does not. Could it somehow be data specific? Does the query parser use 
partition information?
Are there any logs I could see to investigate this further? Or is this a known 
bug?


We're using hive 0.10.0-cdh4.4.0.

Cheers,
Krishna   

Re: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col2, 2:_col3]

2014-02-27 Thread Kumar V
Hi,
   I wrote a similar simple UDTF and a new table. This simple UDTF does work on 
hive 0.10.  
But my original one doesn't.  Still don't understand why.

Does the fact that the original query works with the setting 'set 
hive.optimize.ppd=true' give any clue ?

Please let me know.




On Tuesday, February 25, 2014 3:28 PM, java8964 java8...@hotmail.com wrote:
  
Works for me on 0.10.

Yong




Date: Tue, 25 Feb 2014 11:37:32 -0800
From: kumarbuyonl...@yahoo.com
Subject: Re: java.lang.RuntimeException: cannot find field key from [0:_col0, 
1:_col2, 2:_col3]
To: user@hive.apache.org


Hi,
    Thanks for looking into it.
I am also trying this on hive 0.11 to see if it works there.  If you get a 
chance to reproduce this problem on hive 0.10, please let me know.

Thanks.



On Monday, February 24, 2014 10:59 PM, java8964 java8...@hotmail.com wrote:
  
My guess is that your UDTF will return an array of struct. 

I don't have Hive 0.10 in handy right now, but I write a simple UDTF to return 
an array of struct to test on Hive 0.12 release.

hive desc test;
OK
id                  int                 None                
name                string              None                
Time taken: 0.074 seconds, Fetched: 2 row(s)
hive select * from test;
OK
1Apples,Bananas,Carrots
Time taken: 0.08 seconds, Fetched: 1 row(s)

The pair UDTF will output Apples,Bananas,Carrots
to 
Apples, Bananas
Apples, Carrots
Bananas, Carrots
an array of 2 elements struct.

hive select id, name, m1, m2 from test lateral view pair(name) p as m1, m2 
where m1 is not null;
OK
1Apples,Bananas,CarrotsApplesBananas
1Apples,Bananas,CarrotsApplesCarrots
1Apples,Bananas,CarrotsBananasCarrots
Time taken: 7.683 seconds, Fetched: 3 row(s)

hive select id, name, m1, m2 from test lateral view pair(name) p as m1, m2 
where m1 = 'Apples';
OK
1Apples,Bananas,CarrotsApplesBananas
1Apples,Bananas,CarrotsApplesCarrots
Time taken: 7.726 seconds, Fetched: 2 row(s)

hive set hive.optimize.ppd=true;
hive select id, name, m1, m2 from test lateral view pair(name) p as m1, m2 
where m1 is not null;
Total MapReduce jobs = 1
OK
1Apples,Bananas,CarrotsApplesBananas
1Apples,Bananas,CarrotsApplesCarrots
1Apples,Bananas,CarrotsBananasCarrots
Time taken: 7.716 seconds, Fetched: 3 row(s)
I cannot reproduce your error in Hive 0.12, as you can see. 

I can test on Hive 0.10 tomorrow when I have time, but can your test your case 
in Hive 0.12, or review your UDTF again?

Yong




Date: Mon, 24 Feb 2014 07:09:44 -0800
From: kumarbuyonl...@yahoo.com
Subject: Re: java.lang.RuntimeException: cannot find field key from [0:_col0, 
1:_col2, 2:_col3]
To: user@hive.apache.org; kumarbuyonl...@yahoo.com


As suggested, I changed the query like this:

select x.f1,x,f2,x,f3,x.f4
from (
select e.f1 as f1,e.f2 as f2,e.f3 as f3,e.f4 as f4 from mytable LATERAL VIEW 
myfunc(p1,p2,p3,p4) e  as f1,f2,f3,f4 where lang=123) x 
where x.f3 is not null;

And it still doesn't work. I am getting the same error.  
If anyone has any ideas, please let me know.

Thanks.



On Friday, February 21, 2014 11:27 AM, Kumar V kumarbuyonl...@yahoo.com wrote:
  
Line 316 in my UDTF where is shows the error is the line where I call forward().

The whole trace is :

Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 
1:_col2, 2:_col6, 3:_col7, 4:_col8, 5:_col9]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:346)
at
 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:128)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:128)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:128)
at
 org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:85)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at
 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at 
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at 

Re: ORC 'BETWEEN' Error

2014-02-27 Thread Prasanth Jayachandran
Hi Martin

This is an known issue and its fixed in hive trunk. It should be available in 
0.13 release.

https://issues.apache.org/jira/browse/HIVE-5601

Thanks
Prasanth Jayachandran

On Feb 26, 2014, at 8:55 AM, Martin, Nick nimar...@pssd.com wrote:

 Hi all,
  
 (Running Hive 12.0)
  
 I have two tables and both are stored as ORC. I attempted to insert via 
 select from tbl1 to tbl2 using ‘BETWEEN’ in my where clause to narrow down 
 some dates. Something like so:
  
 “Insert into tbl1 select col1, col2 from tbl2 where col1 between 2 and 4”
  
 I kept hitting the error pasted below. So, I switched to a different approach 
 to see if it would work:
  
 “Insert into tbl1 select col1,col2 from tbl2 where col1=2 and col1=4”
  
 Hit the same error. When I just use “where col1=2” in the where clause the 
 insert will run fine.
  
 Is this expected?
  
  
  
 2014-02-26 11:22:53,755 WARN [main] org.apache.hadoop.conf.Configuration: 
 job.xml:an attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 2014-02-26 11:22:53,782 WARN [main] org.apache.hadoop.conf.Configuration: 
 job.xml:an attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 2014-02-26 11:22:53,902 INFO [main] 
 org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 2014-02-26 11:22:53,930 INFO [main] 
 org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
 2014-02-26 11:22:53,975 INFO [main] 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
 at 10 second(s).
 2014-02-26 11:22:53,975 INFO [main] 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
 started
 2014-02-26 11:22:53,985 INFO [main] org.apache.hadoop.mapred.YarnChild: 
 Executing with tokens:
 2014-02-26 11:22:53,985 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: 
 mapreduce.job, Service: job_1392147432508_1108, Ident: 
 (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@249c2715)
 2014-02-26 11:22:54,057 INFO [main] org.apache.hadoop.mapred.YarnChild: 
 Sleeping for 0ms before retrying again. Got null now.
 2014-02-26 11:22:54,352 WARN [main] org.apache.hadoop.conf.Configuration: 
 job.xml:an attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 2014-02-26 11:22:54,363 WARN [main] org.apache.hadoop.conf.Configuration: 
 job.xml:an attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 2014-02-26 11:22:54,409 INFO [main] org.apache.hadoop.mapred.YarnChild: 
 mapreduce.cluster.local.dir for child: 
 /hdfs/01/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/02/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/03/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/04/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/05/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/06/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/07/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/08/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/09/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/10/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/11/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108,/hdfs/12/hadoop/yarn/local/usercache/myusername/appcache/application_1392147432508_1108
 2014-02-26 11:22:54,481 WARN [main] org.apache.hadoop.conf.Configuration: 
 job.xml:an attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 2014-02-26 11:22:54,486 WARN [main] org.apache.hadoop.conf.Configuration: 
 job.xml:an attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 2014-02-26 11:22:54,542 INFO [main] 
 org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is 
 deprecated. Instead, use mapreduce.task.attempt.id
 2014-02-26 11:22:54,542 INFO [main] 
 org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is 
 deprecated. Instead, use mapreduce.task.ismap
 2014-02-26 11:22:54,543 INFO [main] 
 org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is 
 deprecated. Instead, use mapreduce.cluster.local.dir
 2014-02-26 11:22:54,543 INFO [main] 
 org.apache.hadoop.conf.Configuration.deprecation: mapred.cache.localFiles is 
 deprecated. Instead, use mapreduce.job.cache.local.files
 2014-02-26 11:22:54,543 INFO [main] 
 org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is 
 deprecated. Instead, use mapreduce.job.id
 2014-02-26 11:22:54,544 INFO [main] 
 org.apache.hadoop.conf.Configuration.deprecation: 

move hive tables from one cluster to another cluster

2014-02-27 Thread soniya B
Dear experts,

I want to move my hive tables from one cluster to another cluster. how can
i do it?

Thanks
Soniya.


Re: move hive tables from one cluster to another cluster

2014-02-27 Thread Krishnan K
1. you can use distcp to copy the files to the new cluster
2. rebuild metadata


On Thu, Feb 27, 2014 at 8:07 PM, soniya B soniya.bigd...@gmail.com wrote:

 Dear experts,

 I want to move my hive tables from one cluster to another cluster. how can
 i do it?

 Thanks
 Soniya.



Re: move hive tables from one cluster to another cluster

2014-02-27 Thread soniya B
Hi,

I have moved warehouse file to another cluster. but still i didn't see the
tables on the other cluster. How to rebulid the metadata?

Thanks
Soniya

On Fri, Feb 28, 2014 at 9:26 AM, Krishnan K kkrishna...@gmail.com wrote:

 1. you can use distcp to copy the files to the new cluster
 2. rebuild metadata


 On Thu, Feb 27, 2014 at 8:07 PM, soniya B soniya.bigd...@gmail.comwrote:

 Dear experts,

 I want to move my hive tables from one cluster to another cluster. how
 can i do it?

 Thanks
 Soniya.