Re: Hive and engine performance tez vs mr

2015-04-06 Thread max scalf
Try setting the below in Hive and see what happens..btw what are you
configs in hive if any?

set mapred.map.tasks = 20;


On Thu, Apr 2, 2015 at 11:01 AM, Erwan MAS er...@mas.nom.fr wrote:

 Hello ,

 I have a issue on hive , with tez engine . When  try to execute a query ,
 with
 tez engine , the query is 9 times slower than map/reduce .

 The query is a left outer join on two table using orc storage .

 With map/reduce i have  :
 Job 0 : Map 27 Reduce 256
 Job 1 : Map 27 Reduce 256
 Time taken 110 sec

 With tez i have :
 Map 1 :  1/1 Map 4 : 3/3 Reducer 2: 256/256 Reducer 3: 256/256
 Time taken 930 sec

 With my configuration tez want to use only one mapper for some part .

 How to increase this number of mapper ?
 Which variable on hive , i must set to change this behavior  ?

 My context :
Hive 0.13 on Hortonworks 2.1

 --
  
 / Erwan MAS /\
| mailto:er...@mas.nom.fr   |_/
 ___|   |
 \___\__/



Re: Over-logging by ORC packages

2015-04-06 Thread Owen O'Malley
Sorry for the excessive logging. The pushdown logging should only be at the
start, is there a particular message that was being repeated per a row?

Thanks,
   Owen

On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas 
douglas.mo...@thinkbiganalytics.com wrote:

   On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that
 Gigabytes and millions more INFO level hive.log entries from ORC packages
 were being logged.

 I feel these log entries should be at the DEBUG level.

  Is there an existing bug in Hive or ORC?

   Here is one example:

  2015-04-06 15:12:43,212 INFO  orc.OrcInputFormat
 (OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate:
 leaf-0 = (EQUALS company XYZ)

 leaf-1 = (EQUALS site DEF)

 leaf-2 = (EQUALS table ABC)

 expr = (and leaf-0 leaf-1 leaf-2)


  To get an acceptable amount of logging that did not fill /tmp we had to
 add these entries to /etc/hive/conf/hive-log4j.settings:

 log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA

 log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA

 log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA



  While I'm on the subject, to operationally harden Hive, I think Hive
 should use a more aggressive rolling file appender by default, one that
 can roll hourly or max size, compress the rolled logs…


  - Douglas



Re: Over-logging by ORC packages

2015-04-06 Thread Moore, Douglas
Owen, we're seeing a millions of those log entries.

There are three, one for each package listed below in the revised 
hive-log4j.settings., one full example provided below.
Seems to repeat fewer than per-row (that would be billions). Perhaps repeats 
for each and every partition in a table (1000's to 10'000s).

To reproduce, create a y/m/d partitioned table and do select * from table 
where year=2015 and month=4 and day = 6 limit 1;
- Douglas

From: Owen O'Malley omal...@apache.orgmailto:omal...@apache.org
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
Date: Mon, 6 Apr 2015 11:13:28 -0700
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Over-logging by ORC packages

Sorry for the excessive logging. The pushdown logging should only be at the 
start, is there a particular message that was being repeated per a row?

Thanks,
   Owen

On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas 
douglas.mo...@thinkbiganalytics.commailto:douglas.mo...@thinkbiganalytics.com
 wrote:
On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes 
and millions more INFO level hive.log entries from ORC packages were being 
logged.
I feel these log entries should be at the DEBUG level.
Is there an existing bug in Hive or ORC?

Here is one example:
2015-04-06 15:12:43,212 INFO  orc.OrcInputFormat 
(OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate: leaf-0 = 
(EQUALS company XYZ)
leaf-1 = (EQUALS site DEF)
leaf-2 = (EQUALS table ABC)
expr = (and leaf-0 leaf-1 leaf-2)

To get an acceptable amount of logging that did not fill /tmp we had to add 
these entries to /etc/hive/conf/hive-log4j.settings:
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA


While I'm on the subject, to operationally harden Hive, I think Hive should use 
a more aggressive rolling file appender by default, one that can roll hourly or 
max size, compress the rolled logs…

- Douglas



Re: Hive and engine performance tez vs mr

2015-04-06 Thread Erwan MAS
On Mon, Apr 06, 2015 at 12:15:05PM -0500, max scalf wrote:
 Try setting the below in Hive and see what happens..btw what are you
 configs in hive if any?
 
 set mapred.map.tasks = 20;
 

Does not change the behavior :(

--
 
/ Erwan MAS /\
   | mailto:er...@mas.nom.fr   |_/
___|   |
\___\__/


RE: Can WebHCat show non-MapReduce jobs?

2015-04-06 Thread Xiaoyong Zhu
Ping... not sure anybody has more ideas here...?

Xiaoyong

From: Xiaoyong Zhu [mailto:xiaoy...@microsoft.com]
Sent: Thursday, March 26, 2015 9:07 AM
To: user@hive.apache.org
Cc: ekoif...@hortonworks.com
Subject: RE: Can WebHCat show non-MapReduce jobs?

Thanks for the reply, Eugene. However, when I try to list the jobs via WebHCat 
(via /templeton/v1/jobs):

[{id:job_1427201295241_0001,detail:null},{id:job_1427201295241_0003,detail:null},{id:job_1427201295241_0005,detail:null}]
As you can see, there are three jobs there: 0001,0003 and 0005. However, I 
submitted three Hive on Tez jobs.

However, in YARN UI, I can see 6 jobs, where 0002, 0004, 0006 (which do not 
exist in WebHCat) are shown here.

[cid:image001.png@01D07107.4B48B3F0]
Maybe something is wrong with my configurations?

Xiaoyong

From: Eugene Koifman [mailto:ekoif...@hortonworks.com]
Sent: Thursday, March 26, 2015 12:52 AM
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Can WebHCat show non-MapReduce jobs?

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs should 
produce all jobs (assuming the calling user has permissions to see them).
templeton.Server.showJobList() has detailed JavaDoc

From: Xiaoyong Zhu xiaoy...@microsoft.commailto:xiaoy...@microsoft.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Wednesday, March 25, 2015 at 5:35 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Can WebHCat show non-MapReduce jobs?

It seems that WebHCat could only show the Map Reduce jobs - for example, if I 
submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob 
ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is 
launched by TempletonControllerJob).

Is this by design? Is there a way to return all type of jobs via WebHCat?

Xiaoyong



Re: How efficient is memory allocation in tez.

2015-04-06 Thread Gopal Vijayaraghavan

 I have a map join in which the smaller tables together are 200 MB and
trying to  have one block of main table be processed by one tez task.
...
 What am I missing and is this even the right way of approaching the
problem ?

You need to be more specific about the Hive version. Hive-13 needs ~6x the
amount of map-join memory for Tez compared to Hive-14.

Hive-1.0 branch is a bit better at estimating map-join sizes as well,
since it counts the memory overheads of JavaDataModel.

Hive-1.1 got a little worse, which will get fixed when we get to hive-1.2.

But for the 1.x line, the approx size of data that fits within a map-join
is (container Xmx - io.sort.mb)/3.

This plays into the NewRatio settings in JDK7 as well, make sure you have
set the new ratio to only 1/8th the memory instead of using 1/3rd default
(which means 30% of your memory cannot be used by the sort buffer or the
map-join since they are tenured data).

Also running ³ANALYZE TABLE tbl compute statistics;² on the small tables
will fill in the uncompressed size fields so that we don¹t estimate
map-joins based on zlib sizes (which coincidentally is ~3x off).

And if you still keep getting heap errors, I can take a look at it if you
have a .hprof.bz2 file to share  fix any corner cases we might¹ve missed.

Cheers,
Gopal
PS: The current trunk implements a Grace HashJoin which is another
approach to the memory limit problem - a more traditional solution than
fixing mem sizes.




Re: Hive and engine performance tez vs mr

2015-04-06 Thread Carter Shanklin
Erwan,

Faced with a similar situation last week I found that decreasing

mapred.max.split.size

Increased my parallelism by 6x. Yes mapred even though it was a Tez job. I
reduced it to 10mb from 256mb which I believe is the default.

The other variables to try are:
tez.grouping.min-size (make it smaller)
tez.grouping.max-size (smaller as well)


Good luck.


On 4/6/15, 2:57 PM, Erwan MAS er...@mas.nom.fr wrote:

On Mon, Apr 06, 2015 at 12:15:05PM -0500, max scalf wrote:
 Try setting the below in Hive and see what happens..btw what are you
 configs in hive if any?
 
 set mapred.map.tasks = 20;
 

Does not change the behavior :(

--
 
/ Erwan MAS /\
   | mailto:er...@mas.nom.fr   |_/
___|   |
\___\__/



Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread Eugene Koifman
can you check that the schema in your metastore db has transaction related 
tables?
You can find the list of tables in hive-txn-schema-0.14.0.mysql.sql, for 
example.

From: Mich Talebzadeh m...@peridale.co.ukmailto:m...@peridale.co.uk
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Monday, April 6, 2015 at 8:05 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org, @Sanjiv Singh 
sanjiv.is...@gmail.commailto:sanjiv.is...@gmail.com
Subject: RE: Concurrency issue, Setting hive.txn.manager to 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

Thanks Sanjiv.

Unfortunately after resetting


hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
and doing


property

  namehive.in.test/name

  valuetrue/value

 /property


Still getting the same error message

hive show databases;
FAILED: LockException [Error 10280]: Error communicating with the metastore


Mich Talebzadeh

http://talebzadehmich.wordpress.com

Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
Sent: 06 April 2015 15:21
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Concurrency issue, Setting hive.txn.manager to 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

Not sure ..It should work
Try adding below configuration and then check..

 property

  namehive.in.test/name

  valuetrue/value

 /property


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh 
m...@peridale.co.ukmailto:m...@peridale.co.uk wrote:

Hi,



I turned on concurrency for hive for DML with settings in hive-site.xml as 
follows:



hive.support.concurrency=true

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

hive.compactor.initiator.on=true

hive.compactor.worker.threads=2

hive.support.concurrency=true

hive.enforce.bucketing=true

hive.exec.dynamic.partition.mode=nonstrict

 Recycled connection to metastore and started hive server. Tried to query hive 
as follows:

 hive use asehadoop;

FAILED: LockException [Error 10280]: Error communicating with the metastore



Went back and set hive.txn.managerto default

hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager



and recycled again and all worked!



Sounds like concurrency does not work or something extra I need to do?



Thanks



Mich Talebzadeh



http://talebzadehmich.wordpress.com



Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache



NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.







hive 0.14 hive 1.1.0 lost some columinfo

2015-04-06 Thread r7raul1...@163.com
I found difference form log:
In hive 0.14
DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, 
city_id, landing_page_type_id, landing_track_time, landing_url, 
nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, 
nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, 
nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, 
app_vers, nav_link_position, nav_button_position, nav_track_time, 
nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, 
detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, 
detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, 
cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, 
ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, 
brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, 
nav_page_url, detl_button_position, manul_flag, manul_track_date, 
nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, 
nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, 
nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, 
nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, 
detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, 
cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, 
cart_tcd, cart_tci, cart_postn_type] columnTypes=[string, bigint, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, int, string, string, 
string, string, string, string, int, string, string, string, bigint, string, 
string, string, string, string, string, string, string, bigint, string, string, 
string, string, bigint, string, int, string, string, string, int, string, 
string, int, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string] separator=[[B@e50bca4] nullstring=\N 
lastColumnTakesRest=false
 
In hive 0.10
DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, 
city_id, landing_page_type_id, landing_track_time, landing_url, 
nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, 
nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, 
nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, 
app_vers, nav_link_position, nav_button_position, nav_track_time, 
nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, 
detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, 
detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, 
cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, 
ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, 
brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, 
nav_page_url, detl_button_position, manul_flag, manul_track_date, 
nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, 
nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, 
nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, 
nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, 
detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, 
cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, 
cart_tcd, cart_tci, cart_postn_type, sessn_chanl_id, gu_sec_flg, 
detl_refer_page_type_id, detl_refer_page_value, detl_event_id, 
nav_refer_intrn_reslt_sum, nav_intrn_reslt_sum, nav_refer_intrn_kw, 
nav_intrn_kw, detl_track_time, cart_track_time] columnTypes=[string, bigint, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, int, string, 
string, string, string, string, string, int, string, string, string, bigint, 
string, string, string, string, string, string, string, string, bigint, string, 
string, string, string, bigint, string, int, string, string, string, int, 
string, string, int, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, bigint, bigint, string, string, string, string, 
string, string, string, 

Re: documentation link wrong?

2015-04-06 Thread Lefty Leverenz
The TaskController class is gone from Hadoop 2.6 (the current stable
release, where the link points) as well as 2.5.2, but I found it in 1.2.1:
http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/TaskController.html
.

For now, I can change the WebHCat doc to that link.  But a WebHCat expert
should determine whether there's something equivalent in later versions of
Hadoop.

-- Lefty

On Tue, Apr 7, 2015 at 12:48 AM, Xiaoyong Zhu xiaoy...@microsoft.com
wrote:

  It seems that the link (Class TaskController
 http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html)
 is wrong in this page:



 https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive



 it returns a 404 for me. Not sure what is the correct link…



 Xiaoyong





documentation link wrong?

2015-04-06 Thread Xiaoyong Zhu
It seems that the link (Class 
TaskControllerhttp://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html)
 is wrong in this page:

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive

it returns a 404 for me. Not sure what is the correct link...

Xiaoyong



Re: documentation link wrong?

2015-04-06 Thread Lefty Leverenz
I've fixed the link to the TaskController class in four places:

   - WebHCat Reference -- Hive Job
   
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive#WebHCatReferenceHive-Results
   - WebHCat Reference -- MapReduce Job
   
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar#WebHCatReferenceMapReduceJar-Results
   - WebHCat Reference -- MapReduce Streaming Job
   
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceStream#WebHCatReferenceMapReduceStream-Results
   - WebHCat Reference -- Pig Job
   
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig#WebHCatReferencePig-Results

Thanks for the tip, Xiaoyong.

-- Lefty

On Tue, Apr 7, 2015 at 1:22 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 The TaskController class is gone from Hadoop 2.6 (the current stable
 release, where the link points) as well as 2.5.2, but I found it in 1.2.1:
 http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/TaskController.html
 .

 For now, I can change the WebHCat doc to that link.  But a WebHCat expert
 should determine whether there's something equivalent in later versions of
 Hadoop.

 -- Lefty

 On Tue, Apr 7, 2015 at 12:48 AM, Xiaoyong Zhu xiaoy...@microsoft.com
 wrote:

  It seems that the link (Class TaskController
 http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html)
 is wrong in this page:



 https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive



 it returns a 404 for me. Not sure what is the correct link…



 Xiaoyong







Unable to make Sort Merge Bucket Join work

2015-04-06 Thread Timothy Manuel

Hi,


 
I have two large tables which I need to perform an equijoinon. I have bucketed 
and sorted the two tables on the join key. I have then madethe following 
specifications when running the join SQL:-


 
SET 
hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;SET 
hive.auto.convert.sortmerge.join=true;SET hive.optimize.bucketmapjoin=true;SET 
hive.optimize.bucketmapjoin.sortedmerge=true;SET 
hive.auto.convert.sortmerge.join.noconditionaltask=true;
However I get this error:-


 
FAILED: SemanticException [Error 10135]: Sort merge bucketedjoin could not be 
performed. If you really want to perform the operation,either set 
hive.optimize.bucketmapjoin.sortedmerge=false, or 
sethive.enforce.sortmergebucketmapjoin=false.


 
What am I doing wrong? The version of Hive is0.13.0.2.1.2.0-402


 
Thanks




admin user in hive

2015-04-06 Thread Megha Garg
Hi,

I want to enable authentication+authorization on my hive server. But I want
only one admin user who can create other user as admin/public. But by
default any user who types 'set role admin' can become admin and grants any
user any permission. How can i avoid this behavior?

Hive version is 0.13


Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread Mich Talebzadeh
Hi,

 

I turned on concurrency for hive for DML with settings in hive-site.xml as
follows:

 

hive.support.concurrency=true

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

hive.compactor.initiator.on=true

hive.compactor.worker.threads=2

hive.support.concurrency=true

hive.enforce.bucketing=true

hive.exec.dynamic.partition.mode=nonstrict

 

Recycled connection to metastore and started hive server. Tried to query
hive as follows:

 

hive use asehadoop;

FAILED: LockException [Error 10280]: Error communicating with the metastore

 

Went back and set hive.txn.manager to default

hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager 

 

and recycled again and all worked!

 

Sounds like concurrency does not work or something extra I need to do?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

 



HiveServer2 addressing standby namenode

2015-04-06 Thread Daniel Haviv
Hi,
We get a lot of error messaged on the standby namenode indicating that hive
is trying to address the standby namenode.
As all of our jobs function normally, my guess is that Hive is constantly
trying to address both namenodes and only works with the active one.

Is this correct?
Can this be modified so it will only address the active one and still
maintain HA architecture ?

Thanks,
Daniel


Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread @Sanjiv Singh
Not sure ..It should work

Try adding below configuration and then check..

 property
  namehive.in.test/name
  valuetrue/value
 /property



Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh m...@peridale.co.uk wrote:

 Hi,



 I turned on concurrency for hive for DML with settings in hive-site.xml as
 follows:



 hive.support.concurrency=true

 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

 hive.compactor.initiator.on=true

 hive.compactor.worker.threads=2

 hive.support.concurrency=true

 hive.enforce.bucketing=true

 hive.exec.dynamic.partition.mode=nonstrict



 Recycled connection to metastore and started hive server. Tried to query
 hive as follows:



 hive use asehadoop;

 FAILED: LockException [Error 10280]: Error communicating with the metastore



 Went back and set hive.txn.manager to default

 hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager



 and recycled again and all worked!



 Sounds like concurrency does not work or something extra I need to do?



 Thanks



 Mich Talebzadeh



 http://talebzadehmich.wordpress.com



 Publications due shortly:

 Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
 Coherence Cache



 NOTE: The information in this email is proprietary and confidential. This
 message is for the designated recipient only, if you are not the intended
 recipient, you should destroy it immediately. Any information in this
 message shall not be understood as given or endorsed by Peridale Ltd, its
 subsidiaries or their employees, unless expressly so stated. It is the
 responsibility of the recipient to ensure that this email is virus free,
 therefore neither Peridale Ltd, its subsidiaries nor their employees accept
 any responsibility.







RE: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread Mich Talebzadeh
Thanks Sanjiv.

 

Unfortunately after resetting

 

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

and doing 

 

property
  namehive.in.test/name
  valuetrue/value
 /property

 

 

Still getting the same error message

 

hive show databases;

FAILED: LockException [Error 10280]: Error communicating with the metastore

 

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] 
Sent: 06 April 2015 15:21
To: user@hive.apache.org
Subject: Re: Concurrency issue, Setting hive.txn.manager to 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

 

Not sure ..It should work  

Try adding below configuration and then check..

 property
  namehive.in.test/name
  valuetrue/value
 /property

 




Regards
Sanjiv Singh
Mob :  +091 9990-447-339

 

On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh m...@peridale.co.uk wrote:

Hi,

 

I turned on concurrency for hive for DML with settings in hive-site.xml as 
follows:

 

hive.support.concurrency=true

hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

hive.compactor.initiator.on=true

hive.compactor.worker.threads=2

hive.support.concurrency=true

hive.enforce.bucketing=true

hive.exec.dynamic.partition.mode=nonstrict

 Recycled connection to metastore and started hive server. Tried to query hive 
as follows:

 hive use asehadoop;

FAILED: LockException [Error 10280]: Error communicating with the metastore

 

Went back and set hive.txn.manager to default

hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager 

 

and recycled again and all worked!

 

Sounds like concurrency does not work or something extra I need to do?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

 

 



Over-logging by ORC packages

2015-04-06 Thread Moore, Douglas
On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes 
and millions more INFO level hive.log entries from ORC packages were being 
logged.
I feel these log entries should be at the DEBUG level.
Is there an existing bug in Hive or ORC?

Here is one example:
2015-04-06 15:12:43,212 INFO  orc.OrcInputFormat 
(OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate: leaf-0 = 
(EQUALS company XYZ)
leaf-1 = (EQUALS site DEF)
leaf-2 = (EQUALS table ABC)
expr = (and leaf-0 leaf-1 leaf-2)

To get an acceptable amount of logging that did not fill /tmp we had to add 
these entries to /etc/hive/conf/hive-log4j.settings:
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA


While I'm on the subject, to operationally harden Hive, I think Hive should use 
a more aggressive rolling file appender by default, one that can roll hourly or 
max size, compress the rolled logs…

- Douglas