Re: Hive and engine performance tez vs mr
Try setting the below in Hive and see what happens..btw what are you configs in hive if any? set mapred.map.tasks = 20; On Thu, Apr 2, 2015 at 11:01 AM, Erwan MAS er...@mas.nom.fr wrote: Hello , I have a issue on hive , with tez engine . When try to execute a query , with tez engine , the query is 9 times slower than map/reduce . The query is a left outer join on two table using orc storage . With map/reduce i have : Job 0 : Map 27 Reduce 256 Job 1 : Map 27 Reduce 256 Time taken 110 sec With tez i have : Map 1 : 1/1 Map 4 : 3/3 Reducer 2: 256/256 Reducer 3: 256/256 Time taken 930 sec With my configuration tez want to use only one mapper for some part . How to increase this number of mapper ? Which variable on hive , i must set to change this behavior ? My context : Hive 0.13 on Hortonworks 2.1 -- / Erwan MAS /\ | mailto:er...@mas.nom.fr |_/ ___| | \___\__/
Re: Over-logging by ORC packages
Sorry for the excessive logging. The pushdown logging should only be at the start, is there a particular message that was being repeated per a row? Thanks, Owen On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas douglas.mo...@thinkbiganalytics.com wrote: On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions more INFO level hive.log entries from ORC packages were being logged. I feel these log entries should be at the DEBUG level. Is there an existing bug in Hive or ORC? Here is one example: 2015-04-06 15:12:43,212 INFO orc.OrcInputFormat (OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate: leaf-0 = (EQUALS company XYZ) leaf-1 = (EQUALS site DEF) leaf-2 = (EQUALS table ABC) expr = (and leaf-0 leaf-1 leaf-2) To get an acceptable amount of logging that did not fill /tmp we had to add these entries to /etc/hive/conf/hive-log4j.settings: log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA While I'm on the subject, to operationally harden Hive, I think Hive should use a more aggressive rolling file appender by default, one that can roll hourly or max size, compress the rolled logs… - Douglas
Re: Over-logging by ORC packages
Owen, we're seeing a millions of those log entries. There are three, one for each package listed below in the revised hive-log4j.settings., one full example provided below. Seems to repeat fewer than per-row (that would be billions). Perhaps repeats for each and every partition in a table (1000's to 10'000s). To reproduce, create a y/m/d partitioned table and do select * from table where year=2015 and month=4 and day = 6 limit 1; - Douglas From: Owen O'Malley omal...@apache.orgmailto:omal...@apache.org Reply-To: user@hive.apache.orgmailto:user@hive.apache.org Date: Mon, 6 Apr 2015 11:13:28 -0700 To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Over-logging by ORC packages Sorry for the excessive logging. The pushdown logging should only be at the start, is there a particular message that was being repeated per a row? Thanks, Owen On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas douglas.mo...@thinkbiganalytics.commailto:douglas.mo...@thinkbiganalytics.com wrote: On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions more INFO level hive.log entries from ORC packages were being logged. I feel these log entries should be at the DEBUG level. Is there an existing bug in Hive or ORC? Here is one example: 2015-04-06 15:12:43,212 INFO orc.OrcInputFormat (OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate: leaf-0 = (EQUALS company XYZ) leaf-1 = (EQUALS site DEF) leaf-2 = (EQUALS table ABC) expr = (and leaf-0 leaf-1 leaf-2) To get an acceptable amount of logging that did not fill /tmp we had to add these entries to /etc/hive/conf/hive-log4j.settings: log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA While I'm on the subject, to operationally harden Hive, I think Hive should use a more aggressive rolling file appender by default, one that can roll hourly or max size, compress the rolled logs… - Douglas
Re: Hive and engine performance tez vs mr
On Mon, Apr 06, 2015 at 12:15:05PM -0500, max scalf wrote: Try setting the below in Hive and see what happens..btw what are you configs in hive if any? set mapred.map.tasks = 20; Does not change the behavior :( -- / Erwan MAS /\ | mailto:er...@mas.nom.fr |_/ ___| | \___\__/
RE: Can WebHCat show non-MapReduce jobs?
Ping... not sure anybody has more ideas here...? Xiaoyong From: Xiaoyong Zhu [mailto:xiaoy...@microsoft.com] Sent: Thursday, March 26, 2015 9:07 AM To: user@hive.apache.org Cc: ekoif...@hortonworks.com Subject: RE: Can WebHCat show non-MapReduce jobs? Thanks for the reply, Eugene. However, when I try to list the jobs via WebHCat (via /templeton/v1/jobs): [{id:job_1427201295241_0001,detail:null},{id:job_1427201295241_0003,detail:null},{id:job_1427201295241_0005,detail:null}] As you can see, there are three jobs there: 0001,0003 and 0005. However, I submitted three Hive on Tez jobs. However, in YARN UI, I can see 6 jobs, where 0002, 0004, 0006 (which do not exist in WebHCat) are shown here. [cid:image001.png@01D07107.4B48B3F0] Maybe something is wrong with my configurations? Xiaoyong From: Eugene Koifman [mailto:ekoif...@hortonworks.com] Sent: Thursday, March 26, 2015 12:52 AM To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Can WebHCat show non-MapReduce jobs? https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs should produce all jobs (assuming the calling user has permissions to see them). templeton.Server.showJobList() has detailed JavaDoc From: Xiaoyong Zhu xiaoy...@microsoft.commailto:xiaoy...@microsoft.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Wednesday, March 25, 2015 at 5:35 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Can WebHCat show non-MapReduce jobs? It seems that WebHCat could only show the Map Reduce jobs - for example, if I submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is launched by TempletonControllerJob). Is this by design? Is there a way to return all type of jobs via WebHCat? Xiaoyong
Re: How efficient is memory allocation in tez.
I have a map join in which the smaller tables together are 200 MB and trying to have one block of main table be processed by one tez task. ... What am I missing and is this even the right way of approaching the problem ? You need to be more specific about the Hive version. Hive-13 needs ~6x the amount of map-join memory for Tez compared to Hive-14. Hive-1.0 branch is a bit better at estimating map-join sizes as well, since it counts the memory overheads of JavaDataModel. Hive-1.1 got a little worse, which will get fixed when we get to hive-1.2. But for the 1.x line, the approx size of data that fits within a map-join is (container Xmx - io.sort.mb)/3. This plays into the NewRatio settings in JDK7 as well, make sure you have set the new ratio to only 1/8th the memory instead of using 1/3rd default (which means 30% of your memory cannot be used by the sort buffer or the map-join since they are tenured data). Also running ³ANALYZE TABLE tbl compute statistics;² on the small tables will fill in the uncompressed size fields so that we don¹t estimate map-joins based on zlib sizes (which coincidentally is ~3x off). And if you still keep getting heap errors, I can take a look at it if you have a .hprof.bz2 file to share fix any corner cases we might¹ve missed. Cheers, Gopal PS: The current trunk implements a Grace HashJoin which is another approach to the memory limit problem - a more traditional solution than fixing mem sizes.
Re: Hive and engine performance tez vs mr
Erwan, Faced with a similar situation last week I found that decreasing mapred.max.split.size Increased my parallelism by 6x. Yes mapred even though it was a Tez job. I reduced it to 10mb from 256mb which I believe is the default. The other variables to try are: tez.grouping.min-size (make it smaller) tez.grouping.max-size (smaller as well) Good luck. On 4/6/15, 2:57 PM, Erwan MAS er...@mas.nom.fr wrote: On Mon, Apr 06, 2015 at 12:15:05PM -0500, max scalf wrote: Try setting the below in Hive and see what happens..btw what are you configs in hive if any? set mapred.map.tasks = 20; Does not change the behavior :( -- / Erwan MAS /\ | mailto:er...@mas.nom.fr |_/ ___| | \___\__/
Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
can you check that the schema in your metastore db has transaction related tables? You can find the list of tables in hive-txn-schema-0.14.0.mysql.sql, for example. From: Mich Talebzadeh m...@peridale.co.ukmailto:m...@peridale.co.uk Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Monday, April 6, 2015 at 8:05 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org, @Sanjiv Singh sanjiv.is...@gmail.commailto:sanjiv.is...@gmail.com Subject: RE: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager Thanks Sanjiv. Unfortunately after resetting hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager and doing property namehive.in.test/name valuetrue/value /property Still getting the same error message hive show databases; FAILED: LockException [Error 10280]: Error communicating with the metastore Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: 06 April 2015 15:21 To: user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager Not sure ..It should work Try adding below configuration and then check.. property namehive.in.test/name valuetrue/value /property Regards Sanjiv Singh Mob : +091 9990-447-339 On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh m...@peridale.co.ukmailto:m...@peridale.co.uk wrote: Hi, I turned on concurrency for hive for DML with settings in hive-site.xml as follows: hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on=true hive.compactor.worker.threads=2 hive.support.concurrency=true hive.enforce.bucketing=true hive.exec.dynamic.partition.mode=nonstrict Recycled connection to metastore and started hive server. Tried to query hive as follows: hive use asehadoop; FAILED: LockException [Error 10280]: Error communicating with the metastore Went back and set hive.txn.managerto default hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager and recycled again and all worked! Sounds like concurrency does not work or something extra I need to do? Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
hive 0.14 hive 1.1.0 lost some columinfo
I found difference form log: In hive 0.14 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, city_id, landing_page_type_id, landing_track_time, landing_url, nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, app_vers, nav_link_position, nav_button_position, nav_track_time, nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, nav_page_url, detl_button_position, manul_flag, manul_track_date, nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, cart_tcd, cart_tci, cart_postn_type] columnTypes=[string, bigint, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, int, string, string, string, bigint, string, string, string, string, string, string, string, string, bigint, string, string, string, string, bigint, string, int, string, string, string, int, string, string, int, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string] separator=[[B@e50bca4] nullstring=\N lastColumnTakesRest=false In hive 0.10 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, city_id, landing_page_type_id, landing_track_time, landing_url, nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, app_vers, nav_link_position, nav_button_position, nav_track_time, nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, nav_page_url, detl_button_position, manul_flag, manul_track_date, nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, cart_tcd, cart_tci, cart_postn_type, sessn_chanl_id, gu_sec_flg, detl_refer_page_type_id, detl_refer_page_value, detl_event_id, nav_refer_intrn_reslt_sum, nav_intrn_reslt_sum, nav_refer_intrn_kw, nav_intrn_kw, detl_track_time, cart_track_time] columnTypes=[string, bigint, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, int, string, string, string, bigint, string, string, string, string, string, string, string, string, bigint, string, string, string, string, bigint, string, int, string, string, string, int, string, string, int, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, bigint, bigint, string, string, string, string, string, string, string,
Re: documentation link wrong?
The TaskController class is gone from Hadoop 2.6 (the current stable release, where the link points) as well as 2.5.2, but I found it in 1.2.1: http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/TaskController.html . For now, I can change the WebHCat doc to that link. But a WebHCat expert should determine whether there's something equivalent in later versions of Hadoop. -- Lefty On Tue, Apr 7, 2015 at 12:48 AM, Xiaoyong Zhu xiaoy...@microsoft.com wrote: It seems that the link (Class TaskController http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html) is wrong in this page: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive it returns a 404 for me. Not sure what is the correct link… Xiaoyong
documentation link wrong?
It seems that the link (Class TaskControllerhttp://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html) is wrong in this page: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive it returns a 404 for me. Not sure what is the correct link... Xiaoyong
Re: documentation link wrong?
I've fixed the link to the TaskController class in four places: - WebHCat Reference -- Hive Job https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive#WebHCatReferenceHive-Results - WebHCat Reference -- MapReduce Job https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar#WebHCatReferenceMapReduceJar-Results - WebHCat Reference -- MapReduce Streaming Job https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceStream#WebHCatReferenceMapReduceStream-Results - WebHCat Reference -- Pig Job https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig#WebHCatReferencePig-Results Thanks for the tip, Xiaoyong. -- Lefty On Tue, Apr 7, 2015 at 1:22 AM, Lefty Leverenz leftylever...@gmail.com wrote: The TaskController class is gone from Hadoop 2.6 (the current stable release, where the link points) as well as 2.5.2, but I found it in 1.2.1: http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/TaskController.html . For now, I can change the WebHCat doc to that link. But a WebHCat expert should determine whether there's something equivalent in later versions of Hadoop. -- Lefty On Tue, Apr 7, 2015 at 12:48 AM, Xiaoyong Zhu xiaoy...@microsoft.com wrote: It seems that the link (Class TaskController http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html) is wrong in this page: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive it returns a 404 for me. Not sure what is the correct link… Xiaoyong
Unable to make Sort Merge Bucket Join work
Hi, I have two large tables which I need to perform an equijoinon. I have bucketed and sorted the two tables on the join key. I have then madethe following specifications when running the join SQL:- SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;SET hive.auto.convert.sortmerge.join=true;SET hive.optimize.bucketmapjoin=true;SET hive.optimize.bucketmapjoin.sortedmerge=true;SET hive.auto.convert.sortmerge.join.noconditionaltask=true; However I get this error:- FAILED: SemanticException [Error 10135]: Sort merge bucketedjoin could not be performed. If you really want to perform the operation,either set hive.optimize.bucketmapjoin.sortedmerge=false, or sethive.enforce.sortmergebucketmapjoin=false. What am I doing wrong? The version of Hive is0.13.0.2.1.2.0-402 Thanks
admin user in hive
Hi, I want to enable authentication+authorization on my hive server. But I want only one admin user who can create other user as admin/public. But by default any user who types 'set role admin' can become admin and grants any user any permission. How can i avoid this behavior? Hive version is 0.13
Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
Hi, I turned on concurrency for hive for DML with settings in hive-site.xml as follows: hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on=true hive.compactor.worker.threads=2 hive.support.concurrency=true hive.enforce.bucketing=true hive.exec.dynamic.partition.mode=nonstrict Recycled connection to metastore and started hive server. Tried to query hive as follows: hive use asehadoop; FAILED: LockException [Error 10280]: Error communicating with the metastore Went back and set hive.txn.manager to default hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager and recycled again and all worked! Sounds like concurrency does not work or something extra I need to do? Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
HiveServer2 addressing standby namenode
Hi, We get a lot of error messaged on the standby namenode indicating that hive is trying to address the standby namenode. As all of our jobs function normally, my guess is that Hive is constantly trying to address both namenodes and only works with the active one. Is this correct? Can this be modified so it will only address the active one and still maintain HA architecture ? Thanks, Daniel
Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
Not sure ..It should work Try adding below configuration and then check.. property namehive.in.test/name valuetrue/value /property Regards Sanjiv Singh Mob : +091 9990-447-339 On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh m...@peridale.co.uk wrote: Hi, I turned on concurrency for hive for DML with settings in hive-site.xml as follows: hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on=true hive.compactor.worker.threads=2 hive.support.concurrency=true hive.enforce.bucketing=true hive.exec.dynamic.partition.mode=nonstrict Recycled connection to metastore and started hive server. Tried to query hive as follows: hive use asehadoop; FAILED: LockException [Error 10280]: Error communicating with the metastore Went back and set hive.txn.manager to default hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager and recycled again and all worked! Sounds like concurrency does not work or something extra I need to do? Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
RE: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
Thanks Sanjiv. Unfortunately after resetting hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager and doing property namehive.in.test/name valuetrue/value /property Still getting the same error message hive show databases; FAILED: LockException [Error 10280]: Error communicating with the metastore Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: 06 April 2015 15:21 To: user@hive.apache.org Subject: Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager Not sure ..It should work Try adding below configuration and then check.. property namehive.in.test/name valuetrue/value /property Regards Sanjiv Singh Mob : +091 9990-447-339 On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh m...@peridale.co.uk wrote: Hi, I turned on concurrency for hive for DML with settings in hive-site.xml as follows: hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on=true hive.compactor.worker.threads=2 hive.support.concurrency=true hive.enforce.bucketing=true hive.exec.dynamic.partition.mode=nonstrict Recycled connection to metastore and started hive server. Tried to query hive as follows: hive use asehadoop; FAILED: LockException [Error 10280]: Error communicating with the metastore Went back and set hive.txn.manager to default hive.txn.manager= org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager and recycled again and all worked! Sounds like concurrency does not work or something extra I need to do? Thanks Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
Over-logging by ORC packages
On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions more INFO level hive.log entries from ORC packages were being logged. I feel these log entries should be at the DEBUG level. Is there an existing bug in Hive or ORC? Here is one example: 2015-04-06 15:12:43,212 INFO orc.OrcInputFormat (OrcInputFormat.java:setSearchArgument(298)) - ORC pushdown predicate: leaf-0 = (EQUALS company XYZ) leaf-1 = (EQUALS site DEF) leaf-2 = (EQUALS table ABC) expr = (and leaf-0 leaf-1 leaf-2) To get an acceptable amount of logging that did not fill /tmp we had to add these entries to /etc/hive/conf/hive-log4j.settings: log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA While I'm on the subject, to operationally harden Hive, I think Hive should use a more aggressive rolling file appender by default, one that can roll hourly or max size, compress the rolled logs… - Douglas