Hive counters for records read and written
Hi, I can see that two new counters have been added for hive (RECORDS_IN/RECORDS_OUT) in hive 0.14. Prior to this release which counters could be used to get the records read by hive job and records written. Because i noticed that in hive 0.14 for a few hive jobs i see map_input_records but the map_output_records counter is 0 but the job actually writes something to output table and the hive log also gives that count correctly. In this case how else can we get records read and records written in releases before 0.14. Regards Hemanth
Re: collecting lists in non-natural order / SORT BY on columns which are not part of the final result set
Hi Robin, thanks a lot for the hint about the WITH statement. This is comparable to using a subselect. E.g., I could rewrite my query from: SELECT actor, COLLECT_LIST(action) AS actions FROM foobar GROUP BY actor SORT BY actor, ts; to: SELECT actor, COLLECT_LIST(action) AS actions FROM ( SELECT actor, ts, action FROM foobar ORDER BY actor, ts ) foobar_sorted GROUP BY actor; This has several disadvantages: a) we have two MR jobs instead of one b) the global ORDER BY is forcing data to flow through a single reducer, which increases latency c) when reading the sorted results from the first job, data for the same reduce group can come from two different input splits. Correct sorting is not guaranteed for these reduce groups. I created a JIRA for a new UDAF COLLECT_LIST_SORTED: https://issues.apache.org/jira/browse/HIVE-11022 Best regards Michael On 2015-06-13, at 16:52, Robin Verlangen ro...@us2.nl wrote: Hi Michael, You can try using a with statement, pseudo: WITH input AS (SELECT colA, colB FROM table ORDER BY colA ASC) SELECT colB FROM input Best regards, Robin Verlangen Chief Data Architect W http://www.robinverlangen.nl E ro...@us2.nl What is CloudPelican? Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. On Sat, Jun 13, 2015 at 3:50 PM, Michael Häusler mich...@akatose.de wrote: Hi there, imagine you have a table of time-series transactions, e.g.: CREATE TABLE foobar (actor BIGINT, ts BIGINT, action STRING); containing the rows: 1 2000bar 1 1000foo 2 1500foo An interesting query would be to get a denormalized view on all actions of an actor sorted by timestamp: 1 [foo, bar] 2 [foo] This would require collecting the actions in a list in non-natural order. A naive query would look like this: SET hive.map.aggr = false; SELECT actor, COLLECT_LIST(action) AS actions FROM foobar GROUP BY actor SORT BY actor, ts; Unfortunately, as the column ts is not in the final result set, this produces a SemanticException [Error 10004]: Line 9:11 Invalid table alias or column reference 'ts'. I do understand that it is difficult to allow a global ORDER BY on column that is not in the result set. But the SORT BY only needs to ensure order within the reduce group. Is there a way to get this behaviour in Hive? Best regards Michael
Re: Alter table statement for create table like-tables
Probably has to do with the fact that it is an Avro table. I don't have any experience using Avro, but maybe take a look at https://issues.apache.org/jira/browse/HIVE-7446 for some of the issues described there, or maybe look at the test that was added for that Jira. On Jun 16, 2015, at 2:42 AM, Julian Keppel juliankeppel1...@gmail.commailto:juliankeppel1...@gmail.com wrote: *Push* Does no one have an idea or hit similar issues? 2015-06-09 15:46 GMT+02:00 Julian Keppel juliankeppel1...@gmail.commailto:juliankeppel1...@gmail.com: I use Hive Version 1.1.0 in Cloudera CDH 5.4.0. I have created an external table: CREATE EXTERNAL TABLE *tableA* ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'location' TBLPROPERTIES ('avro.schema.url'=schema_location'); Now I wanted to create a managed table with exactly the same columns except of one additional column (a generated hash key). The only idea I had was to create the table with: CREATE TABLE *tableB* LIKE *tableA*; And then add the additional column with: ALTER TABLE *tableB* ADD COLUMNS (new_column INT); The statements run without any errors or exceptions (even in the log files under /var/log/hive) but the new column doesn't appear. What am I doing wrong? Or is this not possible? What other ideas do you have for my use case? Thank you in advance for your help!
error on hive insert query
iam using *hive 1.0.0* and *tez 0.5.2.* when i set *hive.execution.engine* value in hive-site.xml to *tez*select query works well... but in case of *insert* getting error. the query is : *insert into table tablename values(intvalue,'string value');* and the error is : *FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.Tez Task*
Re: Merging small files in partitions
https://github.com/edwardcapriolo/filecrush On Tue, Jun 16, 2015 at 5:05 PM, Chagarlamudi, Prasanth prasanth.chagarlam...@epsilon.com wrote: Hello, I am looking for an optimized way to merge small files in hive partitions into one big file. I came across *Alter Table/Partition Concatenate * https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate. Doc says this only works for RCFiles. I wish there is something similar for TEXT FILE format. Any suggestions? Thanks in advance Prasanth -- This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.
Re: error on hive insert query
Unless you can pinpoint the problem to something Tez specific, hive-specific questions might be better off being asked on user@hive initially as there is a larger group there that understands Hive as compared to the Tez community. FWIW, TezTask error 1 means “something in the Hive layer using Tez or somewhere within Tez itself” had a failure. Without additional logs/stack trace, this is usually impossible to debug/pinpoint where the problem is. thanks — Hitesh On Jun 16, 2015, at 10:19 AM, Sateesh Karuturi sateesh.karutu...@gmail.com wrote: iam using hive 1.0.0 and tez 0.5.2. when i set hive.execution.engine value in hive-site.xml to tezselect query works well... but in case of insert getting error. the query is : insert into table tablename values(intvalue,'string value'); and the error is : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.Tez Task
Merging small files in partitions
Hello, I am looking for an optimized way to merge small files in hive partitions into one big file. I came across Alter Table/Partition Concatenate https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate. Doc says this only works for RCFiles. I wish there is something similar for TEXT FILE format. Any suggestions? Thanks in advance Prasanth This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.
Re: error on hive insert query
You can refer to the following link to figure out whether this is tez problem https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App Best Regard, Jeff Zhang From: Sateesh Karuturi sateesh.karutu...@gmail.commailto:sateesh.karutu...@gmail.com Reply-To: u...@tez.apache.orgmailto:u...@tez.apache.org u...@tez.apache.orgmailto:u...@tez.apache.org Date: Wednesday, June 17, 2015 at 1:19 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org, u...@tez.apache.orgmailto:u...@tez.apache.org u...@tez.apache.orgmailto:u...@tez.apache.org Subject: error on hive insert query iam using hive 1.0.0 and tez 0.5.2. when i set hive.execution.engine value in hive-site.xml to tezselect query works well... but in case of insert getting error. the query is : insert into table tablename values(intvalue,'string value'); and the error is : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.Tez Task
hive cbo calciteplanner
Hi all, I'm reading the source code of Hive cbo (CalcaitePlanner), but I find it hard to follow. Listed below are some of the questions: 1. What's the relationship between HepPlanner and HiveVolcanoPlanner? 2. I don't have a clue about these concepts: clusters, traitDef and collectGarbage(). Thanks for any help. best regards, -zhenhua
Re: Re: delta file compact take no effect
Any help? r7raul1...@163.com From: r7raul1...@163.com Date: 2015-06-11 15:53 To: user Subject: Re: Re: delta file compact take no effect SHOW COMPACTIONS; I can see some info Database Table Partition Type State Worker Start Time default u_data_txn NULL MAJOR initiated NULL 0 Time taken: 0.024 seconds, Fetched: 2 row(s) But after that I still see many delta file. r7raul1...@163.com From: Elliot West Date: 2015-06-11 15:25 To: user@hive.apache.org Subject: Re: delta file compact take no effect What do you see if you issue: SHOW COMPACTIONS; On Thursday, 11 June 2015, r7raul1...@163.com r7raul1...@163.com wrote: I use hive 1.1.0 on hadoop 2.5.0 After I do some update operation on table u_data_txn. My table create many delta file like: drwxr-xr-x - hdfs hive 0 2015-02-06 22:52 /user/hive/warehouse/u_data_txn/delta_001_001 -rw-r--r-- 3 hdfs supergroup 346453 2015-02-06 22:52 /user/hive/warehouse/u_data_txn/delta_001_001/bucket_0 -rw-r--r-- 3 hdfs supergroup 415924 2015-02-06 22:52 /user/hive/warehouse/u_data_txn/delta_001_001/bucket_1 drwxr-xr-x - hdfs hive 0 2015-02-06 22:58 /user/hive/warehouse/u_data_txn/delta_002_002 -rw-r--r-- 3 hdfs supergroup 807 2015-02-06 22:58 /user/hive/warehouse/u_data_txn/delta_002_002/bucket_0 -rw-r--r-- 3 hdfs supergroup 779 2015-02-06 22:58 /user/hive/warehouse/u_data_txn/delta_002_002/bucket_1 drwxr-xr-x - hdfs hive 0 2015-02-06 22:59 /user/hive/warehouse/u_data_txn/delta_003_003 -rw-r--r-- 3 hdfs supergroup 817 2015-02-06 22:59 /user/hive/warehouse/u_data_txn/delta_003_003/bucket_0 -rw-r--r-- 3 hdfs supergroup 767 2015-02-06 22:59 /user/hive/warehouse/u_data_txn/delta_003_003/bucket_1 drwxr-xr-x - hdfs hive 0 2015-02-06 23:01 /user/hive/warehouse/u_data_txn/delta_004_004 -rw-r--r-- 3 hdfs supergroup 817 2015-02-06 23:01 /user/hive/warehouse/u_data_txn/delta_004_004/bucket_0 -rw-r--r-- 3 hdfs supergroup 779 2015-02-06 23:01 /user/hive/warehouse/u_data_txn/delta_004_004/bucket_1 drwxr-xr-x - hdfs hive 0 2015-02-06 23:03 /user/hive/warehouse/u_data_txn/delta_005_005 -rw-r--r-- 3 hdfs supergroup 817 2015-02-06 23:03 /user/hive/warehouse/u_data_txn/delta_005_005/bucket_0 -rw-r--r-- 3 hdfs supergroup 779 2015-02-06 23:03 /user/hive/warehouse/u_data_txn/delta_005_005/bucket_1 drwxr-xr-x - hdfs hive 0 2015-02-10 21:34 /user/hive/warehouse/u_data_txn/delta_006_006 -rw-r--r-- 3 hdfs supergroup 821 2015-02-10 21:34 /user/hive/warehouse/u_data_txn/delta_006_006/bucket_0 drwxr-xr-x - hdfs hive 0 2015-02-10 21:35 /user/hive/warehouse/u_data_txn/delta_007_007 -rw-r--r-- 3 hdfs supergroup 821 2015-02-10 21:35 /user/hive/warehouse/u_data_txn/delta_007_007/bucket_0 drwxr-xr-x - hdfs hive 0 2015-03-24 01:16 /user/hive/warehouse/u_data_txn/delta_008_008 -rw-r--r-- 3 hdfs supergroup 1670 2015-03-24 01:16 /user/hive/warehouse/u_data_txn/delta_008_008/bucket_0 -rw-r--r-- 3 hdfs supergroup 1767 2015-03-24 01:16 /user/hive/warehouse/u_data_txn/delta_008_008/bucket_1 I try ALTER TABLE u_data_txn COMPACT 'MAJOR'; The delta still exist. Then I try ALTER TABLE u_data_txn COMPACT 'MINOR'; The delta still exist. How to merge delta file? My config is: property namehive.support.concurrency/name valuetrue/value /property property namehive.enforce.bucketing/name valuetrue/value /property property namehive.exe.dynamic.partition.mode/name valuenonstrict/value /property property namehive.txn.manager/name valueorg.apache.hadoop.hive.ql.lockmgr.DbTxnManager/value /property property namehive.compactor.initiator.on/name valuetrue/value /property property namehive.compactor.worker.threads/name value4/value /property r7raul1...@163.com
Re: delta file compact take no effect
Is the config you give on your metastore or your client? The worker thread and initiator must be started on the metastore. Alan. r7raul1...@163.com mailto:r7raul1...@163.com June 16, 2015 at 22:38 Any help? r7raul1...@163.com
Left function
Hi every one, how to get leftmost length of characters from the string in hive? In Mysql or sq has specific function LEFT(string,length) Could you please help any other way to achieve this scenario? Regards Ravisnkar
prepareStatement.getMetaData problem
Hi, can anyone to fix this bug? I think it's important https://issues.apache.org/jira/browse/HIVE-6992 is there any other way to getMetaData before run the sql? thanks!
Re: Alter table statement for create table like-tables
*Push* Does no one have an idea or hit similar issues? 2015-06-09 15:46 GMT+02:00 Julian Keppel juliankeppel1...@gmail.com: I use Hive Version 1.1.0 in Cloudera CDH 5.4.0. I have created an external table: CREATE EXTERNAL TABLE *tableA* ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'location' TBLPROPERTIES ('avro.schema.url'=schema_location'); Now I wanted to create a managed table with exactly the same columns except of one additional column (a generated hash key). The only idea I had was to create the table with: CREATE TABLE *tableB* LIKE *tableA*; And then add the additional column with: ALTER TABLE *tableB* ADD COLUMNS (new_column INT); The statements run without any errors or exceptions (even in the log files under /var/log/hive) but the new column doesn't appear. What am I doing wrong? Or is this not possible? What other ideas do you have for my use case? Thank you in advance for your help!
Re: Left function
try using substr function On Tue, Jun 16, 2015 at 3:03 PM, Ravisankar Mani rrav...@gmail.com wrote: Hi every one, how to get leftmost length of characters from the string in hive? In Mysql or sq has specific function LEFT(string,length) Could you please help any other way to achieve this scenario? Regards Ravisnkar -- Nitin Pawar