Hive counters for records read and written

2015-06-16 Thread Hemanth Meka

Hi,


I can see that two new counters have been added for hive 
(RECORDS_IN/RECORDS_OUT) in hive 0.14.


Prior to this release which counters could be used to get the records read by 
hive job and records written.  Because i noticed that in hive 0.14 for a few 
hive jobs i see map_input_records but the map_output_records counter is 0 but 
the job actually writes something to output table and the hive log also gives 
that count correctly.


In this case how else can we get records read and records written  in releases 
before 0.14.


Regards

Hemanth


Re: collecting lists in non-natural order / SORT BY on columns which are not part of the final result set

2015-06-16 Thread Michael Häusler
Hi Robin,

thanks a lot for the hint about the WITH statement. This is comparable to using 
a subselect.
E.g., I could rewrite my query from:
SELECT
actor,
COLLECT_LIST(action) AS actions
FROM
foobar
GROUP BY
actor
SORT BY
actor, ts;

to:
SELECT
actor,
COLLECT_LIST(action) AS actions
FROM
(
SELECT
actor,
ts,
action
FROM
foobar
ORDER BY
actor,
ts
) foobar_sorted
GROUP BY
actor;

This has several disadvantages:
a) we have two MR jobs instead of one
b) the global ORDER BY is forcing data to flow through a single reducer, which 
increases latency
c) when reading the sorted results from the first job, data for the same reduce 
group can come from two different input splits. Correct sorting is not 
guaranteed for these reduce groups.

I created a JIRA for a new UDAF COLLECT_LIST_SORTED:
https://issues.apache.org/jira/browse/HIVE-11022

Best regards
Michael



 On 2015-06-13, at 16:52, Robin Verlangen ro...@us2.nl wrote:
 
 Hi Michael,
 
 You can try using a with statement, pseudo:
 
 WITH input AS (SELECT colA, colB FROM table ORDER BY colA ASC)
 SELECT colB FROM input
 
 Best regards, 
 
 Robin Verlangen
 Chief Data Architect
 
 W http://www.robinverlangen.nl
 E ro...@us2.nl
 
 
 What is CloudPelican?
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 On Sat, Jun 13, 2015 at 3:50 PM, Michael Häusler mich...@akatose.de wrote:
 Hi there,
 
 imagine you have a table of time-series transactions, e.g.:
 CREATE TABLE
 foobar (actor BIGINT, ts BIGINT, action STRING);
 
 containing the rows:
 1   2000bar
 1   1000foo
 2   1500foo
 
 An interesting query would be to get a denormalized view on all actions of an 
 actor sorted by timestamp:
 1   [foo, bar]
 2   [foo]
 
 This would require collecting the actions in a list in non-natural order. A 
 naive query would look like this:
 SET hive.map.aggr = false;
 SELECT
 actor,
 COLLECT_LIST(action) AS actions
 FROM
 foobar
 GROUP BY
 actor
 SORT BY
 actor, ts;
 
 Unfortunately, as the column ts is not in the final result set, this 
 produces a SemanticException [Error 10004]: Line 9:11 Invalid table alias or 
 column reference 'ts'.
 
 
 I do understand that it is difficult to allow a global ORDER BY on column 
 that is not in the result set.
 But the SORT BY only needs to ensure order within the reduce group. Is there 
 a way to get this behaviour in Hive?
 
 Best regards
 Michael
 



Re: Alter table statement for create table like-tables

2015-06-16 Thread Jason Dere
Probably has to do with the fact that it is an Avro table.  I don't have any 
experience using Avro, but maybe take a look at 
https://issues.apache.org/jira/browse/HIVE-7446 for some of the issues 
described there, or maybe look at the test that was added for that Jira.


On Jun 16, 2015, at 2:42 AM, Julian Keppel 
juliankeppel1...@gmail.commailto:juliankeppel1...@gmail.com wrote:

*Push* Does no one have an idea or hit similar issues?

2015-06-09 15:46 GMT+02:00 Julian Keppel 
juliankeppel1...@gmail.commailto:juliankeppel1...@gmail.com:
I use Hive Version 1.1.0 in Cloudera CDH 5.4.0.

I have created an external table:

CREATE EXTERNAL TABLE *tableA*
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'location'
TBLPROPERTIES ('avro.schema.url'=schema_location');

Now I wanted to create a managed table with exactly the same columns except of 
one additional column (a generated hash key). The only idea I had was to create 
the table with: CREATE TABLE *tableB* LIKE *tableA*;

And then add the additional column with: ALTER TABLE *tableB* ADD COLUMNS 
(new_column INT);

The statements run without any errors or exceptions (even in the log files 
under /var/log/hive) but the new column doesn't appear.

What am I doing wrong? Or is this not possible? What other ideas do you have 
for my use case?

Thank you in advance for your help!




error on hive insert query

2015-06-16 Thread Sateesh Karuturi
iam using *hive 1.0.0* and *tez 0.5.2.* when i set
*hive.execution.engine* value
in hive-site.xml to *tez*select query works well... but in case of
*insert* getting
error. the query is :

*insert into table tablename values(intvalue,'string value');*

and the error is :

*FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.tez.Tez Task*


Re: Merging small files in partitions

2015-06-16 Thread Edward Capriolo
https://github.com/edwardcapriolo/filecrush

On Tue, Jun 16, 2015 at 5:05 PM, Chagarlamudi, Prasanth 
prasanth.chagarlam...@epsilon.com wrote:

  Hello,

 I am looking for an optimized way to merge small files in hive partitions
 into one big file.

 I came across *Alter Table/Partition Concatenate *
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate.
 Doc says this only works for RCFiles. I wish there is something similar for
 TEXT FILE format.

 Any suggestions?



 Thanks in advance

 Prasanth





 --

 This e-mail and files transmitted with it are confidential, and are
 intended solely for the use of the individual or entity to whom this e-mail
 is addressed. If you are not the intended recipient, or the employee or
 agent responsible to deliver it to the intended recipient, you are hereby
 notified that any dissemination, distribution or copying of this
 communication is strictly prohibited. If you are not one of the named
 recipient(s) or otherwise have reason to believe that you received this
 message in error, please immediately notify sender by e-mail, and destroy
 the original message. Thank You.



Re: error on hive insert query

2015-06-16 Thread Hitesh Shah
Unless you can pinpoint the problem to something Tez specific, hive-specific 
questions might be better off being asked on user@hive initially as there is a 
larger group there that understands Hive as compared to the Tez community. 

FWIW, TezTask error 1 means “something in the Hive layer using Tez or somewhere 
within Tez itself” had a failure. Without additional logs/stack trace, this is 
usually impossible to debug/pinpoint where the problem is.  

thanks
— Hitesh

On Jun 16, 2015, at 10:19 AM, Sateesh Karuturi sateesh.karutu...@gmail.com 
wrote:

 iam using hive 1.0.0 and tez 0.5.2. when i set hive.execution.engine value in 
 hive-site.xml to tezselect query works well... but in case of insert getting 
 error. the query is :
 
 insert into table tablename values(intvalue,'string value');
 
 and the error is :
 
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.tez.Tez Task
 
 



Merging small files in partitions

2015-06-16 Thread Chagarlamudi, Prasanth
Hello,
I am looking for an optimized way to merge small files in hive partitions into 
one big file.
I came across Alter Table/Partition Concatenate 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate.
 Doc says this only works for RCFiles. I wish there is something similar for 
TEXT FILE format.
Any suggestions?

Thanks in advance
Prasanth





This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.


Re: error on hive insert query

2015-06-16 Thread Jianfeng (Jeff) Zhang

You can refer to the following link to figure out whether this is tez problem

https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App


Best Regard,
Jeff Zhang


From: Sateesh Karuturi 
sateesh.karutu...@gmail.commailto:sateesh.karutu...@gmail.com
Reply-To: u...@tez.apache.orgmailto:u...@tez.apache.org 
u...@tez.apache.orgmailto:u...@tez.apache.org
Date: Wednesday, June 17, 2015 at 1:19 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org, 
u...@tez.apache.orgmailto:u...@tez.apache.org 
u...@tez.apache.orgmailto:u...@tez.apache.org
Subject: error on hive insert query


iam using hive 1.0.0 and tez 0.5.2. when i set hive.execution.engine value in 
hive-site.xml to tezselect query works well... but in case of insert getting 
error. the query is :

insert into table tablename values(intvalue,'string value');

and the error is :

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.tez.Tez Task



hive cbo calciteplanner

2015-06-16 Thread wangzhenhua (G)
Hi all,

I'm reading the source code of Hive cbo (CalcaitePlanner), but I find it hard 
to follow.
Listed below are some of the questions:
1. What's the relationship between HepPlanner and HiveVolcanoPlanner?
2. I don't have a clue about these concepts: clusters, traitDef and 
collectGarbage().

Thanks for any help.


best regards,
-zhenhua


Re: Re: delta file compact take no effect

2015-06-16 Thread r7raul1...@163.com
Any help?



r7raul1...@163.com
 
From: r7raul1...@163.com
Date: 2015-06-11 15:53
To: user
Subject: Re: Re: delta file compact take no effect
SHOW COMPACTIONS;
I can see some info

Database Table Partition Type State Worker Start Time 
default u_data_txn NULL MAJOR initiated NULL 0 
Time taken: 0.024 seconds, Fetched: 2 row(s)

But after that I still see many delta file.



r7raul1...@163.com
 
From: Elliot West
Date: 2015-06-11 15:25
To: user@hive.apache.org
Subject: Re: delta file compact take no effect
What do you see if you issue:

SHOW COMPACTIONS;

On Thursday, 11 June 2015, r7raul1...@163.com r7raul1...@163.com wrote:

I use hive 1.1.0 on hadoop 2.5.0
After I do some update operation on table u_data_txn.
My table create many delta file like:
drwxr-xr-x - hdfs hive 0 2015-02-06 22:52 
/user/hive/warehouse/u_data_txn/delta_001_001 
-rw-r--r-- 3 hdfs supergroup 346453 2015-02-06 22:52 
/user/hive/warehouse/u_data_txn/delta_001_001/bucket_0 
-rw-r--r-- 3 hdfs supergroup 415924 2015-02-06 22:52 
/user/hive/warehouse/u_data_txn/delta_001_001/bucket_1 
drwxr-xr-x - hdfs hive 0 2015-02-06 22:58 
/user/hive/warehouse/u_data_txn/delta_002_002 
-rw-r--r-- 3 hdfs supergroup 807 2015-02-06 22:58 
/user/hive/warehouse/u_data_txn/delta_002_002/bucket_0 
-rw-r--r-- 3 hdfs supergroup 779 2015-02-06 22:58 
/user/hive/warehouse/u_data_txn/delta_002_002/bucket_1 
drwxr-xr-x - hdfs hive 0 2015-02-06 22:59 
/user/hive/warehouse/u_data_txn/delta_003_003 
-rw-r--r-- 3 hdfs supergroup 817 2015-02-06 22:59 
/user/hive/warehouse/u_data_txn/delta_003_003/bucket_0 
-rw-r--r-- 3 hdfs supergroup 767 2015-02-06 22:59 
/user/hive/warehouse/u_data_txn/delta_003_003/bucket_1 
drwxr-xr-x - hdfs hive 0 2015-02-06 23:01 
/user/hive/warehouse/u_data_txn/delta_004_004 
-rw-r--r-- 3 hdfs supergroup 817 2015-02-06 23:01 
/user/hive/warehouse/u_data_txn/delta_004_004/bucket_0 
-rw-r--r-- 3 hdfs supergroup 779 2015-02-06 23:01 
/user/hive/warehouse/u_data_txn/delta_004_004/bucket_1 
drwxr-xr-x - hdfs hive 0 2015-02-06 23:03 
/user/hive/warehouse/u_data_txn/delta_005_005 
-rw-r--r-- 3 hdfs supergroup 817 2015-02-06 23:03 
/user/hive/warehouse/u_data_txn/delta_005_005/bucket_0 
-rw-r--r-- 3 hdfs supergroup 779 2015-02-06 23:03 
/user/hive/warehouse/u_data_txn/delta_005_005/bucket_1 
drwxr-xr-x - hdfs hive 0 2015-02-10 21:34 
/user/hive/warehouse/u_data_txn/delta_006_006 
-rw-r--r-- 3 hdfs supergroup 821 2015-02-10 21:34 
/user/hive/warehouse/u_data_txn/delta_006_006/bucket_0 
drwxr-xr-x - hdfs hive 0 2015-02-10 21:35 
/user/hive/warehouse/u_data_txn/delta_007_007 
-rw-r--r-- 3 hdfs supergroup 821 2015-02-10 21:35 
/user/hive/warehouse/u_data_txn/delta_007_007/bucket_0 
drwxr-xr-x - hdfs hive 0 2015-03-24 01:16 
/user/hive/warehouse/u_data_txn/delta_008_008 
-rw-r--r-- 3 hdfs supergroup 1670 2015-03-24 01:16 
/user/hive/warehouse/u_data_txn/delta_008_008/bucket_0 
-rw-r--r-- 3 hdfs supergroup 1767 2015-03-24 01:16 
/user/hive/warehouse/u_data_txn/delta_008_008/bucket_1

I try ALTER TABLE u_data_txn COMPACT 'MAJOR'; 
The delta still exist. 
Then I try ALTER TABLE u_data_txn COMPACT 'MINOR'; 
The delta still exist. 
How to  merge delta file?

My config is:
property 
namehive.support.concurrency/name 
valuetrue/value 
/property 
property 
namehive.enforce.bucketing/name 
valuetrue/value 
/property 
property 
namehive.exe.dynamic.partition.mode/name 
valuenonstrict/value 
/property 
property 
namehive.txn.manager/name 
valueorg.apache.hadoop.hive.ql.lockmgr.DbTxnManager/value 
/property 
property 
namehive.compactor.initiator.on/name 
valuetrue/value 
/property 
property 
namehive.compactor.worker.threads/name 
value4/value 
/property


r7raul1...@163.com


Re: delta file compact take no effect

2015-06-16 Thread Alan Gates
Is the config you give on your metastore or your client?  The worker 
thread and initiator must be started on the metastore.


Alan.


r7raul1...@163.com mailto:r7raul1...@163.com
June 16, 2015 at 22:38
Any help?


r7raul1...@163.com


Left function

2015-06-16 Thread Ravisankar Mani
Hi every one,


how to get  leftmost length of characters from the string  in hive?

In Mysql or sq has specific function

LEFT(string,length) 

Could you please help any other way to achieve this scenario?


Regards
Ravisnkar


prepareStatement.getMetaData problem

2015-06-16 Thread zhangjp
Hi,  can anyone to fix this bug? I think it's important  
https://issues.apache.org/jira/browse/HIVE-6992
  is there any other way to getMetaData before run the sql?
thanks!
  

Re: Alter table statement for create table like-tables

2015-06-16 Thread Julian Keppel
*Push* Does no one have an idea or hit similar issues?

2015-06-09 15:46 GMT+02:00 Julian Keppel juliankeppel1...@gmail.com:

 I use Hive Version 1.1.0 in Cloudera CDH 5.4.0.

 I have created an external table:

 CREATE EXTERNAL TABLE *tableA*
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 'location'
 TBLPROPERTIES ('avro.schema.url'=schema_location');

 Now I wanted to create a managed table with exactly the same columns
 except of one additional column (a generated hash key). The only idea I had
 was to create the table with: CREATE TABLE *tableB* LIKE *tableA*;

 And then add the additional column with: ALTER TABLE *tableB* ADD COLUMNS
 (new_column INT);

 The statements run without any errors or exceptions (even in the log files
 under /var/log/hive) but the new column doesn't appear.

 What am I doing wrong? Or is this not possible? What other ideas do you
 have for my use case?

 Thank you in advance for your help!



Re: Left function

2015-06-16 Thread Nitin Pawar
try using substr function

On Tue, Jun 16, 2015 at 3:03 PM, Ravisankar Mani rrav...@gmail.com wrote:

 Hi every one,


 how to get  leftmost length of characters from the string  in hive?

 In Mysql or sq has specific function

 LEFT(string,length) 

 Could you please help any other way to achieve this scenario?


 Regards
 Ravisnkar




-- 
Nitin Pawar