hive lineage info

2015-05-14 Thread r7raul1...@163.com
I want to pring lineage info of sql. I found this jira 
https://issues.apache.org/jira/browse/HIVE-1131 . How to use it?



r7raul1...@163.com


Re: user matching query does not exist

2015-05-14 Thread amit kumar
Yes it is happening  for hue only, can u plz suggest how i cleaning up hue
session from server ?

The query is succeed in hive command line.

On Fri, May 15, 2015 at 11:52 AM, Nitin Pawar 
wrote:

> Is this happening for Hue?
>
> If yes, may be you can try cleaning up hue sessions from server. (this may
> clean all users active sessions from hue so be careful while doing it)
>
>
>
> On Fri, May 15, 2015 at 11:31 AM, amit kumar  wrote:
>
>> i am using CDH 5.2.1,
>>
>> Any pointers will be of immense help.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Fri, May 15, 2015 at 9:43 AM, amit kumar  wrote:
>>
>>> Hi,
>>>
>>> After re-create my account in Hue, i receives “User matching query does
>>> not exist” when attempting to perform hive query.
>>>
>>> The query is succeed in hive command line.
>>>
>>> Please suggest on this,
>>>
>>> 
>>> Thanks you
>>> Amit
>>>
>>
>>
>
>
> --
> Nitin Pawar
>


Re: user matching query does not exist

2015-05-14 Thread Nitin Pawar
Is this happening for Hue?

If yes, may be you can try cleaning up hue sessions from server. (this may
clean all users active sessions from hue so be careful while doing it)



On Fri, May 15, 2015 at 11:31 AM, amit kumar  wrote:

> i am using CDH 5.2.1,
>
> Any pointers will be of immense help.
>
>
>
> Thanks
>
>
>
> On Fri, May 15, 2015 at 9:43 AM, amit kumar  wrote:
>
>> Hi,
>>
>> After re-create my account in Hue, i receives “User matching query does
>> not exist” when attempting to perform hive query.
>>
>> The query is succeed in hive command line.
>>
>> Please suggest on this,
>>
>> 
>> Thanks you
>> Amit
>>
>
>


-- 
Nitin Pawar


Re: user matching query does not exist

2015-05-14 Thread amit kumar
i am using CDH 5.2.1,

Any pointers will be of immense help.



Thanks



On Fri, May 15, 2015 at 9:43 AM, amit kumar  wrote:

> Hi,
>
> After re-create my account in Hue, i receives “User matching query does
> not exist” when attempting to perform hive query.
>
> The query is succeed in hive command line.
>
> Please suggest on this,
>
> 
> Thanks you
> Amit
>


user matching query does not exist

2015-05-14 Thread amit kumar
Hi,

After re-create my account in Hue, i receives “User matching query does not
exist” when attempting to perform hive query.

The query is succeed in hive command line.

Please suggest on this,


Thanks you
Amit


Re: Partition Columns

2015-05-14 Thread Appan Thirumaligai
Mungeol,

I did check the # of mappers and that did not change between the two
queries but when I ran a count(*) query the total execution time reduced
significantly for Query1 vs Query2. Also, the amount data the query reads
does change when the where clause changes. I still can't explain why one is
faster over the other.

Thanks,
Appan

On Thu, May 14, 2015 at 4:46 PM, Mungeol Heo  wrote:

> Hi, Appan.
>
> you can just simply check the amount of data your query reads from the
> table. or the number of the mapper for running that query.
> then, you can know whether it filtering or scanning all table.
> Of course, it is a lazy approach. but, you can give a try.
> I think query 1 should work fine. because I am using a lot of that
> kind of queries and it works fine for me.
>
> Thanks,
> mungeol
>
> On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai
>  wrote:
> > I agree with you Viral. I see the same behavior as well. We are on Hive
> 0.13
> > for the cluster where I'm testing this.
> >
> > On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria 
> > wrote:
> >>
> >> Hi Appan,
> >>
> >> In my experience I have seen that Query 2 does not use partition pruning
> >> because it's not a straight up filtering and involves using functions
> (aka
> >> UDFs).
> >>
> >> What version of Hive are you using ?
> >>
> >> Thanks,
> >> Viral
> >>
> >>
> >>
> >> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a question on Hive Optimizer. I have a table with partition
> >>> columns  eg.,Sales partitioned by year, month, day. Assume that I have
> two
> >>> years worth of data on this table. I'm running two queries on this
> table.
> >>>
> >>> Query 1: Select * from Sales where year=2015 and month = 5 and day
> >>> between 1 and 7
> >>>
> >>> Query 2: Select * from Sales where concat_ws('-',cast(year as
> >>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as
> string),2,'0'))
> >>> between '2015-01-01' and '2015-01-07'
> >>>
> >>> When I ran Explain command on the above two queries I get a Filter
> >>> operation for the 2nd Query and there is no Filter Operation for the
> first
> >>> query.
> >>>
> >>> My question is: Do both queries use the partitions or is it used only
> in
> >>> Query 1 and for Query 2 it will be a scan of all the data?
> >>>
> >>> Thanks for your help.
> >>>
> >>> Thanks,
> >>> Appan
> >>
> >>
> >
>


Re: Partition Columns

2015-05-14 Thread Mungeol Heo
Hi, Appan.

you can just simply check the amount of data your query reads from the
table. or the number of the mapper for running that query.
then, you can know whether it filtering or scanning all table.
Of course, it is a lazy approach. but, you can give a try.
I think query 1 should work fine. because I am using a lot of that
kind of queries and it works fine for me.

Thanks,
mungeol

On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai
 wrote:
> I agree with you Viral. I see the same behavior as well. We are on Hive 0.13
> for the cluster where I'm testing this.
>
> On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria 
> wrote:
>>
>> Hi Appan,
>>
>> In my experience I have seen that Query 2 does not use partition pruning
>> because it's not a straight up filtering and involves using functions (aka
>> UDFs).
>>
>> What version of Hive are you using ?
>>
>> Thanks,
>> Viral
>>
>>
>>
>> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai
>>  wrote:
>>>
>>> Hi,
>>>
>>> I have a question on Hive Optimizer. I have a table with partition
>>> columns  eg.,Sales partitioned by year, month, day. Assume that I have two
>>> years worth of data on this table. I'm running two queries on this table.
>>>
>>> Query 1: Select * from Sales where year=2015 and month = 5 and day
>>> between 1 and 7
>>>
>>> Query 2: Select * from Sales where concat_ws('-',cast(year as
>>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as string),2,'0'))
>>> between '2015-01-01' and '2015-01-07'
>>>
>>> When I ran Explain command on the above two queries I get a Filter
>>> operation for the 2nd Query and there is no Filter Operation for the first
>>> query.
>>>
>>> My question is: Do both queries use the partitions or is it used only in
>>> Query 1 and for Query 2 it will be a scan of all the data?
>>>
>>> Thanks for your help.
>>>
>>> Thanks,
>>> Appan
>>
>>
>


Re: Partition Columns

2015-05-14 Thread Appan Thirumaligai
I agree with you Viral. I see the same behavior as well. We are on Hive
0.13 for the cluster where I'm testing this.

On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria 
wrote:

> Hi Appan,
>
> In my experience I have seen that Query 2 does not use partition pruning
> because it's not a straight up filtering and involves using functions (aka
> UDFs).
>
> What version of Hive are you using ?
>
> Thanks,
> Viral
>
>
>
> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai  > wrote:
>
>> Hi,
>>
>> I have a question on Hive Optimizer. I have a table with partition
>> columns  eg.,Sales partitioned by year, month, day. Assume that I have two
>> years worth of data on this table. I'm running two queries on this table.
>>
>> Query 1: Select * from Sales where year=2015 and month = 5 and day
>> between 1 and 7
>>
>> Query 2: Select * from Sales where concat_ws('-',cast(year as
>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as string),2,'0'))
>> between '2015-01-01' and '2015-01-07'
>>
>> When I ran Explain command on the above two queries I get a Filter
>> operation for the 2nd Query and there is no Filter Operation for the first
>> query.
>>
>> My question is: Do both queries use the partitions or is it used only in
>> Query 1 and for Query 2 it will be a scan of all the data?
>>
>> Thanks for your help.
>>
>> Thanks,
>> Appan
>>
>
>


Re: Partition Columns

2015-05-14 Thread Viral Bajaria
Hi Appan,

In my experience I have seen that Query 2 does not use partition pruning
because it's not a straight up filtering and involves using functions (aka
UDFs).

What version of Hive are you using ?

Thanks,
Viral



On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai 
wrote:

> Hi,
>
> I have a question on Hive Optimizer. I have a table with partition columns
>  eg.,Sales partitioned by year, month, day. Assume that I have two years
> worth of data on this table. I'm running two queries on this table.
>
> Query 1: Select * from Sales where year=2015 and month = 5 and day between
> 1 and 7
>
> Query 2: Select * from Sales where concat_ws('-',cast(year as
> string),lpad(cast(month as string),2,'0'),lpad(cast(day as string),2,'0'))
> between '2015-01-01' and '2015-01-07'
>
> When I ran Explain command on the above two queries I get a Filter
> operation for the 2nd Query and there is no Filter Operation for the first
> query.
>
> My question is: Do both queries use the partitions or is it used only in
> Query 1 and for Query 2 it will be a scan of all the data?
>
> Thanks for your help.
>
> Thanks,
> Appan
>


Re: HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
Still no effect. Set minsize to 32M and maxsize to 64M

On Thu, May 14, 2015 at 11:07 AM, Ankit Bhatnagar 
wrote:

> try these
> mapred.max.split.size=
> mapred.min.split.size=
>
> mapreduce.input.fileinputformat.split.maxsize=
> mapreduce.input.fileinputformat.split.minsize=
>
>
>
>
>
>   On Thursday, May 14, 2015 11:04 AM, Pradeep Gollakota <
> pradeep...@gmail.com> wrote:
>
>
> The following property has been to no effect.
>
> mapreduce.input.fileinputformat.split.maxsize = 67108864
>
> I'm still getting 1 Mapper per file.
>
> On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar 
> wrote:
>
> you can explicitly set the split size
>
>
>
>   On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota <
> pradeep...@gmail.com> wrote:
>
>
> Hi All,
>
> I'm writing an MR job to read data using HCatInputFormat... however, the
> job is generating too many splits. I don't have this problem when running
> queries in Hive since it combines splits by default.
>
> Is there an equivalent in MR so that I'm not generating thousands of
> mappers?
>
> Thanks,
> Pradeep
>
>
>
>
>
>


Partition Columns

2015-05-14 Thread Appan Thirumaligai
Hi,

I have a question on Hive Optimizer. I have a table with partition columns
 eg.,Sales partitioned by year, month, day. Assume that I have two years
worth of data on this table. I'm running two queries on this table.

Query 1: Select * from Sales where year=2015 and month = 5 and day between
1 and 7

Query 2: Select * from Sales where concat_ws('-',cast(year as
string),lpad(cast(month as string),2,'0'),lpad(cast(day as string),2,'0'))
between '2015-01-01' and '2015-01-07'

When I ran Explain command on the above two queries I get a Filter
operation for the 2nd Query and there is no Filter Operation for the first
query.

My question is: Do both queries use the partitions or is it used only in
Query 1 and for Query 2 it will be a scan of all the data?

Thanks for your help.

Thanks,
Appan


Re: Unable to create table in Hive

2015-05-14 Thread kulkarni.swar...@gmail.com
Yeah. 0.13 isn't compatible with 1.0 HBase. We haven't made the jump the
HBase 1.0 yet. But Hive 1.1 is on HBase 0.98. And from what I know, there
aren't many breaking changes from 0.98 to 1.0 so you might give that a shot
a see if it works.

On Thu, May 14, 2015 at 3:30 PM, Ibrar Ahmed  wrote:

> I have also tried
>
> ADD FILE /usr/local/hbase/conf/hbase-site.xml;
> ADD JAR /usr/local/hive/lib/zookeeper-3.4.5.jar;
> ADD JAR /usr/local/hive/lib/hive-hbase-handler-0.13.0.jar;
> ADD JAR /usr/local/hive/lib/guava-11.0.2.jar;
> ADD JAR /usr/local/hbase/lib/hbase-client-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-common-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-protocol-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-server-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-shell-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-thrift-1.0.1.jar;
> ADD JAR /usr/local/hbase/lib/hbase-server-1.0.1.jar;
>
> CREATE TABLE abcd(key int, value string)  STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "xyz");
>
>
> But "list jars" also shows nothing.
>
>
>
> On Fri, May 15, 2015 at 1:29 AM, Ibrar Ahmed 
> wrote:
>
>> Hive : 0.13
>> Hbase: 1.0.1
>>
>>
>>
>> On Fri, May 15, 2015 at 1:26 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Hi Ibrar,
>>>
>>> It seems like your hive and hbase versions are incompatible. What
>>> version of hive and hbase are you on?
>>>
>>> On Thu, May 14, 2015 at 3:21 PM, Ibrar Ahmed 
>>> wrote:
>>>
 Hi,

 While creating a table in Hive I am getting this error message.

 CREATE TABLE abcd(key int, value string)  STORED BY
 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
 ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
 hbase.table.name" = "xyz");


 [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
 Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
 org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V


>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>>
>>


-- 
Swarnim


Re: Unable to create table in Hive

2015-05-14 Thread Ibrar Ahmed
I have also tried

ADD FILE /usr/local/hbase/conf/hbase-site.xml;
ADD JAR /usr/local/hive/lib/zookeeper-3.4.5.jar;
ADD JAR /usr/local/hive/lib/hive-hbase-handler-0.13.0.jar;
ADD JAR /usr/local/hive/lib/guava-11.0.2.jar;
ADD JAR /usr/local/hbase/lib/hbase-client-1.0.1.jar;
ADD JAR /usr/local/hbase/lib/hbase-common-1.0.1.jar;
ADD JAR /usr/local/hbase/lib/hbase-protocol-1.0.1.jar;
ADD JAR /usr/local/hbase/lib/hbase-server-1.0.1.jar;
ADD JAR /usr/local/hbase/lib/hbase-shell-1.0.1.jar;
ADD JAR /usr/local/hbase/lib/hbase-thrift-1.0.1.jar;
ADD JAR /usr/local/hbase/lib/hbase-server-1.0.1.jar;

CREATE TABLE abcd(key int, value string)  STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name"
= "xyz");


But "list jars" also shows nothing.


On Fri, May 15, 2015 at 1:29 AM, Ibrar Ahmed  wrote:

> Hive : 0.13
> Hbase: 1.0.1
>
>
>
> On Fri, May 15, 2015 at 1:26 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Hi Ibrar,
>>
>> It seems like your hive and hbase versions are incompatible. What version
>> of hive and hbase are you on?
>>
>> On Thu, May 14, 2015 at 3:21 PM, Ibrar Ahmed 
>> wrote:
>>
>>> Hi,
>>>
>>> While creating a table in Hive I am getting this error message.
>>>
>>> CREATE TABLE abcd(key int, value string)  STORED BY
>>> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
>>> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
>>> hbase.table.name" = "xyz");
>>>
>>>
>>> [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
>>> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>>> org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>
>
>


Re: Unable to create table in Hive

2015-05-14 Thread Ibrar Ahmed
Hive : 0.13
Hbase: 1.0.1

On Fri, May 15, 2015 at 1:26 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Hi Ibrar,
>
> It seems like your hive and hbase versions are incompatible. What version
> of hive and hbase are you on?
>
> On Thu, May 14, 2015 at 3:21 PM, Ibrar Ahmed 
> wrote:
>
>> Hi,
>>
>> While creating a table in Hive I am getting this error message.
>>
>> CREATE TABLE abcd(key int, value string)  STORED BY
>> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
>> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
>> hbase.table.name" = "xyz");
>>
>>
>> [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
>> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>> org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
>>
>>
>
>
> --
> Swarnim
>



-- 
Ibrar Ahmed
EnterpriseDB   http://www.enterprisedb.com


Re: Unable to create table in Hive

2015-05-14 Thread kulkarni.swar...@gmail.com
Hi Ibrar,

It seems like your hive and hbase versions are incompatible. What version
of hive and hbase are you on?

On Thu, May 14, 2015 at 3:21 PM, Ibrar Ahmed  wrote:

> Hi,
>
> While creating a table in Hive I am getting this error message.
>
> CREATE TABLE abcd(key int, value string)  STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("
> hbase.table.name" = "xyz");
>
>
> [Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
> org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
>
>


-- 
Swarnim


Unable to create table in Hive

2015-05-14 Thread Ibrar Ahmed
Hi,

While creating a table in Hive I am getting this error message.

CREATE TABLE abcd(key int, value string)  STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name"
= "xyz");


[Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V


Re: HCatInputFormat combine splits

2015-05-14 Thread Ankit Bhatnagar
try thesemapred.max.split.size= mapred.min.split.size=  
mapreduce.input.fileinputformat.split.maxsize= 
mapreduce.input.fileinputformat.split.minsize=   



 On Thursday, May 14, 2015 11:04 AM, Pradeep Gollakota 
 wrote:
   

 The following property has been to no effect.
mapreduce.input.fileinputformat.split.maxsize = 67108864
I'm still getting 1 Mapper per file.
On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar  wrote:

you can explicitly set the split size 


 On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota 
 wrote:
   

 Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the job is 
generating too many splits. I don't have this problem when running queries in 
Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of mappers?
Thanks,Pradeep

   



  

how to enable serde property hive.serialization.extend.nesting.levels for CTAS statment

2015-05-14 Thread Jie Zhang
Hi, experts,

My application uses a CTAS query to create a result table in hive, the
source table has deeply nested struct column (7 levels). CTAS query fails
with the following exception.

jdbc:hive2://localhost:1/default> CREATE TABLE IF NOT EXISTS
reporting.test1 AS select row_number() over() AS rowid, * from (select data
from store.table1) X;

Error: Error while compiling statement: FAILED: SemanticException
org.apache.hadoop.hive.serde2.SerDeException: Number of levels of nesting
supported for LazySimpleSerde is 7 Unable to work with level 8. Use
hive.serialization.extend.nesting.levels serde property for tables using
LazySimpleSerde. (state=42000,code=4)

Then I followed the suggestion to
add hive.serialization.extend.nesting.levels in hive-site.xml in order to
enable extended nesting levels, however, it is not in effect and I still
see the same exception. I noticed that
hive.serialization.extend.nesting.levels is a serde property, so is this
the reason that adding this property in hive-site.xml does not help? Then
how to enable this property for CTAS query? Thanks very much for the help!

Jessica


Re: HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
The following property has been to no effect.

mapreduce.input.fileinputformat.split.maxsize = 67108864

I'm still getting 1 Mapper per file.

On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar 
wrote:

> you can explicitly set the split size
>
>
>
>   On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota <
> pradeep...@gmail.com> wrote:
>
>
> Hi All,
>
> I'm writing an MR job to read data using HCatInputFormat... however, the
> job is generating too many splits. I don't have this problem when running
> queries in Hive since it combines splits by default.
>
> Is there an equivalent in MR so that I'm not generating thousands of
> mappers?
>
> Thanks,
> Pradeep
>
>
>


Re: HCatInputFormat combine splits

2015-05-14 Thread Ankit Bhatnagar
you can explicitly set the split size 


 On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota 
 wrote:
   

 Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the job is 
generating too many splits. I don't have this problem when running queries in 
Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of mappers?
Thanks,Pradeep

   

Re: ACID ORC file reader issue with uncompacted data

2015-05-14 Thread Alan Gates
Ok, I think I understand now.  I also get why OrcSplit.getPath returns 
just up to the partition keys and not the delta directories.  In most 
cases there will be more than one delta directory, so which one would it 
pick?


It seems you already know the file type you are working on before you 
call this (since you're calling OrcSplit.getPath rather than 
FileSplit.getPath).  The best way forward might be to make a utility 
method in Hive that takes the file type and the result of getPath and 
then returns you the partition keys.  That way you're not left putting 
ORC specific code in Cascading.


Alan.


Elliot West 
May 1, 2015 at 3:04
Yes and no :-) We're initially using OrcFile.createReader to create a 
Reader so that we can obtain the schema (StructTypeInfo) from the 
file. I don't believe this is possible with OrcInputFormat.getReader(?):


Reader orcReader = OrcFile.createReader(path,
OrcFile.readerOptions(conf));

ObjectInspector inspector = orcReader.getObjectInspector();
StructTypeInfo typeInfo = (StructTypeInfo)
TypeInfoUtils.getTypeInfoFromObjectInspector(inspector);


In the case of transactional datasets we've worked around this by 
generating the StructTypeInfo from schema data retrieved from the meta 
store as we need to interact with the meta store anyway to correct 
read the data. Even if OrcFile.createReader were to transparently read 
delta only datasets, It wouldn't get us much further currently as the 
delta files lack the correct column names and the Reader would thus 
return an unusable StructTypeInfo.


The org.apache.hadoop.hive.ql.io.orc.OrcSplit.getPath() issue is 
currently our biggest pain point as it requires us to place Orc+Atomic 
specific code in what should be a general framework. To illustrate the 
problem further, somewhere in cascading there is some code that 
extracts partition keys from split paths. It extracts keys by chopping 
off the 'part' leaf and removing the preceding parent:


*Text etc:*
OrcSplit.getPath() returns: 
'warehouse/test_table/continent=Asia/country=India/part-01'

Partition keys derived as: 'continent=Asia/country=India' (CORRECT)

*Orc base+delta:*
OrcSplit.getPath() returns: 
warehouse/test_table/continent=Asia/country=India/base_006'

Partition keys derived as: 'continent=Asia/country=India' (CORRECT)

*Orc delta only etc:*
OrcSplit.getPath() returns: 
warehouse/test_table/continent=Asia/country=India

Partition keys derived as: 'continent=Asia' (INCORRECT)

Cheers - Elliot.





On 30 April 2015 at 17:40, Alan Gates > wrote:


Are you using OrcInputFormat.getReader to get a reader?  If so, it
should take care of these anomalies for you and mask your need to
worry about delta versus base files.

Alan.


Elliot West 
April 29, 2015 at 9:40
Hi,

I'm implementing a tap to read Hive ORC ACID date into Cascading
jobs and I've hit a couple of issues for a particular scenario.
The case I have is when data has been written into a
transactional table and a compaction has not yet occurred. This
can be recreated like so:

CREATE TABLE test_table ( id int, message string )
  PARTITIONED BY ( continent string, country string )
  CLUSTERED BY (id) INTO 1 BUCKETS
  STORED AS ORC
  TBLPROPERTIES ('transactional' = 'true')
);

INSERT INTO TABLE test_table
PARTITION (continent = 'Asia', country = 'India')
VALUES (1, 'x'), (2, 'y'), (3, 'z');


This results in a dataset that contains only a delta file:


warehouse/test_table/continent=Asia/country=India/delta_060_060/bucket_0


I'm assuming that this scenario is valid - a user might insert
new data into a table and want to read it back at a time prior to
the first compaction. I can select the data back from this table
in Hive with no problem. However, for a number of reasons I'm
finding it rather tricky to do so programmatically. At this point
I should mention that reading base files or base+deltas is
trouble free. The issues I've encountered are as follows:

 1. org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(Path,
ReaderOptions) fails if the directory specified by the path
('warehouse/test_table/continent=Asia/country=India' in this
case) contains only a delta. Specifically it attempts to
access 'delta_060_060' as if it were a file and
therefore fails. It appears to function correctly if the
directory also contains a base. We use this method to extract
the typeInfo from the ORCFile and build a mapping between the
user's declared fields.
 2. org.apache.hadoop.hive.ql.io.orc.OrcSplit.getPath() is
seemingly inconsistent in that it returns the path of the
base if present, otherwise the parent. This presents issues
within cascading (

hive job not making progress due to Number of reduce tasks is set to 0 since there's no reduce operator

2015-05-14 Thread Bhagwan S. Soni
Hi Hive Users,

I'm using Cloudera distribution and Hive's 13th version on my cluster.

I came across a problem where job is not making any progress after writing
log line - "*Number of reduce tasks is set to 0 since there's no reduce
operator*"

Below is the log for the same, could you help me what kind of issue is this
because this is not a code issue as if i re-run the same job it completes
successfully.

Logging initialized using configuration in
jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/hive-common-0.13.1-cdh5.2.1.jar!/hive-log4j.properties
Total jobs = 5
Launching Job 1 out of 5
Launching Job 2 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 10
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
Number of reduce tasks not specified. Defaulting to jobconf value of: 10
  set mapreduce.job.reduces=
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1431159077692_1399, Tracking URL =
http://xyz.com:8088/proxy/application_1431159077692_1399/
Starting Job = job_1431159077692_1398, Tracking URL =
http://xyz.com:8088/proxy/application_1431159077692_1398/
Kill Command =
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/hadoop/bin/hadoop job
-kill job_1431159077692_1399
Kill Command =
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/hadoop/bin/hadoop job
-kill job_1431159077692_1398
Hadoop job information for Stage-12: number of mappers: 5; number of
reducers: 10
Hadoop job information for Stage-1: number of mappers: 5; number of
reducers: 10
2015-05-12 19:59:12,298 Stage-1 map = 0%,  reduce = 0%
2015-05-12 19:59:12,298 Stage-12 map = 0%,  reduce = 0%
2015-05-12 19:59:20,832 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 2.5
sec
2015-05-12 19:59:20,832 Stage-12 map = 80%,  reduce = 0%, Cumulative CPU
8.63 sec
2015-05-12 19:59:21,905 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU
7.06 sec
2015-05-12 19:59:22,968 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU
9.34 sec
2015-05-12 19:59:24,031 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
11.46 sec
2015-05-12 19:59:26,265 Stage-12 map = 100%,  reduce = 0%, Cumulative CPU
10.92 sec
2015-05-12 19:59:32,665 Stage-12 map = 100%,  reduce = 30%, Cumulative CPU
24.51 sec
2015-05-12 19:59:33,726 Stage-12 map = 100%,  reduce = 100%, Cumulative CPU
57.61 sec
2015-05-12 19:59:35,021 Stage-1 map = 100%,  reduce = 30%, Cumulative CPU
20.99 sec
MapReduce Total cumulative CPU time: 57 seconds 610 msec
Ended Job = job_1431159077692_1399
2015-05-12 19:59:36,084 Stage-1 map = 100%,  reduce = 80%, Cumulative CPU
39.24 sec
2015-05-12 19:59:37,146 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU
42.37 sec
2015-05-12 19:59:38,203 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
45.97 sec
MapReduce Total cumulative CPU time: 45 seconds 970 msec
Ended Job = job_1431159077692_1398
2015-05-12 19:59:45,180 WARN  [main] conf.Configuration
(Configuration.java:loadProperty(2510)) -
file:/tmp/srv-hdp-mkt-d/hive_2015-05-12_19-58-53_081_2145723752519383568-1/-local-10014/jobconf.xml:an
attempt to override final parameter: hadoop.ssl.require.client.cert;
Ignoring.
2015-05-12 19:59:45,193 WARN  [main] conf.Configuration
(Configuration.java:loadProperty(2510)) -
file:/tmp/srv-hdp-mkt-d/hive_2015-05-12_19-58-53_081_2145723752519383568-1/-local-10014/jobconf.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2015-05-12 19:59:45,196 WARN  [main] conf.Configuration
(Configuration.java:loadProperty(2510)) -
file:/tmp/srv-hdp-mkt-d/hive_2015-05-12_19-58-53_081_2145723752519383568-1/-local-10014/jobconf.xml:an
attempt to override final parameter: hadoop.ssl.client.conf;  Ignoring.
2015-05-12 19:59:45,201 WARN  [main] conf.Configuration
(Configuration.java:loadProperty(2510)) -
file:/tmp/srv-hdp-mkt-d/hive_2015-05-12_19-58-53_081_2145723752519383568-1/-local-10014/jobconf.xml:an
attempt to override final parameter: hadoop.ssl.keystores.factory.class;
Ignoring.
2015-05-12 19:59:45,210 WARN  [main] conf.Configuration
(Configuration.java:loadProperty(2510)) -
file:/tmp/srv-hdp-mkt-d/hive_2015-05-12_19-58-53_081_2145723752519383568-1/-local-10014/jobconf.xml:an
attempt to override final parameter: hadoop.ssl.server.conf;  Ignoring.
2015-05-12 19:59:45,258 WARN  [main] conf.Configuration
(Configuration.java:loadProperty(2510)) -
file:/tmp/srv-hdp-mkt-d/hive_2015-05-12_19-58-53_081_2145723752519383568-1/-local-10014/jobconf.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2015-05-12 19:59:45,792 WARN  [main] conf.HiveConf
(HiveConf.java:in

Re: Hive/Hbase Integration issue

2015-05-14 Thread Ibrar Ahmed
Now my hbase is working fine now, but i am still getting the same error


[127.0.0.1:1] hive> CREATE TABLE hbase_table_1(key int, value string)
  > STORED BY 'org.apache.hadoop.hive.hbase.
HBaseStorageHandler'
  > WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf1:val")
  > TBLPROPERTIES ("hbase.table.name" = "xyz");



[Hive Error]: Query returned non-zero code: 1, cause: FAILED: Execution
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException:
Can't get the locations
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
cas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)

On Thu, May 14, 2015 at 1:18 AM, Ibrar Ahmed  wrote:

> Seems you are right, Sometime I got this error while running hbase shell
> command.
>
>
> ibrar@ibrar-virtual-machine:/usr/local/hbase/bin$ ./hbase shell
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2015-05-14 01:14:27,063 WARN  [main] util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2015-05-14 01:14:43,982 ERROR [main] zookeeper.RecoverableZooKeeper:
> ZooKeeper exists failed after 4 attempts
> 2015-05-14 01:14:43,983 WARN  [main] zookeeper.ZKUtil:
> hconnection-0x4d980c0x0, quorum=localhost:2181, baseZNode=/hbase Unable to
> set watcher on znode (/hbase/hbaseid)
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
> at
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
> at
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:833)
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:623)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450)
> at
> org.jruby.javasupport.JavaMethod.invokeStaticDirect(JavaMethod.java:362)
> at
> org.jruby.java.invokers.StaticMethodInvoker.call(StaticMethodInvoker.java:58)
> at
> org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:312)
> at
> org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:169)
> at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
> at org.jruby.ast.InstAsgnNode.interpret(InstAsgnNode.java:95)
> at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
> at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
> at
> org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
> at
> org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:169)
> at
> org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:191)
> at
> org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:302)
> at
> org.jruby.run