Simulating a login trigger in hive

2015-07-21 Thread Steve Howard
We would like to assign a YARN queue to a user upon login.

Is there any way to do this outside of the box?  If not, is anyone aware of
any development effort to do this?

It sounds like it would be pretty simple to extend the Connection class to
lookup a queue in a custom table in the metastore for a given user and set
tez.queue.name=foo.  If one isn't found, assign the default queue.


Re: hbase column without prefix

2015-07-21 Thread Wojciech Indyk
Hello!
I've posted a bug on this issue:
https://issues.apache.org/jira/browse/HIVE-11329
What do you think? I can prepare a patch.

Kindly regards
Wojciech Indyk


2015-07-07 9:51 GMT+02:00 Wojciech Indyk :
> Hi!
> I use hbase column regex matching to create map column in hive, like:
> "hbase.columns.mapping" = ":key,s:ap_.*"
> then I have values in column:
> {"ap_col1":"23","ap_col2":"7"}
> is it possible to cut the prefix ap_ to have values like below?
> {"col1":"23","col2":"7"}
>
> Kindly regards
> Wojciech Indyk


Hive on Tez query failed with “wrong key class"

2015-07-21 Thread Jim Green
Hi Team,

Env: Hive 1.0 on Tez 0.5.3
Query is a simple group-by on top of sequence table.

It fails with below error on tez mode:
*java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: *
*java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable *

And it works fine in MR mode.
Anyone met this issue before?

-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


RE: Hive on Tez query failed with “wrong key class"

2015-07-21 Thread Bikas Saha
A full stack trace would help determine is this is a Tez issue or hive issue.

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 21, 2015 11:12 AM
To: u...@tez.apache.org; user@hive.apache.org
Subject: Hive on Tez query failed with “wrong key class"

Hi Team,

Env: Hive 1.0 on Tez 0.5.3
Query is a simple group-by on top of sequence table.

It fails with below error on tez mode:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: java.io.IOException: wrong key class: 
org.apache.hadoop.io.BytesWritable is not class 
org.apache.hadoop.io.NullWritable

And it works fine in MR mode.
Anyone met this issue before?

--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Re: Hive on Tez query failed with ³wrong key class"

2015-07-21 Thread Gopal Vijayaraghavan

> Query is a simple group-by on top of sequence table.
...
> java.io.IOException: java.io.IOException: wrong key class:
>org.apache.hadoop.io.BytesWritable is not class
>org.apache.hadoop.io.NullWritable

I have seen this issue when mixing Sequence files written by PIG with
Sequence files written by Hive - primarily because the data ingestion
wasn¹t done properly via HCatalog writers.

Last report, the first sequence file had as its header

M?.io.LongWritable"org.apache.hadoop.io.BytesWritable)org.apache.hadoop.io.
compress.SnappyCodec??


and the second one had

SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text)org.apache.h
adoop.io.compress.SnappyCodec?


You can cross-check the exception trace and make sure that the exception
is coming from the RecordReader as the k-v pairs change types between
files.

Primarily this doesn¹t happen in Hive-mr at the small scale, but it
happens for both MR and Tez.

To hit this via CombineInputFormat, you need a file which has been split
up between machines and two such files to generate a combined split of
mismatched schema.

Tez is more aggressive at splitting, since it relies on the file format
splits, not HDFS locations.

If you confirm that this is indeed the cause of the issue, I might have an
idea how to fix it.

Cheers,
Gopal 




Re: Hive on Tez query failed with “wrong key class"

2015-07-21 Thread Jim Green
Sample stacktrace is :
[Error: Failure while running task:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is
not class org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
Caused by: java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:363)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:126)
at
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 15 more
Caused by: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2495)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:358)
... 21 more
],



On Tue, Jul 21, 2015 at 11:26 AM, Bikas Saha  wrote:

>  A full stack trace would help determine is this is a Tez issue or hive
> issue.
>
>
>
> *From:* Jim Green [mailto:openkbi...@gmail.com]
> *Sent:* Tuesday, July 21, 2015 11:12 AM
> *To:* u...@tez.apache.org; user@hive.apache.org
> *Subject:* Hive on Tez query failed with “wrong key class"
>
>
>
> Hi Team,
>
>
>
> Env: Hive 1.0 on Tez 0.5.3
>
> Query is a simple group-by on top of sequence table.
>
>
>
> It fails with below error on tez mode:
>
> *java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: *
>
> *java.io.IOException: java.io.IOException: wrong key class:
> org.apache.hadoop.io.BytesWritable is not class
> org.apache.hadoop.io.NullWritable *
>
>
>
> And it works fine in MR mode.
>
> Anyone met this issue before?
>
>
>
> --
>
> Thanks,
>
> www.openkb.info
>
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>



-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


limit clause + fetch optimization

2015-07-21 Thread Adam Silberstein
Hi,
I've been experimenting with 'select *' and 'select * limit X' in beeline
and watching the hive-server2 log to understand when a M/R job is triggered
and when not.  It seems like whenever I set a limit, the job is avoided,
but with no limit, it is run.

I found this param:
hive.limit.optimize.fetch.max

That defaults to 50,000 and as I understand it, whenever I set limit to
above that number, a job should be triggered.  But I can set limit to
something very high (e.g. 10M) and no job runs.

If anyone has some insight into how this param is used or expected behavior
of the fetch optimization, would appreciate it.

This is on Hive 1.1 inside CDH5.4.

Thanks,
Adam


Re: limit clause + fetch optimization

2015-07-21 Thread Gopal Vijayaraghavan

> I've been experimenting with 'select *' and 'select * limit X' in
>beeline and watching the hive-server2 log to understand when a M/R job is
>triggered and when not.  It seems like whenever I set a limit, the job is
>avoided, but with no limit, it is run.

https://issues.apache.org/jira/browse/HIVE-10156


It¹s sitting on my back-burner (I know the fix, but I¹m working on the
LLAP branch).

> hive.limit.optimize.fetch.max
>
> That defaults to 50,000 and as I understand it, whenever I set limit to
>above that number, a job should be triggered.  But I can set limit to
>something very high (e.g. 10M) and no job runs.

That configs belong to a different optimization - the global limit case,
which works as follows.

Run query with a 50k row sample of the input, then if it doesn¹t produce
enough rows, re-run the query with the full input data-set.

You will notice errors on your JDBC connections with that optimization
turned on (like HIVE-9382) and will get the following log line "Retry
query with a different approachŠ² in the HS2 logs.

So I suggest not turning on the Global Limit optimization, if you¹re on
JDBC/ODBC.

Cheers,
Gopal
 




Re: limit clause + fetch optimization

2015-07-21 Thread Adam Silberstein
Thanks for the quick answer Gopal, and also for the details on that param.
I indeed use JDBC in production, so will stay away from it.

Just want to make sure I understand the behavior once that bug is fixed...a
'select *' with no limit will run without a M/R job and instead stream.  Is
that correct?

That may incidently solve another bug I'm seeing: when you use JDBC
templates to set the limit (setMaxRows in Spring in my setup), it does not
avoid the M/R job (and no limit clause appears in the hive-server2 log).
Instead, the M/R job gets launched...I'm not sure if the jdbc framework
subsequently would apply a limit, once the job finishes.  I haven't spotted
this issue in JIRA, I'd be happy to file it if that's useful to you.

Thanks!
Adam

On Tue, Jul 21, 2015 at 7:20 PM, Gopal Vijayaraghavan 
wrote:

>
> > I've been experimenting with 'select *' and 'select * limit X' in
> >beeline and watching the hive-server2 log to understand when a M/R job is
> >triggered and when not.  It seems like whenever I set a limit, the job is
> >avoided, but with no limit, it is run.
>
> https://issues.apache.org/jira/browse/HIVE-10156
>
>
> It¹s sitting on my back-burner (I know the fix, but I¹m working on the
> LLAP branch).
>
> > hive.limit.optimize.fetch.max
> >
> > That defaults to 50,000 and as I understand it, whenever I set limit to
> >above that number, a job should be triggered.  But I can set limit to
> >something very high (e.g. 10M) and no job runs.
>
> That configs belong to a different optimization - the global limit case,
> which works as follows.
>
> Run query with a 50k row sample of the input, then if it doesn¹t produce
> enough rows, re-run the query with the full input data-set.
>
> You will notice errors on your JDBC connections with that optimization
> turned on (like HIVE-9382) and will get the following log line "Retry
> query with a different approachŠ² in the HS2 logs.
>
> So I suggest not turning on the Global Limit optimization, if you¹re on
> JDBC/ODBC.
>
> Cheers,
> Gopal
>
>
>
>


Re: limit clause + fetch optimization

2015-07-21 Thread Gopal Vijayaraghavan

> Just want to make sure I understand the behavior once that bug is
>fixed...a 'select *' with no limit will run without a M/R job and instead
>stream.  Is that correct?

Yes, that¹s the intended behaviour. I can help you get a fix in, if you
have some time to test out my WIP patches.

> That may incidently solve another bug I'm seeing: when you use JDBC
>templates to set the limit (setMaxRows in Spring in my setup), it does
>not avoid the M/R job (and no limit clause appears in the hive-server2
>log).  Instead, the M/R job gets launched...I'm
> not sure if the jdbc framework subsequently would apply a limit, once
>the job finishes.  I haven't spotted this issue in JIRA, I'd be happy to
>file it if that's useful to you.

File a JIRA, would be very useful for me.

There¹s a lot of low-hanging fruit in the JDBC + Prepared Statement
codepath, so going over the issues & filing your findings would help me
pick up and knock them off one by one when I¹m back.

Prasanth¹s github has some automated benchmarking tools for JDBC, which I
use heavily - https://github.com/prasanthj/jmeter-hiveserver2/tree/llap


There are some known issues which have a 2-3x perf degradation for the
simple query patterns you¹re running, like -
https://issues.apache.org/jira/browse/HIVE-10982

Cheers,
Gopal