Fwd: Hive LLAP with Parquet format

2017-05-04 Thread Nar Kumar Chhantyal
Hi everyone,

I posted question on SO http://stackoverflow.com/qu
estions/43771050/hive-llap-doesnt-work-with-parquet-format but it didn't
get any love, so I am posting here.

Basically, I have large IoT data stored in Parquet format. I want to enable
faster access to this data. I started Azure HDInsight with LLAP enabled.
After trying different settings, it doesn't seem to work. I am now
suspecting it's probably because of underlying file format.

Does Hive LLAP work with Parquet format as well?

-- 
Nar-Kumar Chhantyal

-- 


Social Media:  www.viessmann.de/social-media
--


*[image: Viessmann - climate of innovation] *


Heizsysteme
Industriesysteme
Kühlsysteme


*Viessmann Werke GmbH & Co. KG*
Persönlich haftende Gesellschafter: Viessmann Komplementär B.V., Venlo (NL)
Eingetragen im Handelsregister (Kamer van Koophandel) 
Verwaltungsrat: Prof. Dr. Martin Viessmann (Präsident), Joachim Janssen 
(CEO), Klaus Gantner, 
Dr. Ulrich Hüllmann, Maximilian Viessmann; Viessmann Werke Beteiligungs 
OHG, Allendorf (Eder). 
Sitz der Gesellschaft: Allendorf (Eder), Registergericht: AG Marburg (Lahn) HRA 
3389, USt-IdNr. DE111845525



Re: User is not allowed to impersonate

2017-05-04 Thread Markovich
i'm still unable to resolve this...

 INFO  [Thread-17]: thrift.ThriftCLIService
(ThriftHttpCLIService.java:run(152)) - Started ThriftHttpCLIService in http
mode on port 10001 path=/cliservice/* with 5...500 worker threads
2017-05-04 13:40:14,195 INFO  [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(145)) - Could not
validate cookie sent, will try to generate a new cookie
2017-05-04 13:40:14,198 INFO  [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(398)) -
Failed to authenticate with http/_HOST kerberos principal, trying with
hive/_HOST kerberos principal
2017-05-04 13:40:14,199 ERROR [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(406)) -
Failed to authenticate with hive/_HOST kerberos principal
2017-05-04 13:40:14,199 ERROR [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(209)) - Error:
org.apache.hive.service.auth.HttpAuthenticationException:
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:407)
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:159)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:404)
... 23 more
Caused by: org.apache.hive.service.auth.HttpAuthenticationException:
Authorization header received from the client is empty.
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.getAuthHeader(ThriftHttpServlet.java:548)
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.access$100(ThriftHttpServlet.java:74)
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:449)
at
org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:412)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
... 24 more
2017-05-04 13:40:14,211 INFO  [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(145)) - Could not
validate cookie sent, will try to generate a new cookie
2017-05-04 13:40:14,219 INFO  [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(204)) - Cookie
added for clientUserName hue
2017-05-04 13:40:14,229 INFO  [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(313)) - Client
protocol version: HIVE_CLI_SERVICE_PROTOCOL_V7
2017-05-04 13:40:14,244 WARN  [HiveServer2-HttpHandler-Pool: Thread-60]:
thrift.ThriftC

Re: Hive LLAP with Parquet format

2017-05-04 Thread Gopal Vijayaraghavan
Hi,


 > Does Hive LLAP work with Parquet format as well?

 

LLAP does work with the Parquet format, but it does not work very fast, because 
the java Parquet reader is slow.

https://issues.apache.org/jira/browse/PARQUET-131
+

https://issues.apache.org/jira/browse/HIVE-14826

In particular to your question, Parquet's columnar data reads haven't been 
optimized for Azure/S3/GCS.

There was a comparison of ORC vs Parquet for NYC taxi data and it found that 
for simple queries Parquet read ~4x more data over the network - your problem 
might be bandwidth related.

You might want to convert a small amount to ORC and see whether the BYTES_READ 
drops or not.

In my tests with a recent LLAP, Text data was faster on LLAP on S3 & Azure than 
Parquet, because Text has a vectorized reader & cache support.


Cheers,

Gopal



Re: Hive LLAP with Parquet format

2017-05-04 Thread Edward Capriolo
The parquet orc thing has to be tje biggest detractor. Your forced to chose
between a format good for impala or good for hive.

On May 4, 2017 3:57 PM, "Gopal Vijayaraghavan"  wrote:

> Hi,
>
>
> > Does Hive LLAP work with Parquet format as well?
>
>
>
> LLAP does work with the Parquet format, but it does not work very fast,
> because the java Parquet reader is slow.
>
> https://issues.apache.org/jira/browse/PARQUET-131
> +
>
> https://issues.apache.org/jira/browse/HIVE-14826
>
> In particular to your question, Parquet's columnar data reads haven't been
> optimized for Azure/S3/GCS.
>
> There was a comparison of ORC vs Parquet for NYC taxi data and it found
> that for simple queries Parquet read ~4x more data over the network - your
> problem might be bandwidth related.
>
> You might want to convert a small amount to ORC and see whether the
> BYTES_READ drops or not.
>
> In my tests with a recent LLAP, Text data was faster on LLAP on S3 & Azure
> than Parquet, because Text has a vectorized reader & cache support.
>
> Cheers,
>
> Gopal
>