Re: Problem creating Hive Avro table based off schema from web server

2020-06-28 Thread ravi kanth
Hi Jagat,

Yes, curling or using wget will retrieve the contents of the Avro schema.
Just to be specific not the avsc file but its contents.

Thanks,
Ravi


On Sat, Jun 27, 2020 at 2:06 AM Jagat Singh  wrote:

> Hello Ravi,
>
> When you wget this url
>
> wget http://:9091/schema?name=ed&store=parquet&
> isMutated=true&table=ed&secbypass=testing'
>
> Do you get avsc file?
>
> Regards,
>
> Jagat Singh
>
> On Sat, 27 Jun 2020, 7:01 am ravi kanth,  wrote:
>
>> Just want to follow up on the below email.
>>
>> Thanks,
>> Ravi
>>
>>
>> On Thu, Jun 25, 2020 at 5:37 PM ravi kanth  wrote:
>>
>>> Hi Community,
>>>
>>> Hive Version: 3.1.2
>>>
>>> We are working on building a Hive Avro table on a few Avro files. I am
>>> able to succesfully create the table and query it when I have the Avro
>>> schema definition(avsc) file on hdfs with no issues.
>>>
>>> However, when trying to load the same schema from a rest API(as
>>> mentioned in
>>> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-CreatingAvro-backedHivetables),
>>> hive throws an exception and fails to create the table.
>>>
>>> *Sample table:*
>>> CREATE EXTERNAL TABLE ed_avro_1
>>> STORED AS AVRO
>>> LOCATION '/tmp/sample/mmdd=20200206'
>>> TBLPROPERTIES ('avro.schema.literal'='http://
>>> :9091/schema?name=ed&store=parquet&isMutated=true&table=ed&secbypass=testing');
>>>
>>> When launched hive in INFO mode below is the trace of the problem which
>>> looks like Hive is interpreting the URL as a file name and throwing out a
>>> FileNotFoundException.
>>>
>>> I have tried using avro.schema.literal instead of avro.schema.url,
>>> however it turns out that hive is interpreting URL as a String and throwing
>>> a jackson parsing error.
>>>
>>> Can anyone help look into this? Is this a bug in Hive-3.1.2? Any details
>>> will be of great help.
>>>
>>> Thanks,
>>> Ravi
>>>
>>>
>>> StackTrace:
>>>
 2020-06-26T00:06:03,283 INFO [main]
> org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in
> for log id: 646da35b-84b0-43aa-9b68-5d668ebbfc36

 2020-06-26T00:06:03,283 INFO [main]
> org.apache.hadoop.hive.ql.session.SessionState - Updating thread name to
> 646da35b-84b0-43aa-9b68-5d668ebbfc36 main

 2020-06-26T00:06:03,286  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] ql.Driver: Compiling
> command(queryId=hdfs_20200626000603_0992e79f-6e1c-4383-be62-a6466c4c1cf2):
> CREATE EXTERNAL TABLE ed_avro_1

 STORED AS AVRO

 LOCATION '/tmp/event_detail/mmdd=20200206'

 TBLPROPERTIES ('avro.schema.url'='http://
> :9091/schema?name=ed&store=parquet&isMutated=true&table=ed&secbypass=testing')

 2020-06-26T00:06:03,630  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] ql.Driver: Concurrency mode is disabled, not creating a lock manager

 2020-06-26T00:06:03,638  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] parse.CalcitePlanner: Starting Semantic Analysis

 2020-06-26T00:06:03,669  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] sqlstd.SQLStdHiveAccessController: Created 
> SQLStdHiveAccessController
> for session context : HiveAuthzSessionContext
> [sessionString=646da35b-84b0-43aa-9b68-5d668ebbfc36, clientType=HIVECLI]

 2020-06-26T00:06:03,673 WARN [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] org.apache.hadoop.hive.ql.session.SessionState -
> METASTORE_FILTER_HOOK will be ignored, since
> hive.security.authorization.manager is set to instance of
> HiveAuthorizerFactory.

 2020-06-26T00:06:03,673  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] metastore.HiveMetaStoreClient: Mestastore configuration
> metastore.filter.hook changed from
> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook

 2020-06-26T00:06:03,675  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] metastore.HiveMetaStore: 0: Cleaning up thread local RawStore...

 2020-06-26T00:06:03,675  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] HiveMetaStore.audit: ugi=hdfs ip=unknown-ip-addr cmd=Cleaning
> up thread local RawStore...

 2020-06-26T00:06:03,676  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore

 2020-06-26T00:06:03,676  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] HiveMetaStore.audit: ugi=hdfs ip=unknown-ip-addr cmd=Done
> cleaning up thread local RawStore

 2020-06-26T00:06:03,680  INFO [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] metastore.HiveMetaStore: 0: Opening raw store with implementation
> class:org.apache.hadoop.hive.metastore.ObjectStore

 2020-06-26T00:06:03,680  WARN [646da35b-84b0-43aa-9b68-5d668ebbfc36
> main] metastore.ObjectStore: datanucleus.autoStartMechanismMode is set to
> unsupported value 

RE: LLAP can't read ORC ZLIB files from S3

2020-06-28 Thread Aaron Grubb
Hi Owen, the problem disappeared when I stopped using 
orc.write.variable.length.blocks but how would I go about turning off direct 
byte buffers on read out of curiosity? I can’t find any settings to control 
this.

Thanks,
Aaron

From: Owen O'Malley 
Sent: Thursday, June 25, 2020 1:21 PM
To: user@hive.apache.org
Subject: Re: LLAP can't read ORC ZLIB files from S3

Actually, it looks like LLAP is trying to get the ByteBuffer array from a 
direct byte buffer. Turning off direct byte buffers on read should fix the 
problem.

.. Owen

On Thu, Jun 25, 2020 at 7:27 AM Aaron Grubb 
mailto:aaron.gr...@clearpier.com>> wrote:
This appears to have been caused by orc.write.variable.length.blocks=true which 
I had set for HDFS-based tables. Setting this to false and inserting data into 
the S3 table appears to have fixed this problem.

From: Aaron Grubb mailto:aaron.gr...@clearpier.com>>
Sent: Wednesday, June 24, 2020 4:04 PM
To: user@hive.apache.org
Subject: LLAP can't read ORC ZLIB files from S3

Hello everyone,

I’m encountering an error that I can’t find any information on. I’ve inserted 
data into a table with storage in S3 in ORC ZLIB format. I can query this data 
directly without issues. Running a query that requires LLAP causes the 
following error:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
java.lang.UnsupportedOperationException
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
java.lang.UnsupportedOperationException
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 15 more
Caused by: java.io.IOException: java.io.IOException: 
java.lang.UnsupportedOperationException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
... 17 more
Caused by: java.io.IOException: java.lang.UnsupportedOperationException
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readIndexStreams(EncodedReaderImpl.java:1954)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:384)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:263)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncod