Cannot get column metadata through thrift

2019-01-29 Thread CJay Zhang (czhang3)
Hi everyone,

I get an error when getting the column info thrift API, the version of the hive 
is 3.1.1.
I created a table using the following sql: CREATE EXTERNAL TABLE `my_table`( a 
string, b bigint) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
When I calling the thrift API GetColumns(TGetColumnsResp& _return, const 
TGetColumnsReq& req), it returned an error, the error message is :

MetaException(message:java.lang.UnsupportedOperationException: 
Storage schema reading not supported)

+ [0]   
"*org.apache.hive.service.cli.HiveSQLException:MetaException(message:java.lang.UnsupportedOperationException:
 Storage schema reading not supported):25:24"  
std::basic_string,std::allocator >
+ [1]
"org.apache.hive.service.cli.operation.GetColumnsOperation:runInternal:GetColumnsOperation.java:213"

std::basic_string,std::allocator >
+ [2]  
"org.apache.hive.service.cli.operation.Operation:run:Operation.java:247"
std::basic_string,std::allocator >
+ [3]  
"org.apache.hive.service.cli.session.HiveSessionImpl:getColumns:HiveSessionImpl.java:695"

std::basic_string,std::allocator >
+ [4]  
"sun.reflect.GeneratedMethodAccessor46:invoke::-1"
std::basic_string,std::allocator >
+ [5]  
"sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43"

std::basic_string,std::allocator >
+ [6]  
"java.lang.reflect.Method:invoke:Method.java:498"
std::basic_string,std::allocator >
+ [7]  
"org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78"

std::basic_string,std::allocator >
+ [8]  
"org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36"

std::basic_string,std::allocator >
+ [9]  
"org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63"

std::basic_string,std::allocator >
+ [10]
"java.security.AccessController:doPrivileged:AccessController.java:-2"  
  std::basic_string,std::allocator >
+ [11]
"javax.security.auth.Subject:doAs:Subject.java:422"
std::basic_string,std::allocator >
+ [12]
"org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1729"

std::basic_string,std::allocator >
+ [13]
"org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59"

std::basic_string,std::allocator >
+ [14]
"com.sun.proxy.$Proxy43:getColumns::-1"
std::basic_string,std::allocator >
+ [15]
"org.apache.hive.service.cli.CLIService:getColumns:CLIService.java:387" 
   std::basic_string,std::allocator >
+ [16]
"org.apache.hive.service.cli.thrift.ThriftCLIService:GetColumns:ThriftCLIService.java:654"

std::basic_string,std::allocator >
+ [17]
"org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetColumns:getResult:TCLIService.java:1677"

std::basic_string,std::allocator >
+ [18]
"org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetColumns:getResult:TCLIService.java:1662"

std::basic_string,std::allocator >
+ [19]
"org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39" 
   std::basic_string,std::allocator >
+ [20]
"org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39"   
 std::basic_string,std::allocator >
+ [21]
"org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56"

std::basic_string,std::allocator >
+ [22]
"org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286"

std::basic_string,std::allocator >
+ [23]
"java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149"

std::basic_string,std::allocator >
+ [24]
"java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624"

std::basic_string,std::allocator >
+

Re: Comparing Google Cloud Platform BiqQuery with Hive

2019-01-29 Thread Mich Talebzadeh
Hi Furcy,

Thanks.

Apologies for being late on this. You are absolutely correct. I tried and
BQ can read compressed ORC files.

Still referring to my original thread, BQ handling of Double and Dates are
problematic. I tend to create these type of fields as String and do the ETL
in BQ by converting these fields into the desired type.

I am not much concerned about what Hive itself does. I run Hive on Spark
Execution engine on prem and use Spark for anything on prem interacting
with Hive. On BQ one can achieve the same although my Spark codes (written
in Scala) have to be modified. In general I have founds out that using
Spark in both prem and GCP on Hive and BQ respectively makes things easier.
Also so far as my tests go Spark has analytical functions identical both on
prem and in Dataproc.

HTH,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 14 Jan 2019 at 09:18, Furcy Pin  wrote:

> Hi Mich,
>
> Contrary to what you said, I can confirm you that BQ is able to read ORC
> files compressed with Snappy.
> However, BQ requires to perform a loading operation from the ORC file on
> Google Storage and convert it into a BQ table.
>
> The main advantages I see with BQ is the guaranteed very high scalability
> and low query latency without having to manage a Hadoop cluster yourself.
>
> I would not say however, that you can simply plug your existing HQL
> queries into BQ. All useful analytics functions are indeed there, but in
> many cases they have a different name.
> For instance, the equivalent of Hive's UDF *trunc* in BQ is *date_trunc.*
>
> In my use case I use pyspark for complex transformations and use BQ as a
> Warehouse to plug Power BI on it.
> So for a fair comparison, I think you should compare BQ with Vertica,
> Presto, Impala or Hive LLAP rather than just Hive.
>
> Regards,
>
> Furcy
>
>
>
>
> On Fri, 11 Jan 2019 at 11:18, Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> Has anyone got some benchmarks on comparing Hive with Google Cloud
>> Platform (GCP) BiqQuery (BQ)?
>>
>> From my experience  experience BQ supports both Avro and ORC file types.
>> There is no support for compressed ORC or AVRO. So if you want to load a
>> Hive table into BQ, you will need to create a table with no compression. In
>> short you need to perform ETL to move a Hive table to BQ.
>>
>> On the other hand BQ seems to support all analytical functions available
>> in Hive so your queries should run without any modification in BQ.
>>
>> On the other hand Dataproc tool in GCP also supports Hive (though I have
>> not tried it myself). So the question is are there any advantages taking a
>> Hive table into BQ itself?
>>
>> Thanks,
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>


Re: hqlsql on hive partition table

2019-01-29 Thread Dmitry Tolpeko
Can you please create a Jira ticket, and put code sample, not screenshot
there.

Thanks,
Dmitry

On Tue, Jan 29, 2019 at 9:08 AM 陈贵龙  wrote:

> Hi,
>How are you . i have  a question in how to use hqlsql on hive partition
> table
> when I use Hqlsql on hive Partition table,Why get the key Word  to
> function .Thank.
>
>
>
>
>
>


Re: JOIN on map value results to HiveException: Unexpected column vector type MAP

2019-01-29 Thread Jan Adona
Hi,

After tweaking the configs, I found out that
"hive.vectorized.execution.enabled" and "hive.auto.convert.join" configs
are the culprit.

I think vectorization on map column data type is not supported in my
current Hive version. Also, the Map Join is having problems on the map data
type.

So, after setting atleast one of these configurations to "false", the cross
product query successfully runs. Of course, there might be some performance
loss since we're turning off the vectorization and Map Join.


Regards,
Jan Charles


On Thu, Jan 17, 2019 at 1:57 PM Jan Adona 
wrote:

> Hi,
>
> Just a follow up, I think that JOIN is not the problem here since this
> error also occurs when I am querying 2 tables even without a join and you
> include the map column in the select statement.
>
> I'm going to rewrite the schema and queries that I've sent before because
> I mistakenly formatted the body that's why it has random asterisks.
>
> *Schema:*
>
>
>
>
>
>
>
> *CREATE TABLE test_table0(userid BIGINT, mapCol map BIGINT>)COMMENT 'Test table 0'STORED AS SEQUENCEFILE;CREATE TABLE
> test_table1(userid BIGINT, col1 STRING, col2 STRING)COMMENT 'Test table
> 1'STORED AS SEQUENCEFILE;*
> *Rows:*
>
>
> *INSERT INTO TABLE test_table0 VALUES (1, map('a', 1, 'b', 2));INSERT INTO
> TABLE test_table0 VALUES (2, map('c', 3, 'd', 4));INSERT INTO TABLE
> test_table1 VALUES (1, 'mycol1', 'mycol2');*
>
> *Query with a JOIN (fail):*
>
> *SELECT a.*, b.* FROM test_table0 a INNER JOIN test_table1 b
> ONa.mapCol['a'] = b.userid;*
>
> *Query without a JOIN (fail):*
> *SELECT a.*, b.* FROM test_table0 a, test_table1 b;*
>
> *Query without a JOIN, not including the column with the map data type
> (success):*
> *SELECT a.userid, b.* FROM test_table0 a, test_table1 b;*
>
> *Error message of the failed queries:*
> ERROR : Status: Failed
> ERROR : Vertex failed, vertexName=Map 1,
> vertexId=vertex_1546408189013_0179_7_01, diagnostics=[Task failed,
> taskId=task_1546408189013_0179_7_01_00, diagnostics=[TaskAttempt 0
> failed, info=[Error: Error while running task ( failure ) :
> attempt_1546408189013_0179_7_01_00_0:java.lang.RuntimeException:
> java.lang.RuntimeException: Map operator initialization failed
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
> column vector type MAP
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
> at
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335)
> ... 17 more
>
> Also, I'm running this query on an HDP 3.0.1 cluster with Apache Hive
> 3.1.0.
>
>