ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

2018-07-25 Thread Dmitry Goldenberg
Hi,

I apologize for the wide distribution and if this is not the right mailing
list for this.

We write Avro files to Parquet and load them to HDFS so they can be
accessed via an EXTERNAL Hive table.  These records have two timestamp
fields which are expressed in the Avro schema as type = long and
logicalType=timestamp-millis.

When trying to do a SELECT * FROM  we get the error as included
below. Basically, the long values cannot be converted to timestamps.  This
appears similar to https://issues.apache.org/jira/browse/HIVE-13534.

Could someone suggest a workaround? Would we have to make the timestamp
fields strings? Was hoping to avoid that... Thanks

-Dmitry

Bad status for request TFetchResultsReq(

fetchType=0, operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,

operationId=THandleIdentifier(secret='\x94yB\xb2\xf47K\x98\xaa\xce\\\xab\xdc_\xcdH',
guid='~\xf9\xd5x\x1e\xe1I*\x91\xddu\x92\xa7\xec\xc6\xda')),

orientation=4, maxRows=100):



TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOExcept
ion:

org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable',

sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:

java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable:14:13',

'org.apache.hive.service.cli.operation.SQLOperation:getNextR
owSet:SQLOperation.java:463',

'org.apache.hive.service.cli.operation.OperationManager:getO
perationNextRowSet:OperationManager.java:294',

'org.apache.hive.service.cli.session.HiveSessionImpl:fetchRe
sults:HiveSessionImpl.java:769',

'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462',

'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchRe
sults:ThriftCLIService.java:694',

'org.apache.hive.service.cli.thrift.TCLIService$Processor$Fe
tchResults:getResult:TCLIService.java:1553',

'org.apache.hive.service.cli.thrift.TCLIService$Processor$Fe
tchResults:getResult:TCLIService.java:1538',

'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',

'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',

'org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server
$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:747',

'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:
run:TThreadPoolServer.java:286',

'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoo
lExecutor.java:1149',

'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPo
olExecutor.java:624',

'java.lang.Thread:run:Thread.java:748',



'*java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:

java.lang.ClassCastException:

rg.apache.hadoop.io.LongWritable cannot be cast to
org.apache.hadoop.hive.serde2.io.TimestampWritable:16:2',

'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:154',

'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069',

'org.apache.hive.service.cli.operation.SQLOperation:getNextR
owSet:SQLOperation.java:458',

'*org.apache.hadoop.hive.ql.metadata.HiveException:java.lang
.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to
org.apache.hadoop.hive.serde2.io.TimestampWritable:23:7',

'org.apache.hadoop.hive.ql.exec.ListSinkOperator:processOp:
ListSinkOperator.java:90',

'org.apache.hadoop.hive.ql.exec.Operator:forward:Operator.java:815',

'org.apache.hadoop.hive.ql.exec.SelectOperator:processOp:Sel
ectOperator.java:84',

'org.apache.hadoop.hive.ql.exec.Operator:forward:Operator.java:815',

'org.apache.hadoop.hive.ql.exec.TableScanOperator:processOp:
TableScanOperator.java:98',

'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchO
perator.java:425',

'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchO
perator.java:417',

'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140',

'*java.lang.ClassCastException:org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable:28:5',

'org.apache.hadoop.hive.serde2.objectinspector.primitive.Wri
tableTimestampObjectInspector:getPrimitiveJavaObject:Writabl
eTimestampObjectInspector.java:39',

'org.apache.hadoop.hive.serde2.objectinspector.primitive.Wri
tableTimestampObjectInspector:getPrimitiveJavaObject:Writabl
eTimestampObjectInspector.java:25',

'org.apache.hadoop.hive.serde2.objectinspector.ObjectInspect
orUtils:copyToStandardObject:ObjectInspectorUtils.java:336',

'org.apache.hadoop.hive.serde2.SerDeUtils:toThriftPayload:
SerDeUtils.java:167',

'org.apache.hadoop.hive.ql.exec.FetchFormatter$ThriftFormatt
er:convert:FetchFormatter.java:61',

'org.apache.hadoop.hive.ql.exec.ListSinkOperator:processOp:ListSinkOperator.java:87'],
statusCode=3), results=None, hasMoreRows=None)


Re: Clustering and Large-scale analysis of Hive Queries

2018-07-25 Thread Johannes Alberti
Did you guys already look at Dr Elephant?

https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark

Not sure if there is anything you might find useful, but I would be interested 
in hearing about good and bad about Dr Elephant w/ Hive.

Sent from my iPhone

> On Jul 25, 2018, at 12:13 PM, Zheng Shao  wrote:
> 
> Hi,
> 
> I am interested in working on a project that takes a large number of Hive 
> queries (as well as their meta data like amount of resources used etc) and 
> find out common sub queries and expensive query groups etc.
> 
> Are there any existing work in this domain?  Happy to collaborate as well if 
> there are shared I interests.
> 
> Zheng
> 


Clustering and Large-scale analysis of Hive Queries

2018-07-25 Thread Zheng Shao
Hi,

I am interested in working on a project that takes a large number of Hive
queries (as well as their meta data like amount of resources used etc) and
find out common sub queries and expensive query groups etc.

Are there any existing work in this domain?  Happy to collaborate as well
if there are shared I interests.

Zheng


Re: Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-25 Thread 何宝宁
Thank you Gopal for pointing the root cause. After running command alter table 
xxx compact ‘major’ to request a force compaction, total length is right !

Is there any way to do compact immediately after insert values.

Bob He
Thanks

On 25 Jul 2018, at 1:45 PM, Gopal Vijayaraghavan  wrote:

> Search ’Total length’ in log sys_dag_xxx, it is 2147483648.

This is the INT_MAX “placeholder” value for uncompacted ACID tables.

This is because with ACIDv1 there is no way to generate splits against 
uncompacted files, so this gets “an empty bucket + unknown number of inserts + 
updates” placeholder value.

Cheers,
Gopal



Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-25 Thread Peter Vary
Happy to help!  :)

Proust (Feng Guizhou) [FDS Payment and Marketing]  ezt
írta (időpont: 2018. júl. 24., Ke 12:17):

> Just FYI, I'm able to make a custom UDF to apply the thread-safe code
> changes.
>
> Thanks a lot for your help
>
>
> Guizhou
> --
> *From:* Proust (Feng Guizhou) [FDS Payment and Marketing] <
> pf...@coupang.com>
> *Sent:* Tuesday, July 24, 2018 4:34:49 PM
> *To:* user@hive.apache.org
> *Subject:* Re: UDFJson cannot Make Progress and Looks Like Deadlock
>
>
> Thanks a lot for pointing this out, it makes the problem clear.
>
>
> For a quick workaround and low cost without upgrading, I'm considering
> to reimplement the UDF get_json_object to a new name to avoid the problem.
>
>
>
> Thanks
> Guizhou
> --
> *From:* Peter Vary 
> *Sent:* Tuesday, July 24, 2018 4:24:12 PM
> *To:* user@hive.apache.org
> *Subject:* Re: UDFJson cannot Make Progress and Looks Like Deadlock
>
> Hi Guizhou,
>
> I would guess, that this is caused by:
>
>- HIVE-16196  UDFJson
>having thread-safety issues
>
>
> Try to upgrade to a CDH version where this patch is already included
> (5.12.0 or later)
>
> Regards,
> Peter
>
>
> On Jul 24, 2018, at 10:15, Proust (Feng Guizhou) [FDS Payment and
> Marketing]  wrote:
>
> Hi, Hive Community
>
> We are running Hive on Spark with CDH Cluster: Apache Hive (version
> 1.1.0-cdh5.10.1)
> Sometimes(High Frequency) Hive Query could hang and does not make progress
> within *UDFJson.evaluate*
> *An example Executor thread dump looks below*
> *3 threads hang within *
>
> java.util.HashMap$TreeNode.balanceInsertion(HashMap.java:2229)
>
>
> *1 thread hang within*
>
> java.util.HashMap$TreeNode.find(HashMap.java:1873)
>
>
> I cannot find out any existing Jira issues related to this problem, here
> I'm looking for if any workaround or any solution if anyone already
> encounter and solved such problem.
>
> Personally I doubt that it may looks like a concurrent issue on HashMap.
>
> *Detail Thread Dump:*
>
> java.util.HashMap$TreeNode.balanceInsertion(HashMap.java:2229)
> java.util.HashMap$TreeNode.treeify(HashMap.java:1938)
> java.util.HashMap$TreeNode.split(HashMap.java:2161)
> java.util.HashMap.resize(HashMap.java:713)
> java.util.HashMap.putVal(HashMap.java:662)
> java.util.HashMap.put(HashMap.java:611)
> org.apache.hadoop.hive.ql.udf.UDFJson.evaluate(UDFJson.java:151)
> sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:498)
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:965)
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:89)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:748)
>
>
> java.util.HashMap$TreeNode.find(HashMap.java:1873)
> java.util.HashMap$TreeNode.getTreeNode(HashMap.java:1881)
> java.util.HashMap.getNode(HashMap.java:575)
> 

?????? Using snappy compresscodec in hive

2018-07-25 Thread Zhefu Peng
Hi Gopal,


Thanks for your reply! One more question, does the effect of using pure-java 
version is the same as that of using SnappyCodec? Or, in other words, is there 
any difference between these two methods, about the compression result and 
effect?


Looking forward to your reply and help.


Best,
Zhefu Peng




--  --
??: "Gopal Vijayaraghavan";
: 2018??7??24??(??) 10:53
??: "user@hive.apache.org";

: Re: Using snappy compresscodec in hive




> "TBLPROPERTIES ("orc.compress"="Snappy"); " 

That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is 
slower, but always works).

The Hadoop snappyCodec needs libsnappy installed on all hosts.

Cheers,
Gopal