RE: Error running SQL query through Hive JDBC

2016-08-05 Thread Markovitz, Dudu
1.
SELECT TBL_CODE FROM DB.CODE_MAP WHERE SYSTEM_NAME='TDS' AND 
TABLE_NAME=TRIM('XYZ')

This does not make sense

2.
Can you please also share the DDL and maybe a small set of data?

Thanks

Dudu

From: Amit Bajpai [mailto:amit.baj...@flextronics.com]
Sent: Friday, August 05, 2016 11:08 PM
To: user@hive.apache.org
Subject: RE: Error running SQL query through Hive JDBC

Below is the code snippet with the SQL query which I am running. The same query 
is running fine through Hive CLI.

String sql = " SELECT TBL_CODE FROM DB.CODE_MAP 
WHERE SYSTEM_NAME='TDS' AND TABLE_NAME=TRIM('XYZ')";

System.out.println("New SQL: " + sql);

String driverName = 
"org.apache.hive.jdbc.HiveDriver";
try {
Class.forName(driverName);
Connection con = 
DriverManager.getConnection(

"jdbc:hive2://hiveservername:1/default",

"username", "");
HiveStatement stmt = 
(HiveStatement) con.createStatement();
ResultSet res = 
stmt.executeQuery(sql);

while (res.next()) {
Object ret_obj 
= res.getObject(1);

System.out.println(res.getString(1));
}

stmt.close();
con.close();

} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Friday, August 05, 2016 3:04 PM
To: user@hive.apache.org
Subject: RE: Error running SQL query through Hive JDBC

Can you please share the query?

From: Amit Bajpai [mailto:amit.baj...@flextronics.com]
Sent: Friday, August 05, 2016 10:40 PM
To: user@hive.apache.org
Subject: Error running SQL query through Hive JDBC

Hi,

I am getting the below error when running the SQL query through Hive JDBC. Can 
suggestion how to fix it.

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: SemanticException UDF = is not allowed
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
at 
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
at 
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at 
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at com.flex.hdp.logs.test.main(test.java:84)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: SemanticException UDF = is not allowed
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:111)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:180)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:256)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.Hado

Re: parquet decoding exceptions - hue sample data view works fine though

2016-08-05 Thread Sumit Khanna
Well anyways, even from Hue if I try loading partition wise data, it throws
the same error. Am really really perplexed to what this bug really is..
Although, if I try viewing the data in general, it displays me column names
/ values  / analysis etc but not partitionwise.

Thanks,
Sumit

On Sat, Aug 6, 2016 at 10:33 AM, Sumit Khanna  wrote:

> Hey,
>
> I am having a parquet dir and a table mounted on it. the table is showing
> sample view , via hue fine but a simple query like select * from tablename
> gives this error :
>
>
>- Bad status for request TFetchResultsReq(fetchType=0, operationHandle=
>TOperationHandle(hasResultSet=True, modifiedRowCount=None,
>operationType=0, operationId=THandleIdentifier(
>secret='\xf7\xe7\x90\x0e\x85\x91E{\x99\xd1\xdf>v\xf7\x8c`',
>guid='\xcc\xd6$^\xac{M\xaf\x9c{\xc2\xcf\xf3\xc6\xe7/')),
>orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0,
>errorMessage='java.io.IOException: parquet.io.ParquetDecodingException:
>Can not read value at 0 in block -1 in file hdfs://askmehadoop/parquet1_
>mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-
>r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet',
>sqlState=None, infoMessages=['*org.apache.hive.service.cli.
>HiveSQLException:java.io.IOException: parquet.io.ParquetDecodingException:
>Can not read value at 0 in block -1 in file hdfs://askmehadoop/parquet1_
>mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-
>r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet:25:24',
>'org.apache.hive.service.cli.operation.SQLOperation:
>getNextRowSet:SQLOperation.java:352', 'org.apache.hive.service.cli.
>
> operation.OperationManager:getOperationNextRowSet:OperationManager.java:220',
>'org.apache.hive.service.cli.session.HiveSessionImpl:
>fetchResults:HiveSessionImpl.java:685', 'sun.reflect.
>GeneratedMethodAccessor63:invoke::-1', 'sun.reflect.
>DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
>'java.lang.reflect.Method:invoke:Method.java:498',
>'org.apache.hive.service.cli.session.HiveSessionProxy:
>invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.
>session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
>
> 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
>'java.security.AccessController:doPrivileged:AccessController.java:-2',
>'javax.security.auth.Subject:doAs:Subject.java:422',
>'org.apache.hadoop.security.UserGroupInformation:doAs:
>UserGroupInformation.java:1657', 'org.apache.hive.service.cli.
>session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
>'com.sun.proxy.$Proxy22:fetchResults::-1',
>'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
>'org.apache.hive.service.cli.thrift.ThriftCLIService:
>FetchResults:ThriftCLIService.java:672', 'org.apache.hive.service.cli.
>thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
>'org.apache.hive.service.cli.thrift.TCLIService$Processor$
>FetchResults:getResult:TCLIService.java:1538', 'org.apache.thrift.
>ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.
>TBaseProcessor:process:TBaseProcessor.java:39',
>'org.apache.hive.service.auth.TSetIpAddressProcessor:process:
>TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.
>TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
>
> 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142',
>
> 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617',
>'java.lang.Thread:run:Thread.java:745', 
> '*java.io.IOException:parquet.io.ParquetDecodingException:
>Can not read value at 0 in block -1 in file hdfs://askmehadoop/parquet1_
>mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-
>r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet:29:4',
>
> 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:507',
>
> 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:414',
>'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140',
>'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:1670',
>'org.apache.hive.service.cli.operation.SQLOperation:
>getNextRowSet:SQLOperation.java:347', 
> '*parquet.io.ParquetDecodingException:Can
>not read value at 0 in block -1 in file hdfs://askmehadoop/parquet1_
>mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-
>r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet:36:7',
>'parquet.hadoop.InternalParquetRecordReader:nextKeyValue:
>InternalParquetRecordReader.java:228', 'parquet.hadoop.
>ParquetRecordReader:nextKeyValue:ParquetRecordReader.java:201', '
>org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper:<
>init>:ParquetRecordReaderWrappe

parquet decoding exceptions - hue sample data view works fine though

2016-08-05 Thread Sumit Khanna
Hey,

I am having a parquet dir and a table mounted on it. the table is showing
sample view , via hue fine but a simple query like select * from tablename
gives this error :


   - Bad status for request TFetchResultsReq(fetchType=0,
   operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None,
   operationType=0,
   
operationId=THandleIdentifier(secret='\xf7\xe7\x90\x0e\x85\x91E{\x99\xd1\xdf>v\xf7\x8c`',
   guid='\xcc\xd6$^\xac{M\xaf\x9c{\xc2\xcf\xf3\xc6\xe7/')), orientation=4,
   maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0,
   errorMessage='java.io.IOException: parquet.io.ParquetDecodingException: Can
   not read value at 0 in block -1 in file
   
hdfs://askmehadoop/parquet1_mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet',
   sqlState=None,
   
infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException:
   parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in
   file
   
hdfs://askmehadoop/parquet1_mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet:25:24',
   
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:352',
   
'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:220',
   
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:685',
   'sun.reflect.GeneratedMethodAccessor63:invoke::-1',
   
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
   'java.lang.reflect.Method:invoke:Method.java:498',
   
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
   
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
   
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
   'java.security.AccessController:doPrivileged:AccessController.java:-2',
   'javax.security.auth.Subject:doAs:Subject.java:422',
   
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
   
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
   'com.sun.proxy.$Proxy22:fetchResults::-1',
   'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
   
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
   
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
   
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
   'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
   'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
   
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
   
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
   
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142',
   
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617',
   'java.lang.Thread:run:Thread.java:745',
   '*java.io.IOException:parquet.io.ParquetDecodingException: Can not read
   value at 0 in block -1 in file
   
hdfs://askmehadoop/parquet1_mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet:29:4',
   
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:507',
   
'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:414',
   'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:140',
   'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:1670',
   
'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:347',
   '*parquet.io.ParquetDecodingException:Can not read value at 0 in block -1
   in file
   
hdfs://askmehadoop/parquet1_mpdm_mpdm_store/partitioned_on_seller_mailer_flag=1/part-r-0-a77c308f-c088-4f41-ab07-0c8e0557dbe1.gz.parquet:36:7',
   
'parquet.hadoop.InternalParquetRecordReader:nextKeyValue:InternalParquetRecordReader.java:228',
   
'parquet.hadoop.ParquetRecordReader:nextKeyValue:ParquetRecordReader.java:201',
   
'org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper::ParquetRecordReaderWrapper.java:122',
   
'org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper::ParquetRecordReaderWrapper.java:85',
   
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat:getRecordReader:MapredParquetInputFormat.java:72',
   
'org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit:getRecordReader:FetchOperator.java:673',
   
'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:323',
   
'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:445',
   
'*java.lang.

Re: hive concurrency not working

2016-08-05 Thread Gopal Vijayaraghavan

> Depends on how you configured scheduling in yarn ...
...

>> you won't have this problem if you use Spark as the execution engine?
>>That handles concurrency OK

If I read this right, it is unlikely to be related to YARN configs.

The Hue issue is directly related to how many Tez/Spark sessions are
supported per-connection-handle.

hive.server2.parallel.ops.in.session

I would guess that this is queuing up in  the
getSessionManager().submitBackgroundOperation() call talking to
SparkSessionManagerImpl/TezSessionPoolManager.


MR has no equivalent of a "session" in relation to the cluster, because as
a non-DAG engine  it has to have the "parallel job queue" built over to
support parallel stages.


Cheers,
Gopal




Re: Hive compaction didn't launch

2016-08-05 Thread Eugene Koifman
Support for transactions in Hive is not just for Storm.  You can run 
transactional SQL statements.  So the system must support cases where all 
actions with a transaction are not known at the start of the transaction.

From: Igor Kuzmenko mailto:f1she...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, July 29, 2016 at 4:43 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive compaction didn't launch

Here's how storm works right now:

After receiving new message, Storm determine in which partition it should be 
written. Than, check is there any open connection to that 
HiveEndPoint,
 if not - creates one and fetches new transaction batch. Here I assume, that 
this transactions can only be used to write data at one HiveEndPoint only, 
because when we fetched transaction batch we pass RecordWriter to fetch method 
(StreamingConnection::fetchTransactionBatch).
 So I don't see a case "After they write to A they may choose to write to B and 
then commit". It seems, that Streaming API doesn't support this feature.

Storm keep receiving messages and when his message buffer is full or after 
fixed period of time it flushes all the messages (performing commit in terms of 
hive streaming).
And here's interesting part if there's nothing to flush Storm will do nothing. 
(HiveWriter)


public void flush(boolean rollToNext)
throws CommitFailure, TxnBatchFailure, TxnFailure, InterruptedException 
{
// if there are no records do not call flush
if (totalRecords <= 0) return;
try {
synchronized(txnBatchLock) {
commitTxn();
nextTxn(rollToNext);
totalRecords = 0;
lastUsed = System.currentTimeMillis();
}
} catch(StreamingException e) {
throw new TxnFailure(txnBatch, e);
}
}

At the same time Storm maintain all fetched transactions in separate thread by 
sending heartbeats 
(HiveBolt):

private void setupHeartBeatTimer() {
if(options.getHeartBeatInterval()>0) {
heartBeatTimer.schedule(new TimerTask() {
@Override
public void run() {
try {
if (sendHeartBeat.get()) {
LOG.debug("Start sending heartbeat on all writers");
sendHeartBeatOnAllWriters();
setupHeartBeatTimer();
}
} catch (Exception e) {
LOG.warn("Failed to heartbeat on HiveWriter ", e);
}
}
}, options.getHeartBeatInterval() * 1000);
}
}

The only way idle connection will be closed is excess connections limit which 
is configurable parameter, but I can't control this event explicitly. Making 
transaction batch smaller doesn't help either. Even if batch size is 1, after 
flushing the data Storm will get another transaction batch and will wait new 
messages, which may not come for a long time.

I don't see any way to fix this problem with proper configuration, I need to 
make changes in Hive or Storm code. Question is where it more appropriate?




On Fri, Jul 29, 2016 at 8:15 AM, Eugene Koifman 
mailto:ekoif...@hortonworks.com>> wrote:
I think Storm has some timeout parameter that will close the transaction
if there are no events for a certain amount of time.
How many transactions do you per transaction batch?  Perhaps making the
batches smaller will make them close sooner.

Eugene


On 7/28/16, 3:59 PM, "Alan Gates" 
mailto:alanfga...@gmail.com>> wrote:

>But until those transactions are closed you don¹t know that they won¹t
>write to partition B.  After they write to A they may choose to write to
>B and then commit.  The compactor can not make any assumptions about what
>sessions with open transactions will do in the future.
>
>Alan.
>
>> On Jul 28, 2016, at 09:19, Igor Kuzmenko 
>> mailto:f1she...@gmail.com>> wrote:
>>
>> But this minOpenTxn value isn't from from delta I want to compact.
>>minOpenTxn can point on transaction in partition A while in partition B
>>there's deltas ready for compaction. If minOpenTxn is less than txnIds
>>in partition B deltas, compaction won't happen. So open transaction in
>>partition A blocks compaction in partition B. That's seems wrong to me.
>>
>> On Thu, Jul 28, 2016 at 7:06 PM, Alan Gates 
>> mailto:alanfga...@gmail.com>>
>>wrote:
>> Hive is doing the right thing there, a

Re: Malformed orc file

2016-08-05 Thread Prasanth Jayachandran
If you are using one of the latest hive releases, then we orcfiledump have an 
option for recovering such files. It backtrack the files for intermediate 
footers.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
// Hive version 1.3.0 and later:
hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] 
[--skip-dump]
[--backup-path ] 

Thanks
Prasanth

On Aug 5, 2016, at 1:36 PM, Owen O'Malley 
mailto:omal...@apache.org>> wrote:

The file has trailing data. If you want to recover the data, you can use:

% strings -3 -t d ~/Downloads/bucket_0 | grep ORC

will print the offsets where ORC occurs with in the file:

0 ORC
4559 ORC

That means that there is one intermediate footer within the file. If you slice 
the file at the right point (ORC offset + 4), you can get the data back:

% dd bs=1 count=4563 < ~/Downloads/bucket_0 > recover.orc

and

% orc-metadata recover.orc

{ "name": "recover.orc",
  "type": 
"struct>",
  "rows": 115,
  "stripe count": 1,
  "format": "0.12", "writer version": "HIVE-8732",
  "compression": "zlib", "compression block": 16384,
  "file length": 4563,
  "content": 3454, "stripe stats": 339, "footer": 744, "postscript": 25,
  "row index stride": 1,
  "user metadata": {
"hive.acid.key.index": "71698156,0,114;",
"hive.acid.stats": "115,0,0"
  },
  "stripes": [
{ "stripe": 0, "rows": 115,
  "offset": 3, "length": 3451,
  "index": 825, "data": 2353, "footer": 273
}
  ]
}

.. Owen

On Fri, Aug 5, 2016 at 2:47 AM, Igor Kuzmenko 
mailto:f1she...@gmail.com>> wrote:
Unfortunately, I сan't provide more information, this file I got from our 
tester and he already droped table.

On Thu, Aug 4, 2016 at 9:16 PM, Prasanth Jayachandran 
mailto:pjayachand...@hortonworks.com>> wrote:
Hi

In case of streaming, when a transaction is open orc file is not closed and 
hence may not be flushed completely. Did the transaction commit successfully? 
Or was there any exception thrown during writes/commit?

Thanks
Prasanth

On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko 
mailto:f1she...@gmail.com>> wrote:

Hello, I've got a malformed ORC file in my Hive table. File was created by Hive 
Streaming API and I have no idea under what circumstances it became corrupted.

File on google drive: 
link

Exception message when trying to perform select from table:
ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1468498236400_1106_6_00, diagnostics=[Task failed, 
taskId=task_1468498236400_1106_6_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: 
org.apache.hadoop.hive.ql.io.FileFormatException:
 Malformed ORC file 
hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_0.
 Invalid postscript length 0
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: 
org.apache.hadoop.hive.ql.io.FileFormatException:
 Malformed ORC file 
hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/directory_number_last_digit=5/delta_71700156_71700255/bucket_0.
 Invalid postscript length 0
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsIn

Re: Malformed orc file

2016-08-05 Thread Owen O'Malley
The file has trailing data. If you want to recover the data, you can use:

% strings -3 -t d ~/Downloads/bucket_0 | grep ORC

will print the offsets where ORC occurs with in the file:

0 ORC
4559 ORC

That means that there is one intermediate footer within the file. If you
slice the file at the right point (ORC offset + 4), you can get the data
back:

% dd bs=1 count=4563 < ~/Downloads/bucket_0 > recover.orc

and

% orc-metadata recover.orc

{ "name": "recover.orc",
  "type":
"struct>",
  "rows": 115,
  "stripe count": 1,
  "format": "0.12", "writer version": "HIVE-8732",
  "compression": "zlib", "compression block": 16384,
  "file length": 4563,
  "content": 3454, "stripe stats": 339, "footer": 744, "postscript": 25,
  "row index stride": 1,
  "user metadata": {
"hive.acid.key.index": "71698156,0,114;",
"hive.acid.stats": "115,0,0"
  },
  "stripes": [
{ "stripe": 0, "rows": 115,
  "offset": 3, "length": 3451,
  "index": 825, "data": 2353, "footer": 273
}
  ]
}

.. Owen

On Fri, Aug 5, 2016 at 2:47 AM, Igor Kuzmenko  wrote:

> Unfortunately, I сan't provide more information, this file I got from our
> tester and he already droped table.
>
> On Thu, Aug 4, 2016 at 9:16 PM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
>> Hi
>>
>> In case of streaming, when a transaction is open orc file is not closed
>> and hence may not be flushed completely. Did the transaction commit
>> successfully? Or was there any exception thrown during writes/commit?
>>
>> Thanks
>> Prasanth
>>
>> On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko  wrote:
>>
>> Hello, I've got a malformed ORC file in my Hive table. File was created
>> by Hive Streaming API and I have no idea under what circumstances it
>> became corrupted.
>>
>> File on google drive: link
>> 
>>
>> Exception message when trying to perform select from table:
>>
>> ERROR : Vertex failed, vertexName=Map 1, 
>> vertexId=vertex_1468498236400_1106_6_00,
>> diagnostics=[Task failed, taskId=task_1468498236400_1106_6_00_00,
>> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running
>> task:java.lang.RuntimeException: java.lang.RuntimeException:
>> java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException:
>> Malformed ORC file hdfs://sorm-master01.msk.mts.r
>> u:8020/apps/hive/warehouse/pstn_connections/dt=20160711/dire
>> ctory_number_last_digit=5/delta_71700156_71700255/bucket_0. Invalid
>> postscript length 0
>> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAn
>> dRunProcessor(TezProcessor.java:173)
>> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProce
>> ssor.java:139)
>> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(Log
>> icalIOProcessorRuntimeTask.java:344)
>> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable
>> $1.run(TezTaskRunner.java:181)
>> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable
>> $1.run(TezTaskRunner.java:172)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1657)
>> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable
>> .callInternal(TezTaskRunner.java:172)
>> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable
>> .callInternal(TezTaskRunner.java:168)
>> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.RuntimeException: java.io.IOException:
>> org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file
>> hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/pst
>> n_connections/dt=20160711/directory_number_last_digit=5/delt
>> a_71700156_71700255/bucket_0. Invalid postscript length 0
>> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$T
>> ezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedS
>> plitsInputFormat.java:196)
>> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$T
>> ezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
>> at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMap
>> red.java:113)
>> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecor
>> d(MapRecordSource.java:61)
>> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(
>> MapRecordProcessor.java:326)
>> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAn
>> dRunProcessor(TezProcessor.java:150)
>> ... 14 more
>> Caused by: java.io.IOException: 
>> org.apache.hadoop.hive.ql.io.FileFormatException:
>> Malformed ORC file hdfs://sorm-maste

RE: Error running SQL query through Hive JDBC

2016-08-05 Thread Amit Bajpai
Below is the code snippet with the SQL query which I am running. The same query 
is running fine through Hive CLI.

String sql = " SELECT TBL_CODE FROM DB.CODE_MAP 
WHERE SYSTEM_NAME='TDS' AND TABLE_NAME=TRIM('XYZ')";

System.out.println("New SQL: " + sql);

String driverName = 
"org.apache.hive.jdbc.HiveDriver";
try {
Class.forName(driverName);
Connection con = 
DriverManager.getConnection(

"jdbc:hive2://hiveservername:1/default",

"username", "");
HiveStatement stmt = 
(HiveStatement) con.createStatement();
ResultSet res = 
stmt.executeQuery(sql);

while (res.next()) {
Object ret_obj 
= res.getObject(1);

System.out.println(res.getString(1));
}

stmt.close();
con.close();

} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Friday, August 05, 2016 3:04 PM
To: user@hive.apache.org
Subject: RE: Error running SQL query through Hive JDBC

Can you please share the query?

From: Amit Bajpai [mailto:amit.baj...@flextronics.com]
Sent: Friday, August 05, 2016 10:40 PM
To: user@hive.apache.org
Subject: Error running SQL query through Hive JDBC

Hi,

I am getting the below error when running the SQL query through Hive JDBC. Can 
suggestion how to fix it.

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: SemanticException UDF = is not allowed
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
at 
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
at 
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at 
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at com.flex.hdp.logs.test.main(test.java:84)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: SemanticException UDF = is not allowed
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:111)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:180)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:256)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
at com.sun.proxy.$Proxy32.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLISer

RE: Error running SQL query through Hive JDBC

2016-08-05 Thread Markovitz, Dudu
Can you please share the query?

From: Amit Bajpai [mailto:amit.baj...@flextronics.com]
Sent: Friday, August 05, 2016 10:40 PM
To: user@hive.apache.org
Subject: Error running SQL query through Hive JDBC

Hi,

I am getting the below error when running the SQL query through Hive JDBC. Can 
suggestion how to fix it.

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: SemanticException UDF = is not allowed
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
at 
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
at 
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at 
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at com.flex.hdp.logs.test.main(test.java:84)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: SemanticException UDF = is not allowed
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:111)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:180)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:256)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
at com.sun.proxy.$Proxy32.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:401)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at 
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.SemanticException:UDF = is not allowed
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:677)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:810)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1152)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraph

Error running SQL query through Hive JDBC

2016-08-05 Thread Amit Bajpai
Hi,

I am getting the below error when running the SQL query through Hive JDBC. Can 
suggestion how to fix it.

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: SemanticException UDF = is not allowed
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
at 
org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
at 
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at 
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at com.flex.hdp.logs.test.main(test.java:84)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: SemanticException UDF = is not allowed
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:111)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:180)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:256)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
at com.sun.proxy.$Proxy32.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:401)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at 
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.SemanticException:UDF = is not allowed
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:677)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:810)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1152)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:189)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactor

Re: hive not showing up default database

2016-08-05 Thread Mich Talebzadeh
Most prob. a configuration issue. You don't create default database. It is
there for you.

How are you connecting to Hive, Hive CLI or Hive Thrift server?

Can you log in to hive and do

show databases;

Also try to create another database

create database mytest;

and send the output.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 August 2016 at 11:20, Sumit Khanna  wrote:

> Hello,
>
> hive is not showing default database. but when I try to create database
> default, it says default database already exists and throws an error.
>
> also, FYI, I have configured metastorage db as mysql.
>
> Please help.
>
> Thanks,
> Sumit
>


hive not showing up default database

2016-08-05 Thread Sumit Khanna
Hello,

hive is not showing default database. but when I try to create database
default, it says default database already exists and throws an error.

also, FYI, I have configured metastorage db as mysql.

Please help.

Thanks,
Sumit


Re: Malformed orc file

2016-08-05 Thread Igor Kuzmenko
Unfortunately, I сan't provide more information, this file I got from our
tester and he already droped table.

On Thu, Aug 4, 2016 at 9:16 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Hi
>
> In case of streaming, when a transaction is open orc file is not closed
> and hence may not be flushed completely. Did the transaction commit
> successfully? Or was there any exception thrown during writes/commit?
>
> Thanks
> Prasanth
>
> On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko  wrote:
>
> Hello, I've got a malformed ORC file in my Hive table. File was created by
> Hive Streaming API and I have no idea under what circumstances it
> became corrupted.
>
> File on google drive: link
> 
>
> Exception message when trying to perform select from table:
>
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1468498236400_1106_6_00,
> diagnostics=[Task failed, taskId=task_1468498236400_1106_6_00_00,
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running
> task:java.lang.RuntimeException: java.lang.RuntimeException:
> java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException:
> Malformed ORC file hdfs://sorm-master01.msk.mts.
> ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/
> directory_number_last_digit=5/delta_71700156_71700255/bucket_0.
> Invalid postscript length 0
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.
> initializeAndRunProcessor(TezProcessor.java:173)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(
> TezProcessor.java:139)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(
> LogicalIOProcessorRuntimeTask.java:344)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(
> TezTaskRunner.java:181)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(
> TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.
> callInternal(TezTaskRunner.java:172)
> at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.
> callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.IOException:
> org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file
> hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/
> pstn_connections/dt=20160711/directory_number_last_digit=5/
> delta_71700156_71700255/bucket_0. Invalid postscript length 0
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$
> TezGroupedSplitsRecordReader.initNextRecordReader(
> TezGroupedSplitsInputFormat.java:196)
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$
> TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at org.apache.tez.mapreduce.lib.MRReaderMapred.next(
> MRReaderMapred.java:113)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.
> pushRecord(MapRecordSource.java:61)
> at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.
> run(MapRecordProcessor.java:326)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.
> initializeAndRunProcessor(TezProcessor.java:150)
> ... 14 more
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.ql.io.FileFormatException:
> Malformed ORC file hdfs://sorm-master01.msk.mts.
> ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/
> directory_number_last_digit=5/delta_71700156_71700255/bucket_0.
> Invalid postscript length 0
> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.
> handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.
> handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(
> HiveInputFormat.java:251)
> at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$
> TezGroupedSplitsRecordReader.initNextRecordReader(
> TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed
> ORC file hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/
> pstn_connections/dt=20160711/directory_number_last_digit=5/
> delta_71700156_71700255/bucket_0. Invalid postscript length 0
> at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl.
> java:236)
> at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(
> ReaderImpl.java:3

Re: Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-05 Thread Igor Kuzmenko
Thanks for reply, Gopal. Very helpful.

On Thu, Aug 4, 2016 at 10:15 PM, Gopal Vijayaraghavan 
wrote:

> > where res_url like '%mts.ru%'
> ...
> > where res_url like '%mts_ru%'
> ...
> > Why '_' wildcard decrease perfomance?
>
> Because it misses the fast path by just one "_".
>
> ORC vectorized reader has a zero-copy check for 3 patterns - prefix,
> suffix and middle.
>
> That means "https://%";, "%.html", "%mts.ru%" will hit the fast path -
> which uses StringExpr::equal() which JITs into the following.
>
> https://issues.apache.org/jira/secure/attachment/
> 12748720/string-intrinsic-
> sse.png
>
>
> In Hive-2.0, you can mix these up too to get "https:%mts%.html" in a
> ChainedChecker.
>
>
> Anything other than these 3 cases becomes a Regex and takes the slow path.
>
> The pattern you mentioned gets rewritten into ".*mts.ru.*" and the inner
> loop has a new String() as the input to the matcher + matcher.matches() in
> it.
>
> I've put in some patches recently which rewrite it Lazy regexes like
> ".?*mts.ru.?*", so the regex DFA will be smaller (HIVE-13196).
>
> That improves the case where the pattern is found, but does nothing to
> improve the performance of the new String() GC garbage.
>
> Cheers,
> Gopal
>
>
>


Re: hive concurrency not working

2016-08-05 Thread Mich Talebzadeh
great in that case they can try it and I am pretty sure if they are stuck
they can come and ask you for expert advice since Hortonworks do not
support Hive on Spark and I know that

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 August 2016 at 09:01, Jörn Franke  wrote:

> That is not correct the option is there to install it.
>
> On 05 Aug 2016, at 08:41, Mich Talebzadeh 
> wrote:
>
> You won't have this problem if you use Spark as the execution engine! This
> set up handles concurrency but Hive with Spark is not part of the HW distro.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 5 August 2016 at 07:39, Mich Talebzadeh 
> wrote:
>
>> you won't have this problem if you use Spark as the execution engine?
>> That handles concurrency OK
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 5 August 2016 at 06:23, Raj hadoop  wrote:
>>
>>> Thanks everyone..
>>>
>>> we are raising case with Hortonworks
>>>
>>> On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop  wrote:
>>>
 Dear All,

 In need or your help,

 we have horton works 4 node cluster,and the problem is hive is allowing
 only one user at a time,

 if any second resource need to login hive is not working,

 could someone please help me in this

 Thanks,
 Rajesh

>>>
>>>
>>
>


Re: hive concurrency not working

2016-08-05 Thread Jörn Franke
Depends on how you configured scheduling in yarn ...

> On 05 Aug 2016, at 08:39, Mich Talebzadeh  wrote:
> 
> you won't have this problem if you use Spark as the execution engine? That 
> handles concurrency OK
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
>> On 5 August 2016 at 06:23, Raj hadoop  wrote:
>> Thanks everyone..
>> 
>> we are raising case with Hortonworks
>> 
>>> On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop  wrote:
>>> Dear All,
>>> 
>>> In need or your help,
>>> 
>>> we have horton works 4 node cluster,and the problem is hive is allowing 
>>> only one user at a time,
>>> 
>>> if any second resource need to login hive is not working,
>>> 
>>> could someone please help me in this
>>> 
>>> Thanks,
>>> Rajesh
> 


Re: hive concurrency not working

2016-08-05 Thread Jörn Franke
That is not correct the option is there to install it.

> On 05 Aug 2016, at 08:41, Mich Talebzadeh  wrote:
> 
> You won't have this problem if you use Spark as the execution engine! This 
> set up handles concurrency but Hive with Spark is not part of the HW distro.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
>> On 5 August 2016 at 07:39, Mich Talebzadeh  wrote:
>> you won't have this problem if you use Spark as the execution engine? That 
>> handles concurrency OK
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> http://talebzadehmich.wordpress.com
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>>> On 5 August 2016 at 06:23, Raj hadoop  wrote:
>>> Thanks everyone..
>>> 
>>> we are raising case with Hortonworks
>>> 
 On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop  wrote:
 Dear All,
 
 In need or your help,
 
 we have horton works 4 node cluster,and the problem is hive is allowing 
 only one user at a time,
 
 if any second resource need to login hive is not working,
 
 could someone please help me in this
 
 Thanks,
 Rajesh
>