Re: Hive Metastore Hook to to fire only on success

2018-10-06 Thread Daniel Haviv
Thanks.
I was using MetaStoreEventListener and wasn't aware there's another type of
HMS hook.

TY.
Daniel

On Fri, Oct 5, 2018 at 10:54 PM Alan Gates  wrote:

> Which version of Hive are you on and which hook are you seeing fire?
> Based on looking at the master code you should only see the
> commitCreateTable hook call if the creation succeeds.
>
> Alan.
>
> On Thu, Oct 4, 2018 at 12:36 AM Daniel Haviv 
> wrote:
>
>> Hi,
>> I'm writing a HMS hook and I noticed that the hook fires no matter if the
>> operation succeeded or not.
>> For example, if a user creates an already existing table, the operation
>> will fail but the the hook will fire regardless.
>>
>> Is there a way to either validate that the operation succeeded or fire
>> only upon success?
>>
>>
>> TY.
>> Daniel
>>
>


Hive Metastore Hook to to fire only on success

2018-10-04 Thread Daniel Haviv
Hi,
I'm writing a HMS hook and I noticed that the hook fires no matter if the
operation succeeded or not.
For example, if a user creates an already existing table, the operation
will fail but the the hook will fire regardless.

Is there a way to either validate that the operation succeeded or fire only
upon success?


TY.
Daniel


Specifying orc.stripe.size in Spark

2016-12-18 Thread Daniel Haviv
Hi,
When writing a dataframe using:
df.write.orc("/path/to/orc")

How can I specify orc parameters like orc.stripe.size ?

Thank you,
Daniel


Column names in ORC file

2016-12-15 Thread Daniel Haviv
Hi,
When I'm generating ORC files using spark the column names are written into
the ORC file but when generated using Hive I get the following column names:

_col107, _col33, _col23, _col102


Is it possible to somehow configure hive to properly store the column
names like Spark?


Thank you,

Daniel


Re: IntWritable cannot be cast to LongWritable

2016-12-14 Thread Daniel Haviv
I'm using 1.1.0.
I always thought these issues were resolved way back at 0.13-0.14.
So rewriting the data is the only way to handle this?

Thank you,
Daniel



On Wed, Dec 14, 2016 at 8:42 PM, Owen O'Malley  wrote:

> Which version of Hive are you on? Hive 2.1 should automatically handle the
> type conversions from the file to the table.
>
> .. Owen
>
> On Wed, Dec 14, 2016 at 9:36 AM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
>
>> Hi,
>> I have an ORC table where one of the fields was an int and is now a
>> bigint.
>> Whenever I query a partition before the schema change I encounter the
>> following error:
>> Error: java.io.IOException: java.io.IOException:
>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
>> cast to org.apache.hadoop.io.LongWritable
>> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handle
>> RecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleR
>> ecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRe
>> cordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226)
>> at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRe
>> cordReader.next(HadoopShimsSecure.java:136)
>> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToN
>> ext(MapTask.java:199)
>> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(
>> MapTask.java:185)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:
>> 453)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1698)
>>
>> I tried to manually go through the old partitions and set that column to
>> int but I'm still getting the same exceptions.
>> I expected promoting an int to a bigint shouldn't cause any problems.
>>
>> Am I doing something wrong ?
>>
>> Thank you,
>> Daniel
>>
>
>


IntWritable cannot be cast to LongWritable

2016-12-14 Thread Daniel Haviv
Hi,
I have an ORC table where one of the fields was an int and is now a bigint.
Whenever I query a partition before the schema change I encounter the
following error:
Error: java.io.IOException: java.io.IOException:
java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
cast to org.apache.hadoop.io.LongWritable
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

I tried to manually go through the old partitions and set that column to
int but I'm still getting the same exceptions.
I expected promoting an int to a bigint shouldn't cause any problems.

Am I doing something wrong ?

Thank you,
Daniel


Re: How to setup Hive JDBC client to connect remote Hiveserver

2016-04-03 Thread Daniel Haviv
You can see which pid is listening on port 1 by running "netstat -pan | 
grep 1"

The logs are usually under /var/log/hive or under a log dir inside Hive's dir

Thank you.
Daniel

> On 4 Apr 2016, at 08:11, brajmohan saxena  wrote:
> 
> My Hiveserver2 is up and running.
> But I think its not able to listening at port 1.
> What should i do now ?
> 
> Also I am using apache-hive-1.2.1-bin copied in my home directory and running 
> the Hiveserver2 from bin.
> But I do not find any hive.log file anywhere, Could you please suggest me the 
> exact location.
> 
> Thanks
> Braj
> 
>> On Mon, Apr 4, 2016 at 10:25 AM, Daniel Haviv 
>>  wrote:
>> It seems your hive server is not up (or not listening on port 1).
>> hiveserver's logs might shed some light (usually at /var/log/hive)
>> 
>> Thank you.
>> Daniel
>> 
>>> On 4 Apr 2016, at 07:00, brajmohan saxena  wrote:
>>> 
>>> Hi Shumin,
>>> 
>>> I did telnet 
>>> 
>>> braj-laptop:bin brajmohan$ telnet 192.168.1.103
>>> 
>>> Trying 192.168.1.103...
>>> 
>>> telnet: connect to address 192.168.1.103: Connection refused
>>> 
>>> telnet: Unable to connect to remote host
>>> 
>>> Thanks
>>> 
>>> Braj
>>> 
>>> 
>>>> On Mon, Apr 4, 2016 at 8:41 AM, Shumin Guo  wrote:
>>>> Can you telnet to that port? 
>>>> 
>>>> $ telnet 192.168.1.103 1
>>>> 
>>>>> On Sun, Apr 3, 2016 at 9:43 PM, brajmohan saxena 
>>>>>  wrote:
>>>>> Hi,
>>>>> 
>>>>> Could you please tell me how to connect a simple JDBC program to remote 
>>>>> Hiveserver2 with default Derby database.
>>>>> 
>>>>> I have Hiveserver2 running on remote machine and i am trying to run 
>>>>> simple JDBC program from client machine ( 
>>>>> DriverManager.getConnection("jdbc:hive2://192.168.1.103:1/default", 
>>>>> "", ""); )
>>>>> 
>>>>> but getting the following error.
>>>>> Error: Could not open client transport with JDBC Uri: 
>>>>> jdbc:hive2://192.168.1.103:10001: java.net.ConnectException: Connection 
>>>>> refused (state=08S01,code=0)
>>>>> 
>>>>> Do I need to change hive-site.xml file at server side.
>>>>> 
>>>>> Thanks in advance
>>>>> 
>>>>> Regards
>>>>> 
>>>>> Braj
>>>>> 
> 


Re: How to setup Hive JDBC client to connect remote Hiveserver

2016-04-03 Thread Daniel Haviv
It seems your hive server is not up (or not listening on port 1).
hiveserver's logs might shed some light (usually at /var/log/hive)

Thank you.
Daniel

> On 4 Apr 2016, at 07:00, brajmohan saxena  wrote:
> 
> Hi Shumin,
> 
> I did telnet 
> 
> braj-laptop:bin brajmohan$ telnet 192.168.1.103
> 
> Trying 192.168.1.103...
> 
> telnet: connect to address 192.168.1.103: Connection refused
> 
> telnet: Unable to connect to remote host
> 
> Thanks
> 
> Braj
> 
> 
>> On Mon, Apr 4, 2016 at 8:41 AM, Shumin Guo  wrote:
>> Can you telnet to that port? 
>> 
>> $ telnet 192.168.1.103 1
>> 
>>> On Sun, Apr 3, 2016 at 9:43 PM, brajmohan saxena  
>>> wrote:
>>> Hi,
>>> 
>>> Could you please tell me how to connect a simple JDBC program to remote 
>>> Hiveserver2 with default Derby database.
>>> 
>>> I have Hiveserver2 running on remote machine and i am trying to run simple 
>>> JDBC program from client machine ( 
>>> DriverManager.getConnection("jdbc:hive2://192.168.1.103:1/default", "", 
>>> ""); )
>>> 
>>> but getting the following error.
>>> Error: Could not open client transport with JDBC Uri: 
>>> jdbc:hive2://192.168.1.103:10001: java.net.ConnectException: Connection 
>>> refused (state=08S01,code=0)
>>> 
>>> Do I need to change hive-site.xml file at server side.
>>> 
>>> Thanks in advance
>>> 
>>> Regards
>>> 
>>> Braj
>>> 
> 


Re: Hive_CSV

2016-03-09 Thread Daniel Haviv
Hi Ajay,
Use the CSV serde to read your file, map all three columns but only select the 
relevant ones when you insert:

Create table csvtab (
irrelevant string,
sportName string,
sportType string) ...

Insert into loaded_table select sportName, sportType from csvtab;

Daniel

> On 9 Mar 2016, at 19:43, Ajay Chander  wrote:
> 
> Hi Everyone,
> 
> I am looking for a way, to ignore the first occurrence of the delimiter while 
> loading the data from csv file to hive external table.
> 
> Csv file: 
> 
> Xyz, baseball, outdoor
> 
> Hive table has two columns sport_name & sport_type and fields are separated 
> by ','
> 
> Now I want to load by data into table such that while loading it has to 
> ignore the first delimiter that ignore xyz and load the data from second 
> delimiter.
> 
> In the end my hive table should have the following data,
> 
> Baseball, outdoor .
> 
> Any inputs are appreciated. Thank you for your time.


Partition level inputformat

2016-01-27 Thread Daniel Haviv
Hi,
I'm trying to add external partitions to a table with a different
inputformat and row delimiter properties but I keep failing and I can't
find any documentation that explains the correct syntax.
This is the DML I'm running:

hive> alter table test_tbl_parquet add partition (year=2016,month=01,day=27)
>  ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\u0001'
> STORED AS INPUTFORMAT
>   'com.mycopmany.hive.WhaleAvroGenericInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>  location '/mycopmany/data/test_tbl/year=2016/month=01/day=27';
FAILED: ParseException line 1:90 missing EOF at 'ROW' near ')'


Thank you.
Daniel


Re: chmod: changing permissions of '/datadir/000056_0': Permission denied. user=danielh is not the owner of inode=000056_0

2016-01-25 Thread Daniel Haviv
Hi,
Any thoughts on this issue ?

Thank you.
Daniel

On Wed, Jan 20, 2016 at 12:28 PM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> We have a table in which the files are created by different users (under
> the same group).
> When a user inserts into the table it will finish successfully but after
> moving the files the user will receive the following error(s):
> chmod: changing permissions of '/datadir/56_0': Permission denied.
> user=danielh is not the owner of inode=56_0
>
> and that's because hive is trying to chmod file that the specific user did
> not create.
>
> Is there a way to prevent this behavior ?
>
> Thank you.
> Daniel
>


chmod: changing permissions of '/datadir/000056_0': Permission denied. user=danielh is not the owner of inode=000056_0

2016-01-20 Thread Daniel Haviv
Hi,
We have a table in which the files are created by different users (under
the same group).
When a user inserts into the table it will finish successfully but after
moving the files the user will receive the following error(s):
chmod: changing permissions of '/datadir/56_0': Permission denied.
user=danielh is not the owner of inode=56_0

and that's because hive is trying to chmod file that the specific user did
not create.

Is there a way to prevent this behavior ?

Thank you.
Daniel


Fwd: Conversion

2016-01-17 Thread Daniel Haviv
Hi,

We have a string column that represents an array of doubles that looks like
this:

f7 ad 3b 38 89 b7 e5 3f a1 c1 1a 74 db



To parse it we use unhex(translate(signalvalues,' ','')) which returns a
BINARY value.

How can we convert it to ARRAY ?



Thank you.

Daniel


simple usage of stack UDTF causes a cast exception

2016-01-10 Thread Daniel Haviv
Hi,
I'm trying to break a row into two rows based on two different columns by
using the following query:

SELECT mystack.alias1
FROM cdrtable
LATERAL VIEW stack(2,  caller_IMEI, recipient_IMEI) mystack  AS alias1;


The exception I'm hitting is:

java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString
cannot be cast to org.apache.hadoop.io.Text


this is the table's DDL:
CREATE TABLE `cdrtable`(
  `ts` string,
  `caller_msisdn` string,
  `caller_imei` string,
  `caller_imsi` string,
  `caller_cell` string,
  `recipient_msisdn` string,
  `recipient_imei` string,
  `recipient_imsi` string,
  `recipient_cell` string,
  `call_type` string,
  `call_duration` string,
  `call_length` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'


The whole stack trace:
16/01/10 14:20:34 [main]: ERROR CliDriver: Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString
cannot be cast to org.apache.hadoop.io.Text
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString
cannot be cast to org.apache.hadoop.io.Text
at
org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString
cannot be cast to org.apache.hadoop.io.Text
at
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:125)
at
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:107)
at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFStack.process(GenericUDTFStack.java:123)
at
org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:108)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:416)
at
org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
... 13 more
Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to
org.apache.hadoop.io.Text
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:220)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:306)
at

Re: UPDATE RE: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes (beeline - hive server 2)

2015-11-30 Thread Daniel Haviv
Hi,
I remember encountering a similar problem that was caused by an old mysql
client driver.
You can try and upgrade your mysql connector.

Daniel

On Mon, Nov 30, 2015 at 8:12 PM, Timothy Garza <
timothy.ga...@collinsongroup.com> wrote:

> We’ve been playing with the MySQL Global Settings: (Hive metastore)
>
>
>
> *mysql*> set global innodb_large_prefix = ON;  (ß this was set to OFF
> previously)
>
>
>
> …and now the ERROR is thus:
>
> Specified key was too long; max key length is 3072 bytes
>
>
>
> So it’s still ‘failing’ (but the HDFS operation itself succeeds). This
> must be the problem area as the message has changed from:
>
>
>
> Specified key was too long; max key length is 767 bytes
>
> to
>
> Specified key was too long; max key length is 3072 bytes
>
>
>
> …simply by altering the MySQL Global settings. So is hiveserver2 trying to
> use a key larger than MySQL supports (v5.5.2, file format Antelope)?
>
>
>
> NB. This only occurs when executing beeline INSERT, not CREATE nor SELECT
> statements on a Hive Table (in this case a Sequence File).
>
>
>
> My colleague thinks this is SSL related (because of the use of the word
> ‘key’ in the error), is HiveServer2 connecting to the Metastore using SSL?
>
>
> --
>
> Weirdly I’m experiencing exactly the same issue when trying to populate a
> Hive Table using INSERT OVERWRITE TABLE. We’re recently upgraded from Hive
> 0.13 to 1.2.1. NB. The Hive Table populates but the map-reduce returns an
> error code. I have run the hive Schema Tool:   schematool -dbType mysql
> -upgradeSchemaFrom 0.13
>
>
>
> The only table I can see with 767 size column is “PART_COL_STATS” –
> implemented in one of the metastore upgrade scripts. Column Name: 
> PARTITION_NAME
> | varchar(767). ß I changed this column to varchar(1000) but get the same
> message afterwards:
>
>
>
> *ERROR jdbc.JDBCStatsPublisher:* Error during JDBC initialization.
>
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key
> was too long; max key length is 767 bytes
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
>
> at com.mysql.jdbc.Util.getInstance(Util.java:384)
>
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
>
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4232)
>
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4164)
>
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
>
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
>
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2832)
>
> at
> com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1755)
>
> at
> com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1679)
>
> at
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.init(JDBCStatsPublisher.java:292)
>
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:411)
>
> at
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor

Re: Hive On Spark - Using custom SerDe

2015-11-16 Thread Daniel Haviv
Hi,
How should I set it ? just a normal set in hive or add it via the safety
valve to the hive or sparks configuartion?

Thank you.
Daniel

On Mon, Nov 16, 2015 at 5:46 PM, Jimmy Xiang  wrote:

> Have you add your class to "spark.kryo.classesToRegister"? You also need
> to make sure your jar is in ""hive.aux.jars.path".
>
> Thanks,
> Jimmy
>
> On Mon, Nov 16, 2015 at 1:44 AM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
>
>> Hi,
>> We have a custom SerDe we would like to use with Hive on Spark but I'm
>> not sure how to.
>> The error messages are pretty clear about the fact that it can't find my
>> SerDE's class:
>>
>> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable 
>> to find class: com.mycompany.hive.WhaleAvroGenericInputFormat
>>
>>
>>
>>
>> Thank you.
>>
>> Daniel
>>
>>
>


Hive On Spark - Using custom SerDe

2015-11-16 Thread Daniel Haviv
Hi,
We have a custom SerDe we would like to use with Hive on Spark but I'm not
sure how to.
The error messages are pretty clear about the fact that it can't find my
SerDE's class:

Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException:
Unable to find class: com.mycompany.hive.WhaleAvroGenericInputFormat




Thank you.

Daniel


Re: Disabling local mode optimization

2015-11-02 Thread Daniel Haviv
Hi,
I'm trying to set  hive.exec.mode.local.auto.inputbytes.max &
hive.exec.mode.local.auto.tasks.max to 1 or 0 but still local mode is being
used instead of M/R.

Any ideas?

Thank you.
Daniel

On Thu, Sep 3, 2015 at 8:02 AM, sreebalineni . 
wrote:

> Hi,
>
> Is not it that you should set it true, by default it is disabled which is
> false.
>
> Hive analyzes the size of each map-reduce job in a query and may run it
> locally if the following thresholds are satisfied:
>
>- The total input size of the job is lower than:
>hive.exec.mode.local.auto.inputbytes.max (128MB by default)
>- The total number of map-tasks is less than:
>hive.exec.mode.local.auto.tasks.max (4 by default)
>- The total number of reduce tasks required is 1 or 0.
>
> So for queries over small data sets, or for queries with multiple
> map-reduce jobs where the input to subsequent jobs is substantially smaller
> (because of reduction/filtering in the prior job), jobs may be run locally.
>
> so we may need to check the sizeof your input, which version of hive are
> you using? it can work only from Hive 0.7 onwards
>
> On Wed, Sep 2, 2015 at 4:46 PM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
>
>> Hi,
>> I would like to disable the optimization where a query that just selects
>> data is running without mapreduce (local mode).
>>
>> hive.exec.mode.local.auto is set to false but hive still runs in local mode 
>> for some queries.
>>
>>
>> How can I disable local mode completely?
>>
>>
>> Thank you.
>>
>> Daniel
>>
>>
>


Re: Merging small files

2015-10-17 Thread Daniel Haviv
Changed it to sort by.


On Sat, Oct 17, 2015 at 6:05 PM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Thanks for the tip Gopal.
> I tried what you suggested (on Tez) but I'm getting a middle stage with 1
> reducer (which is awful for performance).
>
> This is my query:
> insert into upstreamparam_org partition(day_ts, cmtsid) select * from
> upstreamparam_20151013 order by datats,macaddress;
>
> I've attached the query plan in case it might help understand why.
>
> Thank you.
> Daniel.
>
>
>
>
> On Fri, Oct 16, 2015 at 7:19 PM, Gopal Vijayaraghavan 
> wrote:
>
>>
>> > Is there a more efficient way to have Hive merge small files on the
>> >files without running with two passes?
>>
>> Not entirely an efficient way, but adding a shuffle stage usually works
>> much better as it gives you the ability to layout the files for better
>> vectorization.
>>
>> Like for TPC-H, doing ETL with
>>
>> create table lineitem as select * from lineitem sort by l_shipdate,
>> l_suppkey;
>>
>> will produce fewer files (exactly as many as your reducer #) & compresses
>> harder due to the natural order of transactions (saves ~20Gb or so at 1000
>> scale).
>>
>> Caveat: that is not more efficient in MRv2, only in Tez/Spark which can
>> run MRR pipelines as-is.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>


Re: Merging small files

2015-10-17 Thread Daniel Haviv
Thanks for the tip Gopal.
I tried what you suggested (on Tez) but I'm getting a middle stage with 1
reducer (which is awful for performance).

This is my query:
insert into upstreamparam_org partition(day_ts, cmtsid) select * from
upstreamparam_20151013 order by datats,macaddress;

I've attached the query plan in case it might help understand why.

Thank you.
Daniel.




On Fri, Oct 16, 2015 at 7:19 PM, Gopal Vijayaraghavan 
wrote:

>
> > Is there a more efficient way to have Hive merge small files on the
> >files without running with two passes?
>
> Not entirely an efficient way, but adding a shuffle stage usually works
> much better as it gives you the ability to layout the files for better
> vectorization.
>
> Like for TPC-H, doing ETL with
>
> create table lineitem as select * from lineitem sort by l_shipdate,
> l_suppkey;
>
> will produce fewer files (exactly as many as your reducer #) & compresses
> harder due to the natural order of transactions (saves ~20Gb or so at 1000
> scale).
>
> Caveat: that is not more efficient in MRv2, only in Tez/Spark which can
> run MRR pipelines as-is.
>
> Cheers,
> Gopal
>
>
>
Plan not optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-3
   Stats-Aggr Operator
  Stage-0
 Move Operator
partition:{}
table:{"name:":"default.upstreamparam_org","input 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
Stage-2
   Dependency Collection{}
  Stage-1
 Reducer 2
 File Output Operator [FS_5]
compressed:false
Statistics:Num rows: 8707462208 Data size: 
1767614828224 Basic stats: COMPLETE Column stats: NONE
table:{"name:":"default.upstreamparam_org","input 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
Select Operator [SEL_3]
|  
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"]
|  Statistics:Num rows: 8707462208 Data size: 
1767614828224 Basic stats: COMPLETE Column stats: NONE
|<-Map 1 [SIMPLE_EDGE]
   Reduce Output Operator [RS_7]
  key expressions:_col1 (type: bigint), _col0 
(type: bigint)
  sort order:++
  Statistics:Num rows: 8707462208 Data size: 
1767614828224 Basic stats: COMPLETE Column stats: NONE
  value expressions:_col2 (type: bigint), _col3 
(type: int), _col4 (type: int), _col5 (type: bigint), _col6 (type: float), 
_col7 (type: float), _col8 (type: float), _col9 (type: float), _col10 (type: 
float), _col11 (type: float), _col12 (type: float), _col13 (type: float), 
_col14 (type: float), _col15 (type: float), _col16 (type: bigint), _col17 
(type: bigint), _col18 (type: bigint), _col19 (type: bigint), _col20 (type: 
string)
  Select Operator [OP_6]
 
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"]
 Statistics:Num rows: 8707462208 Data size: 
1767614828224 Basic stats: COMPLETE Column stats: NONE
 TableScan [TS_0]
alias:upstreamparam_20151013
Statistics:Num rows: 8707462208 Data size: 
1767614828224 Basic stats: COMPLETE Column stats: NONE



Merging small files

2015-10-16 Thread Daniel Haviv
Hi,
We are using Hive to merge small files by setting
hive.merge.smallfiles.avgsize to 12000 and doing an insert as select to
a table.
The problem is that this take two passes over the data, first to insert the
data and then to merge it.

Is there a more efficient way to have Hive merge small files on the files
without running with two passes?


Thank you.
Daniel


Re: Hive SerDe regex error

2015-10-01 Thread Daniel Haviv
Hi,
You didn't escape the ^ character at the end.
Try using this string instead: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^
\[]*)\[([^ ]*)\]: \(([^ ]*)\) ([\^]*)

Daniel

On Thu, Oct 1, 2015 at 3:17 PM, IT CTO  wrote:

> Hi,
> I am trying to create a table with Regex SerDe but failing with no good
> reason:
> CREATE EXTERNAL TABLE syslog (
>   month STRING,
>   day STRING,
>   time STRING,
>   source STRING,
>   process STRING,
>   pid STRING,
>   uname STRING,
>   message STRING)
> COMMENT 'This is the syslog sample table'
> ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.RegexSerDe"
> WITH SERDEPROPERTIES (
>   "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \[]*)\[([^ ]*)\]:
> \(([^ ]*)\) ([^]*)"
> )
> STORED AS TEXTFILE
> LOCATION 'dfs://localhost:8020/data/flumeTest/flume-test-spoolDir';
>
> The regex iself works on regex tester so I don't understand why I am
> getting:
>
>  FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> java.util.regex.PatternSyntaxException: Unclosed character class near index
> 66
> ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ []*)[([^ ]*)]: (([^ ]*)) ([^]*)
>
> Any help?
> --
> Eran | "You don't need eyes to see, you need vision" (Faithless)
>


Re: Error: java.lang.IllegalArgumentE:Column has wrong number of index entries found - when trying to insert from JSON external table to ORC table

2015-09-11 Thread Daniel Haviv
Hi Prasanth,
Can you elaborate on what does the hive.merge.orcfile.stripe.level parameter 
affext?

Thank you for your help.
Daniel

Sent from my iPhone

> On 8 בספט׳ 2015, at 17:48, Prasanth Jayachandran 
>  wrote:
> 
> hive.merge.orcfile.stripe.level


Permission denied error when starting HiveServer2

2015-09-07 Thread Daniel Haviv
Hi,
I'm getting this error when starting HiveServer2:
2015-09-07 08:09:50,356 WARN org.apache.hive.service.server.HiveServer2:
Error starting HiveServer2 on attempt 1, will retry in 60 seconds
java.lang.RuntimeException: java.lang.RuntimeException:
java.io.IOException: Permission denied
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
at
org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:124)
at org.apache.hive.service.cli.CLIService.init(CLIService.java:111)
at
org.apache.hive.service.CompositeService.init(CompositeService.java:59)
at
org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92)
at
org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309)
at
org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68)
at
org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523)
at
org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: java.io.IOException: Permission
denied
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:465)
... 14 more
Caused by: java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2024)
at
org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:740)
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:463)
... 14 more

the hive user has write permissions to the scratch dir, is there another
path I should take care of ?

Thank you.
Daniel


Re: Disabling local mode optimization

2015-09-02 Thread Daniel Haviv
Excatly the info I needed.
Thanks

Daniel

> On 3 בספט׳ 2015, at 09:02, sreebalineni .  wrote:
> 
> Hi,
> 
> Is not it that you should set it true, by default it is disabled which is 
> false.
> Hive analyzes the size of each map-reduce job in a query and may run it 
> locally if the following thresholds are satisfied:
> The total input size of the job is lower than: 
> hive.exec.mode.local.auto.inputbytes.max (128MB by default)
> The total number of map-tasks is less than: 
> hive.exec.mode.local.auto.tasks.max (4 by default)
> The total number of reduce tasks required is 1 or 0.
> So for queries over small data sets, or for queries with multiple map-reduce 
> jobs where the input to subsequent jobs is substantially smaller (because of 
> reduction/filtering in the prior job), jobs may be run locally.
> so we may need to check the sizeof your input, which version of hive are you 
> using? it can work only from Hive 0.7 onwards
> 
>> On Wed, Sep 2, 2015 at 4:46 PM, Daniel Haviv 
>>  wrote:
>> Hi,
>> I would like to disable the optimization where a query that just selects 
>> data is running without mapreduce (local mode).
>> hive.exec.mode.local.auto is set to false but hive still runs in local mode 
>> for some queries.
>> 
>> How can I disable local mode completely?
>> 
>> Thank you.
>> Daniel
> 


Disabling local mode optimization

2015-09-02 Thread Daniel Haviv
Hi,
I would like to disable the optimization where a query that just selects
data is running without mapreduce (local mode).

hive.exec.mode.local.auto is set to false but hive still runs in local
mode for some queries.


How can I disable local mode completely?


Thank you.

Daniel


Re: Data presentation to consumer layer

2015-08-25 Thread Daniel Haviv
Hi,
There is a myriad of solutions, among them:
Impala
Presto
Drill
Kylin
Tajo


On Tue, Aug 25, 2015 at 10:44 AM, Mich Talebzadeh 
wrote:

> Hi,
>
>
>
> My question concerns the means of presenting data to consumer layer from
> Hive.
>
>
>
> Obviously Hive is very suitable for batch analysis. However, the MapReduce
> nature of extracting data make is unlikely as a direct access tool for
> consumer layer.
>
>
>
> So my question is what products are there that can be used effectively to
> get the data from Hive to visualisations tools like Tableau.
>
>
>
> I thought of using Oracle TimesTen in-memory database to get the data out
> of Hive/Hadoop and keep the most frequently used data in memory. What are
> other alternatives around?
>
>
>
> Thanks,
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>


Re: Loading multiple file format in hive

2015-08-24 Thread Daniel Haviv
Hi,
You can set a different file format per partition.
You can't mix files in the same directory (You could theoretically write
some kind of custom SerDe).

Daniel.



On Mon, Aug 24, 2015 at 6:15 PM, Jeetendra G 
wrote:

> Can anyone put some light on this please?
>
> On Mon, Aug 24, 2015 at 12:32 PM, Jeetendra G 
> wrote:
>
>> HI All,
>>
>> I have a directory where I have json formatted and parquet files in same
>> folder. can hive load these?
>>
>> I am getting Json data and storing in HDFS. later I am running job to
>> convert JSon to Parquet(every 15 mins). so we will habe 15 mins Json data.
>>
>> Can i provide multiple serde in hive?
>>
>> regards
>> Jeetendra
>>
>
>


Re: Imporve performance in displaying content of data frame

2015-07-01 Thread Daniel Haviv
Hi Vinod,
A better place to ask this would be at Spark's mailing list.

Your select isn't executed until you're running the foreach on it, so you get 
the impression that the select ran fast.

Daniel

> On 1 ביולי 2015, at 12:56, Vinod Kuamr  wrote:
> 
> Hi Everyone,
> 
> I am using following sqlContext
> 
> var df=sqlContext.sql("SELECT fullname,SUM(CAST(contactid AS decimal(38,6))) 
> FROM adventurepersoncontacts GROUP BY fullname ORDER BY fullname ASC");
> 
> It executes fine but when I display the content of the data frame by using 
> println method it take very more time to retrun the result
> 
> df.foreach(println)
> 
> can you please let me know how get the content of data frame in a optimized 
> way?
> 
> My Environment is:
> Spark 1.3.1
> Windows 8
> Sample Data with  15000 records
> 
> Thank you,
> Vinod


Re: Understanding ORC file format compression

2015-06-21 Thread Daniel Haviv
Hi Sreejesh,
The data in an ORC file is divided into stripes and in these stripes columns 
are divided into column groups.
The compression is at the column group level, so to answer your question ORC 
files are splittable no matter the codec used.

Daniel

> On 21 ביוני 2015, at 10:56, sreejesh s  wrote:
> 
> Hi,
> 
> As per my understanding, the available codecs for ORC file format Hive table 
> compression are either Zlib or Snappy.
> Both the compression techniques are non splittable.. Does it mean that any 
> queries on Hive table stored as ORC and compressed will not run multiple maps 
> in parallel ???
> 
> I know that is not correct, please help me understand what i am missing 
> here...
> 
> Thanks


Re: Output of Hive

2015-05-16 Thread Daniel Haviv
It seems like your query returns no results,try using  count to confirm.

Daniel

> On 16 במאי 2015, at 14:40, Anand Murali  wrote:
> 
> Dear All:
> 
> I am new to hive so pardon my ignorance. I have the following query but do 
> not see any output. I wondered it maybe in HDFS and checked there and do not 
> find it there. Can somebody advise
> 
> hive> select year, MAX(Temperature) from records where temperature <>  
> and (quality = 0 or quality = 1 or quality = 4 or quality = 5 or quality = 9)
> > group by year
> > ;
> Query ID = anand_vihar_20150516170505_9b23d8ba-19d7-4fa7-b972-4f199e3bf56a
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-05-16 17:05:11,504 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local927727978_0003
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 5329140 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 1.258 seconds
> 
> Thanks
>  
> Anand Murali  
> 


Re: Hive Alter Partition Location Issue

2015-04-30 Thread Daniel Haviv
I think you have an extra '/' in the HDFS URI

Daniel

> On 30 באפר׳ 2015, at 16:46, Harsha N  wrote:
> 
> Thanks for your reply,
> 
> analyze table table1 partition (dt=201501) compute statistics;--returns the 
> same error
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> java.io.IOException: cannot find dir = hdfs:///data/dt =201501/1430201400/
> in pathToPartitionInfo: [hdfs:/data/dt=201501/1430201400/]
>   at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:347)
> 
> To Add On 
> I am working on External tables in Hive 0.13.1-cdh5.3.2
> 
> -Harsha
> 
>> On Thu, Apr 30, 2015 at 12:37 AM, Mich Talebzadeh  
>> wrote:
>> Hi Harsha,
>> 
>>  
>> 
>> Have you updated stats on table1 after partition adding? In other words it 
>> is possible that the optimiser is not aware of that partition yet?
>> 
>>  
>> 
>> analyze table table1 partition (dt=201501) compute statistics;
>> 
>>  
>> 
>> HTH
>> 
>>  
>> 
>> Mich Talebzadeh
>> 
>>  
>> 
>> http://talebzadehmich.wordpress.com
>> 
>>  
>> 
>> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
>> ISBN 978-0-9563693-0-7.
>> 
>> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
>> 978-0-9759693-0-4
>> 
>> Publications due shortly:
>> 
>> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
>> Coherence Cache
>> 
>> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume 
>> one out shortly
>> 
>>  
>> 
>> NOTE: The information in this email is proprietary and confidential. This 
>> message is for the designated recipient only, if you are not the intended 
>> recipient, you should destroy it immediately. Any information in this 
>> message shall not be understood as given or endorsed by Peridale Ltd, its 
>> subsidiaries or their employees, unless expressly so stated. It is the 
>> responsibility of the recipient to ensure that this email is virus free, 
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept 
>> any responsibility.
>> 
>>  
>> 
>> From: Harsha N [mailto:harsha.hadoo...@gmail.com] 
>> Sent: 30 April 2015 07:24
>> To: user@hive.apache.org
>> Subject: Hive Alter Partition Location Issue
>> 
>>  
>> 
>> Hi All,
>> 
>> Can experts share your view on Hive behaviour in below scenario. 
>> 
>>  
>> 
>> I am facing below issue on using alter partition locations in hive.
>> 
>>  
>> 
>> select count(*) from table1 where dt = 201501;
>> 
>>  
>> 
>> Total jobs = 1
>> 
>> Launching Job 1 out of 1
>> 
>> Number of reduce tasks determined at compile time: 1
>> 
>> In order to change the average load for a reducer (in bytes):
>> 
>>   set hive.exec.reducers.bytes.per.reducer=
>> 
>> In order to limit the maximum number of reducers:
>> 
>>   set hive.exec.reducers.max=
>> 
>> In order to set a constant number of reducers:
>> 
>>   set mapreduce.job.reduces=
>> 
>> java.io.IOException: cannot find dir = hdfs:///data/dt =201501/1430201400/
>> 
>> in pathToPartitionInfo: [hdfs:/data/dt=201501/1430201400/]
>> 
>>  
>> 
>> Below are the steps I have followed.
>> 
>> I have altered a partition location in hive using below command.
>> 
>> ALTER TABLE table1 PARTITION (dt=201501) SET LOCATION 
>> 'hdfs:///data/dt=201501/1430201400/';
>> 
>>  
>> 
>> I have inserted new data into this new location.
>> 
>>  
>> 
>> INSERT INTO TABLE table1
>> 
>> SELECT * FROM table2 where dt=201501
>> 
>>  
>> 
>> select count(*) from table1 where dt = 201501; doesn't work but 
>> 
>> select * from table1 where dt = 201501 works good.
>> 
>>  
>> 
>> Please let me know if you need more information.
>> 
>>  
>> 
>> Thanks
>> 
>> Harsha
>> 
> 


Re: creating parquet table using avro schame

2015-04-29 Thread Daniel Haviv
Sorry, I misunderstood.
AFAIK you can't do that.

Daniel

> On 29 באפר׳ 2015, at 18:49, Yosi Botzer  wrote:
> 
> Hi,
> 
> I have parquet files that are the product of map-reduce job.
> 
> I have used AvroParquetOutputFormat in order to produce them, so I have an 
> avro schema file describing the structure of the data.
> 
> When I wan to create avro based table in hive I can use:
> TBLPROPERTIES 
> ('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc');
> 
> So I do not to specify every field in the create statement.
> 
> Is there a way to use the avro schema file to create the parquet table as 
> well?
> 
> 
> 
> Yosi


Re: creating parquet table using avro schame

2015-04-29 Thread Daniel Haviv
You should be able to get the schema out using parquet tools:
http://blog.cloudera.com/blog/2015/03/converting-apache-avro-data-to-parquet-format-in-apache-hadoop/

Daniel

> On 29 באפר׳ 2015, at 18:49, Yosi Botzer  wrote:
> 
> Hi,
> 
> I have parquet files that are the product of map-reduce job.
> 
> I have used AvroParquetOutputFormat in order to produce them, so I have an 
> avro schema file describing the structure of the data.
> 
> When I wan to create avro based table in hive I can use:
> TBLPROPERTIES 
> ('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc');
> 
> So I do not to specify every field in the create statement.
> 
> Is there a way to use the avro schema file to create the parquet table as 
> well?
> 
> 
> 
> Yosi


Re: Extremely Slow Data Loading with 40k+ Partitions

2015-04-16 Thread Daniel Haviv
Is this a test environment?
If so, can you try and disable concurrency?


Daniel

> On 16 באפר׳ 2015, at 19:44, Tianqi Tong  wrote:
> 
> Hi Daniel,
> Actually the mapreduce job was just fine, but the process stuck on the data 
> loading after that.
> The output stopped at:
> Loading data to table default.parquet_table_with_40k_partitions partition 
> (yearmonth=null, prefix=null)
>  
> When I look at the size of hdfs files of table, I can see the size is 
> growing, but it's kind of slow.
> For mapreduce job, I had 400+ mappers and 100+ reducers.
>  
> Thanks
> Tianqi
>  
> From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] 
> Sent: Wednesday, April 15, 2015 9:23 PM
> To: user@hive.apache.org
> Subject: Re: Extremely Slow Data Loading with 40k+ Partitions
>  
> How many reducers are you using?
> 
> Daniel
> 
> On 16 באפר׳ 2015, at 00:55, Tianqi Tong  wrote:
> 
> Hi,
> I'm loading data to a Parquet table with dynamic partitons. I have 40k+ 
> partitions, and I have skipped the partition stats computation step.
> Somehow it's still exetremely slow loading data into partitions (800MB/h).
> Do you have any hints on the possible reason and solution?
>  
> Thank you
> Tianqi Tong
>  


Re: Extremely Slow Data Loading with 40k+ Partitions

2015-04-15 Thread Daniel Haviv
How many reducers are you using?

Daniel

> On 16 באפר׳ 2015, at 00:55, Tianqi Tong  wrote:
> 
> Hi,
> I'm loading data to a Parquet table with dynamic partitons. I have 40k+ 
> partitions, and I have skipped the partition stats computation step.
> Somehow it's still exetremely slow loading data into partitions (800MB/h).
> Do you have any hints on the possible reason and solution?
>  
> Thank you
> Tianqi Tong
>  


Re: A simple insert stuck in hive

2015-04-08 Thread Daniel Haviv
I would guess it has something to do with container allocation

Daniel

> On 8 באפר׳ 2015, at 20:26, Alan Gates  wrote:
> 
> If you're seeing it list progress (or attempted progress) as here, this isn't 
> a locking issue.  All locks are obtained before the job is submitted to 
> Hadoop.
> 
> Alan.
> 
>> Mich Talebzadeh April 7, 2015 at 14:09
>> Hi,
>>  
>> Today I have noticed the following issue.
>>  
>> A simple insert into a table is sting there throwing the following
>>  
>> hive> insert into table mytest values(1,'test');
>> Query ID = hduser_20150407215959_bc030fac-258f-4996-b50f-3d2d49371cca
>> Total jobs = 3
>> Launching Job 1 out of 3
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_1428439695331_0002, Tracking URL = 
>> http://rhes564:8088/proxy/application_1428439695331_0002/
>> Kill Command = /home/hduser/hadoop/hadoop-2.6.0/bin/hadoop job  -kill 
>> job_1428439695331_0002
>> Hadoop job information for Stage-1: number of mappers: 1; number of 
>> reducers: 0
>> 2015-04-07 21:59:35,068 Stage-1 map = 0%,  reduce = 0%
>> 2015-04-07 22:00:35,545 Stage-1 map = 0%,  reduce = 0%
>> 2015-04-07 22:01:35,832 Stage-1 map = 0%,  reduce = 0%
>> 2015-04-07 22:02:36,058 Stage-1 map = 0%,  reduce = 0%
>> 2015-04-07 22:03:36,279 Stage-1 map = 0%,  reduce = 0%
>> 2015-04-07 22:04:36,486 Stage-1 map = 0%,  reduce = 0%
>>  
>> I have been messing around with concurrency for hive. That did not work. My 
>> metastore is built in Oracle. So I drooped that schema and recreated from 
>> scratch. Got rid of concurrency parameters. First I was getting “container 
>> is running beyond virtual memory limits” for the task. I changed the 
>> following parameters in yarn-site.xml
>>  
>>  
>> 
>>   yarn.nodemanager.resource.memory-mb
>>   2048
>>   Amount of physical memory, in MB, that can be allocated for 
>> containers.
>> 
>> 
>>   yarn.scheduler.minimum-allocation-mb
>>   1024
>> 
>>  
>> and mapred-site.xml
>>  
>> 
>> mapreduce.map.memory.mb
>> 4096
>> 
>> 
>> mapreduce.reduce.memory.mb
>> 4096
>> 
>> 
>> mapreduce.map.java.opts
>> -Xmx3072m
>> 
>> 
>> mapreduce.recduce.java.opts
>> -Xmx6144m
>> 
>> 
>> yarn.app.mapreduce.am.resource.mb
>> 400
>> 
>>  
>> However, nothing has helped except that virtual memory error has gone. Any 
>> ideas appreciated.
>>  
>> Thanks
>>  
>> Mich Talebzadeh
>>  
>> http://talebzadehmich.wordpress.com
>>  
>> Publications due shortly:
>> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
>> Coherence Cache
>>  
>> NOTE: The information in this email is proprietary and confidential. This 
>> message is for the designated recipient only, if you are not the intended 
>> recipient, you should destroy it immediately. Any information in this 
>> message shall not be understood as given or endorsed by Peridale Ltd, its 
>> subsidiaries or their employees, unless expressly so stated. It is the 
>> responsibility of the recipient to ensure that this email is virus free, 
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept 
>> any responsibility.


HiveServer2 addressing standby namenode

2015-04-06 Thread Daniel Haviv
Hi,
We get a lot of error messaged on the standby namenode indicating that hive
is trying to address the standby namenode.
As all of our jobs function normally, my guess is that Hive is constantly
trying to address both namenodes and only works with the active one.

Is this correct?
Can this be modified so it will only address the active one and still
maintain HA architecture ?

Thanks,
Daniel


Re: hive 0.14 return some not NULL value as NULL

2015-03-31 Thread Daniel Haviv
Can you also supply the table's DDL and a few lines of your raw data?

Daniel

> On 1 באפר׳ 2015, at 09:16, "r7raul1...@163.com"  wrote:
> 
> 
> 
> 
> I use  hive 0.14 the result is 
> 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356   9150119100048 
>   7326356 NULL
> 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356   121501191035580028
>   7326356 NULL
> UBDTK8D9XUZ9GRZU8NZNXDEG73D4PCZG2362223711289   161501191549050061
>   14837289  NULL
> Y49EY895ACABHS95DRQEE8DVFEB8JSE12360853052224   111501191426280023
>   115883224   NULL
> 
> I use  hive 0.10 the result is 
> 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356   9150119100048 
>   73263562015-01-19 10:44:44
> 87FQEZT1UEDXJHJQPFFX7G7ET8S2DVPM2357378283356   121501191035580028
>   73263562015-01-19 10:35:58
> UBDTK8D9XUZ9GRZU8NZNXDEG73D4PCZG2362223711289   161501191549050061
>   14837289 2015-01-19 15:49:05
> Y49EY895ACABHS95DRQEE8DVFEB8JSE12360853052224   111501191426280023
>   115883224   2015-01-19 14:26:28
> 
> Why ? I attach my log. Also in my log I found 2015-04-01 09:55:38,409 WARN 
> [main] org.apache.hadoop.hive.serde2.lazy.LazyStruct: Extra bytes detected at 
> the end of the row! Ignoring similar problems.
> 
> r7raul1...@163.com
> 


Re: Understanding Hive's execution plan

2015-03-27 Thread Daniel Haviv
lan was produced!
>  
> Thanks
>  
>  
> Mich Talebzadeh
>  
> http://talebzadehmich.wordpress.com
>  
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
> Coherence Cache
>  
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Peridale Ltd, its 
> subsidiaries or their employees, unless expressly so stated. It is the 
> responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept 
> any responsibility.
>  
> From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] 
> Sent: 26 March 2015 17:27
> To: user@hive.apache.org
> Subject: Understanding Hive's execution plan
>  
> Hi,
> Can anyone direct me to a good explanation on understanding Hive's execution 
> plan?
>  
> Thanks,
> Daniel


Re: 0.14 parse exception, row format question

2015-03-26 Thread Daniel Haviv
Your quotation marks around the location string seem to be wrong

Daniel

> On 26 במרץ 2015, at 22:10, bitsofinfo  wrote:
> 
> Hi,
> 
> What is wrong with this query? I am reading the docs and it appears that
> this should work no?
> 
> INSERT OVERWRITE DIRECTORY “/user/admin/mydirectory”
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
> select * from my_table_that_exists;
> 
> Error occurred executing hive query: Error while compiling statement:
> FAILED: ParseException line 2:0 cannot recognize input near ‘ROW’
> ‘FORMAT’ ‘DELIMITED’ in statement
> 
> Version of Hue/Hive etc I am running:
> ———-
> 
> Hue
> 2.6.1-2041
> 
> HDP
> 2.2.0
> 
> Hadoop
> 2.6.0
> 
> Pig
> 0.14.0
> 
> Hive-Hcatalog
> 0.14.0
> 
> Oozie
> 4.1.0
> 
> Ambari
> 1.7-169
> 
> HBase
> 0.98.4
> 
> Knox
> 0.5.0
> 
> Storm
> 0.9.3
> 
> Falcon
> 0.6.0


Understanding Hive's execution plan

2015-03-26 Thread Daniel Haviv
Hi,
Can anyone direct me to a good explanation on understanding Hive's execution 
plan?

Thanks,
Daniel

Re: how to set column level privileges

2015-03-26 Thread Daniel Haviv
Create a view with the permitted columns and handle the privileges for it

Daniel

> On 26 במרץ 2015, at 12:40, Allen  wrote:
> 
> hi,
> 
>   We use SQL standards based authorization for authorization in Hive 
> 0.14.  But it  has not support for column level privileges.
> 
>   So, I want to know Is there anyway to set column level privileges?
> 
>   
> 
>   Thanks!
> 
> 
> 
> 


Re: How to clean up a table for which the underlying hdfs file no longer exists

2015-03-21 Thread Daniel Haviv
You can also use
ALTER TABLE  SET TBLPROPERTIES('EXTERNAL'='TRUE')
And then drop it


Daniel

> On 22 במרץ 2015, at 04:15, Stephen Boesch  wrote:
> 
> 
> There is a hive table for which the metadata points to a non-existing hdfs 
> file.  Simply calling
> 
> drop table 
> 
> results in:
> 
> Failed to load metadata for table: db.mytable
> Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable
> File does not exist:  hdfs://
> Caused by FileNotFoundException: File does not exist: hdfs:// ..
>  
> So:  the file does not exist in hdfs , and it is not possible to remove the 
> metadata for it directly. Is the next step going to be: "run some sql 
> commands against the metastore to manually delete the associated rows"?  If 
> so,  what are those delete commands?
> 
> thanks


Re: How to clean up a table for which the underlying hdfs file no longer exists

2015-03-21 Thread Daniel Haviv
You can (as a workaround) just create it's directory and then drop it

Daniel

> On 22 במרץ 2015, at 04:15, Stephen Boesch  wrote:
> 
> 
> There is a hive table for which the metadata points to a non-existing hdfs 
> file.  Simply calling
> 
> drop table 
> 
> results in:
> 
> Failed to load metadata for table: db.mytable
> Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable
> File does not exist:  hdfs://
> Caused by FileNotFoundException: File does not exist: hdfs:// ..
>  
> So:  the file does not exist in hdfs , and it is not possible to remove the 
> metadata for it directly. Is the next step going to be: "run some sql 
> commands against the metastore to manually delete the associated rows"?  If 
> so,  what are those delete commands?
> 
> thanks


Re: Which SerDe for Custom Binary Data.

2015-03-13 Thread Daniel Haviv
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HowtoWriteYourOwnSerDe


Daniel

> On 13 במרץ 2015, at 17:56, karthik maddala  wrote:
> 
>  
>  
> I want to set up a DW based on Hive. However, my data does not come as handy 
> csv files but  as binary files in a proprietary format.
>  
> The binary file  consists of  serialized data using C language.
>  
>  
> Could you please suggest which input format to be used and how to write a 
> custom SerDe for the above mentioned data.
>  
>  
> Thanks,
> Karthik Maddala
>  
>  


Re: insert table error

2015-03-13 Thread Daniel Haviv
What is the error you get?

Daniel

> On 13 במרץ 2015, at 13:13, zhangjp  wrote:
> 
> case fail 
> CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
>   CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
> INSERT INTO TABLE students
>   VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);


Bucket pruning

2015-03-12 Thread Daniel Haviv
Hi,
We created a bucketed table and when we select in the following way:
select *
from testtble
where bucket_col ='X';

We observe that there all of the table is being read and not just the
specific bucket.

Does Hive support such a feature ?


Thanks,
Daniel


Re: Simple way to export data from a Hive table in to Avro?

2015-02-02 Thread Daniel Haviv
I might be missing something here but you could use:
Create table newtable stored as avro as select * from oldtable

On Mon, Feb 2, 2015 at 3:09 PM, Michael Segel 
wrote:

> Currently using Hive 13.x
>
> Would like to select from a table that exists and output to an external
> file(s) in avro via hive.
>
> Is there a simple way to do this?
>
> From what I’ve seen online, the docs tend to imply you need to know the
> avro schema when you specify the table.
> Could you copy from an existing table, or do I need to dump the current
> schema and write some code to generate an avro schema?
>
> Thx
>
> -Mike
>
>


Trying to improve compression ratio for an ORC table

2015-01-18 Thread Daniel Haviv
Hi guys,
I'm experiencing something very odd:
I have an ORC table with the "orc.compress"="SNAPPY" property that weighs
4.9 GB and is composed of 253 files..
I then do a CTAS into a new table where I added this
property "orc.compress.size"="2485760" to improve the compression ratio.

The new table weighs 5.2 GB over 18 files so not only did the compression
ratio not improve, it got worse.

How can this be ?

Thanks,
Daniel


Re: Adding new columns to parquet based Hive table

2015-01-14 Thread Daniel Haviv
Hi Kumar,
Altering the table just update's Hive's metadata without updating parquet's 
schema.
I believe that if you'll insert to your table (after adding the column) you'll 
be able to later on select all 3 columns.

Daniel

> On 14 בינו׳ 2015, at 21:34, Kumar V  wrote:
> 
> Hi,
> 
> Any ideas on how to go about this ? Any insights you have would be 
> helpful. I am kinda stuck here.
> 
> Here are the steps I followed on hive 0.13
> 
> 1) create table t (f1 String, f2 string) stored as Parquet;
> 2) upload parquet files with 2 fields
> 3) select * from t; < Works fine.
> 4) alter table t add columns (f3 string);
> 5) Select * from t; <- ERROR  "Caused by: 
> java.lang.IllegalStateException: Column f3 at index 2 does not exist 
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)
> 
> 
> 
> 
> 
> On Wednesday, January 7, 2015 2:55 PM, Kumar V  
> wrote:
> 
> 
> Hi,
> I have a Parquet format Hive table with a few columns.  I have loaded a 
> lot of data to this table already and it seems to work.
> I have to add a few new columns to this table.  If I add new columns, queries 
> don't work anymore since I have not reloaded the old data.
> Is there a way to add new fields to the table and not reload the old Parquet 
> files and make the query work ?
> 
> I tried this in Hive 0.10 and also on hive 0.13.  Getting an error in both 
> versions.
> 
> Please let me know how to handle this.
> 
> Regards,
> Kumar. 
> 
> 


Re: Monitoring Hive Thread Usage

2015-01-06 Thread Daniel Haviv
Found a solution (aside from JMX):
ps -eLf | grep [HiveServer2 PID]



On Tue, Jan 6, 2015 at 11:03 AM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> I suspect we have a problem with clients opening connections and not
> closing them.
> To verify that I'd like to monitor the Hive's number of threads but I
> can't seem to find a way to do so.
>
> Anyone has ever tried or has any ideas?
>
> Thanks,
> Daniel
>


Monitoring Hive Thread Usage

2015-01-06 Thread Daniel Haviv
Hi,
I suspect we have a problem with clients opening connections and not
closing them.
To verify that I'd like to monitor the Hive's number of threads but I can't
seem to find a way to do so.

Anyone has ever tried or has any ideas?

Thanks,
Daniel


Re: How to pass information to hive udf except as arguments

2014-12-19 Thread Daniel Haviv
First result in google:
http://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts

Daniel

> On 19 בדצמ׳ 2014, at 10:54, Dilip Agarwal  wrote:
> 
> 
> Hi, I have created a udf which accepts geo location points as arguments and 
> return the name of location fetching from a url. I have to set this URL 
> dynamically at the time of hive script run.
> 
> I don't like to pass this url as separate argument tot the udf evaluate 
> method. Is there a way to set this url in hive script and get from hive udf, 
> or set this in user environment and then fetch. Please tell me the full 
> procedure to do this.
> 
> 
> Thanks & Regards
> Dilip Agarwal
> +91 8287857554


Re: Case inside select statement in hive

2014-12-16 Thread Daniel Haviv
Hi,
Please RTFM before asking questions.
Taken from
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF:
Conditional Functions

*Return Type*

*Name(Signature)*

*Description*

T

if(boolean testCondition, T valueTrue, T valueFalseOrNull)

Returns valueTrue when testCondition is true, returns valueFalseOrNull
otherwise.

T

COALESCE(T v1, T v2, ...)

Returns the first v that is not NULL, or NULL if all v's are NULL.

T

CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END

When a = b, returns c; when a = d, returns e; else returns f.

T

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END

When a = true, returns b; when c = true, returns d; else returns e.
BR,
Daniel

On Tue, Dec 16, 2014 at 6:37 PM, Gayathri Swaroop 
wrote:
>
> Hi,
>
> I have oracle query which i want to hit against a hive table.
> The oracle query has a case if exists select
> would this work in hive?
> This is my oracle query that needs to be converted to hive.
>
> select distinct CONTR.BMF_PARTNER_ID AS BMF_BMF_PARTNER_ID,
> CONTR.BUSINESS_PARTNER AS BMF_BUS_PRTNR_ID,
> CONTR.CONTRACT_ACCOUNT AS BMF_CONTR_ACCT_ID,
> CONTR.CONTRACT_NBR AS BMF_CONTR_ID,
> CONTR.ESIID AS BMF_ESI_ID,
> CONTR.INSTALLATION_ID AS BMF_INSTALLATION_ID,
> CONTR.SEGMENT_TYPE_CD AS BMF_SEGMENT_TYPE_CD,
> CONTR.PARTNER_TYPE AS BMF_PARTNER_TYPE,
> CONTR.ACTUAL_MOVEIN_DATE AS BMF_ACTUAL_MVI_DT,
> CONTR.ACTUAL_MOVEOUT_DATE AS BMF_ACTUAL_MVO_DT,
> CONTR.ENRL_RATE_CATEGORY AS BMF_ENRL_RATE_CATEGORY,
> CONTR.CAMPAIGN_CD AS BMF_CAMPAIGN_CD,
> CONTR.OFFER_CD AS BMF_OFFER_CD,
>case when exists (select * from KSS_ACTIVITY_STG_CURR_STAT C_ID
> where c_id.esiid = contr.esiid
> and c_id.contract_nbr = contr.contract_nbr
> and c_id.BMF_PARTNER_ID <> contr.BMF_PARTNER_ID
> and c_id.partner_type=2
> and c_id.actual_movein_date <
> to_date('09/30/2014','mm/dd/')
> and c_id.actual_moveout_date
> >=to_date('09/30/2014','mm/dd/'))
> then 'YES' else NULL end
> as IS_DUPLICATE_BMF
>  FROM   KSS_ESIID_LIST ESID INNER JOIN
>  KSS_ACTIVITY_STG_CURR_STAT CONTR ON
> ESID.BMF_PARTNER_ID = CONTR.BMF_PARTNER_ID
>  WHERE  contr.partner_type=2
> and CONTR.actual_movein_date <
> to_date('09/30/2014','mm/dd/')
> and CONTR.actual_moveout_date
> >=to_date('09/30/2014','mm/dd/');
>
>
> Thanks,
> Gayathri
>


Re: Concatenating ORC files

2014-12-12 Thread Daniel Haviv
Thanks a lot
I'll try it out

Daniel

> On 12 בדצמ׳ 2014, at 03:45, Prasanth Jayachandran 
>  wrote:
> 
> Thanks Daniel for filing the jira and the test case. I have put up a patch in 
> HIVE-9067 jira that should fix this issue. 
> 
> - Prasanth
> 
> 
>> On Thu, Dec 11, 2014 at 3:29 AM, Daniel Haviv 
>>  wrote:
>> Hi,
>> I've created a JIRA with a test case:
>> https://issues.apache.org/jira/browse/HIVE-9080
>> 
>> Thanks!
>> Daniel
>> 
>>> On Thu, Dec 11, 2014 at 12:49 AM, Prasanth Jayachandran 
>>>  wrote:
>>> I am unable to reproduce the case that causes exception that you are 
>>> seeing. Will be great if you can provide a repro.
>>> 
>>> - Prasanth
>>> 
>>> 
>>>> On Wed, Dec 10, 2014 at 1:43 PM, Prasanth Jayachandran 
>>>>  wrote:
>>>> I can see a bug for the case 2 where orc index is disabled. I have created 
>>>> a jira to track that issue.
>>>> https://issues.apache.org/jira/browse/HIVE-9067
>>>> 
>>>> I am not sure why does it fail in case 1 though. Can you create a jira 
>>>> with a reproducible case? I can take a look at it.
>>>> 
>>>> - Prasanth
>>>> 
>>>> 
>>>>> On Wed, Dec 10, 2014 at 10:37 AM, Daniel Haviv 
>>>>>  wrote:
>>>>> I've made a little experiment and recreated the table with 
>>>>> 'orc.create.index'='FALSE' and now it fails on something else:
>>>>> Error: java.io.IOException: 
>>>>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>>>>> java.lang.ClassCastException: 
>>>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl
>>>>>  cannot be cast to 
>>>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$BooleanStatisticsImpl
>>>>> at 
>>>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.map(MergeFileMapper.java:115)
>>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at 
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>> 
>>>>> It seems that the concatenation feature needs more work..
>>>>> 
>>>>> Daniel
>>>>> 
>>>>>> On Wed, Dec 10, 2014 at 4:54 PM, Daniel Haviv 
>>>>>>  wrote:
>>>>>> Hi,
>>>>>> I'm trying to use the new concatenate command merge small ORC files and 
>>>>>> file right away:
>>>>>> 
>>>>>>  alter table requests partition(day_ts=1418083200, hour_ts=1418151600) 
>>>>>> concatenate;
>>>>>> 
>>>>>> Diagnostic Messages for this Task:
>>>>>> Error: java.lang.IllegalArgumentException: Column has wrong number of 
>>>>>> index entries found: 0 expected: 1
>>>>>> at 
>>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726)
>>>>>> at 
>>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614)
>>>>>> at 
>>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
>>>>>> at 
>>>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
>>>>>> at 
>>>>>> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215)
>>>>>> at 
>>>>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
>>>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>>>>>> at 
>>>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>>>>>> at org.apache

Re: Concatenating ORC files

2014-12-11 Thread Daniel Haviv
Hi,
I've created a JIRA with a test case:
https://issues.apache.org/jira/browse/HIVE-9080

Thanks!
Daniel

On Thu, Dec 11, 2014 at 12:49 AM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> I am unable to reproduce the case that causes exception that you are
> seeing. Will be great if you can provide a repro.
>
> - Prasanth
>
>
> On Wed, Dec 10, 2014 at 1:43 PM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
>> I can see a bug for the case 2 where orc index is disabled. I have
>> created a jira to track that issue.
>> https://issues.apache.org/jira/browse/HIVE-9067
>>
>> I am not sure why does it fail in case 1 though. Can you create a jira
>> with a reproducible case? I can take a look at it.
>>
>> - Prasanth
>>
>>
>> On Wed, Dec 10, 2014 at 10:37 AM, Daniel Haviv <
>> daniel.ha...@veracity-group.com> wrote:
>>
>>> I've made a little experiment and recreated the table
>>> with 'orc.create.index'='FALSE' and now it fails on something else:
>>> Error: java.io.IOException:
>>> org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.lang.ClassCastException:
>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl
>>> cannot be cast to
>>> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$BooleanStatisticsImpl
>>> at
>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.map(MergeFileMapper.java:115)
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>> at
>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>
>>> It seems that the concatenation feature needs more work..
>>>
>>> Daniel
>>>
>>> On Wed, Dec 10, 2014 at 4:54 PM, Daniel Haviv <
>>> daniel.ha...@veracity-group.com> wrote:
>>>
>>>> Hi,
>>>> I'm trying to use the new concatenate command merge small ORC files and
>>>> file right away:
>>>>
>>>>  alter table requests partition(day_ts=1418083200, hour_ts=1418151600)
>>>> concatenate;
>>>>
>>>>  Diagnostic Messages for this Task:
>>>> Error: java.lang.IllegalArgumentException: Column has wrong number of
>>>> index entries found: 0 expected: 1
>>>> at
>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726)
>>>> at
>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614)
>>>> at
>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
>>>> at
>>>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
>>>> at
>>>> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215)
>>>> at
>>>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>
>>>>
>>>>  Is there some property I need to set for ORC to be able to support
>>>> concatenation?
>>>>
>>>> Thanks,
>>>> Daniel
>>>>
>>>>
>>>>
>>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Concatenating ORC files

2014-12-10 Thread Daniel Haviv
HI Prasanth,
The first attempt had ("orc.compress"="Snappy") and all the files under it
were created the same way so I'm assuming they all should have indexes
created.
In the second attempt I used ("orc.create.index"="false",
"orc.compress"="Snappy").

Thanks,
Daniel

ll

On Wed, Dec 10, 2014 at 9:04 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Hi Daniel
>
> In you first run, are there some files with “orc.create.index”=“false”?
> What are the table properties used to create ORC files in both cases?
>
> - Prasanth
>
>
> On Wed, Dec 10, 2014 at 7:55 AM, Daniel Haviv <
> daniel.ha...@veracity-group.com> wrote:
>
>> Hi,
>> I'm trying to use the new concatenate command merge small ORC files and
>> file right away:
>>
>>  alter table requests partition(day_ts=1418083200, hour_ts=1418151600)
>> concatenate;
>>
>>  Diagnostic Messages for this Task:
>> Error: java.lang.IllegalArgumentException: Column has wrong number of
>> index entries found: 0 expected: 1
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
>> at
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
>> at
>> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215)
>> at
>> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>>  Is there some property I need to set for ORC to be able to support
>> concatenation?
>>
>> Thanks,
>> Daniel
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


Re: Concatenating ORC files

2014-12-10 Thread Daniel Haviv
I've made a little experiment and recreated the table
with 'orc.create.index'='FALSE' and now it fails on something else:
Error: java.io.IOException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException:
org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl
cannot be cast to
org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$BooleanStatisticsImpl
at
org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.map(MergeFileMapper.java:115)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

It seems that the concatenation feature needs more work..

Daniel

On Wed, Dec 10, 2014 at 4:54 PM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> I'm trying to use the new concatenate command merge small ORC files and
> file right away:
>
>  alter table requests partition(day_ts=1418083200, hour_ts=1418151600)
> concatenate;
>
> Diagnostic Messages for this Task:
> Error: java.lang.IllegalArgumentException: Column has wrong number of
> index entries found: 0 expected: 1
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
> at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
> at
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215)
> at
> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> Is there some property I need to set for ORC to be able to support
> concatenation?
>
> Thanks,
> Daniel
>
>
>


Concatenating ORC files

2014-12-10 Thread Daniel Haviv
Hi,
I'm trying to use the new concatenate command merge small ORC files and
file right away:

 alter table requests partition(day_ts=1418083200, hour_ts=1418151600)
concatenate;

Diagnostic Messages for this Task:
Error: java.lang.IllegalArgumentException: Column has wrong number of index
entries found: 0 expected: 1
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:726)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1614)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
at
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:215)
at
org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


Is there some property I need to set for ORC to be able to support
concatenation?

Thanks,
Daniel


Re: Insert into dynamic partitions performance

2014-12-06 Thread Daniel Haviv
I see.
Thanks a lot that's very helpful!

Daniel

> On 7 בדצמ׳ 2014, at 09:10, Gopal V  wrote:
> 
>> On 12/6/14, 10:11 PM, Daniel Haviv wrote:
>> 
>> Isn't there a way to make hive allocate more than one reducer for the whole 
>> job? Maybe one
>> per partition.
> 
> Yes.
> 
> hive.optimize.sort.dynamic.partition=true; does nearly that.
> 
> It raises the net number of useful reducers to total-num-of-partitions x 
> total-num-buckets.
> 
> If you have say, data being written into six hundred partitions with 1 bucket 
> each, it can use anywhere between 1 and 600 reducers (hashcode collisions 
> causing skews, of course).
> 
> It's turned off by default, because it really slows down the 1 partition 
> without buckets insert speed.
> 
> Cheers,
> Gopal
> 
>>>> On 7 בדצמ׳ 2014, at 06:06, Gopal V  wrote:
>>>> 
>>>> On 12/6/14, 6:27 AM, Daniel Haviv wrote:
>>>> Hi,
>>>> I'm executing an insert statement that goes over 1TB of data.
>>>> The map phase goes well but the reduce stage only used one reducer which 
>>>> becomes
>> a great bottleneck.
>>> 
>>> Are you inserting into a bucketed or sorted table?
>>> 
>>> If the destination table is bucketed + partitioned, you can use the dynamic 
>>> partition
>> sort optimization to get beyond the single reducer.
>>> 
>>> Cheers,
>>> Gopal
> 


Re: Insert into dynamic partitions performance

2014-12-06 Thread Daniel Haviv
Thanks Gopal,
I dont want to divide my data any further.

Isn't there a way to make hive allocate more than one reducer for the whole 
job? Maybe one per partition.

Daniel

> On 7 בדצמ׳ 2014, at 06:06, Gopal V  wrote:
> 
>> On 12/6/14, 6:27 AM, Daniel Haviv wrote:
>> Hi,
>> I'm executing an insert statement that goes over 1TB of data.
>> The map phase goes well but the reduce stage only used one reducer which 
>> becomes a great bottleneck.
> 
> Are you inserting into a bucketed or sorted table?
> 
> If the destination table is bucketed + partitioned, you can use the dynamic 
> partition sort optimization to get beyond the single reducer.
> 
> Cheers,
> Gopal


Insert into dynamic partitions performance

2014-12-06 Thread Daniel Haviv
Hi,
I'm executing an insert statement that goes over 1TB of data.
The map phase goes well but the reduce stage only used one reducer which 
becomes a great bottleneck.

 I've tried to set the number of reducers to four and added a distribute by 
clause to the statement but I'm still using just one reducer.

How can I increase the reducer's parallelism?

Thanks,
Daniel

Re: Start hiveserver2 as a daemon

2014-12-05 Thread Daniel Haviv
Try using screen

Daniel

> On 5 בדצמ׳ 2014, at 19:08, peterm_second  wrote:
> 
> yes, 
> I've tried nohup , & even sh -c . 
> & works but after the first call get's executed in the background I get the 
> message you can see when a hadoop job is submitted to the cluster and then 
> the terminal get's frozen. I think the problem is in the 
> ext/hiveserver2.sh:hiveserver2 function. it's says something along the lines 
> of 
> exec $HADOOP jar $JAR $CLASS $HIVE_OPTS "$@" 
> I am not 100% how the exec command works , but somehow it re-owns the 
> launching terminal. My problem is aggravated by the fact that I am launching 
> hive using sshpass
> 
> Peter
> 
>> On 5.12.2014 18:55, Jörn Franke wrote:
>> Have you tried nohup ?
>> 
>> Le 5 déc. 2014 15:25, "peterm_second"  a écrit :
>>> Hi Guys,
>>> How can I launch the Hiveserver2 as a daemon.
>>> I am launching the hiverserv2 using sshpass and I can't detach   
>>> hiveserver2 from my terminal. Is there a way to deamonise the hiveserver2  ?
>>> 
>>> I've also tried using & but it's not working either, any thoughts ?
>>> 
>>> Regards,
>>> Peter
> 


Running hive inside a bash script

2014-12-02 Thread Daniel Haviv
Hi,
I have a bash script that runs a hive query and I would like it to do
something if the query succeeds and something else if it fails.
My testings show that a query failure does not change Hive's exit code,
what's the right way to achieve this ?

Thanks,
Daniel


Re: Container launch failed Error

2014-11-24 Thread Daniel Haviv
Good luck
Share your results with us

Daniel

> On 24 בנוב׳ 2014, at 19:36, Amit Behera  wrote:
> 
> Hi Daniel,
> 
> Thanks a lot,
> 
> 
> I will do that and rerun the query. :)
> 
>> On Mon, Nov 24, 2014 at 10:59 PM, Daniel Haviv 
>>  wrote:
>> It is a problem as the application master needs to contact the other nodes
>> 
>> Try updating the hosts file on all the machines and try again.
>> 
>> Daniel
>> 
>>> On 24 בנוב׳ 2014, at 19:26, Amit Behera  wrote:
>>> 
>>> I did not modify in all the slaves. except slave 
>>> 
>>> will it be a problem ?
>>> 
>>> But for small data (up to 20 GB table) it is running and for 300GB table 
>>> only count(*) running sometimes and sometimes failed  
>>> 
>>> Thanks
>>> Amit  
>>> 
>>>> On Mon, Nov 24, 2014 at 10:37 PM, Daniel Haviv 
>>>>  wrote:
>>>> did you copy the hosts file to all the nodes?
>>>> 
>>>> Daniel
>>>> 
>>>>> On 24 בנוב׳ 2014, at 19:04, Amit Behera  wrote:
>>>>> 
>>>>> hi Daniel,
>>>>> 
>>>>> 
>>>>> this stacktrace same for other query .
>>>>> for different run I am getting slave7 sometime slave8... 
>>>>> 
>>>>> And also I registered all machine IPs in /etc/hosts 
>>>>> 
>>>>> Regards
>>>>> Amit
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv 
>>>>>>  wrote:
>>>>>> It seems that the application master can't resolve slave6's name to an IP
>>>>>> 
>>>>>> Daniel
>>>>>> 
>>>>>>> On 24 בנוב׳ 2014, at 18:49, Amit Behera  wrote:
>>>>>>> 
>>>>>>> Hi Users,
>>>>>>> 
>>>>>>> my cluster(1+8) configuration:
>>>>>>> 
>>>>>>> RAM  : 32 GB each
>>>>>>> HDFS : 1.5 TB SSD
>>>>>>> CPU   : 8 core each
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> I am trying to query on 300GB of table but I am able to run only select 
>>>>>>> query.
>>>>>>> 
>>>>>>> Except select query , for all other query I am getting following 
>>>>>>> exception.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Total jobs = 1
>>>>>>> Stage-1 is selected by condition resolver.
>>>>>>> Launching Job 1 out of 1
>>>>>>> Number of reduce tasks not specified. Estimated
>>>>>>> from input data size: 183
>>>>>>> In order to change the average load for a
>>>>>>> reducer (in bytes):
>>>>>>>   set
>>>>>>> hive.exec.reducers.bytes.per.reducer=
>>>>>>> In order to limit the maximum number of
>>>>>>> reducers:
>>>>>>>   set hive.exec.reducers.max=
>>>>>>> In order to set a constant number of reducers:
>>>>>>>   set mapreduce.job.reduces=
>>>>>>> Starting Job = job_1416831990090_0005, Tracking
>>>>>>> URL = http://master:8088/proxy/application_1416831990090_0005/
>>>>>>> Kill Command = /root/hadoop/bin/hadoop job 
>>>>>>> -kill job_1416831990090_0005
>>>>>>> Hadoop job information for Stage-1: number of
>>>>>>> mappers: 679; number of reducers: 183
>>>>>>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, 
>>>>>>> reduce = 0%
>>>>>>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, 
>>>>>>> reduce = 0%, Cumulative CPU 625.19 sec
>>>>>>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, 
>>>>>>> reduce = 100%
>>>>>>> MapReduce Total cumulative CPU time: 10 minutes
>>>>>>> 25 seconds 190 msec
>>>>>>> Ended Job = job_1416831990090_0005 with errors
>>>>>>> Error during job, obtaining debugging
>>>>>>> information...
>>>>>>> Examining task ID:
>>>>>>> task

Re: Container launch failed Error

2014-11-24 Thread Daniel Haviv
It is a problem as the application master needs to contact the other nodes

Try updating the hosts file on all the machines and try again.

Daniel

> On 24 בנוב׳ 2014, at 19:26, Amit Behera  wrote:
> 
> I did not modify in all the slaves. except slave 
> 
> will it be a problem ?
> 
> But for small data (up to 20 GB table) it is running and for 300GB table only 
> count(*) running sometimes and sometimes failed  
> 
> Thanks
> Amit  
> 
>> On Mon, Nov 24, 2014 at 10:37 PM, Daniel Haviv 
>>  wrote:
>> did you copy the hosts file to all the nodes?
>> 
>> Daniel
>> 
>>> On 24 בנוב׳ 2014, at 19:04, Amit Behera  wrote:
>>> 
>>> hi Daniel,
>>> 
>>> 
>>> this stacktrace same for other query .
>>> for different run I am getting slave7 sometime slave8... 
>>> 
>>> And also I registered all machine IPs in /etc/hosts 
>>> 
>>> Regards
>>> Amit
>>> 
>>> 
>>> 
>>>> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv 
>>>>  wrote:
>>>> It seems that the application master can't resolve slave6's name to an IP
>>>> 
>>>> Daniel
>>>> 
>>>>> On 24 בנוב׳ 2014, at 18:49, Amit Behera  wrote:
>>>>> 
>>>>> Hi Users,
>>>>> 
>>>>> my cluster(1+8) configuration:
>>>>> 
>>>>> RAM  : 32 GB each
>>>>> HDFS : 1.5 TB SSD
>>>>> CPU   : 8 core each
>>>>> 
>>>>> ---
>>>>> 
>>>>> I am trying to query on 300GB of table but I am able to run only select 
>>>>> query.
>>>>> 
>>>>> Except select query , for all other query I am getting following 
>>>>> exception.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Total jobs = 1
>>>>> Stage-1 is selected by condition resolver.
>>>>> Launching Job 1 out of 1
>>>>> Number of reduce tasks not specified. Estimated
>>>>> from input data size: 183
>>>>> In order to change the average load for a
>>>>> reducer (in bytes):
>>>>>   set
>>>>> hive.exec.reducers.bytes.per.reducer=
>>>>> In order to limit the maximum number of
>>>>> reducers:
>>>>>   set hive.exec.reducers.max=
>>>>> In order to set a constant number of reducers:
>>>>>   set mapreduce.job.reduces=
>>>>> Starting Job = job_1416831990090_0005, Tracking
>>>>> URL = http://master:8088/proxy/application_1416831990090_0005/
>>>>> Kill Command = /root/hadoop/bin/hadoop job 
>>>>> -kill job_1416831990090_0005
>>>>> Hadoop job information for Stage-1: number of
>>>>> mappers: 679; number of reducers: 183
>>>>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, 
>>>>> reduce = 0%
>>>>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, 
>>>>> reduce = 0%, Cumulative CPU 625.19 sec
>>>>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, 
>>>>> reduce = 100%
>>>>> MapReduce Total cumulative CPU time: 10 minutes
>>>>> 25 seconds 190 msec
>>>>> Ended Job = job_1416831990090_0005 with errors
>>>>> Error during job, obtaining debugging
>>>>> information...
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_05 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_42 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_35 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_65 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_02 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_07 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examining task ID:
>>>>> task_1416831990090_0005_m_58 (and more) from job
>>>>> job_1416831990090_0005
>>>>> Examini

Re: Container launch failed Error

2014-11-24 Thread Daniel Haviv
did you copy the hosts file to all the nodes?

Daniel

> On 24 בנוב׳ 2014, at 19:04, Amit Behera  wrote:
> 
> hi Daniel,
> 
> 
> this stacktrace same for other query .
> for different run I am getting slave7 sometime slave8... 
> 
> And also I registered all machine IPs in /etc/hosts 
> 
> Regards
> Amit
> 
> 
> 
>> On Mon, Nov 24, 2014 at 10:22 PM, Daniel Haviv 
>>  wrote:
>> It seems that the application master can't resolve slave6's name to an IP
>> 
>> Daniel
>> 
>>> On 24 בנוב׳ 2014, at 18:49, Amit Behera  wrote:
>>> 
>>> Hi Users,
>>> 
>>> my cluster(1+8) configuration:
>>> 
>>> RAM  : 32 GB each
>>> HDFS : 1.5 TB SSD
>>> CPU   : 8 core each
>>> 
>>> ---
>>> 
>>> I am trying to query on 300GB of table but I am able to run only select 
>>> query.
>>> 
>>> Except select query , for all other query I am getting following exception.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Total jobs = 1
>>> Stage-1 is selected by condition resolver.
>>> Launching Job 1 out of 1
>>> Number of reduce tasks not specified. Estimated
>>> from input data size: 183
>>> In order to change the average load for a
>>> reducer (in bytes):
>>>   set
>>> hive.exec.reducers.bytes.per.reducer=
>>> In order to limit the maximum number of
>>> reducers:
>>>   set hive.exec.reducers.max=
>>> In order to set a constant number of reducers:
>>>   set mapreduce.job.reduces=
>>> Starting Job = job_1416831990090_0005, Tracking
>>> URL = http://master:8088/proxy/application_1416831990090_0005/
>>> Kill Command = /root/hadoop/bin/hadoop job 
>>> -kill job_1416831990090_0005
>>> Hadoop job information for Stage-1: number of
>>> mappers: 679; number of reducers: 183
>>> 2014-11-24 19:43:01,523 Stage-1 map = 0%, 
>>> reduce = 0%
>>> 2014-11-24 19:43:22,730 Stage-1 map = 53%, 
>>> reduce = 0%, Cumulative CPU 625.19 sec
>>> 2014-11-24 19:43:23,778 Stage-1 map = 100%, 
>>> reduce = 100%
>>> MapReduce Total cumulative CPU time: 10 minutes
>>> 25 seconds 190 msec
>>> Ended Job = job_1416831990090_0005 with errors
>>> Error during job, obtaining debugging
>>> information...
>>> Examining task ID:
>>> task_1416831990090_0005_m_05 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_42 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_35 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_65 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_02 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_07 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_58 (and more) from job
>>> job_1416831990090_0005
>>> Examining task ID:
>>> task_1416831990090_0005_m_43 (and more) from job
>>> job_1416831990090_0005
>>> 
>>> 
>>> Task with the most failures(4): 
>>> -
>>> Task ID:
>>>   task_1416831990090_0005_m_05
>>> 
>>> 
>>> URL:
>>>  
>>> http://master:8088/taskdetails.jsp?jobid=job_1416831990090_0005&tipid=task_1416831990090_0005_m_05
>>> -
>>> Diagnostic Messages for this Task:
>>> Container launch failed for
>>> container_1416831990090_0005_01_000112 :
>>> java.lang.IllegalArgumentException: java.net.UnknownHostException:
>>> slave6
>>> at
>>> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>>> at
>>> org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:397)
>>> at
>>> org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:233)
>>> at
>>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:211)
>>> at
>>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtoco

Re: Container launch failed Error

2014-11-24 Thread Daniel Haviv
It seems that the application master can't resolve slave6's name to an IP

Daniel

> On 24 בנוב׳ 2014, at 18:49, Amit Behera  wrote:
> 
> Hi Users,
> 
> my cluster(1+8) configuration:
> 
> RAM  : 32 GB each
> HDFS : 1.5 TB SSD
> CPU   : 8 core each
> 
> ---
> 
> I am trying to query on 300GB of table but I am able to run only select query.
> 
> Except select query , for all other query I am getting following exception.
> 
> 
> 
> 
> 
> Total jobs = 1
> Stage-1 is selected by condition resolver.
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated
> from input data size: 183
> In order to change the average load for a
> reducer (in bytes):
>   set
> hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of
> reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Starting Job = job_1416831990090_0005, Tracking
> URL = http://master:8088/proxy/application_1416831990090_0005/
> Kill Command = /root/hadoop/bin/hadoop job 
> -kill job_1416831990090_0005
> Hadoop job information for Stage-1: number of
> mappers: 679; number of reducers: 183
> 2014-11-24 19:43:01,523 Stage-1 map = 0%, 
> reduce = 0%
> 2014-11-24 19:43:22,730 Stage-1 map = 53%, 
> reduce = 0%, Cumulative CPU 625.19 sec
> 2014-11-24 19:43:23,778 Stage-1 map = 100%, 
> reduce = 100%
> MapReduce Total cumulative CPU time: 10 minutes
> 25 seconds 190 msec
> Ended Job = job_1416831990090_0005 with errors
> Error during job, obtaining debugging
> information...
> Examining task ID:
> task_1416831990090_0005_m_05 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_42 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_35 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_65 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_02 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_07 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_58 (and more) from job
> job_1416831990090_0005
> Examining task ID:
> task_1416831990090_0005_m_43 (and more) from job
> job_1416831990090_0005
> 
> 
> Task with the most failures(4): 
> -
> Task ID:
>   task_1416831990090_0005_m_05
> 
> 
> URL:
>  
> http://master:8088/taskdetails.jsp?jobid=job_1416831990090_0005&tipid=task_1416831990090_0005_m_05
> -
> Diagnostic Messages for this Task:
> Container launch failed for
> container_1416831990090_0005_01_000112 :
> java.lang.IllegalArgumentException: java.net.UnknownHostException:
> slave6
>   at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>   at
> org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:397)
>   at
> org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:233)
>   at
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:211)
>   at
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:189)
>   at
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:110)
>   at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
>   at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>   at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.UnknownHostException: slave6
>   ... 12 more
> 
> 
> 
> 
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> Job 0: Map: 679  Reduce: 183   Cumulative CPU:
> 625.19 sec   HDFS Read: 0 HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 10 minutes 25
> seconds 190 mse
>
> 
> 
> Please help me to fix the issue.
> 
> Thanks
> Amit


Problem after upgrading to hive 0.14

2014-11-22 Thread Daniel Haviv
Hi,
After upgrading to hive 0.14 any query I run I hit the following message:
. . . . . . . . . . . . . . . .> ;
INFO  : Tez session hasn't been created yet. Opening session
Error: Error while processing statement: FAILED: Execution Error, return
code -101 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V
(state=08S01,code=-101)
0: jdbc:hive2://localhost:1> Closing: 0: jdbc:hive2://localhost:1

when I look into the HiveServer2 logs these are the errors I get:
1.
2014-11-22 10:22:24,812 ERROR [HiveServer2-Background-Pool: Thread-123]:
operation.Operation (SQLOperation.java:run(199)) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing
statement: FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.tez.TezTask.
org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V
at
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:314)
at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146)
at
org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
at
org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at
org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NoSuchMethodError:
org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvBasedOnMRAMEnv(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V
at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:169)
at
org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:234)
at
org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:999)
at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
... 12 more

2.
2014-11-22 10:22:33,015 ERROR [HiveServer2-Handler-Pool: Thread-35]:
server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred
during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
at
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more


Any ideas what can cause this ?

Thanks,
Daniel