Re: struggling with Hive ACID - need help

2015-01-15 Thread Devopam Mittra
Any ideas as to where I am going wrong here.. .Between , observed that by
changing hive txn manager , some other things are breaking in my existing
setup so I have reverted to the old configuration for now to keep the
cluster in a workable mode.


regards
Devopam


On Wed, Jan 14, 2015 at 4:25 PM, Devopam Mittra  wrote:

> Env: Hive 0.14 , HDP2.2, Centos 6.6
>
> I created a simple table (ref syntax below)
> DROP TABLE IF EXISTS  twitter_user_info;
>
> CREATE TABLE twitter_user_info
> ( userid BIGINT COMMENT 'User Id',
>   username STRING COMMENT 'User Name',
>   is_latest BOOLEAN COMMENT 'Currently active',
>   last_modified_ts TIMESTAMP COMMENT 'Last modified timestamp'
> )
> COMMENT 'Twitter User Info Table'
> CLUSTERED BY (userid,is_latest) SORTED BY (userid) INTO 30 BUCKETS
> STORED AS ORC
> TBLPROPERTIES('transactional'='true');
>
>
> set the parameters in hive conf via ambari:
>
> set hive.support.concurrency=true;
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.compactor.initiator.on=true;
> set hive.compactor.worker.threads=100;
>
> I inserted few rows via the following query (not-relevant , yet pasting):
> INSERT INTO TABLE twitter_user_info
> SELECT created_by_id, created_by_name,'Y',MAX(created_at)
> FROM twitter_data B
> WHERE b.created_by_id IS NOT NULL
> AND b.created_by_id NOT IN (
>  SELECT created_by_id FROM
> (SELECT created_by_id,COUNT(DISTINCT
> created_by_name)
>  FROM twitter_data
>  WHERE created_by_id IS NOT NULL
>  GROUP BY created_by_id
>  HAVING COUNT(DISTINCT created_by_name) >1) A
> )
> GROUP BY b.created_by_id,b.created_by_name;
>
> For each operation below I am hitting the same error, and am clueless at
> this point .
>
> hive> UPDATE twitter_user_info
> > SET is_latest = false
> > WHERE userid = 298779403
> > AND last_modified_ts = '2015-01-13 14:25:14';
> *FAILED: NoMatchingMethodException No matching method for class
> org.apache.hadoop.hive.ql.udf.UDFToLong with
> (struct). Possible choices:
> _FUNC_(bigint)  _FUNC_(boolean)  _FUNC_(decimal(38,18))  _FUNC_(double)
> _FUNC_(float)  _FUNC_(int)  _FUNC_(smallint)  _FUNC_(string)
> _FUNC_(timestamp)  _FUNC_(tinyint)  _FUNC_(void)  *
> hive> DELETE FROM twitter_user_info WHERE userid=100;
>
> *FAILED: NoMatchingMethodException No matching method for class
> org.apache.hadoop.hive.ql.udf.UDFToLong with
> (struct). Possible choices:
> _FUNC_(bigint)  _FUNC_(boolean)  _FUNC_(decimal(38,18))  _FUNC_(double)
> _FUNC_(float)  _FUNC_(int)  _FUNC_(smallint)  _FUNC_(string)
> _FUNC_(timestamp)  _FUNC_(tinyint)  _FUNC_(void)  *
>
> Any pointers are welcome
>
> --
> Devopam Mittra
> Life and Relations are not binary
>



-- 
Devopam Mittra
Life and Relations are not binary


full query submitted to hiveserver2

2015-01-15 Thread Vikas Parashar
Hi,

I have written an application that is contacting with my hive server2. I
just want to check how may queries has been submitted on my hive and full
details about the query. But when i have grep this remote command in my
hiveserver2 logs. I got below information only:

[abc@xyz ~]# cat /var/log/hive/hiveserver2.log | grep -ir --color starting\
command
2015-01-15 07:08:43,695 INFO  [pool-5-thread-16]: ql.Driver
(Driver.java:execute(1197)) - Starting command: use default
2015-01-15 07:08:43,885 INFO  [pool-5-thread-16]: ql.Driver
(Driver.java:execute(1197)) - Starting command: use default
2015-01-15 07:08:44,085 INFO  [pool-4-thread-162]: ql.Driver
(Driver.java:execute(1197)) - *Starting command: select /*+ STREAMTABLE(a)
*/*


When i tired with "yarn application -list", in name part i got only some
info:-

[abc@xyz~]# yarn application -list
15/01/15 08:38:08 INFO impl.TimelineClientImpl: Timeline service address:
http://xyz:8188/ws/v1/timeline/
15/01/15 08:38:09 INFO client.RMProxy: Connecting to ResourceManager at
xyz:8050
Total number of applications (application-types: [] and states: [SUBMITTED,
ACCEPTED, RUNNING]):9
Application-IdApplication-NameApplication-Type
 User Queue State   Final-State   Progress
  Tracking-URL
application_1419417277982_0675 *Select /*+ STREAMTABL...*
lds.allpointoptin(Stage-10)   MAPREDUCE  hive   default
  RUNNING UNDEFINED 30.45%   http://xyz:38273



I am not getting full query that i have submitted from my applicaton. Can
anybody please put some light on it. From where(any logs location, any
configuration that i have missed), i get full query.

Thanks in Advance!

Rg:
Vicky


Load columns changed name

2015-01-15 Thread Patcharee Thongtra

Hi,

I have a hive table with a column which was changed its name. Pig is not 
able to load data from this column, it is all empty.


Any ideas how to fix it?

BR,
Patcharee


Re: Hive being slow

2015-01-15 Thread Abhishek kumar
Hi,

Thanks for the reply.

I tried that, but no luck. The map-reduce seems to be stuck (taking a lot
of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or
http://pastebin.com/BUYDUiEu)

My only question is why the filter push-down for row-key (*startKey* and
*stopKey* for the *Scanner*) is not happening to Hbase. If the push-down
happens, then Hbase will resolve this Scanner very fast and no matter MR
job runs or not, the query resolution will be very fast.

--
Abhishek

On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan 
wrote:

> Can you run your query with following config:
>
> hive> set hive.fetch.task.conversion=none;
>
> and run your two queries with this. Lets see if this makes a difference.
> My expectation is this will result in MR job getting launched and thus
> runtimes might be different.
>
> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar 
> wrote:
>
>> First I tried running the query: select * from table1 where id = 'value';
>> It was very fast, as expected since Hbase replied the results very fast.
>> In this case, I observed no map/reduce task getting spawned.
>>
>> Now, for the query, select * from table1 where id > 'zzz', I expected
>> the filter push down to happen (at least the 0.14 code says). And since,
>> there were no results found, so Hbase will again reply very fast and thus
>> hive should output the query's result very fast. But, this is not
>> happening, and from the logs of datanode, it looks like a lot of reads are
>> happening (close to full table scan of 10GBs of data). I expected the
>> response time to be very close to the above query's time.
>>
>> I will check about the number of task getting launched.
>>
>> My questions are:
>> * Why there was no any filter pushdown (id > 'zzz') happening for this
>> very simple query.
>> * Since this query can only be resolved from HBase, will Hive launch map
>> tasks (last time, I guess I observed no map task getting launched)
>>
>> --
>> Abhishek
>>
>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan 
>> wrote:
>>
>>> Hi Abhishek,
>>>
>>> How are you determining its resulting in full table scan? One way to
>>> ascertain that filter got pushed down is to see how many tasks were
>>> launched for your query, with and without filter. One would expect lower #
>>> of splits (and thus tasks) for query having filter.
>>>
>>> Thanks,
>>> Ashutosh
>>>
>>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <
>>> abhishekiit...@gmail.com> wrote:
>>>
 Hi,

 I am using hive 0.14 which runs over hbase (having ~10 GB of data). I
 am facing issues in terms of slowness when querying over Hbase. My query
 looks like following:

 select * from table1 where id > '';  (id is the row-key)

 As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner as
 'startKey'. Now given there are no such rows-keys (id) which satisfies this
 criteria, this query should be extremely fast. But hive is taking a lot of
 time, looks like full hbase table scan.
 Can someone let me know where am I wrong in understanding the whole
 thing?

 --
 Abhishek

>>>
>>>
>>
>
0: jdbc:hive2://localhost:1> select * from events where id = 'some_id';
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
WARN  : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_local1981153761_0018
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
INFO  : 2015-01-15 08:22:02,398 Stage-1 map = 0%,  reduce = 0%
+---+--+--+
| events.values |events.id |
+---+--+--+
| {"eventName":"value","eventTs":"1417258870867","key2":"..."}  | some_id     |
+---+--+--+
1 row selected (15.882 seconds)
INFO  : 2015-01-15 08:22:03,522 Stage-1 map = 100%,  reduce = 0%
INFO  : Ended Job = job_local1981153761_0018
0: jdbc:hive2://localhost:1> 
0: jdbc:hive2://localhost:1> 
0: jdbc:hive2://localhost:1> select * from events where id > 'zzz' AND id < '' limit 1;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
WARN  : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with Too

NegativeArraySizeException in table join

2015-01-15 Thread Guodong Wang
Hi,

I am using hive 0.13.1 and currently I am blocked by a bug when joining 2
tables. Here is the sample query.

INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name, type)
SELECT COALESCE(b.resource_id, a.id) AS id,
   a.timstamp,
   a.payload,
   a.name,
   a.type
FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
WHERE a.date='2015-01-17'
AND a.name IN ('a‘, 'b', 'c')
AND a.type <= 14;

It turns out that when there are more than 25000 joins rows on a specific
id, hive MR job fails, throwing NegativeArraySizeException.

Here is the stack trace

2015-01-15 14:38:42,693 ERROR
org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
java.lang.NegativeArraySizeException
at 
org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at 
org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2015-01-15 14:38:42,707 FATAL ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
... 11 more
Caused by: java.lang.NegativeArraySizeException
at 
org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at 
org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
at 
org.apache.hadoop.mapred.SequenceFileRecordRe

Tez Hive Job failing during initialization

2015-01-15 Thread peterm_second

Hi Guys,
I am running a docker based hadoop cluster, with hive 0.14 and tez 0.6
I've had successfully managed to make my setup work before, I've 
increased the input data to my jobs and now I get this exception :


2015-01-15 14:56:32,872 ERROR [HistoryEventHandlingThread] 
yarn.YarnUncaughtExceptionHandler: Thread 
Thread[HistoryEventHandlingThread,5,main] threw an Exception.
java.util.ConcurrentModificationException
at java.util.Hashtable$Enumerator.next(Hashtable.java:1367)
at 
org.apache.hadoop.conf.Configuration.iterator(Configuration.java:2144)
at 
org.apache.tez.dag.history.utils.DAGUtils.convertConfigurationToATSMap(DAGUtils.java:365)
at 
org.apache.tez.dag.history.logging.impl.HistoryEventJsonConversion.convertAppLaunchedEvent(HistoryEventJsonConversion.java:139)
at 
org.apache.tez.dag.history.logging.impl.HistoryEventJsonConversion.convertToJson(HistoryEventJsonConversion.java:64)
at 
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService.handleEvent(SimpleHistoryLoggingService.java:154)
at 
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService.access$300(SimpleHistoryLoggingService.java:38)
at 
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService$1.run(SimpleHistoryLoggingService.java:108)
at java.lang.Thread.run(Thread.java:745)


Which frustrates me to no end, what could possibly be happening that 
would cause a ConcurentModificationException.


The sql I am running is

select  pdepartment.PdDepartment , incidents.DayOfWeek , category.Category , 
count(incidents.incidntNum) NumberOfIncidents
from  incidents
JOIN category ON ( category.key = incidents.Category_Id )
JOIN pdepartment ON ( pdepartment.key = incidents.PdDistrict_Id )
group  by  pdepartment.PdDepartment, incidents.DayOfWeek, category.Category 
with cube


As I said I am certain that my set up works, I've changed nothing and  I 
get only this one exception in the logs, and that's in the yarn log. The 
mr job logs are clean. The tez hadoop app isn't failing it just hangs.


Have you guys seen anything like that before? Any ideas what might the 
problem be ? I'd love any suggestion cause I am out of ideas.


Thanks,
Peter


Re: Hive being slow

2015-01-15 Thread Ashutosh Chauhan
which hive version you are using ?

On Thu, Jan 15, 2015 at 12:44 AM, Abhishek kumar 
wrote:

> Hi,
>
> Thanks for the reply.
>
> I tried that, but no luck. The map-reduce seems to be stuck (taking a lot
> of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or
> http://pastebin.com/BUYDUiEu)
>
> My only question is why the filter push-down for row-key (*startKey* and
> *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down
> happens, then Hbase will resolve this Scanner very fast and no matter MR
> job runs or not, the query resolution will be very fast.
>
> --
> Abhishek
>
> On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan 
> wrote:
>
>> Can you run your query with following config:
>>
>> hive> set hive.fetch.task.conversion=none;
>>
>> and run your two queries with this. Lets see if this makes a difference.
>> My expectation is this will result in MR job getting launched and thus
>> runtimes might be different.
>>
>> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar > > wrote:
>>
>>> First I tried running the query: select * from table1 where id =
>>> 'value';
>>> It was very fast, as expected since Hbase replied the results very fast.
>>> In this case, I observed no map/reduce task getting spawned.
>>>
>>> Now, for the query, select * from table1 where id > 'zzz', I expected
>>> the filter push down to happen (at least the 0.14 code says). And since,
>>> there were no results found, so Hbase will again reply very fast and thus
>>> hive should output the query's result very fast. But, this is not
>>> happening, and from the logs of datanode, it looks like a lot of reads are
>>> happening (close to full table scan of 10GBs of data). I expected the
>>> response time to be very close to the above query's time.
>>>
>>> I will check about the number of task getting launched.
>>>
>>> My questions are:
>>> * Why there was no any filter pushdown (id > 'zzz') happening for this
>>> very simple query.
>>> * Since this query can only be resolved from HBase, will Hive launch map
>>> tasks (last time, I guess I observed no map task getting launched)
>>>
>>> --
>>> Abhishek
>>>
>>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan 
>>> wrote:
>>>
 Hi Abhishek,

 How are you determining its resulting in full table scan? One way to
 ascertain that filter got pushed down is to see how many tasks were
 launched for your query, with and without filter. One would expect lower #
 of splits (and thus tasks) for query having filter.

 Thanks,
 Ashutosh

 On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <
 abhishekiit...@gmail.com> wrote:

> Hi,
>
> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I
> am facing issues in terms of slowness when querying over Hbase. My query
> looks like following:
>
> select * from table1 where id > '';  (id is the row-key)
>
> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner
> as 'startKey'. Now given there are no such rows-keys (id) which satisfies
> this criteria, this query should be extremely fast. But hive is taking a
> lot of time, looks like full hbase table scan.
> Can someone let me know where am I wrong in understanding the whole
> thing?
>
> --
> Abhishek
>


>>>
>>
>


Re: Hive being slow

2015-01-15 Thread Abhishek kumar
0.14.0

--
Abhishek

On Thu, Jan 15, 2015 at 10:43 PM, Ashutosh Chauhan 
wrote:

> which hive version you are using ?
>
> On Thu, Jan 15, 2015 at 12:44 AM, Abhishek kumar  > wrote:
>
>> Hi,
>>
>> Thanks for the reply.
>>
>> I tried that, but no luck. The map-reduce seems to be stuck (taking a lot
>> of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or
>> http://pastebin.com/BUYDUiEu)
>>
>> My only question is why the filter push-down for row-key (*startKey* and
>> *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down
>> happens, then Hbase will resolve this Scanner very fast and no matter MR
>> job runs or not, the query resolution will be very fast.
>>
>> --
>> Abhishek
>>
>> On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan 
>> wrote:
>>
>>> Can you run your query with following config:
>>>
>>> hive> set hive.fetch.task.conversion=none;
>>>
>>> and run your two queries with this. Lets see if this makes a difference.
>>> My expectation is this will result in MR job getting launched and thus
>>> runtimes might be different.
>>>
>>> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <
>>> abhishekiit...@gmail.com> wrote:
>>>
 First I tried running the query: select * from table1 where id =
 'value';
 It was very fast, as expected since Hbase replied the results very
 fast. In this case, I observed no map/reduce task getting spawned.

 Now, for the query, select * from table1 where id > 'zzz', I expected
 the filter push down to happen (at least the 0.14 code says). And since,
 there were no results found, so Hbase will again reply very fast and thus
 hive should output the query's result very fast. But, this is not
 happening, and from the logs of datanode, it looks like a lot of reads are
 happening (close to full table scan of 10GBs of data). I expected the
 response time to be very close to the above query's time.

 I will check about the number of task getting launched.

 My questions are:
 * Why there was no any filter pushdown (id > 'zzz') happening for this
 very simple query.
 * Since this query can only be resolved from HBase, will Hive launch
 map tasks (last time, I guess I observed no map task getting launched)

 --
 Abhishek

 On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan >>> > wrote:

> Hi Abhishek,
>
> How are you determining its resulting in full table scan? One way to
> ascertain that filter got pushed down is to see how many tasks were
> launched for your query, with and without filter. One would expect lower #
> of splits (and thus tasks) for query having filter.
>
> Thanks,
> Ashutosh
>
> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <
> abhishekiit...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I
>> am facing issues in terms of slowness when querying over Hbase. My query
>> looks like following:
>>
>> select * from table1 where id > '';  (id is the row-key)
>>
>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner
>> as 'startKey'. Now given there are no such rows-keys (id) which satisfies
>> this criteria, this query should be extremely fast. But hive is taking a
>> lot of time, looks like full hbase table scan.
>> Can someone let me know where am I wrong in understanding the whole
>> thing?
>>
>> --
>> Abhishek
>>
>
>

>>>
>>
>


Re: NegativeArraySizeException in table join

2015-01-15 Thread Ashutosh Chauhan
Seems like you are hitting into :
https://issues.apache.org/jira/browse/HIVE-9324

On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang  wrote:

> Hi,
>
> I am using hive 0.13.1 and currently I am blocked by a bug when joining 2
> tables. Here is the sample query.
>
> INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name,
> type)
> SELECT COALESCE(b.resource_id, a.id) AS id,
>a.timstamp,
>a.payload,
>a.name,
>a.type
> FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
> WHERE a.date='2015-01-17'
> AND a.name IN ('a', 'b', 'c')
> AND a.type <= 14;
>
> It turns out that when there are more than 25000 joins rows on a specific
> id, hive MR job fails, throwing NegativeArraySizeException.
>
> Here is the stack trace
>
> 2015-01-15 14:38:42,693 ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>   at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 2015-01-15 14:38:42,707 FATAL ExecReducer: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>   ... 11 more
> Caused by: java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>   at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>   at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>   at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerializati

Re: NegativeArraySizeException in table join

2015-01-15 Thread Guodong Wang
Hi Ashutosh,

Thanks for your reply.

Not sure if HIVE-9324 is the same issue we met.
We found it is a bug in CDH when using MR1 with hive 0.13.1. This bug does
not exist when using yarn with 0.13.1.



Guodong

On Fri, Jan 16, 2015 at 1:21 AM, Ashutosh Chauhan 
wrote:

> Seems like you are hitting into :
> https://issues.apache.org/jira/browse/HIVE-9324
>
> On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang  wrote:
>
>> Hi,
>>
>> I am using hive 0.13.1 and currently I am blocked by a bug when joining 2
>> tables. Here is the sample query.
>>
>> INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name,
>> type)
>> SELECT COALESCE(b.resource_id, a.id) AS id,
>>a.timstamp,
>>a.payload,
>>a.name,
>>a.type
>> FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
>> WHERE a.date='2015-01-17'
>> AND a.name IN ('a‘, 'b', 'c')
>> AND a.type <= 14;
>>
>> It turns out that when there are more than 25000 joins rows on a specific
>> id, hive MR job fails, throwing NegativeArraySizeException.
>>
>> Here is the stack trace
>>
>> 2015-01-15 14:38:42,693 ERROR 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
>> java.lang.NegativeArraySizeException
>>  at 
>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>  at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>>  at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>>  at 
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>>  at 
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>>  at 
>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>>  at 
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>>  at 
>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>  at 
>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>  at 
>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>  at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>  at 
>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>  at 
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>  at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> 2015-01-15 14:38:42,707 FATAL ExecReducer: 
>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.lang.NegativeArraySizeException
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>  at 
>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>  at 
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>  at 
>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>  at 
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>  at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
>> java.lang.NegativeArraySizeException
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
>>  at 
>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>  ... 11 more
>> Caused by: java.lang.NegativeArraySizeException
>>  at 
>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>  at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:12

Re: NegativeArraySizeException in table join

2015-01-15 Thread Ashutosh Chauhan
Interesting. You mean to say there is no bug in Hive, but in some other
component (yarn / MR). Stack trace seems to indicate there is something
going on in Hive side as well. Granted stack trace of 9324 is not identical
to yours, but both points problem in similar area.

On Thu, Jan 15, 2015 at 7:53 PM, Guodong Wang  wrote:

> Hi Ashutosh,
>
> Thanks for your reply.
>
> Not sure if HIVE-9324 is the same issue we met.
> We found it is a bug in CDH when using MR1 with hive 0.13.1. This bug does
> not exist when using yarn with 0.13.1.
>
>
>
> Guodong
>
> On Fri, Jan 16, 2015 at 1:21 AM, Ashutosh Chauhan 
> wrote:
>
>> Seems like you are hitting into :
>> https://issues.apache.org/jira/browse/HIVE-9324
>>
>> On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang  wrote:
>>
>>> Hi,
>>>
>>> I am using hive 0.13.1 and currently I am blocked by a bug when joining
>>> 2 tables. Here is the sample query.
>>>
>>> INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name,
>>> type)
>>> SELECT COALESCE(b.resource_id, a.id) AS id,
>>>a.timstamp,
>>>a.payload,
>>>a.name,
>>>a.type
>>> FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
>>> WHERE a.date='2015-01-17'
>>> AND a.name IN ('a', 'b', 'c')
>>> AND a.type <= 14;
>>>
>>> It turns out that when there are more than 25000 joins rows on a
>>> specific id, hive MR job fails, throwing NegativeArraySizeException.
>>>
>>> Here is the stack trace
>>>
>>> 2015-01-15 14:38:42,693 ERROR 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
>>> java.lang.NegativeArraySizeException
>>> at 
>>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>> at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>>> at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>>> at 
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>>> at 
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>>> at 
>>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>>> at 
>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>>> at 
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>> at 
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>> at 
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> 2015-01-15 14:38:42,707 FATAL ExecReducer: 
>>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>>> java.lang.NegativeArraySizeException
>>> at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>> at 
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
>>> java.lang.NegativeArraySizeException
>>> at 
>>> org.apache.h

date_add statement problem

2015-01-15 Thread Yashwanth M
Hi,

I've been facing some issues in using date_add()

I have column 'day_key'  that is short integer and stores day keys with respect 
to 1985/01/01.

I need to extract actual date from this column.

Date_add('1985/01/01',day_key)
However I get an error saying the second argument needs integer.

I also did
Date_add('1985/01/01',cast(day_key as int))
Resulted in an error saying the column is short.

Please help me with this.

Thanks
Regards,
Yashwanth M | +91 8884 177 884 |www.mu-sigma.com |

Disclaimer: http://www.mu-sigma.com/disclaimer.html