[jira] [Created] (HIVE-27099) Iceberg: select count(*) from table queries all data

2023-02-23 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27099:
---

 Summary: Iceberg: select count(*) from table queries all data
 Key: HIVE-27099
 URL: https://issues.apache.org/jira/browse/HIVE-27099
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


select count is scanning all data. Though it has complete basic stats, it 
launched tez job which wasn't needed. Second issue is, it ended up scanning 
ENTIRE 148 GB dataset which is completely not required. It should have got the 
data from parq files itself. Ideal situation is getting entire records from 
manifest itself.

Data is stored in parquet format in external tables. This may be broken for 
parquet, as for ORC it is able to read less data (footer info). 

1. Consider fixing count( * ) for parq
2. Check if it is possible to read stats from iceberg manifests after #1.


{noformat}

explain select count(*) from store_sales;

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: hive_20230223031934_2abeb3b9-8c18-4ff7-a8f9-df7368010189:5
  Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
  DagName: hive_20230223031934_2abeb3b9-8c18-4ff7-a8f9-df7368010189:5
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: store_sales
  Statistics: Num rows: 2879966589 Data size: 195666988943 
Basic stats: COMPLETE Column stats: COMPLETE
  Select Operator
Statistics: Num rows: 2879966589 Data size: 195666988943 
Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
  aggregations: count()
  minReductionHashAggr: 0.5
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Execution mode: vectorized
Reducer 2
Execution mode: vectorized
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

53 rows selected (1.454 seconds)

0: jdbc:hive2://ve0:218> select count(*) from store_sales;
INFO  : Query ID = hive_20230223031940_9ff5d61d-1fe2-4476-a561-7820e4a3a5f8
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20230223031940_9ff5d61d-1fe2-4476-a561-7820e4a3a5f8
INFO  : Session is already open
INFO  : Dag name: select count(*) from store_sales (Stage-1)
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1676286357243_0061)

--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container SUCCEEDED76776700
   0   0
Reducer 2 .. container SUCCEEDED  1  100
   0   0
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 54.94 s
--
INFO  : Status: DAG finished successfully in 54.85 seconds
INFO  :
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATIONDURATION
INFO  : 
--
INFO  : C

Re: select count(*) from table;

2016-03-31 Thread Amey Barve
Hi All,

Can custom storage handlers get information for queries like count, max,
min etc. from hive directly so that for each of such queries RecordReader
need not fetch all the records?

Regards,
Amey

On Tue, Mar 22, 2016 at 1:46 PM, Amey Barve <ameybarv...@gmail.com> wrote:

> Thanks Nitin, Mich,
>
> if its just plain vanilla text file format, it needs to run a job to get
> the count so the longest of all
> --> Hive must be translating some operator like fetch (for count) into a
> map-reduce job and getting the result?
> Can a custom storage handler get information about the operator/s for
> count(*) and then use it to retrieve the results.
>
> I want to know whether custom storage handler can get information about
> operators that hive constructs for queries like count, max, min etc. so
> that storage handler can map these to internal storage functions?
>
> Regards,
> Amey
>
> On Tue, Mar 22, 2016 at 1:32 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> ORC file has the following stats levels for storage indexes
>>
>>
>>1. ORC File itself
>>2. Multiple stripes (chunks) within the ORC file
>>3. Multiple row groups (row batches) within each stripe
>>
>> Assuming that the underlying table has stats updated, count will be
>> stored for each column
>>
>> So when we do something like below:
>>
>> select count(1) from orctest
>>
>> you can see stats collected if you do
>>
>> show create table orctest;
>>
>>  TBLPROPERTIES (  |
>> |   'COLUMN_STATS_ACCURATE'='true',|
>> |   'numFiles'='31',   |
>> |   *'numRows'='25'*,|
>>
>>
>> File statistics, Stripe statistics and row group statistics are kept. So
>> ORC table will rely on those if needed
>>
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 22 March 2016 at 07:14, Amey Barve <ameybarv...@gmail.com> wrote:
>>
>>> select count(*) from table;
>>>
>>> How does hive evaluate count(*) on a table?
>>>
>>> Does it return count by actually querying table, or directly return
>>> count by consulting some statistics locally.
>>>
>>> For Hive's Text format it takes few seconds while Hive's Orc format
>>> takes fraction of seconds.
>>>
>>> Regards,
>>> Amey
>>>
>>
>>
>


Re: select count(*) from table;

2016-03-22 Thread Amey Barve
Thanks Nitin, Mich,

if its just plain vanilla text file format, it needs to run a job to get
the count so the longest of all
--> Hive must be translating some operator like fetch (for count) into a
map-reduce job and getting the result?
Can a custom storage handler get information about the operator/s for
count(*) and then use it to retrieve the results.

I want to know whether custom storage handler can get information about
operators that hive constructs for queries like count, max, min etc. so
that storage handler can map these to internal storage functions?

Regards,
Amey

On Tue, Mar 22, 2016 at 1:32 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> ORC file has the following stats levels for storage indexes
>
>
>1. ORC File itself
>2. Multiple stripes (chunks) within the ORC file
>3. Multiple row groups (row batches) within each stripe
>
> Assuming that the underlying table has stats updated, count will be stored
> for each column
>
> So when we do something like below:
>
> select count(1) from orctest
>
> you can see stats collected if you do
>
> show create table orctest;
>
>  TBLPROPERTIES (  |
> |   'COLUMN_STATS_ACCURATE'='true',|
> |   'numFiles'='31',   |
> |   *'numRows'='25'*,|
>
>
> File statistics, Stripe statistics and row group statistics are kept. So
> ORC table will rely on those if needed
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 22 March 2016 at 07:14, Amey Barve <ameybarv...@gmail.com> wrote:
>
>> select count(*) from table;
>>
>> How does hive evaluate count(*) on a table?
>>
>> Does it return count by actually querying table, or directly return count
>> by consulting some statistics locally.
>>
>> For Hive's Text format it takes few seconds while Hive's Orc format takes
>> fraction of seconds.
>>
>> Regards,
>> Amey
>>
>
>


Re: select count(*) from table;

2016-03-22 Thread Nitin Pawar
If you have enabled performance optimization by enabling statistics it will
come from there
if the underlying file format supports infile statistics (like ORC), it
will come from there
if its just plain vanilla text file format, it needs to run a job to get
the count so the longest of all

On Tue, Mar 22, 2016 at 12:44 PM, Amey Barve <ameybarv...@gmail.com> wrote:

> select count(*) from table;
>
> How does hive evaluate count(*) on a table?
>
> Does it return count by actually querying table, or directly return count
> by consulting some statistics locally.
>
> For Hive's Text format it takes few seconds while Hive's Orc format takes
> fraction of seconds.
>
> Regards,
> Amey
>



-- 
Nitin Pawar


select count(*) from table;

2016-03-22 Thread Amey Barve
select count(*) from table;

How does hive evaluate count(*) on a table?

Does it return count by actually querying table, or directly return count
by consulting some statistics locally.

For Hive's Text format it takes few seconds while Hive's Orc format takes
fraction of seconds.

Regards,
Amey


[jira] [Resolved] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2014-05-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4515.


Resolution: Invalid

Resolving this as invalid, per [~swarnim] previous comment. Feel free to reopen 
if you are able to repro this on trunk.

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Assignee: Swarnim Kulkarni
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 successfully:
 {noformat}
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028

[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2014-02-27 Thread sridhar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914655#comment-13914655
 ] 

sridhar commented on HIVE-4515:
---

Hi,

We are using Hbase Avro tables and would like to access to data using Hive. So 
i modified the code in HBaseStorageHandler, LazyHBaseRow, LazyHBaseCellMap to 
provide support for Avro schema parsing. All works perfectly and able to see 
the data with basic query like; select *  But when i query the hive table 
with any filter or select only some columns i see the same error that was 
reported above.
Also the exact same error is noticed when we access the data in hive using the 
original HBaseStorage Handler. So do not think this is something introduced by 
my changes to code.
So wanted check if there is any work around or fix available for this. We are 
using CDH 4.4.  
Any suggestions?

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Assignee: Swarnim Kulkarni
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 successfully:
 {noformat}
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0

[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2014-02-27 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914673#comment-13914673
 ] 

Swarnim Kulkarni commented on HIVE-4515:


I think this is more specific to CDH hive release, probably due to some 
incompatible dependencies. I wasn't able to reproduce this with apache stack.

As a side note for your avro support of HBase, you might as well look into the 
patch on HIVE-6147. It attempts to solve the sam problem.

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Assignee: Swarnim Kulkarni
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 successfully:
 {noformat}
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop

[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2014-02-27 Thread sridhar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914738#comment-13914738
 ] 

sridhar commented on HIVE-4515:
---

Thank you Swarnim. I will follow up with the Cloudera and see if we have any 
resolution.

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Assignee: Swarnim Kulkarni
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 successfully:
 {noformat}
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution

[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2014-01-06 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4515:
---

Description: 
After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
successfully:
{noformat}
create table
insert overwrite table
select * from table

However, when execute select count(*) from table, throws exception:
hive select count(*) from test; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Starting Job = job_201305061042_0028, Tracking URL = 
http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
-kill job_201305061042_0028
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201305061042_0028 with errors
Error during job, obtaining debugging information...
Job Tracking URL: 
http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
Examining task ID: task_201305061042_0028_m_02 (and more) from job 
job_201305061042_0028

Task with the most failures(4): 
-
Task ID:
  task_201305061042_0028_m_00

URL:
  
http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
-
Diagnostic Messages for this Task:
java.lang.NegativeArraySizeException: -1
at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
at 
org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
at 
org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

==
The log of tasktracker:

stderr logs

13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/org
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/hive-exec-log4j.properties
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/hive-exec-log4j.properties
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache

[jira] [Assigned] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2014-01-03 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni reassigned HIVE-4515:
--

Assignee: Swarnim Kulkarni

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Assignee: Swarnim Kulkarni
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 sucessfully:
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop

[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2013-10-21 Thread Yash Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800541#comment-13800541
 ] 

Yash Sharma commented on HIVE-4515:
---

Is there any time line decided for this issue, or any other workaround for it. 
I am stuck with it for a while.


 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 sucessfully:
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
 13/05/07 18:43:20 INFO

[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2013-07-28 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-4515:
---

Environment: 
hive-0.10.0, hive-0.11.0
hbase-0.94.7, hbase-0.94.6.1
zookeeper-3.4.3
hadoop-1.0.4

centos-5.7

  was:
hive-0.10.0
hbase-0.94.7
zookeeper-3.4.3
hadoop-1.0.4

centos-5.7


 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 sucessfully:
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work

[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2013-07-28 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-4515:
---

Affects Version/s: 0.11.0

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0, 0.11.0
 Environment: hive-0.10.0, hive-0.11.0
 hbase-0.94.7, hbase-0.94.6.1
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 sucessfully:
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache

[jira] [Created] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2013-05-07 Thread Yanhui Ma (JIRA)
Yanhui Ma created HIVE-4515:
---

 Summary: select count(*) from table query on hive-0.10.0, 
hbase-0.94.7 integration throws exceptions
 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0
 Environment: hive-0.10.0
hbase-0.94.7
zookeeper-3.4.3
hadoop-1.0.4

centos-5.7
Reporter: Yanhui Ma


After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
sucessfully:
create table
insert overwrite table
select * from table

However, when execute select count(*) from table, throws exception:
hive select count(*) from test; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Starting Job = job_201305061042_0028, Tracking URL = 
http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
-kill job_201305061042_0028
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201305061042_0028 with errors
Error during job, obtaining debugging information...
Job Tracking URL: 
http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
Examining task ID: task_201305061042_0028_m_02 (and more) from job 
job_201305061042_0028

Task with the most failures(4): 
-
Task ID:
  task_201305061042_0028_m_00

URL:
  
http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
-
Diagnostic Messages for this Task:
java.lang.NegativeArraySizeException: -1
at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
at 
org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
at 
org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

==
The log of tasktracker:

stderr logs

13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/org
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/hive-exec-log4j.properties

[jira] [Created] (HIVE-4520) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2013-05-07 Thread Yanhui Ma (JIRA)
Yanhui Ma created HIVE-4520:
---

 Summary: select count(*) from table query on hive-0.10.0, 
hbase-0.94.7 integration throws exceptions
 Key: HIVE-4520
 URL: https://issues.apache.org/jira/browse/HIVE-4520
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.11.0
 Environment: hive-0.11.0
hbase-0.94.6.1
zookeeper-3.4.3
hadoop-1.0.4
centos-5.7
Reporter: Yanhui Ma
Priority: Critical


After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
sucessfully:
create table
insert overwrite table
select * from table

However, when execute select count(*) from table, throws exception:
hive select count(*) from test; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
Starting Job = job_201305061042_0028, Tracking URL = 
http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
-kill job_201305061042_0028
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201305061042_0028 with errors
Error during job, obtaining debugging information...
Job Tracking URL: 
http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
Examining task ID: task_201305061042_0028_m_02 (and more) from job 
job_201305061042_0028

Task with the most failures(4): 
-
Task ID:
  task_201305061042_0028_m_00

URL:
  
http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
-
Diagnostic Messages for this Task:
java.lang.NegativeArraySizeException: -1
at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
at 
org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
at 
org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

==
The log of tasktracker:

stderr logs

13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org
 - 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/org
13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
symlink: 
/tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028

[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

2013-05-07 Thread Yanhui Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanhui Ma updated HIVE-4515:


Priority: Critical  (was: Major)

 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration 
 throws exceptions
 -

 Key: HIVE-4515
 URL: https://issues.apache.org/jira/browse/HIVE-4515
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.10.0
 Environment: hive-0.10.0
 hbase-0.94.7
 zookeeper-3.4.3
 hadoop-1.0.4
 centos-5.7
Reporter: Yanhui Ma
Priority: Critical

 After integration hive-0.10.0+hbase-0.94.7, these commands could be executed 
 sucessfully:
 create table
 insert overwrite table
 select * from table
 However, when execute select count(*) from table, throws exception:
 hive select count(*) from test; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201305061042_0028, Tracking URL = 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  
 -kill job_201305061042_0028
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
 1
 2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
 2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201305061042_0028 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
 Examining task ID: task_201305061042_0028_m_02 (and more) from job 
 job_201305061042_0028
 Task with the most failures(4): 
 -
 Task ID:
   task_201305061042_0028_m_00
 URL:
   
 http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.NegativeArraySizeException: -1
   at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
   at 
 org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
   at 
 org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 ==
 The log of tasktracker:
 stderr logs
 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution
  - 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating 
 symlink: 
 /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org
  - 
 /tmp