[jira] [Created] (HIVE-27099) Iceberg: select count(*) from table queries all data
Rajesh Balamohan created HIVE-27099: --- Summary: Iceberg: select count(*) from table queries all data Key: HIVE-27099 URL: https://issues.apache.org/jira/browse/HIVE-27099 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan select count is scanning all data. Though it has complete basic stats, it launched tez job which wasn't needed. Second issue is, it ended up scanning ENTIRE 148 GB dataset which is completely not required. It should have got the data from parq files itself. Ideal situation is getting entire records from manifest itself. Data is stored in parquet format in external tables. This may be broken for parquet, as for ORC it is able to read less data (footer info). 1. Consider fixing count( * ) for parq 2. Check if it is possible to read stats from iceberg manifests after #1. {noformat} explain select count(*) from store_sales; Explain STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez DagId: hive_20230223031934_2abeb3b9-8c18-4ff7-a8f9-df7368010189:5 Edges: Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) DagName: hive_20230223031934_2abeb3b9-8c18-4ff7-a8f9-df7368010189:5 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 2879966589 Data size: 195666988943 Basic stats: COMPLETE Column stats: COMPLETE Select Operator Statistics: Num rows: 2879966589 Data size: 195666988943 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: count() minReductionHashAggr: 0.5 mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator null sort order: sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: bigint) Execution mode: vectorized Reducer 2 Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink 53 rows selected (1.454 seconds) 0: jdbc:hive2://ve0:218> select count(*) from store_sales; INFO : Query ID = hive_20230223031940_9ff5d61d-1fe2-4476-a561-7820e4a3a5f8 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20230223031940_9ff5d61d-1fe2-4476-a561-7820e4a3a5f8 INFO : Session is already open INFO : Dag name: select count(*) from store_sales (Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1676286357243_0061) -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED76776700 0 0 Reducer 2 .. container SUCCEEDED 1 100 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 54.94 s -- INFO : Status: DAG finished successfully in 54.85 seconds INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATIONDURATION INFO : -- INFO : C
Re: select count(*) from table;
Hi All, Can custom storage handlers get information for queries like count, max, min etc. from hive directly so that for each of such queries RecordReader need not fetch all the records? Regards, Amey On Tue, Mar 22, 2016 at 1:46 PM, Amey Barve <ameybarv...@gmail.com> wrote: > Thanks Nitin, Mich, > > if its just plain vanilla text file format, it needs to run a job to get > the count so the longest of all > --> Hive must be translating some operator like fetch (for count) into a > map-reduce job and getting the result? > Can a custom storage handler get information about the operator/s for > count(*) and then use it to retrieve the results. > > I want to know whether custom storage handler can get information about > operators that hive constructs for queries like count, max, min etc. so > that storage handler can map these to internal storage functions? > > Regards, > Amey > > On Tue, Mar 22, 2016 at 1:32 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> ORC file has the following stats levels for storage indexes >> >> >>1. ORC File itself >>2. Multiple stripes (chunks) within the ORC file >>3. Multiple row groups (row batches) within each stripe >> >> Assuming that the underlying table has stats updated, count will be >> stored for each column >> >> So when we do something like below: >> >> select count(1) from orctest >> >> you can see stats collected if you do >> >> show create table orctest; >> >> TBLPROPERTIES ( | >> | 'COLUMN_STATS_ACCURATE'='true',| >> | 'numFiles'='31', | >> | *'numRows'='25'*,| >> >> >> File statistics, Stripe statistics and row group statistics are kept. So >> ORC table will rely on those if needed >> >> >> HTH >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 22 March 2016 at 07:14, Amey Barve <ameybarv...@gmail.com> wrote: >> >>> select count(*) from table; >>> >>> How does hive evaluate count(*) on a table? >>> >>> Does it return count by actually querying table, or directly return >>> count by consulting some statistics locally. >>> >>> For Hive's Text format it takes few seconds while Hive's Orc format >>> takes fraction of seconds. >>> >>> Regards, >>> Amey >>> >> >> >
Re: select count(*) from table;
Thanks Nitin, Mich, if its just plain vanilla text file format, it needs to run a job to get the count so the longest of all --> Hive must be translating some operator like fetch (for count) into a map-reduce job and getting the result? Can a custom storage handler get information about the operator/s for count(*) and then use it to retrieve the results. I want to know whether custom storage handler can get information about operators that hive constructs for queries like count, max, min etc. so that storage handler can map these to internal storage functions? Regards, Amey On Tue, Mar 22, 2016 at 1:32 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > ORC file has the following stats levels for storage indexes > > >1. ORC File itself >2. Multiple stripes (chunks) within the ORC file >3. Multiple row groups (row batches) within each stripe > > Assuming that the underlying table has stats updated, count will be stored > for each column > > So when we do something like below: > > select count(1) from orctest > > you can see stats collected if you do > > show create table orctest; > > TBLPROPERTIES ( | > | 'COLUMN_STATS_ACCURATE'='true',| > | 'numFiles'='31', | > | *'numRows'='25'*,| > > > File statistics, Stripe statistics and row group statistics are kept. So > ORC table will rely on those if needed > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 22 March 2016 at 07:14, Amey Barve <ameybarv...@gmail.com> wrote: > >> select count(*) from table; >> >> How does hive evaluate count(*) on a table? >> >> Does it return count by actually querying table, or directly return count >> by consulting some statistics locally. >> >> For Hive's Text format it takes few seconds while Hive's Orc format takes >> fraction of seconds. >> >> Regards, >> Amey >> > >
Re: select count(*) from table;
If you have enabled performance optimization by enabling statistics it will come from there if the underlying file format supports infile statistics (like ORC), it will come from there if its just plain vanilla text file format, it needs to run a job to get the count so the longest of all On Tue, Mar 22, 2016 at 12:44 PM, Amey Barve <ameybarv...@gmail.com> wrote: > select count(*) from table; > > How does hive evaluate count(*) on a table? > > Does it return count by actually querying table, or directly return count > by consulting some statistics locally. > > For Hive's Text format it takes few seconds while Hive's Orc format takes > fraction of seconds. > > Regards, > Amey > -- Nitin Pawar
select count(*) from table;
select count(*) from table; How does hive evaluate count(*) on a table? Does it return count by actually querying table, or directly return count by consulting some statistics locally. For Hive's Text format it takes few seconds while Hive's Orc format takes fraction of seconds. Regards, Amey
[jira] [Resolved] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4515. Resolution: Invalid Resolving this as invalid, per [~swarnim] previous comment. Feel free to reopen if you are able to repro this on trunk. select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Assignee: Swarnim Kulkarni Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed successfully: {noformat} create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028
[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914655#comment-13914655 ] sridhar commented on HIVE-4515: --- Hi, We are using Hbase Avro tables and would like to access to data using Hive. So i modified the code in HBaseStorageHandler, LazyHBaseRow, LazyHBaseCellMap to provide support for Avro schema parsing. All works perfectly and able to see the data with basic query like; select * But when i query the hive table with any filter or select only some columns i see the same error that was reported above. Also the exact same error is noticed when we access the data in hive using the original HBaseStorage Handler. So do not think this is something introduced by my changes to code. So wanted check if there is any work around or fix available for this. We are using CDH 4.4. Any suggestions? select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Assignee: Swarnim Kulkarni Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed successfully: {noformat} create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0
[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914673#comment-13914673 ] Swarnim Kulkarni commented on HIVE-4515: I think this is more specific to CDH hive release, probably due to some incompatible dependencies. I wasn't able to reproduce this with apache stack. As a side note for your avro support of HBase, you might as well look into the patch on HIVE-6147. It attempts to solve the sam problem. select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Assignee: Swarnim Kulkarni Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed successfully: {noformat} create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop
[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914738#comment-13914738 ] sridhar commented on HIVE-4515: --- Thank you Swarnim. I will follow up with the Cloudera and see if we have any resolution. select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Assignee: Swarnim Kulkarni Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed successfully: {noformat} create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution
[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4515: --- Description: After integration hive-0.10.0+hbase-0.94.7, these commands could be executed successfully: {noformat} create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/org 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/hive-exec-log4j.properties - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/hive-exec-log4j.properties 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache
[jira] [Assigned] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-4515: -- Assignee: Swarnim Kulkarni select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Assignee: Swarnim Kulkarni Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop
[jira] [Commented] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800541#comment-13800541 ] Yash Sharma commented on HIVE-4515: --- Is there any time line decided for this issue, or any other workaround for it. I am stuck with it for a while. select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO
[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-4515: --- Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 was: hive-0.10.0 hbase-0.94.7 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work
[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-4515: --- Affects Version/s: 0.11.0 select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0, 0.11.0 Environment: hive-0.10.0, hive-0.11.0 hbase-0.94.7, hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache
[jira] [Created] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
Yanhui Ma created HIVE-4515: --- Summary: select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0 Environment: hive-0.10.0 hbase-0.94.7 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/org 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/hive-exec-log4j.properties
[jira] [Created] (HIVE-4520) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
Yanhui Ma created HIVE-4520: --- Summary: select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions Key: HIVE-4520 URL: https://issues.apache.org/jira/browse/HIVE-4520 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.11.0 Environment: hive-0.11.0 hbase-0.94.6.1 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/org 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028
[jira] [Updated] (HIVE-4515) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions
[ https://issues.apache.org/jira/browse/HIVE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanhui Ma updated HIVE-4515: Priority: Critical (was: Major) select count(*) from table query on hive-0.10.0, hbase-0.94.7 integration throws exceptions - Key: HIVE-4515 URL: https://issues.apache.org/jira/browse/HIVE-4515 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0 Environment: hive-0.10.0 hbase-0.94.7 zookeeper-3.4.3 hadoop-1.0.4 centos-5.7 Reporter: Yanhui Ma Priority: Critical After integration hive-0.10.0+hbase-0.94.7, these commands could be executed sucessfully: create table insert overwrite table select * from table However, when execute select count(*) from table, throws exception: hive select count(*) from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305061042_0028 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-05-07 18:41:42,649 Stage-1 map = 0%, reduce = 0% 2013-05-07 18:42:14,789 Stage-1 map = 100%, reduce = 100% Ended Job = job_201305061042_0028 with errors Error during job, obtaining debugging information... Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028 Examining task ID: task_201305061042_0028_m_02 (and more) from job job_201305061042_0028 Task with the most failures(4): - Task ID: task_201305061042_0028_m_00 URL: http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028tipid=task_201305061042_0028_m_00 - Diagnostic Messages for this Task: java.lang.NegativeArraySizeException: -1 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148) at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133) at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec == The log of tasktracker: stderr logs 13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/107328478296390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution - /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_00_0/work/javolution 13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org - /tmp