[ https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165443#comment-14165443 ]
Eugene Koifman commented on HIVE-8368: -------------------------------------- Adding for completeness Before the patch: {noformat} hive> explain delete from concur_orc_tab where age >= 20 and age < 30; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: concur_orc_tab Filter Operator predicate: ((age >= 20) and (age < 30)) (type: boolean) Select Operator expressions: ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) outputColumnNames: _col0 Reduce Output Operator key expressions: _col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) sort order: - Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) outputColumnNames: _col0 File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-2 Map Reduce Map Operator Tree: TableScan Reduce Output Operator sort order: Map-reduce partition columns: UDFToInteger(_col0) (type: int) value expressions: _col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) Reduce Operator Tree: Extract File Output Operator compressed: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.concur_orc_tab Stage: Stage-0 Move Operator tables: replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.concur_orc_tab Stage: Stage-3 Stats-Aggr Operator Time taken: 0.697 seconds, Fetched: 62 row(s) {noformat} After the patch: {noformat} hive> explain delete from concur_orc_tab where age >= 20 and age < 30; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: concur_orc_tab Filter Operator predicate: ((age >= 20) and (age < 30)) (type: boolean) Select Operator expressions: ROW__ID (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) outputColumnNames: _col0 Reduce Output Operator key expressions: _col0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) sort order: + Map-reduce partition columns: UDFToInteger(_col0) (type: int) Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: struct<transactionid:bigint,bucketid:int,rowid:bigint>) outputColumnNames: _col0 File Output Operator compressed: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.concur_orc_tab Stage: Stage-0 Move Operator tables: replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.concur_orc_tab Stage: Stage-2 Stats-Aggr Operator Time taken: 0.538 seconds, Fetched: 45 row(s) {noformat} > compactor is improperly writing delete records in base file > ----------------------------------------------------------- > > Key: HIVE-8368 > URL: https://issues.apache.org/jira/browse/HIVE-8368 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 0.14.0 > Reporter: Alan Gates > Assignee: Alan Gates > Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-8368.2.patch, HIVE-8368.patch > > > When the compactor reads records from the base and deltas, it is not properly > dropping delete records. This leads to oversized base files, and possibly to > wrong query results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)