[jira] [Commented] (HIVE-8368) compactor is improperly writing delete records in base file

Eugene Koifman (JIRA) Thu, 09 Oct 2014 11:02:43 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165443#comment-14165443
 ]


Eugene Koifman commented on HIVE-8368:
--------------------------------------

Adding for completeness
Before the patch:
{noformat}
hive> explain delete from concur_orc_tab where age >= 20 and age < 30;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: concur_orc_tab
            Filter Operator
              predicate: ((age >= 20) and (age < 30)) (type: boolean)
              Select Operator
                expressions: ROW__ID (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
                outputColumnNames: _col0
                Reduce Output Operator
                  key expressions: _col0 (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
                  sort order: -
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
          outputColumnNames: _col0
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              sort order: 
              Map-reduce partition columns: UDFToInteger(_col0) (type: int)
              value expressions: _col0 (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
      Reduce Operator Tree:
        Extract
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                name: default.concur_orc_tab

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.concur_orc_tab

  Stage: Stage-3
    Stats-Aggr Operator

Time taken: 0.697 seconds, Fetched: 62 row(s)

{noformat}
After the patch:
{noformat}
hive> explain delete from concur_orc_tab where age >= 20 and age < 30;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: concur_orc_tab
            Filter Operator
              predicate: ((age >= 20) and (age < 30)) (type: boolean)
              Select Operator
                expressions: ROW__ID (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
                outputColumnNames: _col0
                Reduce Output Operator
                  key expressions: _col0 (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
                  sort order: +
                  Map-reduce partition columns: UDFToInteger(_col0) (type: int)
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: 
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
          outputColumnNames: _col0
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                name: default.concur_orc_tab

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.concur_orc_tab

  Stage: Stage-2
    Stats-Aggr Operator

Time taken: 0.538 seconds, Fetched: 45 row(s)
{noformat}

> compactor is improperly writing delete records in base file
> -----------------------------------------------------------
>
>                 Key: HIVE-8368
>                 URL: https://issues.apache.org/jira/browse/HIVE-8368
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 0.14.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8368.2.patch, HIVE-8368.patch
>
>
> When the compactor reads records from the base and deltas, it is not properly 
> dropping delete records.  This leads to oversized base files, and possibly to 
> wrong query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8368) compactor is improperly writing delete records in base file

Reply via email to