[ https://issues.apache.org/jira/browse/IMPALA-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766430#comment-16766430 ]
ASF subversion and git services commented on IMPALA-5861: --------------------------------------------------------- Commit a154b2d6e775a508df4fd2c8d51a18d5c1d1f933 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a154b2d ] IMPALA-5861: fix RowsRead for zero-slot table scan Testing: Added regression test based on JIRA and a targeted test for all HDFS file formats. Change-Id: I7a927c6a4f0b8055608cb7a5e2b550a1610cef89 Reviewed-on: http://gerrit.cloudera.org:8080/12332 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts > ------------------------------------------------------------------------------ > > Key: IMPALA-5861 > URL: https://issues.apache.org/jira/browse/IMPALA-5861 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.10.0 > Reporter: Dan Hecht > Assignee: Tim Armstrong > Priority: Major > > It appears that this code is double counting into {{rows_read_counter()}}, > since {{row_group_rows_read_}} is already accumulating: > {code:title=HdfsParquetScanner::GetNextInternal()} > } else if (scan_node_->IsZeroSlotTableScan()) { > // There are no materialized slots and we are not optimizing count(*), > e.g. > // "select 1 from alltypes". We can serve this query from just the file > metadata. > // We don't need to read the column data. > if (row_group_rows_read_ == file_metadata_.num_rows) { > eos_ = true; > return Status::OK(); > } > assemble_rows_timer_.Start(); > DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows); > int64_t rows_remaining = file_metadata_.num_rows - row_group_rows_read_; > int max_tuples = min<int64_t>(row_batch->capacity(), rows_remaining); > TupleRow* current_row = row_batch->GetRow(row_batch->AddRow()); > int num_to_commit = WriteTemplateTuples(current_row, max_tuples); > Status status = CommitRows(row_batch, num_to_commit); > assemble_rows_timer_.Stop(); > RETURN_IF_ERROR(status); > row_group_rows_read_ += num_to_commit; > COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_); > <====== > return Status::OK(); > } > {code} > Repro in impala-shell: > {noformat} > set batch_size=16; set num_nodes=1; select count(*) from > functional.alltypesmixedformat; profile > .... > - RowsRead: 3.94K (3936) > - RowsReturned: 1.20K (1200) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org