[ 
https://issues.apache.org/jira/browse/KUDU-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869715#comment-16869715
 ] 

Todd Lipcon commented on KUDU-2854:
-----------------------------------

bq. But currently we don't have a quick way to judge if there is any delta for 
the whole column(cfile) or the whole data block(part of cfile). 

I think we could make some changes in DeltaTracker::WrapIterator and 
DeltaTracker::NewDeltaIterator so that, if there are no relevant DeltaFiles, 
and the only relevant DMS is empty, we could avoid wrapping the base iterator. 
This is the special case of "no deltas at all" which is a little different than 
"no deltas for a specific column". Still, that's a useful optimization (and 
common that we have no deltas). KUDU-2855 would also make this easier to 
implement.

bq.  Is there any way we can easily judge if a column contain deltas or if a 
data block contain deltas?

After DeltaIterator::PrepareBatch is called, we can use MayHaveDeltas() to see 
on a per-block basis whether there were any deltas. We can extend this method 
to be MayHaveDeltas(col_idx). Note that we already use this to determine 
whether we can push down predicates into the block decoder here in 
DeltaApplier::MaterializeColumn:

{code}
  // Data with updates cannot be evaluated at the decoder-level.
  if (delta_iter_->MayHaveDeltas()) {
    ctx->SetDecoderEvalNotSupported();
    RETURN_NOT_OK(base_iter_->MaterializeColumn(ctx));
    RETURN_NOT_OK(delta_iter_->ApplyUpdates(ctx->col_idx(), ctx->block(), 
*ctx->sel()));
  } else {
    RETURN_NOT_OK(base_iter_->MaterializeColumn(ctx));
  }
{code}

> Short circuit predicates on dictionary-coded columns
> ----------------------------------------------------
>
>                 Key: KUDU-2854
>                 URL: https://issues.apache.org/jira/browse/KUDU-2854
>             Project: Kudu
>          Issue Type: Improvement
>          Components: cfile, perf, tserver
>            Reporter: Todd Lipcon
>            Priority: Major
>
> In the common case that a column has no updates in a given DRS, if we see 
> that no entries in the dictionary match the predicate, we can short circuit 
> at a few layers:
> - we can store a flag in the cfile footer that indicates that all blocks are 
> dict-coded (ie there are no fallbacks). In that case, we can skip the whole 
> rowset
> - if a cfile is partially dict-encoded, we can skip any dict-coded blocks 
> without decoding the dictionary words



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to