Alexander Petrossian (PAF) created ORC-1554:
-----------------------------------------------

             Summary: Filtering by columns, nested in LISTs
                 Key: ORC-1554
                 URL: https://issues.apache.org/jira/browse/ORC-1554
             Project: ORC
          Issue Type: Improvement
    Affects Versions: 1.9.2
            Reporter: Alexander Petrossian (PAF)


Currently searchArgument supports fields inside arrays, and that works.
We use even very nested columns and it works fine, row groups get properly 
included:
{noformat}
data.request.eventItem._elem.UsageEventItem.usage.CustomerFacingServiceUsage.relatedParty._elem.resource._elem.value
{noformat}

Alas, [allowSARGToFilter mechanism|ORC-743] does not handle values inside 
arrays.
Two show-stoppers here.

Small
https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/OrcFilterContext.java#L80:
 
{code:java}
  static boolean isNull(ColumnVector[] vectorBranch, int idx) throws 
IllegalArgumentException {
    for (ColumnVector v : vectorBranch) {
      if (v instanceof ListColumnVector || v instanceof MapColumnVector) {
        throw new IllegalArgumentException(String.format(
          "Found vector: %s in branch. List and Map vectors are not supported 
in isNull "
          + "determination", v));
      }
{code}

Big
https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/impl/filter/LeafFilter.java#L70
{code:java}
    ColumnVector[] branch = fc.findColumnVector(colName);
    ColumnVector v = branch[branch.length - 1];
...
        if (!OrcFilterContext.isNull(branch, rowIdx) &&
            allowWithNegation(v, rowIdx)) {
{code}

Here code is indexing *v* with *rowIdx*, which is totally wrong if v is nested 
into some LIST (or MAP).

Row index iterates records.
But v contains column values, which are potentially fewer or more than table 
records.
Their indexing nature is different.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to