Alexander Petrossian (PAF) created ORC-1554: -----------------------------------------------
Summary: Filtering by columns, nested in LISTs Key: ORC-1554 URL: https://issues.apache.org/jira/browse/ORC-1554 Project: ORC Issue Type: Improvement Affects Versions: 1.9.2 Reporter: Alexander Petrossian (PAF) Currently searchArgument supports fields inside arrays, and that works. We use even very nested columns and it works fine, row groups get properly included: {noformat} data.request.eventItem._elem.UsageEventItem.usage.CustomerFacingServiceUsage.relatedParty._elem.resource._elem.value {noformat} Alas, [allowSARGToFilter mechanism|ORC-743] does not handle values inside arrays. Two show-stoppers here. Small https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/OrcFilterContext.java#L80: {code:java} static boolean isNull(ColumnVector[] vectorBranch, int idx) throws IllegalArgumentException { for (ColumnVector v : vectorBranch) { if (v instanceof ListColumnVector || v instanceof MapColumnVector) { throw new IllegalArgumentException(String.format( "Found vector: %s in branch. List and Map vectors are not supported in isNull " + "determination", v)); } {code} Big https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/impl/filter/LeafFilter.java#L70 {code:java} ColumnVector[] branch = fc.findColumnVector(colName); ColumnVector v = branch[branch.length - 1]; ... if (!OrcFilterContext.isNull(branch, rowIdx) && allowWithNegation(v, rowIdx)) { {code} Here code is indexing *v* with *rowIdx*, which is totally wrong if v is nested into some LIST (or MAP). Row index iterates records. But v contains column values, which are potentially fewer or more than table records. Their indexing nature is different. -- This message was sent by Atlassian Jira (v8.20.10#820010)