[ https://issues.apache.org/jira/browse/ARROW-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508660#comment-17508660 ]
Vibhatha Lakmal Abeykoon commented on ARROW-15658: -------------------------------------------------- [~westonpace] [~lidavidm] I made a draft PR with a simple fix. But I am not entirely sure if I have convered all the basis as far as `FieldPath` usage or nested refs taken into account. Would appreciate your feedback on this. > [C++] Parquet pushdown filtering fails if the filter expression uses numeric > field references > --------------------------------------------------------------------------------------------- > > Key: ARROW-15658 > URL: https://issues.apache.org/jira/browse/ARROW-15658 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 7.0.0 > Reporter: Weston Pace > Assignee: Vibhatha Lakmal Abeykoon > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We can refer to a field by name (e.g. {{compute::field_ref("foo")}}) or by > index (e.g. {{compute::field_ref(0)}}). > The latter is not supported when doing parquet projection. A test can > demonstrating this can be found here: > https://github.com/westonpace/arrow/commit/2f92ed0764cf2e1388dac053aeb4e1b923c6872e > Copied here for posterity (this would go in the dataset fixture mixin): > {code} > void TestScanWithFieldPathFilter() { > auto i32 = field("i32", int32()); > auto i64 = field("i64", int64()); > this->opts_->dataset_schema = schema({i32, i64}); > this->Project({"i64"}); > // This should be the column i32 > this->SetFilter(equal(field_ref(0), literal(0))); > auto expected_schema = schema({i64}); > auto reader = this->GetRecordBatchReader(opts_->dataset_schema); > auto source = this->GetFileSource(reader.get()); > auto fragment = this->MakeFragment(*source); > int64_t row_count = 0; > for (auto maybe_batch : PhysicalBatches(fragment)) { > ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch); > row_count += batch->num_rows(); > AssertSchemaEqual(*batch->schema(), *expected_schema, > /*check_metadata=*/false); > } > ASSERT_EQ(row_count, expected_rows()); > } > {code} > I would expect this to work. Instead I get the error: > {noformat} > /home/pace/dev/arrow/cpp/src/arrow/dataset/test_util.h:840: Failure > Failed > '_error_or_value83.status()' failed with NotImplemented: Inferring column > projection from FieldRef FieldRef.FieldPath(0) > /home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:262 > ResolveOneFieldRef(manifest, ref, field_lookup, duplicate_fields, > &columns_selection) > /home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:437 > InferColumnProjection(*reader, *options) > /home/pace/dev/arrow/cpp/src/arrow/util/iterator.h:152 value_.status() > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)