Pandas886 opened a new issue, #5476: URL: https://github.com/apache/paimon/issues/5476
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version 1.2 ### Compute Engine java api ### Minimal reproduce step schema ``` { "version" : 2, "id" : 0, "fields" : [ { "id" : 0, "name" : "user_id", "type" : "BIGINT NOT NULL" }, { "id" : 1, "name" : "item_id", "type" : "BIGINT" }, { "id" : 2, "name" : "behavior", "type" : "STRING" }, { "id" : 3, "name" : "dt", "type" : "STRING" }, { "id" : 4, "name" : "hh", "type" : "STRING" } ], "highestFieldId" : 4, "partitionKeys" : [ ], "primaryKeys" : [ "user_id" ], "options" : { "bucket" : "4", "write-only" : "true", "manifest.format" : "orc", "snapshot.time-retained" : "5min","snapshot.num-retained.min":"5","snapshot.num-retained.max":"20", "orc.sarg.to.filter" : "true","orc.filter.use.selected" : "true" }, "comment" : "", "timeMillis" : 1744621631633 } ``` error ``` Exception in thread "main" java.lang.IllegalArgumentException: Field f0 not found in struct<user_id:bigint,item_id:bigint> at org.apache.paimon.shade.org.apache.orc.impl.ParserUtils.findColumn(ParserUtils.java:398) at org.apache.paimon.shade.org.apache.orc.impl.ParserUtils.findColumn(ParserUtils.java:337) at org.apache.paimon.shade.org.apache.orc.impl.ParserUtils.findSubtype(ParserUtils.java:285) at org.apache.paimon.shade.org.apache.orc.TypeDescription.findSubtype(TypeDescription.java:826) at org.apache.paimon.shade.org.apache.orc.impl.filter.leaf.LeafFilterFactory.createLeafVectorFilter(LeafFilterFactory.java:231) at org.apache.paimon.shade.org.apache.orc.impl.filter.FilterFactory.createSArgFilter(FilterFactory.java:148) at org.apache.paimon.shade.org.apache.orc.impl.filter.FilterFactory.createBatchFilter(FilterFactory.java:88) at org.apache.paimon.shade.org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:335) at org.apache.paimon.format.orc.OrcReaderFactory$1.rows(OrcReaderFactory.java:365) at org.apache.paimon.format.orc.OrcReaderFactory.createRecordReader(OrcReaderFactory.java:304) at org.apache.paimon.format.orc.OrcReaderFactory.createReader(OrcReaderFactory.java:107) at org.apache.paimon.format.orc.OrcReaderFactory.createReader(OrcReaderFactory.java:64) at org.apache.paimon.io.DataFileRecordReader.<init>(DataFileRecordReader.java:53) at org.apache.paimon.operation.RawFileSplitRead.createFileReader(RawFileSplitRead.java:242) at org.apache.paimon.operation.RawFileSplitRead.lambda$createReader$3(RawFileSplitRead.java:181) at org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:53) at org.apache.paimon.operation.RawFileSplitRead.createReader(RawFileSplitRead.java:189) at org.apache.paimon.operation.RawFileSplitRead.createReader(RawFileSplitRead.java:141) at org.apache.paimon.table.source.KeyValueTableRead.reader(KeyValueTableRead.java:123) at org.apache.paimon.table.source.AbstractDataTableRead.createReader(AbstractDataTableRead.java:92) at org.apache.paimon.table.source.TableRead.lambda$createReader$0(TableRead.java:50) at org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:53) at org.apache.paimon.table.source.TableRead.createReader(TableRead.java:52) at org.example.BatchReadTable.main(BatchReadTable.java:68) ``` code ``` CatalogContext context = CatalogContext.create(warehousePath); Catalog catalog = CatalogFactory.createCatalog(context); Table table = catalog.getTable(IDENTIFIER); PredicateBuilder builder = new PredicateBuilder(RowType.of(DataTypes.BIGINT(), DataTypes.BIGINT())); Predicate notNull = builder.isNotNull(0); Predicate greaterOrEqual = builder.equal(0, 99L); int[] projection = new int[] {0, 1}; ReadBuilder readBuilder = table.newReadBuilder() .withProjection(projection) .withFilter(Lists.newArrayList( greaterOrEqual)); // 2. Plan splits in 'Coordinator' (or named 'Driver') List<Split> splits = readBuilder.newScan().plan().splits(); // 3. Distribute these splits to different tasks // 4. Read a split in task TableRead read = readBuilder.newRead(); RecordReader<InternalRow> reader = read.createReader(splits); List<DataType> fieldTypes = table.rowType().getFieldTypes(); reader.forEachRemaining(new Consumer<InternalRow>() { @Override public void accept(InternalRow internalRow) { List<Object> row = new ArrayList<>(); for (int i = 0; i < internalRow.getFieldCount(); i++) { DataType dataType = fieldTypes.get(i); Object value = null; boolean nullAt = internalRow.isNullAt(i); if (nullAt) { value = null; }else { if (dataType.getTypeRoot().equals(DataTypeRoot.INTEGER)) { value= internalRow.getInt(i); } else if (dataType.getTypeRoot().equals(DataTypeRoot.VARCHAR)) { value = internalRow.getString(i).toString(); }else if (dataType.getTypeRoot().equals(DataTypeRoot.BIGINT)) { value = internalRow.getLong(i); }else if (dataType.getTypeRoot().equals(DataTypeRoot.TIMESTAMP_WITHOUT_TIME_ZONE)) { value = internalRow.getTimestamp(i, 0).toLocalDateTime(); }else if (dataType.getTypeRoot().equals(DataTypeRoot.DATE)) { value = internalRow.getTimestamp(i, 0); } } row.add(value); } Gson x = new Gson(); System.out.println(x.toJson(row)); } }); ``` ### What doesn't meet your expectations? push down to orc filedname is correct ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
