[I] [Bug] orc datafile push down error [paimon]

via GitHub Tue, 15 Apr 2025 19:51:49 -0700


Pandas886 opened a new issue, #5476:
URL: https://github.com/apache/paimon/issues/5476


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   1.2
   
   ### Compute Engine
   
   java api
   
   ### Minimal reproduce step
   
   schema
   ```
   {
     "version" : 2,
     "id" : 0,
     "fields" : [ {
       "id" : 0,
       "name" : "user_id",
       "type" : "BIGINT NOT NULL"
     }, {
       "id" : 1,
       "name" : "item_id",
       "type" : "BIGINT"
     }, {
       "id" : 2,
       "name" : "behavior",
       "type" : "STRING"
     }, {
       "id" : 3,
       "name" : "dt",
       "type" : "STRING"
     }, {
       "id" : 4,
       "name" : "hh",
       "type" : "STRING"
     } ],
     "highestFieldId" : 4,
     "partitionKeys" : [ ],
     "primaryKeys" : [ "user_id" ],
     "options" : {
       "bucket" : "4",
       "write-only" : "true",
       "manifest.format" : "orc",
       "snapshot.time-retained" : 
"5min","snapshot.num-retained.min":"5","snapshot.num-retained.max":"20",
       "orc.sarg.to.filter" : "true","orc.filter.use.selected" : "true"
   
     },
     "comment" : "",
     "timeMillis" : 1744621631633
   }
   ```
   error
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: Field f0 not 
found in struct<user_id:bigint,item_id:bigint>
        at 
org.apache.paimon.shade.org.apache.orc.impl.ParserUtils.findColumn(ParserUtils.java:398)
        at 
org.apache.paimon.shade.org.apache.orc.impl.ParserUtils.findColumn(ParserUtils.java:337)
        at 
org.apache.paimon.shade.org.apache.orc.impl.ParserUtils.findSubtype(ParserUtils.java:285)
        at 
org.apache.paimon.shade.org.apache.orc.TypeDescription.findSubtype(TypeDescription.java:826)
        at 
org.apache.paimon.shade.org.apache.orc.impl.filter.leaf.LeafFilterFactory.createLeafVectorFilter(LeafFilterFactory.java:231)
        at 
org.apache.paimon.shade.org.apache.orc.impl.filter.FilterFactory.createSArgFilter(FilterFactory.java:148)
        at 
org.apache.paimon.shade.org.apache.orc.impl.filter.FilterFactory.createBatchFilter(FilterFactory.java:88)
        at 
org.apache.paimon.shade.org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:335)
        at 
org.apache.paimon.format.orc.OrcReaderFactory$1.rows(OrcReaderFactory.java:365)
        at 
org.apache.paimon.format.orc.OrcReaderFactory.createRecordReader(OrcReaderFactory.java:304)
        at 
org.apache.paimon.format.orc.OrcReaderFactory.createReader(OrcReaderFactory.java:107)
        at 
org.apache.paimon.format.orc.OrcReaderFactory.createReader(OrcReaderFactory.java:64)
        at 
org.apache.paimon.io.DataFileRecordReader.<init>(DataFileRecordReader.java:53)
        at 
org.apache.paimon.operation.RawFileSplitRead.createFileReader(RawFileSplitRead.java:242)
        at 
org.apache.paimon.operation.RawFileSplitRead.lambda$createReader$3(RawFileSplitRead.java:181)
        at 
org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:53)
        at 
org.apache.paimon.operation.RawFileSplitRead.createReader(RawFileSplitRead.java:189)
        at 
org.apache.paimon.operation.RawFileSplitRead.createReader(RawFileSplitRead.java:141)
        at 
org.apache.paimon.table.source.KeyValueTableRead.reader(KeyValueTableRead.java:123)
        at 
org.apache.paimon.table.source.AbstractDataTableRead.createReader(AbstractDataTableRead.java:92)
        at 
org.apache.paimon.table.source.TableRead.lambda$createReader$0(TableRead.java:50)
        at 
org.apache.paimon.mergetree.compact.ConcatRecordReader.create(ConcatRecordReader.java:53)
        at 
org.apache.paimon.table.source.TableRead.createReader(TableRead.java:52)
        at org.example.BatchReadTable.main(BatchReadTable.java:68)
   
   ```
   
   code
   ```
    CatalogContext context = CatalogContext.create(warehousePath);
           Catalog catalog = CatalogFactory.createCatalog(context);
           Table table = catalog.getTable(IDENTIFIER);
           PredicateBuilder builder =
               new PredicateBuilder(RowType.of(DataTypes.BIGINT(), 
DataTypes.BIGINT()));
           Predicate notNull = builder.isNotNull(0);
           Predicate greaterOrEqual = builder.equal(0, 99L);
   
           int[] projection = new int[] {0, 1};
   
           ReadBuilder readBuilder =
               table.newReadBuilder()
                   .withProjection(projection)
                   .withFilter(Lists.newArrayList( greaterOrEqual));
   
           // 2. Plan splits in 'Coordinator' (or named 'Driver')
   
           List<Split> splits = readBuilder.newScan().plan().splits();
   
           // 3. Distribute these splits to different tasks
   
           // 4. Read a split in task
           TableRead read = readBuilder.newRead();
           RecordReader<InternalRow> reader = read.createReader(splits);
           List<DataType> fieldTypes = table.rowType().getFieldTypes();
           reader.forEachRemaining(new Consumer<InternalRow>() {
               @Override
               public void accept(InternalRow internalRow) {
                   List<Object> row = new ArrayList<>();
                   for (int i = 0; i < internalRow.getFieldCount(); i++) {
                       DataType dataType = fieldTypes.get(i);
                       Object value = null;
                       boolean nullAt = internalRow.isNullAt(i);
                       if (nullAt) {
                           value = null;
                       }else {
                           if 
(dataType.getTypeRoot().equals(DataTypeRoot.INTEGER)) {
                               value= internalRow.getInt(i);
                           } else if 
(dataType.getTypeRoot().equals(DataTypeRoot.VARCHAR)) {
                               value = internalRow.getString(i).toString();
                           }else if 
(dataType.getTypeRoot().equals(DataTypeRoot.BIGINT)) {
                               value = internalRow.getLong(i);
                           }else if 
(dataType.getTypeRoot().equals(DataTypeRoot.TIMESTAMP_WITHOUT_TIME_ZONE)) {
                               value = internalRow.getTimestamp(i, 
0).toLocalDateTime();
                           }else if 
(dataType.getTypeRoot().equals(DataTypeRoot.DATE)) {
                               value = internalRow.getTimestamp(i, 0);
                           }
                       }
   
                       row.add(value);
   
   
                   }
                   Gson x = new Gson();
                   System.out.println(x.toJson(row));
               }
           });
   ```
   
   
   ### What doesn't meet your expectations?
   
   push down to orc filedname is correct
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] orc datafile push down error [paimon]

Reply via email to