[I] How to query table by partition filed? [iceberg]

via GitHub Wed, 16 Oct 2024 01:00:35 -0700


jiyis opened a new issue, #11329:
URL: https://github.com/apache/iceberg/issues/11329


   ### Query engine
   
   Iceberg API 1.6.1
   
   ### Question
   
   ### Schema
   ```java
   TableIdentifier tableIdentifier = TableIdentifier.of("default", 
"example_table");
   Schema schema = new Schema(
           Types.NestedField.optional(1, "event_id", Types.StringType.get()),
           Types.NestedField.optional(2, "username", Types.StringType.get()),
           Types.NestedField.optional(3, "userid", Types.IntegerType.get()),
           Types.NestedField.optional(4, "api_version", Types.StringType.get()),
           Types.NestedField.optional(5, "command", Types.StringType.get())
   );
   
   PartitionSpec spec = PartitionSpec.builderFor(schema)
           .bucket("event_id", 10)
           .build();
   ```
   ### Insert data
   ```java
   TableIdentifier name = TableIdentifier.of("default", "example_table");
   Table table = catalog.loadTable(name);
   Schema schema = table.schema();
   GenericAppenderFactory appenderFactory = new GenericAppenderFactory(schema);
   
   int partitionId = 1, taskId = 1;
   OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, 
partitionId, taskId)
           .format(FileFormat.AVRO).build();
   final PartitionKey partitionKey = new PartitionKey(table.spec(), 
table.spec().schema());
   PartitionedFanoutWriter<Record> partitionedFanoutWriter = new 
PartitionedFanoutWriter<>(
           table.spec(),
           FileFormat.AVRO, appenderFactory, outputFileFactory,
           table.io(), 10 * 1024 * 1024) {
       @Override
       protected PartitionKey partition(Record record) {
           partitionKey.partition(record);
           return partitionKey;
       }
   };
   
   GenericRecord genericRecord = GenericRecord.create(table.schema());
   List<String> levels = Arrays.asList("info", "debug", "error", "warn");
   Random random = new Random();
   for (int i = 0; i < 10000; i++) {
       GenericRecord record = genericRecord.copy();
       String eventId = UUID.randomUUID().toString();
       record.setField("event_id", eventId);
       record.setField("username", levels.get(random.nextInt(levels.size())));
       record.setField("userid", random.nextInt(10000000));
       record.setField("api_version", "1.0");
       record.setField("command", eventId);
       partitionedFanoutWriter.write(record);
   }
   
   AppendFiles appendFiles = table.newAppend();
   
Arrays.stream(partitionedFanoutWriter.dataFiles()).forEach(appendFiles::appendFile);
   Snapshot newSnapshot = appendFiles.apply();
   appendFiles.commit();
   ```
   ### Query
   I'd like to filter data by bucket partition,but it seems that no data is 
being retrieved. I have confirmed that the data exists, and I can retrieve it 
using other fields.
   ```java
    // empty result
     CloseableIterable<Record> result = IcebergGenerics.read(tbl)
            .where(Expressions.equal(
                     "event_id"
                     , "9c83f47c-9a07-4a6b-949c-3bedc31852fe"))
             .build();
   
    // empty result
     CloseableIterable<Record> result = IcebergGenerics.read(tbl)
            .where(Expressions.equal(Expressions.bucket("event_id", 10), 1))
             .build();
   
    // has result  Record(9c83f47c-9a07-4a6b-949c-3bedc31852fe, info, 2377306, 
1.0, 9c83f47c-9a07-4a6b-949c-3bedc31852fe)
     CloseableIterable<Record> result = IcebergGenerics.read(tbl)
            .where(Expressions.equal(
                     "command"
                     , "9c83f47c-9a07-4a6b-949c-3bedc31852fe"))
             .build();
   
   ```
   
   How should I query data by partition（bucket） field?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] How to query table by partition filed? [iceberg]

Reply via email to