jiyis opened a new issue, #11329:
URL: https://github.com/apache/iceberg/issues/11329
### Query engine
Iceberg API 1.6.1
### Question
### Schema
```java
TableIdentifier tableIdentifier = TableIdentifier.of("default",
"example_table");
Schema schema = new Schema(
Types.NestedField.optional(1, "event_id", Types.StringType.get()),
Types.NestedField.optional(2, "username", Types.StringType.get()),
Types.NestedField.optional(3, "userid", Types.IntegerType.get()),
Types.NestedField.optional(4, "api_version", Types.StringType.get()),
Types.NestedField.optional(5, "command", Types.StringType.get())
);
PartitionSpec spec = PartitionSpec.builderFor(schema)
.bucket("event_id", 10)
.build();
```
### Insert data
```java
TableIdentifier name = TableIdentifier.of("default", "example_table");
Table table = catalog.loadTable(name);
Schema schema = table.schema();
GenericAppenderFactory appenderFactory = new GenericAppenderFactory(schema);
int partitionId = 1, taskId = 1;
OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table,
partitionId, taskId)
.format(FileFormat.AVRO).build();
final PartitionKey partitionKey = new PartitionKey(table.spec(),
table.spec().schema());
PartitionedFanoutWriter<Record> partitionedFanoutWriter = new
PartitionedFanoutWriter<>(
table.spec(),
FileFormat.AVRO, appenderFactory, outputFileFactory,
table.io(), 10 * 1024 * 1024) {
@Override
protected PartitionKey partition(Record record) {
partitionKey.partition(record);
return partitionKey;
}
};
GenericRecord genericRecord = GenericRecord.create(table.schema());
List<String> levels = Arrays.asList("info", "debug", "error", "warn");
Random random = new Random();
for (int i = 0; i < 10000; i++) {
GenericRecord record = genericRecord.copy();
String eventId = UUID.randomUUID().toString();
record.setField("event_id", eventId);
record.setField("username", levels.get(random.nextInt(levels.size())));
record.setField("userid", random.nextInt(10000000));
record.setField("api_version", "1.0");
record.setField("command", eventId);
partitionedFanoutWriter.write(record);
}
AppendFiles appendFiles = table.newAppend();
Arrays.stream(partitionedFanoutWriter.dataFiles()).forEach(appendFiles::appendFile);
Snapshot newSnapshot = appendFiles.apply();
appendFiles.commit();
```
### Query
I'd like to filter data by bucket partition,but it seems that no data is
being retrieved. I have confirmed that the data exists, and I can retrieve it
using other fields.
```java
// empty result
CloseableIterable<Record> result = IcebergGenerics.read(tbl)
.where(Expressions.equal(
"event_id"
, "9c83f47c-9a07-4a6b-949c-3bedc31852fe"))
.build();
// empty result
CloseableIterable<Record> result = IcebergGenerics.read(tbl)
.where(Expressions.equal(Expressions.bucket("event_id", 10), 1))
.build();
// has result Record(9c83f47c-9a07-4a6b-949c-3bedc31852fe, info, 2377306,
1.0, 9c83f47c-9a07-4a6b-949c-3bedc31852fe)
CloseableIterable<Record> result = IcebergGenerics.read(tbl)
.where(Expressions.equal(
"command"
, "9c83f47c-9a07-4a6b-949c-3bedc31852fe"))
.build();
```
How should I query data by partition(bucket) field?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]