robinsinghstudios opened a new issue, #9653:
URL: https://github.com/apache/iceberg/issues/9653
### Query engine
Iceberg Java API 1.4.3
### Question
For context, I am new to Java and might be missing something simple but,
after being stuck on this issue for a long while, I decided to post my question
here.
I am using Iceberg 1.4.3 and Java 20.
While using Iceberg's PartitionedFanoutWriter, I am able to dynamically
generate the partitions and write the files successfully in the respective
partitions. However, when I try to read the data, the partition column values
are always null.
My code looks like this:
```
org.apache.iceberg.Table table;
PartitionSpec spec;
SortOrder srt;
if (catalog.tableExists(name)) {
table = catalog.loadTable(name);
}
else {
spec = PartitionSpec.builderFor(schema)
.identity("temp")
.build();
srt = SortOrder.builderFor(schema)
.asc(keyColumn)
.build();
Map<String, String> tblProps = new HashMap<>();
tblProps.put("write.parquet.compression-codec", "uncompressed");
tblProps.put("write.distribution-mode", "range");
tblProps.put("format-version", "2");
table = catalog.buildTable(name,
schema).withPartitionSpec(spec).withProperties(tblProps).withSortOrder(srt).create();
}
GenericAppenderFactory appenderFactory = new
GenericAppenderFactory(table.schema());
int partitionId = 1, taskId = 1;
OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table,
partitionId, taskId).format(FileFormat.PARQUET).build();
appenderFactory.setAll(table.properties());
final PartitionKey partitionKey = new PartitionKey(table.spec(),
table.spec().schema());
// partitionedFanoutWriter will auto partitioned record and create the
partitioned writer
PartitionedFanoutWriter<Record> partitionedFanoutWriter = new
PartitionedFanoutWriter<>(table.spec(), FileFormat.PARQUET, appenderFactory,
outputFileFactory,
table.io(), TARGET_FILE_SIZE_IN_BYTES) {
@Override
protected PartitionKey partition(Record record) {
partitionKey.partition(record);
return partitionKey;
}
};
GenericRecord genericRecord = GenericRecord.create(table.schema());
// FanoutDataWriter
value.forEach(val -> {
try {
GenericRecord record = genericRecord.copy();
val.toMap().forEach(record::setField);
partitionedFanoutWriter.write(record);
LOGGER.info(val.toString());
}
catch (IOException e) {
throw new RuntimeException(e);
}
});
AppendFiles appendFiles = table.newAppend();
// submit datafiles to the table
try {
Arrays.stream(partitionedFanoutWriter.dataFiles()).forEach(appendFiles::appendFile);
} catch (IOException e) {
throw new RuntimeException(e);
}
// submit snapshot
appendFiles.apply();
appendFiles.commit();
```
And I am reading the table like this:
```
table.refresh();
IcebergGenerics.ScanBuilder scanBuilder =
IcebergGenerics.read(table);
CloseableIterable<Record> result = scanBuilder.build();
for (Record r : result) {
LOGGER.info(r.toString());
}
```
Table is being partitioned on "temp" column like I want but when I read the
rows, this is the result:
```
Record(1022, first_name, last_name, [email protected], null, false)
```
I see null on the partitioned column.
I would be great if anyone could help out with this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]