[I] Iceberg Partition writer converts partition column value to null [iceberg]

via GitHub Mon, 05 Feb 2024 02:31:09 -0800


robinsinghstudios opened a new issue, #9653:
URL: https://github.com/apache/iceberg/issues/9653


   ### Query engine
   
    Iceberg Java API 1.4.3
   
   ### Question
   
   For context, I am new to Java and might be missing something simple but, 
after being stuck on this issue for a long while, I decided to post my question 
here.
   
   I am using Iceberg 1.4.3 and Java 20.
   
   While using Iceberg's PartitionedFanoutWriter, I am able to dynamically 
generate the partitions and write the files successfully in the respective 
partitions. However, when I try to read the data, the partition column values 
are always null.
   
   My code looks like this:
   ```
   org.apache.iceberg.Table table;
   PartitionSpec spec;
   SortOrder srt;
   if (catalog.tableExists(name)) {
   table = catalog.loadTable(name);
   }
   else {
           spec = PartitionSpec.builderFor(schema)
                   .identity("temp")
                   .build();
           srt = SortOrder.builderFor(schema)
                   .asc(keyColumn)
                   .build();
           Map<String, String> tblProps = new HashMap<>();
           tblProps.put("write.parquet.compression-codec", "uncompressed");
           tblProps.put("write.distribution-mode", "range");
           tblProps.put("format-version", "2");
           table = catalog.buildTable(name, 
schema).withPartitionSpec(spec).withProperties(tblProps).withSortOrder(srt).create();
   }
   
   GenericAppenderFactory appenderFactory = new 
GenericAppenderFactory(table.schema());
   int partitionId = 1, taskId = 1;
   OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, 
partitionId, taskId).format(FileFormat.PARQUET).build();
   appenderFactory.setAll(table.properties());
   final PartitionKey partitionKey = new PartitionKey(table.spec(), 
table.spec().schema());
   
   // partitionedFanoutWriter will auto partitioned record and create the 
partitioned writer
   PartitionedFanoutWriter<Record> partitionedFanoutWriter = new 
PartitionedFanoutWriter<>(table.spec(), FileFormat.PARQUET, appenderFactory, 
outputFileFactory,
                       table.io(), TARGET_FILE_SIZE_IN_BYTES) {
                   @Override
                   protected PartitionKey partition(Record record) {
                       partitionKey.partition(record);
                       return partitionKey;
                   }
               };
   
   GenericRecord genericRecord = GenericRecord.create(table.schema());
   
   // FanoutDataWriter
   value.forEach(val -> {
                   try {
                       GenericRecord record = genericRecord.copy();
                       val.toMap().forEach(record::setField);
                       partitionedFanoutWriter.write(record);
                       LOGGER.info(val.toString());
                   }
                   catch (IOException e) {
                       throw new RuntimeException(e);
                   }
   });
   AppendFiles appendFiles = table.newAppend();
   // submit datafiles to the table
   try {
                   
Arrays.stream(partitionedFanoutWriter.dataFiles()).forEach(appendFiles::appendFile);
   } catch (IOException e) {
                   throw new RuntimeException(e);
   }
   // submit snapshot
   appendFiles.apply();
   appendFiles.commit();
   ```
   
   And I am reading the table like this:
   
   ```
           table.refresh();
           IcebergGenerics.ScanBuilder scanBuilder = 
IcebergGenerics.read(table);
           CloseableIterable<Record> result = scanBuilder.build();
           for (Record r : result) {
               LOGGER.info(r.toString());
           }
   ```
   
   Table is being partitioned on "temp" column like I want but when I read the 
rows, this is the result:
   
   ```
   Record(1022, first_name, last_name, [email protected], null, false)
   ```
   I see null on the partitioned column.
   
   I would be great if anyone could help out with this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Iceberg Partition writer converts partition column value to null [iceberg]

Reply via email to