Hi Junjie, The problem is that your writer doesn't have the same schema as the records you're passing to it, because addedFiles doesn't project all columns. Writers assume that the write schema and record schema matches, and will throw an exception like this if they don't.
The projection in addedFiles also came up on the PR to add a cherry-pick operation because that uses addedFiles and appends them. The fix in that PR is to always project the entire schema when returning added files. You could make that change in a different PR to fix this as well. On Mon, Dec 23, 2019 at 12:57 AM Junjie Chen <[email protected]> wrote: > Hi community > > I tried to add data files from an existing iceberg table to a target > iceberg table with following code (unit test): > > Iterator<DataFile> datafiles = > sourceTable.currentSnapshot().addedFiles().iterator(); > > while (datafiles.hasNext()) { > targetTable.newAppend().appendFile(datafiles.next()).commit(); > } > > it throws exception below (this can be reproduced in unit test as > well, I tried in testRewrites, it throws NPE): > > org.apache.avro.file.DataFileWriter$AppendWriteException: > java.lang.ClassCastException: java.util.Collections$UnmodifiableMap > cannot be cast to java.lang.Long > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) > at > org.apache.iceberg.avro.AvroFileAppender.add(AvroFileAppender.java:52) > at org.apache.iceberg.ManifestWriter.addEntry(ManifestWriter.java:133) > at org.apache.iceberg.ManifestWriter.add(ManifestWriter.java:147) > at org.apache.iceberg.ManifestWriter.add(ManifestWriter.java:36) > at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:32) > at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:37) > ... > > After debugging I found that the GenericDataFile read from existing > table has a defined fromProjectionPos array (0->0, ...4->4, 5->9, > 6->10, 7->11, 8->12...), while the GenericAvroWriter is initialized > without such projection so that when writing the object it throws > CastException/NPE. > > My question is how to solve this? Or do we have other methods to add > data files from an existing table? > > Thanks > -- Ryan Blue Software Engineer Netflix
