[GitHub] [iceberg] rdblue commented on a change in pull request #3273: Add File for Avro files throws PreconditionException

GitBox Tue, 19 Oct 2021 12:47:56 -0700


rdblue commented on a change in pull request #3273:
URL: https://github.com/apache/iceberg/pull/3273#discussion_r732191990




##########
File path: 
spark/v3.0/spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java
##########
@@ -106,6 +119,68 @@ public void addDataUnpartitionedOrc() {
         sql("SELECT * FROM %s ORDER BY id", tableName));
   }
 
+  @Test
+  public void addAvroFile() throws Exception {
+    // Spark Session Catalog cannot load metadata tables
+    // with "The namespace in session catalog must have exactly one name part"
+    Assume.assumeFalse(catalogName.equals("spark_catalog"));
+
+    Schema schema = new Schema(
+        Types.NestedField.required(1, "id", Types.LongType.get()),
+        Types.NestedField.optional(2, "data", Types.StringType.get()));
+
+    GenericRecord baseRecord = GenericRecord.create(schema);
+
+    ImmutableList.Builder<Record> builder = ImmutableList.builder();
+    builder.add(baseRecord.copy(ImmutableMap.of("id", 1L, "data", "a")));
+    builder.add(baseRecord.copy(ImmutableMap.of("id", 2L, "data", "b")));
+    List<Record> records = builder.build();
+
+    OutputFile file = Files.localOutput(temp.newFile());
+
+    DataWriter<Record> dataWriter = Avro.writeData(file)
+        .schema(schema)
+        .createWriterFunc(org.apache.iceberg.data.avro.DataWriter::create)
+        .overwrite()
+        .withSpec(PartitionSpec.unpartitioned())
+        .build();
+
+    try {
+      for (Record record : records) {
+        dataWriter.add(record);
+      }
+    } finally {
+      dataWriter.close();
+    }
+
+    String path = dataWriter.toDataFile().path().toString();
+
+    String createIceberg =
+        "CREATE TABLE %s (id Long, data String) USING iceberg";
+    sql(createIceberg, tableName);
+
+    Object result = scalarSql("CALL %s.system.add_files('%s', '`avro`.`%s`')",
+        catalogName, tableName, path);

Review comment:
       This is actually dangerous and we probably want to disallow it. We 
should only import data files that do not have field IDs. Otherwise, the field 
IDs may not match and you could get strange behavior. I'd prefer if the test 
used an Avro file written without Iceberg support to ensure it doesn't have 
field IDs. Not a huge problem, but eventually I think we should catch that 
there were IDs in the imported file and fail if they don't match the table 
schema's IDs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3273: Add File for Avro files throws PreconditionException

Reply via email to