pvary commented on a change in pull request #1481:
URL: https://github.com/apache/iceberg/pull/1481#discussion_r494216160



##########
File path: mr/src/main/java/org/apache/iceberg/mr/Catalogs.java
##########
@@ -77,6 +102,77 @@ private static Table loadTable(Configuration conf, String 
tableIdentifier, Strin
     return new HadoopTables(conf).load(tableLocation);
   }
 
+  /**
+   * Creates an Iceberg table using the catalog specified by the configuration.
+   * The properties should contain the following values:
+   * <p><ul>
+   * <li>Table identifier ({@link Catalogs#NAME}) or table path ({@link 
Catalogs#LOCATION}) is required
+   * <li>Table schema ({@link InputFormatConfig#TABLE_SCHEMA}) is required
+   * <li>Partition specification ({@link InputFormatConfig#PARTITION_SPEC}) is 
optional. Table will be unpartitioned if
+   *  not provided
+   * </ul><p>
+   * Other properties will be handled over to the Table creation. The 
controlling properties above will not be
+   * propagated.
+   * @param conf a Hadoop conf
+   * @param props the controlling properties
+   * @return the created Iceberg table
+   */
+  public static Table createTable(Configuration conf, Properties props) {
+    String schemaString = props.getProperty(InputFormatConfig.TABLE_SCHEMA);
+    Preconditions.checkNotNull(schemaString, "Table schema not set");
+    Schema schema = 
SchemaParser.fromJson(props.getProperty(InputFormatConfig.TABLE_SCHEMA));

Review comment:
       I think we should keep the serialized schema for the Catalogs interface. 
Other systems like Impala, Presto, etc. might want to use it as well.
   I would like to tackle the Hive schema DDL in another PR. The data is 
available in HiveIcebergSerDe.initialize in a somewhat convoluted way. I would 
like to get it there and convert it to the Iceberg Schema string. From there I 
would only push the Iceberg related stuff down further.
   What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to