ms1111 opened a new issue, #10180:
URL: https://github.com/apache/iceberg/issues/10180
### Feature Request / Improvement
If the hadoop-common library is not present, trying to write a Parquet file:
```java
DataWriter<Record> dataWriter =
Parquet.writeData(file)
.schema(schema)
.createWriterFunc(GenericParquetWriter::buildWriter)
.overwrite()
.withSpec(partitionSpec)
.build();
```
... will fail with:
```
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at
org.apache.iceberg.parquet.Parquet$WriteBuilder.<init>(Parquet.java:164)
at
org.apache.iceberg.parquet.Parquet$WriteBuilder.<init>(Parquet.java:143)
at org.apache.iceberg.parquet.Parquet.write(Parquet.java:129)
at
org.apache.iceberg.parquet.Parquet$DataWriteBuilder.<init>(Parquet.java:646)
at
org.apache.iceberg.parquet.Parquet$DataWriteBuilder.<init>(Parquet.java:637)
at org.apache.iceberg.parquet.Parquet.writeData(Parquet.java:623)
```
In org.apache.iceberg.parquet.Parquet, an empty Configuration is created:
```java
private WriteBuilder(OutputFile file) {
this.file = file;
if (file instanceof HadoopOutputFile) {
this.conf = new Configuration(((HadoopOutputFile) file).getConf());
} else {
this.conf = new Configuration();
}
}
```
ParquetWriter eventually passes this to ParquetIO.file(), which ignores it
if the file is not a HadoopOutputFile.
hadoop-common is a heavy dependency with many transitive dependencies, would
be nice to avoid it.
Similar to Iceberg Flink issues - #3117 / #4183
### Query engine
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]