matthijseikelenboom opened a new issue, #9148:
URL: https://github.com/apache/iceberg/issues/9148
### Query engine
Spark
### Question
Hi, I have a question about the combined use of Apache Spark and Iceberg.
I'm trying to concurrently write to Iceberg, but I get `Unclose input stream`
exceptions when I'm testing, see below
```
2023-11-23T16:22:03.407+0100 WARN [Thread="Finalizer"]
[o.a.i.h.HadoopStreams :138] Unclosed input stream created by:
org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.<init>(HadoopStreams.java:91)
org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:55)
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100)
org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76)
org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:188)
org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187)
org.apache.iceberg.io.CloseableIterable.lambda$filter$1(CloseableIterable.java:136)
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72)
org.apache.iceberg.io.CloseableIterable.lambda$filter$1(CloseableIterable.java:136)
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72)
org.apache.iceberg.io.CloseableIterable.lambda$filter$1(CloseableIterable.java:136)
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72)
org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:188)
org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187)
org.apache.iceberg.ManifestGroup$1.iterator(ManifestGroup.java:333)
org.apache.iceberg.ManifestGroup$1.iterator(ManifestGroup.java:291)
org.apache.iceberg.util.ParallelIterable$ParallelIterator.lambda$new$1(ParallelIterable.java:69)
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)
```
I first saw these exceptions when using Iceberg with Hadoop. So I tried it
with Hive, hoping that it would be fixed, but that does not seem to be the case.
What am I missing here? Do I need to change some configurations to enable
proper concurrent writing?
Here are the versions of the tools I'm using, in case that may come in handy
Spark version: 3.2.1
Iceberg version: 1.4.2
Hive version: 4.0.0-alpha-2 (The only version we got to work)
Hadoop version: 3.2.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]