[I] Unclosed input streams when writing concurrently [iceberg]

via GitHub Fri, 24 Nov 2023 07:24:57 -0800


matthijseikelenboom opened a new issue, #9148:
URL: https://github.com/apache/iceberg/issues/9148


   ### Query engine
   
   Spark
   
   ### Question
   
   Hi, I have a question about the combined use of Apache Spark and Iceberg. 
I'm trying to concurrently write to Iceberg, but I get `Unclose input stream` 
exceptions when I'm testing, see below
   
   ```
   2023-11-23T16:22:03.407+0100  WARN [Thread="Finalizer"] 
[o.a.i.h.HadoopStreams         :138] Unclosed input stream created by:
        
org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.<init>(HadoopStreams.java:91)
        org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:55)
        
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
        
org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100)
        org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76)
        
org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:188)
        
org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187)
        
org.apache.iceberg.io.CloseableIterable.lambda$filter$1(CloseableIterable.java:136)
        
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72)
        
org.apache.iceberg.io.CloseableIterable.lambda$filter$1(CloseableIterable.java:136)
        
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72)
        
org.apache.iceberg.io.CloseableIterable.lambda$filter$1(CloseableIterable.java:136)
        
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72)
        
org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:188)
        
org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187)
        org.apache.iceberg.ManifestGroup$1.iterator(ManifestGroup.java:333)
        org.apache.iceberg.ManifestGroup$1.iterator(ManifestGroup.java:291)
        
org.apache.iceberg.util.ParallelIterable$ParallelIterator.lambda$new$1(ParallelIterable.java:69)
        
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   I first saw these exceptions when using Iceberg with Hadoop. So I tried it 
with Hive, hoping that it would be fixed, but that does not seem to be the case.
   
   What am I missing here? Do I need to change some configurations to enable 
proper concurrent writing?
   
   Here are the versions of the tools I'm using, in case that may come in handy
   Spark version: 3.2.1
   Iceberg version: 1.4.2
   Hive version: 4.0.0-alpha-2 (The only version we got to work)
   Hadoop version: 3.2.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Unclosed input streams when writing concurrently [iceberg]

Reply via email to