[ https://issues.apache.org/jira/browse/SPARK-28981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923682#comment-16923682 ]
Sean Owen commented on SPARK-28981: ----------------------------------- (Really we could say it's a duplicate of https://issues.apache.org/jira/browse/SPARK-26995 ) > Missing library for reading/writing Snappy-compressed files > ----------------------------------------------------------- > > Key: SPARK-28981 > URL: https://issues.apache.org/jira/browse/SPARK-28981 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 2.4.4 > Reporter: Paul Schweigert > Priority: Minor > > The current Dockerfile for Spark on Kubernetes is missing the > "ld-linux-x86-64.so.2" library needed to read / write Snappy-compressed > files. > > Sample error message when trying to read a parquet file compressed with > snappy: > > {code:java} > 19/09/02 05:33:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, > 172.30.189.77, executor 2): org.apache.spark.SparkException: Task failed > while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257) > > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) > > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.UnsatisfiedLinkError: > /tmp/snappy-1.1.7-04145e2f-cc82-4217-99b8-641cdd755a87-libsnappyjava.so: > Error loading shared library ld-linux-x86-64.so.2: No such file or directory > (needed by > /tmp/snappy-1.1.7-04145e2f-cc82-4217-99b8-641cdd755a87-libsnappyjava.so) > at java.lang.ClassLoader$NativeLibrary.load(Native Method) > at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) > at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) > at java.lang.Runtime.load0(Runtime.java:809) > at java.lang.System.load(System.java:1086) > at > org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:179) > at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154) > at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47) > at > org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67) > > at > org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81) > > at > org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92) > > at > org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:165) > > at > org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95) > > at > org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147) > > at > org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235) > > at > org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122) > > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172) > > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114) > > at > org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165) > > at > org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42) > > at > org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57) > > at > org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74) > > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) > > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) > > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) > > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248) > > ... 10 more > {code} > The relevant library is in the Alpine Linux "gcompat" package > ([https://pkgs.alpinelinux.org/package/edge/community/x86/gcompat]). Adding > this library to the Dockerfile enables the reading/writing of > Snappy-compressed files. > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org