[jira] [Commented] (SPARK-31813) Cannot write snappy-compressed text files

2020-05-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119584#comment-17119584
 ] 

Hyukjin Kwon commented on SPARK-31813:
--

Well, ORC and Parquet has its own snappy logic IIRC. They might have fallback. 
At least I know they use different implementations.

> Cannot write snappy-compressed text files
> -
>
> Key: SPARK-31813
> URL: https://issues.apache.org/jira/browse/SPARK-31813
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
>Reporter: Ondrej Kokes
>Priority: Minor
>
> After installing pyspark (pip install pyspark) on both macOS and Ubuntu (a 
> clean Docker image with default-jre), Spark fails to write text-based files 
> (CSV and JSON) with snappy compression. It can snappy compress parquet and 
> orc, gzipping CSVs also works.
> This is a clean PySpark installation, snappy jars are in place
> {{$ ls -1 /usr/local/lib/python3.7/site-packages/pyspark/jars/ | grep snappy}}
> {{snappy-0.2.jar
> }}{{snappy-java-1.1.7.3.jar}}
> Repro 1 (Scala):
> $ spark-shell
> {{spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").parquet("tmp/foo")}}
> spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").csv("tmp/foo")
> The first (parquet) will work, the second one won't.
> Repro 2 (PySpark):
>  {{from pyspark.sql import SparkSession}}
>  {{if __name__ == '__main__':}}{{spark}}
>  {{  SparkSession.builder.appName('snappy_testing').getOrCreate()}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').parquet('tmp/works_fine')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'gzip').mode('overwrite').csv('tmp/also_works')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').csv('tmp/snappy_not_found')}}
>   
>  In either case I get the following traceback
> java.lang.RuntimeException: native snappy library not available: this version 
> of libhadoop was built without snappy support.java.lang.RuntimeException: 
> native snappy library not available: this version of libhadoop was built 
> without snappy support. at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
>  at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) 
> at 
> org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.(CSVFileFormat.scala:177)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:85)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For 

[jira] [Commented] (SPARK-31813) Cannot write snappy-compressed text files

2020-05-29 Thread Ondrej Kokes (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119574#comment-17119574
 ] 

Ondrej Kokes commented on SPARK-31813:
--

[~hyukjin.kwon] I'm not disputing the lack of functionality, I'm fine with 
installing an additional library, I'm reporting an *inconsistency,* where I can 
write snappy-compressed parquet/orc, but not JSON/CSV - I should either do both 
of these or none of these.

> Cannot write snappy-compressed text files
> -
>
> Key: SPARK-31813
> URL: https://issues.apache.org/jira/browse/SPARK-31813
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
>Reporter: Ondrej Kokes
>Priority: Minor
>
> After installing pyspark (pip install pyspark) on both macOS and Ubuntu (a 
> clean Docker image with default-jre), Spark fails to write text-based files 
> (CSV and JSON) with snappy compression. It can snappy compress parquet and 
> orc, gzipping CSVs also works.
> This is a clean PySpark installation, snappy jars are in place
> {{$ ls -1 /usr/local/lib/python3.7/site-packages/pyspark/jars/ | grep snappy}}
> {{snappy-0.2.jar
> }}{{snappy-java-1.1.7.3.jar}}
> Repro 1 (Scala):
> $ spark-shell
> {{spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").parquet("tmp/foo")}}
> spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").csv("tmp/foo")
> The first (parquet) will work, the second one won't.
> Repro 2 (PySpark):
>  {{from pyspark.sql import SparkSession}}
>  {{if __name__ == '__main__':}}{{spark}}
>  {{  SparkSession.builder.appName('snappy_testing').getOrCreate()}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').parquet('tmp/works_fine')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'gzip').mode('overwrite').csv('tmp/also_works')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').csv('tmp/snappy_not_found')}}
>   
>  In either case I get the following traceback
> java.lang.RuntimeException: native snappy library not available: this version 
> of libhadoop was built without snappy support.java.lang.RuntimeException: 
> native snappy library not available: this version of libhadoop was built 
> without snappy support. at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
>  at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) 
> at 
> org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.(CSVFileFormat.scala:177)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:85)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SPARK-31813) Cannot write snappy-compressed text files

2020-05-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119568#comment-17119568
 ] 

Hyukjin Kwon commented on SPARK-31813:
--

Seems like you should have native libraries as the error says. You should 
manually install the native library of snappy and let Hadoop knows it. It's not 
a Spark issue.

> Cannot write snappy-compressed text files
> -
>
> Key: SPARK-31813
> URL: https://issues.apache.org/jira/browse/SPARK-31813
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
>Reporter: Ondrej Kokes
>Priority: Minor
>
> After installing pyspark (pip install pyspark) on both macOS and Ubuntu (a 
> clean Docker image with default-jre), Spark fails to write text-based files 
> (CSV and JSON) with snappy compression. It can snappy compress parquet and 
> orc, gzipping CSVs also works.
> This is a clean PySpark installation, snappy jars are in place
> {{$ ls -1 /usr/local/lib/python3.7/site-packages/pyspark/jars/ | grep snappy}}
> {{snappy-0.2.jar
> }}{{snappy-java-1.1.7.3.jar}}
> Repro 1 (Scala):
> $ spark-shell
> {{spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").parquet("tmp/foo")}}
> spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").csv("tmp/foo")
> The first (parquet) will work, the second one won't.
> Repro 2 (PySpark):
>  {{from pyspark.sql import SparkSession}}
>  {{if __name__ == '__main__':}}{{spark}}
>  {{  SparkSession.builder.appName('snappy_testing').getOrCreate()}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').parquet('tmp/works_fine')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'gzip').mode('overwrite').csv('tmp/also_works')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').csv('tmp/snappy_not_found')}}
>   
>  In either case I get the following traceback
> java.lang.RuntimeException: native snappy library not available: this version 
> of libhadoop was built without snappy support.java.lang.RuntimeException: 
> native snappy library not available: this version of libhadoop was built 
> without snappy support. at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
>  at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) 
> at 
> org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.(CSVFileFormat.scala:177)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:85)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SPARK-31813) Cannot write snappy-compressed text files

2020-05-27 Thread Ondrej Kokes (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117440#comment-17117440
 ] 

Ondrej Kokes commented on SPARK-31813:
--

Tried a different Docker image - openjdk:8, downloaded Spark 2.4.5, unpacked, 
launched the spark-shell and could reproduce it there. I can't quite think of a 
cleaner way to reproduce the issue.

> Cannot write snappy-compressed text files
> -
>
> Key: SPARK-31813
> URL: https://issues.apache.org/jira/browse/SPARK-31813
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
>Reporter: Ondrej Kokes
>Priority: Minor
>
> After installing pyspark (pip install pyspark) on both macOS and Ubuntu (a 
> clean Docker image with default-jre), Spark fails to write text-based files 
> (CSV and JSON) with snappy compression. It can snappy compress parquet and 
> orc, gzipping CSVs also works.
> This is a clean PySpark installation, snappy jars are in place
> {{$ ls -1 /usr/local/lib/python3.7/site-packages/pyspark/jars/ | grep snappy}}
> {{snappy-0.2.jar
> }}{{snappy-java-1.1.7.3.jar}}
> Repro 1 (Scala):
> $ spark-shell
> {{spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").parquet("tmp/foo")}}
> spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").csv("tmp/foo")
> The first (parquet) will work, the second one won't.
> Repro 2 (PySpark):
>  {{from pyspark.sql import SparkSession}}
>  {{if __name__ == '__main__':}}{{spark}}
>  {{  SparkSession.builder.appName('snappy_testing').getOrCreate()}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').parquet('tmp/works_fine')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'gzip').mode('overwrite').csv('tmp/also_works')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').csv('tmp/snappy_not_found')}}
>   
>  In either case I get the following traceback
> java.lang.RuntimeException: native snappy library not available: this version 
> of libhadoop was built without snappy support.java.lang.RuntimeException: 
> native snappy library not available: this version of libhadoop was built 
> without snappy support. at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
>  at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) 
> at 
> org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.(CSVFileFormat.scala:177)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:85)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Commented] (SPARK-31813) Cannot write snappy-compressed text files

2020-05-26 Thread ZhangShuai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117231#comment-17117231
 ] 

ZhangShuai commented on SPARK-31813:


In my environment, it works fine.

> Cannot write snappy-compressed text files
> -
>
> Key: SPARK-31813
> URL: https://issues.apache.org/jira/browse/SPARK-31813
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.5
>Reporter: Ondrej Kokes
>Priority: Minor
>
> After installing pyspark (pip install pyspark) on both macOS and Ubuntu (a 
> clean Docker image with default-jre), Spark fails to write text-based files 
> (CSV and JSON) with snappy compression. It can snappy compress parquet and 
> orc, gzipping CSVs also works.
> This is a clean PySpark installation, snappy jars are in place
> {{$ ls -1 /usr/local/lib/python3.7/site-packages/pyspark/jars/ | grep snappy}}
> {{snappy-0.2.jar
> }}{{snappy-java-1.1.7.3.jar}}
> Repro 1 (Scala):
> $ spark-shell
> {{spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").parquet("tmp/foo")}}
> spark.sql("select 1").write.option("compression", 
> "snappy").mode("overwrite").csv("tmp/foo")
> The first (parquet) will work, the second one won't.
> Repro 2 (PySpark):
>  {{from pyspark.sql import SparkSession}}
>  {{if __name__ == '__main__':}}{{spark}}
>  {{  SparkSession.builder.appName('snappy_testing').getOrCreate()}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').parquet('tmp/works_fine')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'gzip').mode('overwrite').csv('tmp/also_works')}}
>  {{  spark.sql('select 1').write.option('compression', 
> 'snappy').mode('overwrite').csv('tmp/snappy_not_found')}}
>   
>  In either case I get the following traceback
> java.lang.RuntimeException: native snappy library not available: this version 
> of libhadoop was built without snappy support.java.lang.RuntimeException: 
> native snappy library not available: this version of libhadoop was built 
> without snappy support. at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
>  at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) 
> at 
> org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
>  at 
> org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$$anonfun$createOutputStream$1.apply(CodecStreams.scala:84)
>  at scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.(CSVFileFormat.scala:177)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:85)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>  at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org