[jira] [Resolved] (SPARK-28721) Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and Executors

2019-08-14 Thread Patrick Clay (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Clay resolved SPARK-28721.
--
Resolution: Duplicate

Ah sorry I didn't search carefully enough for a duplicate

> Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and 
> Executors
> ---
>
> Key: SPARK-28721
> URL: https://issues.apache.org/jira/browse/SPARK-28721
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.1, 2.4.3
>Reporter: Patrick Clay
>Priority: Minor
>
> This does not seem to affect 2.4.0.
> To repro:
>  # Download pristine Spark 2.4.3 binary
>  # Edit pi.py to not call spark.stop()
>  # ./bin/docker-image-tool.sh -r MY_IMAGE -t MY_TAG build push
>  # spark-submit --master k8s://IP --deploy-mode cluster --conf 
> spark.kubernetes.driver.pod.name=spark-driver --conf 
> spark.kubernetes.container.image=MY_IMAGE:MY_TAG 
> file:/opt/spark/examples/src/main/python/pi.py
> The driver runs successfully and Python exits but the Driver and Executor 
> JVMs and Pods remain up.
>  
> I realize that explicitly calling spark.stop() is always best practice, but 
> since this does not repro in 2.4.0 it seems like a regression.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28721) Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and Executors

2019-08-14 Thread Patrick Clay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907522#comment-16907522
 ] 

Patrick Clay commented on SPARK-28721:
--

I confirmed this affects 2.4.1, and re-confirmed that it does not affect 2.4.0.

> Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and 
> Executors
> ---
>
> Key: SPARK-28721
> URL: https://issues.apache.org/jira/browse/SPARK-28721
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.1, 2.4.3
>Reporter: Patrick Clay
>Priority: Minor
>
> This does not seem to affect 2.4.0.
> To repro:
>  # Download pristine Spark 2.4.3 binary
>  # Edit pi.py to not call spark.stop()
>  # ./bin/docker-image-tool.sh -r MY_IMAGE -t MY_TAG build push
>  # spark-submit --master k8s://IP --deploy-mode cluster --conf 
> spark.kubernetes.driver.pod.name=spark-driver --conf 
> spark.kubernetes.container.image=MY_IMAGE:MY_TAG 
> file:/opt/spark/examples/src/main/python/pi.py
> The driver runs successfully and Python exits but the Driver and Executor 
> JVMs and Pods remain up.
>  
> I realize that explicitly calling spark.stop() is always best practice, but 
> since this does not repro in 2.4.0 it seems like a regression.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28721) Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and Executors

2019-08-14 Thread Patrick Clay (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Clay updated SPARK-28721:
-
Affects Version/s: 2.4.1

> Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and 
> Executors
> ---
>
> Key: SPARK-28721
> URL: https://issues.apache.org/jira/browse/SPARK-28721
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.1, 2.4.3
>Reporter: Patrick Clay
>Priority: Minor
>
> This does not seem to affect 2.4.0.
> To repro:
>  # Download pristine Spark 2.4.3 binary
>  # Edit pi.py to not call spark.stop()
>  # ./bin/docker-image-tool.sh -r MY_IMAGE -t MY_TAG build push
>  # spark-submit --master k8s://IP --deploy-mode cluster --conf 
> spark.kubernetes.driver.pod.name=spark-driver --conf 
> spark.kubernetes.container.image=MY_IMAGE:MY_TAG 
> file:/opt/spark/examples/src/main/python/pi.py
> The driver runs successfully and Python exits but the Driver and Executor 
> JVMs and Pods remain up.
>  
> I realize that explicitly calling spark.stop() is always best practice, but 
> since this does not repro in 2.4.0 it seems like a regression.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28721) Failing to stop SparkSession in K8S cluster mode PySpark leaks Driver and Executors

2019-08-13 Thread Patrick Clay (JIRA)
Patrick Clay created SPARK-28721:


 Summary: Failing to stop SparkSession in K8S cluster mode PySpark 
leaks Driver and Executors
 Key: SPARK-28721
 URL: https://issues.apache.org/jira/browse/SPARK-28721
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, PySpark
Affects Versions: 2.4.3
Reporter: Patrick Clay


This does not seem to affect 2.4.0.

To repro:
 # Download pristine Spark 2.4.3 binary
 # Edit pi.py to not call spark.stop()
 # ./bin/docker-image-tool.sh -r MY_IMAGE -t MY_TAG build push
 # spark-submit --master k8s://IP --deploy-mode cluster --conf 
spark.kubernetes.driver.pod.name=spark-driver --conf 
spark.kubernetes.container.image=MY_IMAGE:MY_TAG 
file:/opt/spark/examples/src/main/python/pi.py

The driver runs successfully and Python exits but the Driver and Executor JVMs 
and Pods remain up.

 

I realize that explicitly calling spark.stop() is always best practice, but 
since this does not repro in 2.4.0 it seems like a regression.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26597) Support using images with different entrypoints on Kubernetes

2019-01-10 Thread Patrick Clay (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Clay updated SPARK-26597:
-
External issue ID: https://github.com/jupyter/docker-stacks/issues/797

> Support using images with different entrypoints on Kubernetes
> -
>
> Key: SPARK-26597
> URL: https://issues.apache.org/jira/browse/SPARK-26597
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Patrick Clay
>Priority: Minor
>
> I wish to use arbitrary pre-existing docker images containing Spark with 
> Kubernetes.
> Specifically I wish to use 
> [jupyter/all-spark-notebook|https://hub.docker.com/r/jupyter/all-spark-notebook]
>  in in-cluster client mode ideally without image modification (I think using 
> images maintained by others is a key advantage of Docker).
> It has the full Spark 2.4 binary tarball with Spark's entrypoint.sh in it, 
> but I need to create a child image setting that as the entrypoint to it, 
> because Spark does not let the user specify a k8s command. I needed separate 
> images for kernel / driver and executor, because I need the kernel to have 
> Jupyter's entrypoint. Building and setting the executor image works, but is 
> obnoxious just to set the entrypoint.
> The crux of this FR is to add a property for executor (and driver) command to 
> point to entrypoint.sh
> I personally don't see why you even have entrypoint.sh instead of making the 
> command be _spark-class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend ..._, which seems a 
> lot more portable (albeit reliant on the PATH).
> Speaking of reliance on PATH it also broke, because they didn't set 
> JAVA_HOME, and install tini from Conda putting it on a different path. These 
> are smaller issues. I'll file an issue on them and try to work them out 
> between here and there.
> In general shouldn't Spark on k8s be less coupled to the layout of the image? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26597) Support using images with different entrypoints on Kubernetes

2019-01-10 Thread Patrick Clay (JIRA)
Patrick Clay created SPARK-26597:


 Summary: Support using images with different entrypoints on 
Kubernetes
 Key: SPARK-26597
 URL: https://issues.apache.org/jira/browse/SPARK-26597
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Patrick Clay


I wish to use arbitrary pre-existing docker images containing Spark with 
Kubernetes.

Specifically I wish to use 
[jupyter/all-spark-notebook|https://hub.docker.com/r/jupyter/all-spark-notebook]
 in in-cluster client mode ideally without image modification (I think using 
images maintained by others is a key advantage of Docker).

It has the full Spark 2.4 binary tarball with Spark's entrypoint.sh in it, but 
I need to create a child image setting that as the entrypoint to it, because 
Spark does not let the user specify a k8s command. I needed separate images for 
kernel / driver and executor, because I need the kernel to have Jupyter's 
entrypoint. Building and setting the executor image works, but is obnoxious 
just to set the entrypoint.

The crux of this FR is to add a property for executor (and driver) command to 
point to entrypoint.sh

I personally don't see why you even have entrypoint.sh instead of making the 
command be _spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend 
..._, which seems a lot more portable (albeit reliant on the PATH).

Speaking of reliance on PATH it also broke, because they didn't set JAVA_HOME, 
and install tini from Conda putting it on a different path. These are smaller 
issues. I'll file an issue on them and try to work them out between here and 
there.

In general shouldn't Spark on k8s be less coupled to the layout of the image? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24018) Spark-without-hadoop package fails to create or read parquet files with snappy compression

2018-07-09 Thread Patrick Clay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537551#comment-16537551
 ] 

Patrick Clay commented on SPARK-24018:
--

I believe we are both partially correct in that a fix (with Spark 2.3.0) does 
require snappy-java-1.1.2, and it was caused by SPARK-18646. The native library 
loader of Snappy 1.0.4 [uses a self-described 
hack|https://github.com/xerial/snappy-java/blob/snappy-java-1.0.4/src/main/java/org/xerial/snappy/SnappyLoader.java#L175]
 to inject the loader onto the root class loader. The hack was [later 
removed|https://github.com/xerial/snappy-java/commit/06f007a08#diff-a1c8fc77f8] 
in 1.1.2, which allows the non-inheriting class loader to pick it up.

 

I believed this only affects spark-shell, because neither pyspark (the REPL and 
spark-submit) nor
{code:java}
./bin/spark-submit --class org.apache.spark.examples.sql.SQLDataSourceExample 
examples/jars/spark-examples_2.11-2.3.0.jar{code}
have this issue. What repro did you have without spark-shell?

 

I don't believe this is related to Parquet versioning because this also throws:
{code:java}
scala> import org.xerial.snappy.Snappy 
import org.xerial.snappy.Snappy 

scala> sc.parallelize(Seq("foo")).map(Snappy.compress).collect 
2018-07-09 13:44:14 ERROR Executor:91 - Exception in task 11.0 in stage 0.0 
(TID 11) 
java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
...{code}
In answer to your last question I did not pass any arguments to spark-shell. 
All I did to repro was
{code:java}
export SPARK_DIST_CLASSPATH=$(~/Downloads/hadoop-2.8.3/bin/hadoop classpath)
~/Downloads/spark-2.3.0-bin-without-hadoop/bin/spark-shell{code}
 

 

> Spark-without-hadoop package fails to create or read parquet files with 
> snappy compression
> --
>
> Key: SPARK-24018
> URL: https://issues.apache.org/jira/browse/SPARK-24018
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.0
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> On a brand-new installation of Spark 2.3.0 with a user-provided hadoop-2.8.3, 
> Spark fails to read or write dataframes in parquet format with snappy 
> compression.
> This is due to an incompatibility between the snappy-java version that is 
> required by parquet (parquet is provided in Spark jars but snappy isn't) and 
> the version that is available from hadoop-2.8.3.
>  
> Steps to reproduce:
>  * Download and extract hadoop-2.8.3
>  * Download and extract spark-2.3.0-without-hadoop
>  * export JAVA_HOME, HADOOP_HOME, SPARK_HOME, PATH
>  * Following instructions from 
> [https://spark.apache.org/docs/latest/hadoop-provided.html], set 
> SPARK_DIST_CLASSPATH=$(hadoop classpath) in spark-env.sh
>  * Start a spark-shell, enter the following:
>  
> {code:java}
> import spark.implicits._
> val df = List(1, 2, 3, 4).toDF
> df.write
>   .format("parquet")
>   .option("compression", "snappy")
>   .mode("overwrite")
>   .save("test.parquet")
> {code}
>  
>  
> This fails with the following:
> {noformat}
> java.lang.UnsatisfiedLinkError: 
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
> at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
> at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
> at 
> org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
> at 
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> at 
> org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
> at 
> org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)
> at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93)
> at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150)
> at 
> org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:238)
> at 
> org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:121)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:167)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:163)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:405)
> at 
> 

[jira] [Comment Edited] (SPARK-24018) Spark-without-hadoop package fails to create or read parquet files with snappy compression

2018-07-09 Thread Patrick Clay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537551#comment-16537551
 ] 

Patrick Clay edited comment on SPARK-24018 at 7/9/18 8:49 PM:
--

I believe we are both partially correct in that a fix (with Spark 2.3.0) does 
require snappy-java-1.1.2, and it was caused by SPARK-18646. The native library 
loader of Snappy 1.0.4 [uses a self-described 
hack|https://github.com/xerial/snappy-java/blob/snappy-java-1.0.4/src/main/java/org/xerial/snappy/SnappyLoader.java#L175]
 to inject the loader onto the root class loader. The hack was [later 
removed|https://github.com/xerial/snappy-java/commit/06f007a08#diff-a1c8fc77f8] 
in 1.1.2, which allows the non-inheriting class loader to pick it up.

 

I believed this only affects spark-shell, because neither pyspark (the REPL and 
spark-submit) nor
{code:java}
./bin/spark-submit --class org.apache.spark.examples.sql.SQLDataSourceExample 
examples/jars/spark-examples_2.11-2.3.0.jar{code}
have this issue. What repro did you have without spark-shell?

 

I don't believe this is related to Parquet versioning because this also throws:
{code:java}
scala> import org.xerial.snappy.Snappy 
import org.xerial.snappy.Snappy 

scala> sc.parallelize(Seq("foo")).map(Snappy.compress).collect 
2018-07-09 13:44:14 ERROR Executor:91 - Exception in task 11.0 in stage 0.0 
(TID 11) 
java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
...{code}
 

In answer to your last question I did not pass any arguments to spark-shell. 
All I did to repro was
{code:java}
export SPARK_DIST_CLASSPATH=$(~/Downloads/hadoop-2.8.3/bin/hadoop classpath)
~/Downloads/spark-2.3.0-bin-without-hadoop/bin/spark-shell{code}
 

 


was (Author: pclay):
I believe we are both partially correct in that a fix (with Spark 2.3.0) does 
require snappy-java-1.1.2, and it was caused by SPARK-18646. The native library 
loader of Snappy 1.0.4 [uses a self-described 
hack|https://github.com/xerial/snappy-java/blob/snappy-java-1.0.4/src/main/java/org/xerial/snappy/SnappyLoader.java#L175]
 to inject the loader onto the root class loader. The hack was [later 
removed|https://github.com/xerial/snappy-java/commit/06f007a08#diff-a1c8fc77f8] 
in 1.1.2, which allows the non-inheriting class loader to pick it up.

 

I believed this only affects spark-shell, because neither pyspark (the REPL and 
spark-submit) nor
{code:java}
./bin/spark-submit --class org.apache.spark.examples.sql.SQLDataSourceExample 
examples/jars/spark-examples_2.11-2.3.0.jar{code}
have this issue. What repro did you have without spark-shell?

 

I don't believe this is related to Parquet versioning because this also throws:
{code:java}
scala> import org.xerial.snappy.Snappy 
import org.xerial.snappy.Snappy 

scala> sc.parallelize(Seq("foo")).map(Snappy.compress).collect 
2018-07-09 13:44:14 ERROR Executor:91 - Exception in task 11.0 in stage 0.0 
(TID 11) 
java.lang.UnsatisfiedLinkError: 
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
...{code}
In answer to your last question I did not pass any arguments to spark-shell. 
All I did to repro was
{code:java}
export SPARK_DIST_CLASSPATH=$(~/Downloads/hadoop-2.8.3/bin/hadoop classpath)
~/Downloads/spark-2.3.0-bin-without-hadoop/bin/spark-shell{code}
 

 

> Spark-without-hadoop package fails to create or read parquet files with 
> snappy compression
> --
>
> Key: SPARK-24018
> URL: https://issues.apache.org/jira/browse/SPARK-24018
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.0
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> On a brand-new installation of Spark 2.3.0 with a user-provided hadoop-2.8.3, 
> Spark fails to read or write dataframes in parquet format with snappy 
> compression.
> This is due to an incompatibility between the snappy-java version that is 
> required by parquet (parquet is provided in Spark jars but snappy isn't) and 
> the version that is available from hadoop-2.8.3.
>  
> Steps to reproduce:
>  * Download and extract hadoop-2.8.3
>  * Download and extract spark-2.3.0-without-hadoop
>  * export JAVA_HOME, HADOOP_HOME, SPARK_HOME, PATH
>  * Following instructions from 
> [https://spark.apache.org/docs/latest/hadoop-provided.html], set 
> SPARK_DIST_CLASSPATH=$(hadoop classpath) in spark-env.sh
>  * Start a spark-shell, enter the following:
>  
> {code:java}
> import spark.implicits._
> val df = List(1, 2, 3, 4).toDF
> df.write
>   .format("parquet")
>   .option("compression", "snappy")
>   .mode("overwrite")
>   .save("test.parquet")
> {code}
>  
>  
> This fails with the following:
> {noformat}
> java.lang.UnsatisfiedLinkError: 
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
> at 

[jira] [Commented] (SPARK-24018) Spark-without-hadoop package fails to create or read parquet files with snappy compression

2018-07-06 Thread Patrick Clay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535425#comment-16535425
 ] 

Patrick Clay commented on SPARK-24018:
--

I believe this is limited to spark-shell and was caused by SPARK-18646. 
Reverting it seems to fix the issue for me.

I don't know if there is a simple solution that fixes both this and the user 
classpath issue from that.

> Spark-without-hadoop package fails to create or read parquet files with 
> snappy compression
> --
>
> Key: SPARK-24018
> URL: https://issues.apache.org/jira/browse/SPARK-24018
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.0
>Reporter: Jean-Francis Roy
>Priority: Minor
>
> On a brand-new installation of Spark 2.3.0 with a user-provided hadoop-2.8.3, 
> Spark fails to read or write dataframes in parquet format with snappy 
> compression.
> This is due to an incompatibility between the snappy-java version that is 
> required by parquet (parquet is provided in Spark jars but snappy isn't) and 
> the version that is available from hadoop-2.8.3.
>  
> Steps to reproduce:
>  * Download and extract hadoop-2.8.3
>  * Download and extract spark-2.3.0-without-hadoop
>  * export JAVA_HOME, HADOOP_HOME, SPARK_HOME, PATH
>  * Following instructions from 
> [https://spark.apache.org/docs/latest/hadoop-provided.html], set 
> SPARK_DIST_CLASSPATH=$(hadoop classpath) in spark-env.sh
>  * Start a spark-shell, enter the following:
>  
> {code:java}
> import spark.implicits._
> val df = List(1, 2, 3, 4).toDF
> df.write
>   .format("parquet")
>   .option("compression", "snappy")
>   .mode("overwrite")
>   .save("test.parquet")
> {code}
>  
>  
> This fails with the following:
> {noformat}
> java.lang.UnsatisfiedLinkError: 
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
> at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
> at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
> at 
> org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
> at 
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> at 
> org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
> at 
> org.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)
> at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:93)
> at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:150)
> at 
> org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:238)
> at 
> org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:121)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:167)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:109)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:163)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:405)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:396)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at 
> org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
>