Hi Pralabh,

The Dockerfile defines and ARG for the JDK version:
https://github.com/apache/spark/blob/861df43e8d022f51727e0a12a7cca5e119e3c4cc/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L17
That means you could use --build-arg to overwrite it when building the
image. See
https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg

On Tue, Jun 14, 2022 at 7:31 AM Pralabh Kumar <pralabhku...@gmail.com>
wrote:

> Hi Steve / Dev team
>
> Thx for the help . Have a quick question ,  How can we fix the above error
> in Hadoop 3.1 .
>
>    - Spark docker file have (Java 11)
>    
> https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
>
>    - Now if we build Spark32  , Spark image will be having Java 11 .  If
>    we run on a Hadoop version less than 3.2 , it will throw an exception.
>
>
>
>    - Should there be a separate docker file for Spark32 for Java 8 for
>    Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
>    fine in our environment (with Hadoop3.1)
>
>
> Regards
> Pralabh Kumar
>
>
>
> On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran <ste...@cloudera.com>
> wrote:
>
>>
>>
>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar <pralabhku...@gmail.com>
>> wrote:
>>
>>> Hi Dev team
>>>
>>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>>> the following error . The same error is mentioned in Spark docs
>>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>>> arrow.
>>>
>>>
>>>    - IMHO , I think the error is coming from Parquet 1.12.1  which is
>>>    based on Hadoop 2.10 which is not java 11 compatible.
>>>
>>>
>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>
>>
>> Please let me know if this understanding is correct and is there a way to
>>> fix it.
>>>
>>
>>
>>
>> upgrade to a version of hadoop with the fix. That's any version >= hadoop
>> 3.2.0 which shipped since 2018
>>
>>>
>>>
>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>> sun.nio.ch.DirectBuffer.cleaner()'
>>>
>>>             at
>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>>
>>>             at
>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>>
>>>             at
>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>>
>>>             at java.base/java.io.FilterInputStream.close(Unknown Source)
>>>
>>>             at
>>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>>
>>>             at
>>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>>
>>>             at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>>
>>>             at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>>
>>>             at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>>
>>>             at
>>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>>
>>>             at
>>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>>
>>>             at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>>
>>>             at scala.util.Success.map(Try.scala:213)
>>>
>>>             at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>>
>>>             at
>>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>>
>>>             at
>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>>
>>>             at
>>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>>
>>>             at
>>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>>> Source)
>>>
>>>             at
>>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>>
>>>             at
>>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>>> Source)
>>>
>>>             at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>>> Source)
>>>
>>>             at
>>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>>
>>>             at
>>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>>
>>

Reply via email to