Thx for the reply @Steve Loughran <ste...@cloudera.com>  @Martin . It helps
. However just a minor suggestion


   - Should we update the documentation
   https://spark.apache.org/docs/latest/#downloading , which talks
about   java.nio.DirectByteBuffer
   . We can add another case , where user will get the same error for Spark on
   K8s on Spark32 running on version < Hadoop 3.2 (since the default value in
   Docker file for Spark32 is Java 11)

Please let me know if it make sense to you.

Regards
Pralabh Kumar

On Tue, Jun 14, 2022 at 4:21 PM Steve Loughran <ste...@cloudera.com> wrote:

> hadoop 3.2.x is the oldest of the hadoop branch 3 branches which gets
> active security patches, as was done last month. I would strongly recommend
> using it unless there are other compatibility issues (hive?)
>
> On Tue, 14 Jun 2022 at 05:31, Pralabh Kumar <pralabhku...@gmail.com>
> wrote:
>
>> Hi Steve / Dev team
>>
>> Thx for the help . Have a quick question ,  How can we fix the above
>> error in Hadoop 3.1 .
>>
>>    - Spark docker file have (Java 11)
>>    
>> https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
>>
>>    - Now if we build Spark32  , Spark image will be having Java 11 .  If
>>    we run on a Hadoop version less than 3.2 , it will throw an exception.
>>
>>
>>
>>    - Should there be a separate docker file for Spark32 for Java 8 for
>>    Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
>>    fine in our environment (with Hadoop3.1)
>>
>>
>> Regards
>> Pralabh Kumar
>>
>>
>>
>> On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran <ste...@cloudera.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar <pralabhku...@gmail.com>
>>> wrote:
>>>
>>>> Hi Dev team
>>>>
>>>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>>>> reading a huge parquet file via  spark.read.parquet("") .  I am
>>>> getting the following error . The same error is mentioned in Spark docs
>>>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>>>> arrow.
>>>>
>>>>
>>>>    - IMHO , I think the error is coming from Parquet 1.12.1  which is
>>>>    based on Hadoop 2.10 which is not java 11 compatible.
>>>>
>>>>
>>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>>
>>>
>>> Please let me know if this understanding is correct and is there a way
>>>> to fix it.
>>>>
>>>
>>>
>>>
>>> upgrade to a version of hadoop with the fix. That's any version >=
>>> hadoop 3.2.0 which shipped since 2018
>>>
>>>>
>>>>
>>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>>> sun.nio.ch.DirectBuffer.cleaner()'
>>>>
>>>>             at
>>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>>>
>>>>             at
>>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>>>
>>>>             at
>>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>>>
>>>>             at java.base/java.io.FilterInputStream.close(Unknown
>>>> Source)
>>>>
>>>>             at
>>>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>>>
>>>>             at
>>>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>>>
>>>>             at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>>>
>>>>             at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>>>
>>>>             at
>>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>>>
>>>>             at
>>>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>>>
>>>>             at
>>>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>>>
>>>>             at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>>>
>>>>             at scala.util.Success.map(Try.scala:213)
>>>>
>>>>             at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>>>
>>>>             at
>>>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>>>
>>>>             at
>>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>>>
>>>>             at
>>>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>>>
>>>>             at
>>>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>>>> Source)
>>>>
>>>>             at
>>>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>>>
>>>>             at
>>>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>>>> Source)
>>>>
>>>>             at
>>>> java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
>>>>
>>>>             at
>>>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>>>
>>>>             at
>>>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>>>
>>>

Reply via email to