Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Steve Loughran
On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar  wrote:

> Hi Dev team
>
> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
> reading a huge parquet file via  spark.read.parquet("") .  I am getting
> the following error . The same error is mentioned in Spark docs
> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
> arrow.
>
>
>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>based on Hadoop 2.10 which is not java 11 compatible.
>
>
correct. see https://issues.apache.org/jira/browse/HADOOP-12760


Please let me know if this understanding is correct and is there a way to
> fix it.
>



upgrade to a version of hadoop with the fix. That's any version >= hadoop
3.2.0 which shipped since 2018

>
>
> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
> sun.nio.ch.DirectBuffer.cleaner()'
>
> at
> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>
> at
> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>
> at
> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>
> at java.base/java.io.FilterInputStream.close(Unknown Source)
>
> at
> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>
> at
> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>
> at
> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>
> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>
> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>
> at scala.util.Success.map(Try.scala:213)
>
> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>
> at
> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>
> at
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>
> at
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>
> at
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
> Source)
>
> at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown
> Source)
>
> at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
> Source)
>
> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
> Source)
>
> at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>
> at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
Hi steve

Thx for help . We are on Hadoop3.2 ,however we are building Hadoop3.2 with
Java 8 .

Do you suggest to build Hadoop with Java 11

Regards
Pralabh kumar

On Mon, 13 Jun 2022, 15:25 Steve Loughran,  wrote:

>
>
> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
> wrote:
>
>> Hi Dev team
>>
>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>> the following error . The same error is mentioned in Spark docs
>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>> arrow.
>>
>>
>>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>>based on Hadoop 2.10 which is not java 11 compatible.
>>
>>
> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>
>
> Please let me know if this understanding is correct and is there a way to
>> fix it.
>>
>
>
>
> upgrade to a version of hadoop with the fix. That's any version >= hadoop
> 3.2.0 which shipped since 2018
>
>>
>>
>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>> sun.nio.ch.DirectBuffer.cleaner()'
>>
>> at
>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>
>> at
>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>
>> at
>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>
>> at java.base/java.io.FilterInputStream.close(Unknown Source)
>>
>> at
>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>
>> at
>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>
>> at
>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>
>> at
>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>
>> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>
>> at scala.util.Success.map(Try.scala:213)
>>
>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>
>> at
>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>
>> at
>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>
>> at
>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>> Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>> Source)
>>
>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>> Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>
>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
Steve . Thx for your help ,please ignore last comment.

Regards
Pralabh Kumar

On Mon, 13 Jun 2022, 15:43 Pralabh Kumar,  wrote:

> Hi steve
>
> Thx for help . We are on Hadoop3.2 ,however we are building Hadoop3.2 with
> Java 8 .
>
> Do you suggest to build Hadoop with Java 11
>
> Regards
> Pralabh kumar
>
> On Mon, 13 Jun 2022, 15:25 Steve Loughran,  wrote:
>
>>
>>
>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
>> wrote:
>>
>>> Hi Dev team
>>>
>>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>>> the following error . The same error is mentioned in Spark docs
>>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>>> arrow.
>>>
>>>
>>>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>>>based on Hadoop 2.10 which is not java 11 compatible.
>>>
>>>
>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>
>>
>> Please let me know if this understanding is correct and is there a way to
>>> fix it.
>>>
>>
>>
>>
>> upgrade to a version of hadoop with the fix. That's any version >= hadoop
>> 3.2.0 which shipped since 2018
>>
>>>
>>>
>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>> sun.nio.ch.DirectBuffer.cleaner()'
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>>
>>> at java.base/java.io.FilterInputStream.close(Unknown Source)
>>>
>>> at
>>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>>
>>> at
>>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>>
>>> at
>>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>>
>>> at
>>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>>
>>> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>>
>>> at scala.util.Success.map(Try.scala:213)
>>>
>>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>>
>>> at
>>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>>> Source)
>>>
>>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>>
>>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
Hi Steve / Dev team

Thx for the help . Have a quick question ,  How can we fix the above error
in Hadoop 3.1 .

   - Spark docker file have (Java 11)
   
https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile

   - Now if we build Spark32  , Spark image will be having Java 11 .  If we
   run on a Hadoop version less than 3.2 , it will throw an exception.



   - Should there be a separate docker file for Spark32 for Java 8 for
   Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
   fine in our environment (with Hadoop3.1)


Regards
Pralabh Kumar



On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran  wrote:

>
>
> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
> wrote:
>
>> Hi Dev team
>>
>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>> the following error . The same error is mentioned in Spark docs
>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>> arrow.
>>
>>
>>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>>based on Hadoop 2.10 which is not java 11 compatible.
>>
>>
> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>
>
> Please let me know if this understanding is correct and is there a way to
>> fix it.
>>
>
>
>
> upgrade to a version of hadoop with the fix. That's any version >= hadoop
> 3.2.0 which shipped since 2018
>
>>
>>
>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>> sun.nio.ch.DirectBuffer.cleaner()'
>>
>> at
>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>
>> at
>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>
>> at
>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>
>> at java.base/java.io.FilterInputStream.close(Unknown Source)
>>
>> at
>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>
>> at
>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>
>> at
>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>
>> at
>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>
>> at
>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>
>> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>
>> at scala.util.Success.map(Try.scala:213)
>>
>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>
>> at
>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>
>> at
>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>
>> at
>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>> Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>> Source)
>>
>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>> Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>
>> at
>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>
>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-14 Thread Martin Grigorov
Hi Pralabh,

The Dockerfile defines and ARG for the JDK version:
https://github.com/apache/spark/blob/861df43e8d022f51727e0a12a7cca5e119e3c4cc/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile#L17
That means you could use --build-arg to overwrite it when building the
image. See
https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg

On Tue, Jun 14, 2022 at 7:31 AM Pralabh Kumar 
wrote:

> Hi Steve / Dev team
>
> Thx for the help . Have a quick question ,  How can we fix the above error
> in Hadoop 3.1 .
>
>- Spark docker file have (Java 11)
>
> https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
>
>- Now if we build Spark32  , Spark image will be having Java 11 .  If
>we run on a Hadoop version less than 3.2 , it will throw an exception.
>
>
>
>- Should there be a separate docker file for Spark32 for Java 8 for
>Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
>fine in our environment (with Hadoop3.1)
>
>
> Regards
> Pralabh Kumar
>
>
>
> On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran 
> wrote:
>
>>
>>
>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
>> wrote:
>>
>>> Hi Dev team
>>>
>>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>>> the following error . The same error is mentioned in Spark docs
>>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>>> arrow.
>>>
>>>
>>>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>>>based on Hadoop 2.10 which is not java 11 compatible.
>>>
>>>
>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>
>>
>> Please let me know if this understanding is correct and is there a way to
>>> fix it.
>>>
>>
>>
>>
>> upgrade to a version of hadoop with the fix. That's any version >= hadoop
>> 3.2.0 which shipped since 2018
>>
>>>
>>>
>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>> sun.nio.ch.DirectBuffer.cleaner()'
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>>
>>> at java.base/java.io.FilterInputStream.close(Unknown Source)
>>>
>>> at
>>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>>
>>> at
>>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>>
>>> at
>>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>>
>>> at
>>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>>
>>> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>>
>>> at scala.util.Success.map(Try.scala:213)
>>>
>>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>>
>>> at
>>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>>> Source)
>>>
>>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>>
>>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-14 Thread Steve Loughran
hadoop 3.2.x is the oldest of the hadoop branch 3 branches which gets
active security patches, as was done last month. I would strongly recommend
using it unless there are other compatibility issues (hive?)

On Tue, 14 Jun 2022 at 05:31, Pralabh Kumar  wrote:

> Hi Steve / Dev team
>
> Thx for the help . Have a quick question ,  How can we fix the above error
> in Hadoop 3.1 .
>
>- Spark docker file have (Java 11)
>
> https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
>
>- Now if we build Spark32  , Spark image will be having Java 11 .  If
>we run on a Hadoop version less than 3.2 , it will throw an exception.
>
>
>
>- Should there be a separate docker file for Spark32 for Java 8 for
>Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
>fine in our environment (with Hadoop3.1)
>
>
> Regards
> Pralabh Kumar
>
>
>
> On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran 
> wrote:
>
>>
>>
>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
>> wrote:
>>
>>> Hi Dev team
>>>
>>> I have a spark32 image with Java 11 (Running Spark on K8s) .  While
>>> reading a huge parquet file via  spark.read.parquet("") .  I am getting
>>> the following error . The same error is mentioned in Spark docs
>>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
>>> arrow.
>>>
>>>
>>>- IMHO , I think the error is coming from Parquet 1.12.1  which is
>>>based on Hadoop 2.10 which is not java 11 compatible.
>>>
>>>
>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>
>>
>> Please let me know if this understanding is correct and is there a way to
>>> fix it.
>>>
>>
>>
>>
>> upgrade to a version of hadoop with the fix. That's any version >= hadoop
>> 3.2.0 which shipped since 2018
>>
>>>
>>>
>>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner
>>> sun.nio.ch.DirectBuffer.cleaner()'
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)
>>>
>>> at
>>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)
>>>
>>> at java.base/java.io.FilterInputStream.close(Unknown Source)
>>>
>>> at
>>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)
>>>
>>> at
>>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)
>>>
>>> at
>>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
>>>
>>> at
>>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
>>>
>>> at scala.util.Success.$anonfun$map$1(Try.scala:255)
>>>
>>> at scala.util.Success.map(Try.scala:213)
>>>
>>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>>>
>>> at
>>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>>>
>>> at
>>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
>>> Source)
>>>
>>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown
>>> Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
>>>
>>> at
>>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
>>>
>>


Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-14 Thread Pralabh Kumar
Thx for the reply @Steve Loughran   @Martin . It helps
. However just a minor suggestion


   - Should we update the documentation
   https://spark.apache.org/docs/latest/#downloading , which talks
about   java.nio.DirectByteBuffer
   . We can add another case , where user will get the same error for Spark on
   K8s on Spark32 running on version < Hadoop 3.2 (since the default value in
   Docker file for Spark32 is Java 11)

Please let me know if it make sense to you.

Regards
Pralabh Kumar

On Tue, Jun 14, 2022 at 4:21 PM Steve Loughran  wrote:

> hadoop 3.2.x is the oldest of the hadoop branch 3 branches which gets
> active security patches, as was done last month. I would strongly recommend
> using it unless there are other compatibility issues (hive?)
>
> On Tue, 14 Jun 2022 at 05:31, Pralabh Kumar 
> wrote:
>
>> Hi Steve / Dev team
>>
>> Thx for the help . Have a quick question ,  How can we fix the above
>> error in Hadoop 3.1 .
>>
>>- Spark docker file have (Java 11)
>>
>> https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
>>
>>- Now if we build Spark32  , Spark image will be having Java 11 .  If
>>we run on a Hadoop version less than 3.2 , it will throw an exception.
>>
>>
>>
>>- Should there be a separate docker file for Spark32 for Java 8 for
>>Hadoop version < 3.2 .  Spark 3.0.1 have Java 8 in docker file which works
>>fine in our environment (with Hadoop3.1)
>>
>>
>> Regards
>> Pralabh Kumar
>>
>>
>>
>> On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran 
>> wrote:
>>
>>>
>>>
>>> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar 
>>> wrote:
>>>
 Hi Dev team

 I have a spark32 image with Java 11 (Running Spark on K8s) .  While
 reading a huge parquet file via  spark.read.parquet("") .  I am
 getting the following error . The same error is mentioned in Spark docs
 https://spark.apache.org/docs/latest/#downloading but w.r.t to apache
 arrow.


- IMHO , I think the error is coming from Parquet 1.12.1  which is
based on Hadoop 2.10 which is not java 11 compatible.


>>> correct. see https://issues.apache.org/jira/browse/HADOOP-12760
>>>
>>>
>>> Please let me know if this understanding is correct and is there a way
 to fix it.

>>>
>>>
>>>
>>> upgrade to a version of hadoop with the fix. That's any version >=
>>> hadoop 3.2.0 which shipped since 2018
>>>


 java.lang.NoSuchMethodError: 'sun.misc.Cleaner
 sun.nio.ch.DirectBuffer.cleaner()'

 at
 org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41)

 at
 org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687)

 at
 org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320)

 at java.base/java.io.FilterInputStream.close(Unknown
 Source)

 at
 org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50)

 at
 org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299)

 at
 org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54)

 at
 org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44)

 at
 org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467)

 at
 org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)

 at
 scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)

 at scala.util.Success.$anonfun$map$1(Try.scala:255)

 at scala.util.Success.map(Try.scala:213)

 at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)

 at
 scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)

 at
 scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)

 at
 scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)

 at
 java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown
 Source)

 at
 java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)

 at
 java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
 Source)

 at
 java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)

 at
 java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)

 at
 java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)

>>>