Re: cannot access class sun.nio.ch.DirectBuffer

2022-04-13 Thread Sean Owen
It is not officially supported, yes. Try Spark 3.3 from the branch if you
want to try Java 17

On Wed, Apr 13, 2022, 9:36 PM Arunachalam Sibisakkaravarthi <
arunacha...@mcruncher.com> wrote:

> Thanks everyone for giving your feedback.
> Jvm option "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" resolved the
> issue "cannot access class sun.nio.ch.DirectBuffer"
> But still Spark throws some other exception
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
> 0.0 (TID 0) (ldap executor driver): java.io.InvalidObjectException:
> ReflectiveOperationException during deserialization
> at java.base/java.lang.invoke.SerializedLambda.readResolve(Unknown Source)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
> Source)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
>
> Caused by: java.lang.reflect.InvocationTargetException: null
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
> Source)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
> at java.base/java.lang.reflect.Method.invoke(Unknown Source)
> ... 86 common frames omitted
>
> Caused by: java.lang.IllegalArgumentException: too many arguments
> at java.base/java.lang.invoke.LambdaMetafactory.altMetafactory(Unknown
> Source)
> at
> scala.runtime.LambdaDeserializer$.makeCallSite$1(LambdaDeserializer.scala:105)
> at
> scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:114)
> at
> scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
>
> Maybe we need to change the subject to say "spark-sql_2.12 doesn't work
> with jdk17 " or should I open another discussion?
>
>
>
>
>
>
>
>
>
> *Thanks And RegardsSibi.ArunachalammCruncher*
>
>
> On Wed, Apr 13, 2022 at 10:16 PM Sean Owen  wrote:
>
>> Yes I think that's a change that has caused difficulties, but, these
>> internal APIs were always discouraged. Hey, one is even called 'unsafe'.
>> There is an escape hatch, the JVM arg below.
>>
>> On Wed, Apr 13, 2022, 9:09 AM Andrew Melo  wrote:
>>
>>> Gotcha. Seeing as there's a lot of large projects who used the unsafe
>>> API either directly or indirectly (via netty, etc..) it's a bit surprising
>>> that it was so thoroughly closed off without an escape hatch, but I'm sure
>>> there was a lively discussion around it...
>>>
>>> Cheers
>>> Andrew
>>>
>>> On Wed, Apr 13, 2022 at 09:07 Sean Owen  wrote:
>>>
 It is intentionally closed by the JVM going forward, as direct access
 is discouraged. But it's still necessary for Spark. In some cases, like
 direct mem access, there is a new API but it's in Java 17 I think, and we
 can't assume Java 17 any time soon.

 On Wed, Apr 13, 2022 at 9:05 AM Andrew Melo 
 wrote:

> Hi Sean,
>
> Out of curiosity, will Java 11+ always require special flags to access
> the unsafe direct memory interfaces, or is this something that will either
> be addressed by the spec (by making an "approved" interface) or by
> libraries (with some other workaround)?
>
> Thanks
> Andrew
>
> On Tue, Apr 12, 2022 at 08:45 Sean Owen  wrote:
>
>> In Java 11+, you will need to tell the JVM to allow access to
>> internal packages in some cases, for any JVM application. You will need
>> flags like "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", which you
>> can see in the pom.xml file for the project.
>>
>> Spark 3.2 does not necessarily work with Java 17 (3.3 should have
>> support), but it may well work after you address those flags.
>>
>> On Tue, Apr 12, 2022 at 7:05 AM Arunachalam Sibisakkaravarthi <
>> arunacha...@mcruncher.com> wrote:
>>
>>> Hi guys,
>>>
>>> spark-sql_2.12:3.2.1 is used in our application.
>>>
>>> It throws following exceptions when the app runs using JRE17
>>>
>>> java.lang.IllegalAccessError: class 
>>> org.apache.spark.storage.StorageUtils$ (in unnamed module @0x451f1bd4) 
>>> cannot access class sun.nio.ch.DirectBuffer (in module java.base) 
>>> because module java.base does not export sun.nio.ch to unnamed module 
>>> @0x451f1bd43 at 
>>> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)4  
>>>  at 
>>> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)5 at 
>>> org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)6
>>> at 
>>> org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)7at 
>>> org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)8
>>>at 

Re: cannot access class sun.nio.ch.DirectBuffer

2022-04-13 Thread Arunachalam Sibisakkaravarthi
Thanks everyone for giving your feedback.
Jvm option "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" resolved the
issue "cannot access class sun.nio.ch.DirectBuffer"
But still Spark throws some other exception

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage
0.0 (TID 0) (ldap executor driver): java.io.InvalidObjectException:
ReflectiveOperationException during deserialization
at java.base/java.lang.invoke.SerializedLambda.readResolve(Unknown Source)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)

Caused by: java.lang.reflect.InvocationTargetException: null
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
... 86 common frames omitted

Caused by: java.lang.IllegalArgumentException: too many arguments
at java.base/java.lang.invoke.LambdaMetafactory.altMetafactory(Unknown
Source)
at
scala.runtime.LambdaDeserializer$.makeCallSite$1(LambdaDeserializer.scala:105)
at
scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:114)
at
scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)

Maybe we need to change the subject to say "spark-sql_2.12 doesn't work
with jdk17 " or should I open another discussion?









*Thanks And RegardsSibi.ArunachalammCruncher*


On Wed, Apr 13, 2022 at 10:16 PM Sean Owen  wrote:

> Yes I think that's a change that has caused difficulties, but, these
> internal APIs were always discouraged. Hey, one is even called 'unsafe'.
> There is an escape hatch, the JVM arg below.
>
> On Wed, Apr 13, 2022, 9:09 AM Andrew Melo  wrote:
>
>> Gotcha. Seeing as there's a lot of large projects who used the unsafe API
>> either directly or indirectly (via netty, etc..) it's a bit surprising that
>> it was so thoroughly closed off without an escape hatch, but I'm sure there
>> was a lively discussion around it...
>>
>> Cheers
>> Andrew
>>
>> On Wed, Apr 13, 2022 at 09:07 Sean Owen  wrote:
>>
>>> It is intentionally closed by the JVM going forward, as direct access is
>>> discouraged. But it's still necessary for Spark. In some cases, like direct
>>> mem access, there is a new API but it's in Java 17 I think, and we can't
>>> assume Java 17 any time soon.
>>>
>>> On Wed, Apr 13, 2022 at 9:05 AM Andrew Melo 
>>> wrote:
>>>
 Hi Sean,

 Out of curiosity, will Java 11+ always require special flags to access
 the unsafe direct memory interfaces, or is this something that will either
 be addressed by the spec (by making an "approved" interface) or by
 libraries (with some other workaround)?

 Thanks
 Andrew

 On Tue, Apr 12, 2022 at 08:45 Sean Owen  wrote:

> In Java 11+, you will need to tell the JVM to allow access to internal
> packages in some cases, for any JVM application. You will need flags like
> "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", which you can see in
> the pom.xml file for the project.
>
> Spark 3.2 does not necessarily work with Java 17 (3.3 should have
> support), but it may well work after you address those flags.
>
> On Tue, Apr 12, 2022 at 7:05 AM Arunachalam Sibisakkaravarthi <
> arunacha...@mcruncher.com> wrote:
>
>> Hi guys,
>>
>> spark-sql_2.12:3.2.1 is used in our application.
>>
>> It throws following exceptions when the app runs using JRE17
>>
>> java.lang.IllegalAccessError: class 
>> org.apache.spark.storage.StorageUtils$ (in unnamed module @0x451f1bd4) 
>> cannot access class sun.nio.ch.DirectBuffer (in module java.base) 
>> because module java.base does not export sun.nio.ch to unnamed module 
>> @0x451f1bd43  at 
>> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)4   
>> at 
>> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)5 at 
>> org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)6
>> at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)7 
>>at 
>> org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)8
>>at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)9   at 
>> org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)10 at 
>> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)11   
>> at org.apache.spark.SparkContext.(SparkContext.scala:460)12
>>at 
>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)13   

Problems with DataFrameReader in Structured Streaming

2022-04-13 Thread Artemis User
We have a single file directory that's being used by both the file 
generator/publisher and the Spark job consumer.  When using microbatch 
files in structured streaming, we encountered the following problems:


1. We would like to have a Spark streaming job consume only data files
   after a predefined date/time.  While you could specify the
   "modifiedAfter" option in DataFrameReader, but that option isn't
   available when reading files in structured streaming.  Is any
   specific reason why is option isn't applicable to structured
   streaming since we are using the same reader API?  Is there anyway
   to circumvent this problem?
2. One common problem with structured streaming is that if a single
   file directory is used for both file producer and spark consumer in
   streaming, the Spark consumer will consume a file immediately even
   before the file generation is completed (i.e. before EOF marker is
   produced by the file producer), and won't re-read the file again
   after it's completed.  So you will end up with incomplete data
   content in your data frame.  We solved this problem by creating a
   separate consumer directory along with a customized script that
   moves files one at a time from the producer directory to the
   consumer directory after each file generation is completed.  But in
   a real production environment, this type customization may not be
   possible, as the operation folks usually don't like change any
   system configuration.  Is there any options in DataFrameReader or
   any other easy way to solve this problem?

Thanks for your help in advance!


Re: cannot access class sun.nio.ch.DirectBuffer

2022-04-13 Thread Sean Owen
Yes I think that's a change that has caused difficulties, but, these
internal APIs were always discouraged. Hey, one is even called 'unsafe'.
There is an escape hatch, the JVM arg below.

On Wed, Apr 13, 2022, 9:09 AM Andrew Melo  wrote:

> Gotcha. Seeing as there's a lot of large projects who used the unsafe API
> either directly or indirectly (via netty, etc..) it's a bit surprising that
> it was so thoroughly closed off without an escape hatch, but I'm sure there
> was a lively discussion around it...
>
> Cheers
> Andrew
>
> On Wed, Apr 13, 2022 at 09:07 Sean Owen  wrote:
>
>> It is intentionally closed by the JVM going forward, as direct access is
>> discouraged. But it's still necessary for Spark. In some cases, like direct
>> mem access, there is a new API but it's in Java 17 I think, and we can't
>> assume Java 17 any time soon.
>>
>> On Wed, Apr 13, 2022 at 9:05 AM Andrew Melo 
>> wrote:
>>
>>> Hi Sean,
>>>
>>> Out of curiosity, will Java 11+ always require special flags to access
>>> the unsafe direct memory interfaces, or is this something that will either
>>> be addressed by the spec (by making an "approved" interface) or by
>>> libraries (with some other workaround)?
>>>
>>> Thanks
>>> Andrew
>>>
>>> On Tue, Apr 12, 2022 at 08:45 Sean Owen  wrote:
>>>
 In Java 11+, you will need to tell the JVM to allow access to internal
 packages in some cases, for any JVM application. You will need flags like
 "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", which you can see in
 the pom.xml file for the project.

 Spark 3.2 does not necessarily work with Java 17 (3.3 should have
 support), but it may well work after you address those flags.

 On Tue, Apr 12, 2022 at 7:05 AM Arunachalam Sibisakkaravarthi <
 arunacha...@mcruncher.com> wrote:

> Hi guys,
>
> spark-sql_2.12:3.2.1 is used in our application.
>
> It throws following exceptions when the app runs using JRE17
>
> java.lang.IllegalAccessError: class 
> org.apache.spark.storage.StorageUtils$ (in unnamed module @0x451f1bd4) 
> cannot access class sun.nio.ch.DirectBuffer (in module java.base) because 
> module java.base does not export sun.nio.ch to unnamed module 
> @0x451f1bd43   at 
> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)4
>at 
> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)5 at 
> org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)6
> at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)7  
>   at 
> org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)8
>at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)9   at 
> org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)10 at 
> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)11
>at org.apache.spark.SparkContext.(SparkContext.scala:460)12  
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)13 
>at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)14
>at scala.Option.getOrElse(Option.scala:189)15   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
>
> How do we fix this?
>
>
>
>
> *Thanks And RegardsSibi.ArunachalammCruncher*
>
 --
>>> It's dark in this basement.
>>>
>> --
> It's dark in this basement.
>


Re: Grabbing the current MemoryManager in a plugin

2022-04-13 Thread Andrew Melo
Hello,

Any wisdom on the question below?

Thanks
Andrew

On Fri, Apr 8, 2022 at 16:04 Andrew Melo  wrote:

> Hello,
>
> I've implemented support for my DSv2 plugin to back its storage with
> ArrowColumnVectors, which necessarily means using off-heap memory. Is
> it possible to somehow grab either a reference to the current
> MemoryManager so that the off-heap memory usage is properly accounted
> for and to prevent inadvertently OOM-ing the system?
>
> Thanks
> Andrew
>
-- 
It's dark in this basement.


Re: cannot access class sun.nio.ch.DirectBuffer

2022-04-13 Thread Andrew Melo
Gotcha. Seeing as there's a lot of large projects who used the unsafe API
either directly or indirectly (via netty, etc..) it's a bit surprising that
it was so thoroughly closed off without an escape hatch, but I'm sure there
was a lively discussion around it...

Cheers
Andrew

On Wed, Apr 13, 2022 at 09:07 Sean Owen  wrote:

> It is intentionally closed by the JVM going forward, as direct access is
> discouraged. But it's still necessary for Spark. In some cases, like direct
> mem access, there is a new API but it's in Java 17 I think, and we can't
> assume Java 17 any time soon.
>
> On Wed, Apr 13, 2022 at 9:05 AM Andrew Melo  wrote:
>
>> Hi Sean,
>>
>> Out of curiosity, will Java 11+ always require special flags to access
>> the unsafe direct memory interfaces, or is this something that will either
>> be addressed by the spec (by making an "approved" interface) or by
>> libraries (with some other workaround)?
>>
>> Thanks
>> Andrew
>>
>> On Tue, Apr 12, 2022 at 08:45 Sean Owen  wrote:
>>
>>> In Java 11+, you will need to tell the JVM to allow access to internal
>>> packages in some cases, for any JVM application. You will need flags like
>>> "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", which you can see in
>>> the pom.xml file for the project.
>>>
>>> Spark 3.2 does not necessarily work with Java 17 (3.3 should have
>>> support), but it may well work after you address those flags.
>>>
>>> On Tue, Apr 12, 2022 at 7:05 AM Arunachalam Sibisakkaravarthi <
>>> arunacha...@mcruncher.com> wrote:
>>>
 Hi guys,

 spark-sql_2.12:3.2.1 is used in our application.

 It throws following exceptions when the app runs using JRE17

 java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ 
 (in unnamed module @0x451f1bd4) cannot access class 
 sun.nio.ch.DirectBuffer (in module java.base) because module java.base 
 does not export sun.nio.ch to unnamed module @0x451f1bd43at 
 org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)4 
   at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)5 
 at 
 org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)6
 at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)7   
  at 
 org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)8 
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)9   at 
 org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)10 at 
 org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)11 
   at org.apache.spark.SparkContext.(SparkContext.scala:460)12   
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)13   
  at 
 org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)14
at scala.Option.getOrElse(Option.scala:189)15   at 
 org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)

 How do we fix this?




 *Thanks And RegardsSibi.ArunachalammCruncher*

>>> --
>> It's dark in this basement.
>>
> --
It's dark in this basement.


Re: cannot access class sun.nio.ch.DirectBuffer

2022-04-13 Thread Sean Owen
It is intentionally closed by the JVM going forward, as direct access is
discouraged. But it's still necessary for Spark. In some cases, like direct
mem access, there is a new API but it's in Java 17 I think, and we can't
assume Java 17 any time soon.

On Wed, Apr 13, 2022 at 9:05 AM Andrew Melo  wrote:

> Hi Sean,
>
> Out of curiosity, will Java 11+ always require special flags to access the
> unsafe direct memory interfaces, or is this something that will either be
> addressed by the spec (by making an "approved" interface) or by libraries
> (with some other workaround)?
>
> Thanks
> Andrew
>
> On Tue, Apr 12, 2022 at 08:45 Sean Owen  wrote:
>
>> In Java 11+, you will need to tell the JVM to allow access to internal
>> packages in some cases, for any JVM application. You will need flags like
>> "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", which you can see in the
>> pom.xml file for the project.
>>
>> Spark 3.2 does not necessarily work with Java 17 (3.3 should have
>> support), but it may well work after you address those flags.
>>
>> On Tue, Apr 12, 2022 at 7:05 AM Arunachalam Sibisakkaravarthi <
>> arunacha...@mcruncher.com> wrote:
>>
>>> Hi guys,
>>>
>>> spark-sql_2.12:3.2.1 is used in our application.
>>>
>>> It throws following exceptions when the app runs using JRE17
>>>
>>> java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ 
>>> (in unnamed module @0x451f1bd4) cannot access class sun.nio.ch.DirectBuffer 
>>> (in module java.base) because module java.base does not export sun.nio.ch 
>>> to unnamed module @0x451f1bd43 at 
>>> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)4  
>>>  at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)5 at 
>>> org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)6
>>> at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)7
>>> at 
>>> org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)8  
>>>  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)9   at 
>>> org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)10 at 
>>> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)11  
>>>  at org.apache.spark.SparkContext.(SparkContext.scala:460)12   at 
>>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)13   
>>>  at 
>>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)14
>>>at scala.Option.getOrElse(Option.scala:189)15   at 
>>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
>>>
>>> How do we fix this?
>>>
>>>
>>>
>>>
>>> *Thanks And RegardsSibi.ArunachalammCruncher*
>>>
>> --
> It's dark in this basement.
>


Re: cannot access class sun.nio.ch.DirectBuffer

2022-04-13 Thread Andrew Melo
Hi Sean,

Out of curiosity, will Java 11+ always require special flags to access the
unsafe direct memory interfaces, or is this something that will either be
addressed by the spec (by making an "approved" interface) or by libraries
(with some other workaround)?

Thanks
Andrew

On Tue, Apr 12, 2022 at 08:45 Sean Owen  wrote:

> In Java 11+, you will need to tell the JVM to allow access to internal
> packages in some cases, for any JVM application. You will need flags like
> "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", which you can see in the
> pom.xml file for the project.
>
> Spark 3.2 does not necessarily work with Java 17 (3.3 should have
> support), but it may well work after you address those flags.
>
> On Tue, Apr 12, 2022 at 7:05 AM Arunachalam Sibisakkaravarthi <
> arunacha...@mcruncher.com> wrote:
>
>> Hi guys,
>>
>> spark-sql_2.12:3.2.1 is used in our application.
>>
>> It throws following exceptions when the app runs using JRE17
>>
>> java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ 
>> (in unnamed module @0x451f1bd4) cannot access class sun.nio.ch.DirectBuffer 
>> (in module java.base) because module java.base does not export sun.nio.ch to 
>> unnamed module @0x451f1bd43  at 
>> org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)4   
>> at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)5 at 
>> org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:110)6
>> at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)7
>> at 
>> org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)8   
>> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)9   at 
>> org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)10 at 
>> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)11   
>> at org.apache.spark.SparkContext.(SparkContext.scala:460)12   at 
>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)13
>> at 
>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)14
>>at scala.Option.getOrElse(Option.scala:189)15   at 
>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
>>
>> How do we fix this?
>>
>>
>>
>>
>> *Thanks And RegardsSibi.ArunachalammCruncher*
>>
> --
It's dark in this basement.


[Spark Streaming]: Why planInputPartitions is called multiple times for each micro-batch in Spark 3?

2022-04-13 Thread Hussain, Saghir
Hi All

While upgrading our custom streaming data source from Spark 2.4.5 to Spark 
3.2.1, we observed that the planInputPartitions() method in MicroBatchStream is 
being called multiple times(4 in our case) for each micro-batch in Spark 3.

The Apache Spark documentation also says that :
The method planInputPartitions will be called multiple times, to launch one 
Spark job for each micro-batch in this data 
stream.

What is the reason for this?

Thanks & Regards,
Saghir Hussain


Streaming partition-by data locality for state lookupon executor

2022-04-13 Thread Sandip Khanzode
Hello,

If I have a Kinesis stream split into multiple shards (say 10), can I have, say 
3 executors, subscribed to those shards? I assume that automatic re-balancing 
will be enabled as we add/remove executors for scale up/down or simply failures 
…

If so, can I specify a partition key? If I specify a partition key on the 
Kinesis producer, it will always send (Key=A) to say Shard 4 and (Key=B) to 
Shard 8 and this will be consistent I assume so long as the executors are up 
and no rebalancing occurs.

How can I map the payloads in the first Spark stage/task that receives the 
payload from Kinesis? What I would want to finally achieve is that the 
flatMapGroupWithState() that I would call later in the pipeline should have the 
same (partition) key internally for key lookups in the (RocksDB) state so that 
data locality can be achieved.

Is this redundant or implicit or not possible or am I missing something? Your 
response would be greatly helpful. If there is some documentation or examples 
around this, that would be good too!

Thanks,
Sandip Khanzode