Steven Aerts created SPARK-53790:
------------------------------------
Summary: arrow encoder is unable to handle Beans with instantiated
generics
Key: SPARK-53790
URL: https://issues.apache.org/jira/browse/SPARK-53790
Project: Spark
Issue Type: Bug
Components: Connect
Affects Versions: 4.1.0
Reporter: Steven Aerts
We found a bug in the arrow encoder/decoder for spark connect which is unable
to handle beans with instantiated generics. Like:
```
class JavaBeanWithGenerics[T] {
@BeanProperty var value: T = _
}
class JavaBeanWithGenericsWrapper {
@BeanProperty var value: JavaBeanWithGenerics[String] = _
}
...
val encoder = JavaTypeInference.encoderFor(classOf[JavaBeanWithGenericsWrapper])
```
Which results in the following error when the above encoder is used in connect:
```
java.lang.NoSuchMethodException: no such method:
JavaBeanWithGenerics.getValue()String/invokeVirtual
at
java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:915)
at
java.base/java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:994)
at
java.base/java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3750)
at
java.base/java.lang.invoke.MethodHandles$Lookup.findVirtual(MethodHandles.java:2767)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.$anonfun$serializerFor$19(ArrowSerializer.scala:495)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.$anonfun$serializerFor$19$adapted(ArrowSerializer.scala:491)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.$anonfun$structSerializerFor$1(ArrowSerializer.scala:549)
at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75)
at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.structSerializerFor(ArrowSerializer.scala:547)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.serializerFor(ArrowSerializer.scala:491)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.$anonfun$structSerializerFor$1(ArrowSerializer.scala:548)
at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75)
at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.structSerializerFor(ArrowSerializer.scala:547)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.serializerFor(ArrowSerializer.scala:491)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.serializerFor(ArrowSerializer.scala:237)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer.<init>(ArrowSerializer.scala:62)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$$anon$1.<init>(ArrowSerializer.scala:161)
at
org.apache.spark.sql.connect.client.arrow.ArrowSerializer$.serialize(ArrowSerializer.scala:160)
```
We have have a possible patch for this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]