[ https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788906#comment-17788906 ]
Marc Le Bihan edited comment on SPARK-45311 at 11/23/23 6:07 AM: ----------------------------------------------------------------- I'll do a try for 3.5.0. I believe that I was forced to change from _{color:#000000}RowEncoder{color}.apply({color:#000000}schema{color}) (3.3.x and 3.4.x)_ to {color:#000000}_RowEncoder.encoderFor(schema)_ by the wish of the 3.5.0 version.{color}because _{color:#000000}RowEncoder{color}.apply_ doesn't exist anymore. I didn't found an {{Encoders.row(...)}} method, one {{new Encoder(schema)}} was possible, but I found difficult to extract a {{{}ClassInfo<Row> getCls(){}}}. I've found and changed for this, below, and it seems to reach the workaround you wrote about in 3.5.x. (I'll will continue to check, later) : {code:java} ExpressionEncoder<Row> encoder = ExpressionEncoder.apply(schema); cible = cible.mapPartitions((MapPartitionsFunction<Row, Row>) it -> { List<Row> rows = new LinkedList<>(); while (it.hasNext()) { [...] rows.add(RowFactory.create(valeurs.toArray())); } return rows.iterator(); }, encoder); {code} ---- I've experienced the latest 3.4.2-SNAPSHOT version available (refreshed my fork 19h ago) to check for the problems related to java.util.NoSuchElementException: None.get and the generic types. And it improves the execution greatly. >From 22 (or around) failing tests, for the 3.4.0 or 3.4.1, the 3.4.2-SNAPSHOT >faces 4 failures only: but they look different than before. {code:java} ------------------------------------------------------------------------------- Test set: fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT ------------------------------------------------------------------------------- Tests run: 6, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 8.709 s <<< FAILURE! - in fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT catalogueJeuxDeDonneesEtRessources Time elapsed: 1.472 s <<< ERROR! java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueJeuxDeDonneesEtRessources(CatalogueDatagouvIT.java:161) catalogueDatasetsObjetsMetiersPagines Time elapsed: 1.03 s <<< ERROR! java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.lambda$catalogueDatasetsObjetsMetiersPagines$0(CatalogueDatagouvIT.java:105) at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueDatasetsObjetsMetiersPagines(CatalogueDatagouvIT.java:105) catalogueJeuxDeDonnees Time elapsed: 0.043 s <<< ERROR! java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueJeuxDeDonnees(CatalogueDatagouvIT.java:143) catalogueDatasetsObjetsMetiersPaginesStreames Time elapsed: 0.534 s <<< FAILURE! org.opentest4j.AssertionFailedError: Unexpected exception thrown: java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueDatasetsObjetsMetiersPaginesStreames(CatalogueDatagouvIT.java:126) Caused by: java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.lambda$catalogueDatasetsObjetsMetiersPaginesStreames$3(CatalogueDatagouvIT.java:127) at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.lambda$catalogueDatasetsObjetsMetiersPaginesStreames$4(CatalogueDatagouvIT.java:127) at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueDatasetsObjetsMetiersPaginesStreames(CatalogueDatagouvIT.java:126){code} was (Author: mlebihan): I'll do a try for 3.5.0. I believe that I was forced to change from _{color:#000000}RowEncoder{color}.apply({color:#000000}schema{color}) (3.3.x and 3.4.x)_ to {color:#000000}_RowEncoder.encoderFor(schema)_ by the wish of the 3.5.0 version. {color}because _{color:#000000}RowEncoder{color}.apply_ doesn't exist anymore. Does _{color:#000000}RowEncoder{color}.apply({color:#000000}schema{color})_ had the same behavior than _Encoders.row(schema), btw_ ? I'll check a second time and do few attempts. Thanks a lot. ---- I've experienced the latest 3.4.2-SNAPSHOT version available (refreshed my fork 19h ago) to check for the problems related to java.util.NoSuchElementException: None.get and the generic types. And it improves the execution greatly. >From 22 (or around) failing tests, for the 3.4.0 or 3.4.1, the 3.4.2-SNAPSHOT >faces 4 failures only: but they look different than before. {code:java} ------------------------------------------------------------------------------- Test set: fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT ------------------------------------------------------------------------------- Tests run: 6, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 8.709 s <<< FAILURE! - in fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT catalogueJeuxDeDonneesEtRessources Time elapsed: 1.472 s <<< ERROR! java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueJeuxDeDonneesEtRessources(CatalogueDatagouvIT.java:161) catalogueDatasetsObjetsMetiersPagines Time elapsed: 1.03 s <<< ERROR! java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.lambda$catalogueDatasetsObjetsMetiersPagines$0(CatalogueDatagouvIT.java:105) at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueDatasetsObjetsMetiersPagines(CatalogueDatagouvIT.java:105) catalogueJeuxDeDonnees Time elapsed: 0.043 s <<< ERROR! java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueJeuxDeDonnees(CatalogueDatagouvIT.java:143) catalogueDatasetsObjetsMetiersPaginesStreames Time elapsed: 0.534 s <<< FAILURE! org.opentest4j.AssertionFailedError: Unexpected exception thrown: java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueDatasetsObjetsMetiersPaginesStreames(CatalogueDatagouvIT.java:126) Caused by: java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.reflect.TypeVariable; ([Ljava.lang.Object; and [Ljava.lang.reflect.TypeVariable; are in module java.base of loader 'bootstrap') at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.lambda$catalogueDatasetsObjetsMetiersPaginesStreames$3(CatalogueDatagouvIT.java:127) at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.lambda$catalogueDatasetsObjetsMetiersPaginesStreames$4(CatalogueDatagouvIT.java:127) at fr.ecoemploi.adapters.outbound.spark.dataset.datagouv.CatalogueDatagouvIT.catalogueDatasetsObjetsMetiersPaginesStreames(CatalogueDatagouvIT.java:126){code} > Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search > for an encoder for a generic type, and since 3.5.x isn't "an expression > encoder" > ------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-45311 > URL: https://issues.apache.org/jira/browse/SPARK-45311 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.4.0, 3.4.1, 3.5.0 > Environment: Debian 12 > Java 17 > Underlying Spring-Boot 2.7.14 > Reporter: Marc Le Bihan > Priority: Major > > If you find it convenient, you might clone the > [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that > does many operations around cities, local authorities and accounting with > open data) where I've extracted from my work what's necessary to make a set > of 35 tests that run correctly with Spark 3.3.x, and show the troubles > encountered with 3.4.x and 3.5.x. > > It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark > 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails > with two problems: > > *1)* It throws *java.util.NoSuchElementException: None.get* messages > everywhere. > Asking over the Internet, I wasn't alone facing this problem. Reading it, > you'll see that I've attempted a debug but my Scala skills are low. > [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0] > {color:#172b4d}by the way, if possible, the encoder and decoder functions > should forward a parameter as soon as the name of the field being handled is > known, and then all the long of their process, so that when the encoder is at > any point where it has to throw an exception, it knows the field it is > handling in its specific call and can send a message like:{color} > {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the > method or field it was targeting]_{color} > > *2)* *Not found an encoder of the type RS to Spark SQL internal > representation.* Consider to change the input type to one of supported at > (...) > Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal > representation (...) > > where *RS* and *OMI_ID* are generic types. > This is strange. > [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge] > > *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, > but another add itself to the list: > "{*}Only expression encoders are supported for now{*}" on what was accepted > and working before. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org