[ https://issues.apache.org/jira/browse/SPARK-40074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anuj Gargava updated SPARK-40074: --------------------------------- Affects Version/s: 3.2.2 > Error while creating dataset in Java spark-3.x using Encoders bean with Dense > Vector. (Issue arises when updating spark from 2.4 to 3.x) > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-40074 > URL: https://issues.apache.org/jira/browse/SPARK-40074 > Project: Spark > Issue Type: Bug > Components: Java API, ML, SQL > Affects Versions: 3.1.2, 3.2.2 > Environment: Scala 2.12 > Spark 3.x > Reporter: Anuj Gargava > Priority: Major > > Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also > scala is upgraded from 2.11 to 2.12). > This java code below used to work with spark 2.4 but when migrated to 3.x it > gives the error (mentioned below) I have done my own research but couldn't > find a solution or any related information. > > > {code:java|title=Code.java|borderStyle=solid} > public void test() { > final SparkSession spark = SparkSession.builder() > .appName("Test") > .getOrCreate(); > DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, > 0.24})); > DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, > 0.32})); > final List<DenseClass> inputsNew = Arrays.asList(denseFactor1, denseFactor2); > final Dataset<DenseClass> denseVectorDf = spark.createDataset(inputsNew, > Encoders.bean(DenseClass.class)); > denseVectorDf.printSchema(); > } > public static class DenseClass implements Serializable > { private org.apache.spark.ml.linalg.DenseVector denseVector; }{code} > The error occurs while creating the dataset *denseVectorDf* . > Error > > {noformat} > }} > {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from > struct<> to > struct<type:tinyint,size:int,indices:array<int>,values:array<double>>. > The type path of the target object is: > - field (class: "org.apache.spark.ml.linalg.DenseVector", name: > "denseVector") > You can either add an explicit cast to the input data or choose a higher > precision type of the field in the target object}} > {{{noformat} > I have tried to use _double_ instead of dense vector and it works just fine, > but fails on using the dense vector with encoders bean. > > StackOverflow link for the issue: > [https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve] > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org