[ https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313858#comment-17313858 ]
sivabalan narayanan commented on HUDI-1453: ------------------------------------------- double to int is not backwards compatible schema evolution. if schema compatibility check is enabled, it will fail. ``` {{scala> dfFromData5.write.format("hudi"). | options(getQuickstartWriteConfigs). | option(PRECOMBINE_FIELD_OPT_KEY, "preComb"). | option(RECORDKEY_FIELD_OPT_KEY, "rowId"). | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionId"). | option("hoodie.index.type","SIMPLE"). | option(TABLE_NAME, tableName). | option("hoodie.avro.schema.validate","true"). | mode(Append). | save(basePath) org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check. at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:629) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:152) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:186) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230) ... 72 elided Caused by: org.apache.hudi.exception.HoodieException: Failed schema compatibility check for writerSchema :\{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"rowId","type":["string","null"]},\{"name":"partitionId","type":["string","null"]},\{"name":"preComb","type":["long","null"]},\{"name":"name","type":["string","null"]},\{"name":"versionId","type":["string","null"]},\{"name":"doubleToInt","type":["int","null"]}]}, table schema :\{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"rowId","type":["string","null"]},\{"name":"partitionId","type":["string","null"]},\{"name":"preComb","type":["long","null"]},\{"name":"name","type":["string","null"]},\{"name":"versionId","type":["string","null"]},\{"name":"doubleToInt","type":["double","null"]}]}, base path :file:/tmp/hudi_trips_cow at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:621) at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:627) ... 97 more}} {{```}} > Throw Exception when input data schema is not equal to the hoodie table schema > ------------------------------------------------------------------------------ > > Key: HUDI-1453 > URL: https://issues.apache.org/jira/browse/HUDI-1453 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core > Affects Versions: 0.9.0 > Reporter: pengzhiwei > Assignee: pengzhiwei > Priority: Major > Labels: pull-request-available, sev:high, user-support-issues > Fix For: 0.9.0 > > > The hoodie table *h0's* schema is : > {code:java} > (id long, price double){code} > when I write the *dataframe* to *h0* with the follow schema: > {code:java} > (id long, price int){code} > An Exception is threw as follow: > {code:java} > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)at > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:102) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 > moreCaused by: java.lang.UnsupportedOperationException: > org.apache.parquet.avro.AvroConverters$FieldIntegerConverter at > org.apache.parquet.io.api.PrimitiveConverter.addDouble(PrimitiveConverter.java:84) > at > org.apache.parquet.column.impl.ColumnReaderImpl$2$2.writeValue(ColumnReaderImpl.java:228) > at > org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367) > at > org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) > ... 11 more > {code} > I have enable the *AVRO_SCHEMA_VALIDATE,* it *can pass the schema validate > in HoodieTable#validateUpsertSchema,* so it is right to write the "int" data > to the "double" field in hoodie. -- This message was sent by Atlassian Jira (v8.3.4#803005)