Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
danny0405 closed issue #9902: [SUPPORT] HoodieCompaction with schema parse NullPointerException URL: https://github.com/apache/hudi/issues/9902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
danny0405 commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1809448828 Fixed via: https://github.com/apache/hudi/pull/9984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
watermelon12138 commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1797709598 @zyclove @ad1happy2go I try to reproduce this。I can only explain that candidateCommitFile can be null, but I can't explain why fileSchema is empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
ad1happy2go commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1786764223 @zyclove ```In addition, when submitting a task with spark-submit, in addition to adding configuration in the code or specifying a configuration file, can the configuration be added dynamically when submitting the task?``` - If you meant to pass configuration with spark submit, then You can use --conf to dynamically pass more configuration to hudi job. To triage this issue `Cannot parse schema` Can you help us reproduce this with sample script/data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1784029959 The new hudi table also error with follow exception. ``` Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2610) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2559) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2558) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2558) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1200) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1200) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1200) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2798) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2740) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2729) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:978) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2215) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362) at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361) at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) at org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177) at org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113) ... 85 more Caused by: org.apache.avro.SchemaParseException: Cannot parse schema at org.apache.avro.Schema.parse(Schema.java:1633) at org.apache.avro.Schema$Parser.parse(Schema.java:1430) at org.apache.avro.Schema$Parser.parse(Schema.java:1418) at org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:220) at org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:226) at org.apache.hudi.table.action.commit.HoodieMergeHelper.composeSchemaEvolutionTransformer(HoodieMergeHelper.java:177) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:94) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235) at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64) at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237) at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780588181 This issue should add follow . // set latest schema if (StringUtils.isNullOrEmpty(avroSchema)) { avroSchema = latestHistorySchema; } ``` Caused by: org.apache.avro.SchemaParseException: Cannot parse schema at org.apache.avro.Schema.parse(Schema.java:1633) at org.apache.avro.Schema$Parser.parse(Schema.java:1430) at org.apache.avro.Schema$Parser.parse(Schema.java:1418) at org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:220) at org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:226) at org.apache.hudi.table.action.commit.HoodieMergeHelper.composeSchemaEvolutionTransformer(HoodieMergeHelper.java:177) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:94) ``` ```java public static InternalSchema getInternalSchemaByVersionId(long versionId, String tablePath, Configuration hadoopConf, String validCommits) { String avroSchema = ""; Set commitSet = Arrays.stream(validCommits.split(",")).collect(Collectors.toSet()); List validateCommitList = commitSet.stream().map(HoodieInstant::extractTimestamp).collect(Collectors.toList()); FileSystem fs = FSUtils.getFs(tablePath, hadoopConf); Path hoodieMetaPath = new Path(tablePath, HoodieTableMetaClient.METAFOLDER_NAME); //step1: Path candidateCommitFile = commitSet.stream().filter(fileName -> HoodieInstant.extractTimestamp(fileName).equals(versionId + "")) .findFirst().map(f -> new Path(hoodieMetaPath, f)).orElse(null); if (candidateCommitFile != null) { try { byte[] data; try (FSDataInputStream is = fs.open(candidateCommitFile)) { data = FileIOUtils.readAsByteArray(is); } catch (IOException e) { throw e; } HoodieCommitMetadata metadata = HoodieCommitMetadata.fromBytes(data, HoodieCommitMetadata.class); String latestInternalSchemaStr = metadata.getMetadata(SerDeHelper.LATEST_SCHEMA); avroSchema = metadata.getMetadata(HoodieCommitMetadata.SCHEMA_KEY); if (latestInternalSchemaStr != null) { return SerDeHelper.fromJson(latestInternalSchemaStr).orElse(null); } } catch (Exception e1) { // swallow this exception. LOG.warn(String.format("Cannot find internal schema from commit file %s. Falling back to parsing historical internal schema", candidateCommitFile.toString())); } } // step2: FileBasedInternalSchemaStorageManager fileBasedInternalSchemaStorageManager = new FileBasedInternalSchemaStorageManager(hadoopConf, new Path(tablePath)); String latestHistorySchema = fileBasedInternalSchemaStorageManager.getHistorySchemaStrByGivenValidCommits(validateCommitList); if (latestHistorySchema.isEmpty()) { return InternalSchema.getEmptyInternalSchema(); } InternalSchema fileSchema = InternalSchemaUtils.searchSchema(versionId, SerDeHelper.parseSchemas(latestHistorySchema)); **// set latest schema if (StringUtils.isNullOrEmpty(avroSchema)) { avroSchema = latestHistorySchema; }** // step3: return fileSchema.isEmptySchema() ? AvroInternalSchemaConverter.convert(HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(avroSchema))) : fileSchema; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780347759 In addition, when submitting a task with spark-submit, in addition to adding configuration in the code or specifying a configuration file, can the configuration be added dynamically when submitting the task? @ad1happy2go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780345033 [hoodie.avro.schema.external.transformation](https://hudi.apache.org/docs/configurations#hoodieavroschemaexternaltransformation) Check the hudi code to see if you can set this configuration to true. ```java public static final ConfigProperty AVRO_EXTERNAL_SCHEMA_TRANSFORMATION_ENABLE = ConfigProperty .key(AVRO_SCHEMA_STRING.key() + ".external.transformation") .defaultValue("false") .withAlternatives(AVRO_SCHEMA_STRING.key() + ".externalTransformation") .markAdvanced() .withDocumentation("When enabled, records in older schema are rewritten into newer schema during upsert,delete and background" + " compaction,clustering operations."); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780338605 @ad1happy2go In another task, after upgrading to version 0.14, field incompatibility issues were reported. Can it be restored without rebuilding the data table? For example, through the Schema Evolution feature ``` Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema type: "long" since the old schema type is: "string" at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:387) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:369) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:79) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema type: "long" since the old schema type is: "string" at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:75) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:147) ... 32 more Caused by: org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema type: "long" since the old schema type is: "string" at org.apache.hudi.avro.HoodieAvroUtils.rewritePrimaryTypeWithDiffSchemaType(HoodieAvroUtils.java:1083) at org.apache.hudi.avro.HoodieAvroUtils.rewritePrimaryType(HoodieAvroUtils.java:1001) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:946) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:944) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:902) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873) at org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:843) at org.apache.hudi.common.model.HoodieAvroIndexedRecord.rewriteRecordWithNewSchema(HoodieAvroIndexedRecord.java:123) at org.apache.hudi.table.action.commit.HoodieMergeHelper.lambda$composeSchemaEvolutionTransformer$2(HoodieMergeHelper.java:209) at org.apache.hudi.table.action.commit.HoodieMergeHelper.lambda$runMerge$0(HoodieMergeHelper.java:134) at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:68) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1779022892 @ad1happy2go In this case, the specified schema file does not exist. What could be the cause? Can I delete ods_smart_device_relation_batch/.hoodie/.schema/ under the data table and restore it automatically? The schema has not been obtained. Is there any way to manually generate or specify the schema? ``` Caused by: org.apache.hudi.exception.HoodieCompactionException: Could not compact s3://big-data-us//hudi/bi/bi_ods/ods_smart_device_relation_batch at org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:129) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:155) at org.apache.hudi.client.BaseHoodieTableServiceClient.compact(BaseHoodieTableServiceClient.java:297) at org.apache.hudi.client.BaseHoodieTableServiceClient.lambda$runAnyPendingCompactions$5(BaseHoodieTableServiceClient.java:250) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.apache.hudi.client.BaseHoodieTableServiceClient.runAnyPendingCompactions(BaseHoodieTableServiceClient.java:248) at org.apache.hudi.client.BaseHoodieTableServiceClient.inlineCompaction(BaseHoodieTableServiceClient.java:187) at org.apache.hudi.client.BaseHoodieTableServiceClient.runTableServicesInline(BaseHoodieTableServiceClient.java:534) at org.apache.hudi.client.BaseHoodieWriteClient.runTableServicesInline(BaseHoodieWriteClient.java:584) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:252) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1059) at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:441) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:112) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
ad1happy2go commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776503543 Yeah The version gets automatically upgrades when you write using new version. 0.14.0 uses table version 6. So the behaviour is expected. Not sure why it failed though. I will also create a table using 0.12.3 and try to upgrade and see i get any issues. Do you use slack? If yes, you can join hudi community slack and we can sync up there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776497904 Is there a WeChat group or other communication group where we can communicate with each other? The community group I joined before felt very inactive, and no one discussed the issues. @ad1happy2go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776496036 @ad1happy2go This issue is the same as https://github.com/apache/hudi/issues/9016 . This problem is caused by the upgrade to version 0.14. After the upgrade, this problem suddenly occurred after running for a few days. After working on it all morning yesterday, there was really nothing I could do, so I cleaned up the historical data and ran it again, and it became normal afterwards. Is it still caused by version compatibility issues? It was 0.12.3 before. After directly upgrading the 0.14 bundle package, I found that the version of the hoodie.properties file in the data table changed from 5 to 6. Does this mean that the version has been upgraded normally? There is no manual upgrade table operation through commands. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
ad1happy2go commented on issue #9902: URL: https://github.com/apache/hudi/issues/9902#issuecomment-1775072273 @zyclove Thanks for raising this. Looks like compaction is throwing out this Exception with those schema configuration. I will try to triage this. Can you help us with some sample data or sample script which can help us to reproduce this issue. I tried to reproduce using below code and see compaction happening fine - ``` SET hoodie.schema.on.read.enable=true; SET hoodie.datasource.write.reconcile.schema=true; SET hoodie.avro.schema.validate=true; SET hoodie.datasource.write.new.columns.nullable=true; CREATE TABLE hudi_table ( ts BIGINT, uuid STRING, rider STRING, driver STRING, fare DECIMAL(10,4), city STRING ) USING HUDI tblproperties ( type = 'mor', primaryKey = 'uuid', preCombineField = 'ts' ,hoodie.datasource.write.new.columns.nullable = 'true' ,hoodie.avro.schema.validate = 'true' ,hoodie.schema.on.read.enable = 'true' ,hoodie.datasource.write.reconcile.schema = 'true' ) PARTITIONED BY (city); -- Tried multiple insert commands with multiple values and confirmed compaction is happening fine. INSERT INTO hudi_table VALUES (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',11.0001,'san_francisco'), (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',11.0001 ,'san_francisco'); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]
zyclove opened a new issue, #9902: URL: https://github.com/apache/hudi/issues/9902 **Describe the problem you faced** hudi schema error. **To Reproduce** Steps to reproduce the behavior: 1. spark-sql hudi task appears suddenly during operation **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.14.0 * Spark version :3.2.1 * Hive version :3.1.3 * Hadoop version :3.2.2 * Storage (HDFS/S3/GCS..) :s3 * Running on Docker? (yes/no) : no **Additional context** add follow config, it does not work. ``` #hoodie.datasource.write.new.columns.nullable=true #hoodie.avro.schema.validate=true #hoodie.schema.on.read.enable=true #hoodie.datasource.write.reconcile.schema=true ``` Add any other context about the problem here. **Stacktrace** ``` org.apache.hudi.exception.HoodieCompactionException: Could not compact s3://big-data-us/hudi/tables/bi_dw/dwd_device_dealer_relation_rt at org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:129) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:155) at org.apache.hudi.client.BaseHoodieTableServiceClient.compact(BaseHoodieTableServiceClient.java:297) at org.apache.hudi.client.BaseHoodieTableServiceClient.lambda$runAnyPendingCompactions$5(BaseHoodieTableServiceClient.java:250) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.apache.hudi.client.BaseHoodieTableServiceClient.runAnyPendingCompactions(BaseHoodieTableServiceClient.java:248) at org.apache.hudi.client.BaseHoodieTableServiceClient.inlineCompaction(BaseHoodieTableServiceClient.java:187) at org.apache.hudi.client.BaseHoodieTableServiceClient.runTableServicesInline(BaseHoodieTableServiceClient.java:534) at org.apache.hudi.client.BaseHoodieWriteClient.runTableServicesInline(BaseHoodieWriteClient.java:584) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:252) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1059) at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:441) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3751) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3749) at org.apache.spark.sql.Dataset.(Dataset.scala:230) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) at