Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-11-13 Thread via GitHub


danny0405 closed issue #9902: [SUPPORT] HoodieCompaction with schema parse 
NullPointerException
URL: https://github.com/apache/hudi/issues/9902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-11-13 Thread via GitHub


danny0405 commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1809448828

   Fixed via: https://github.com/apache/hudi/pull/9984


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-11-06 Thread via GitHub


watermelon12138 commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1797709598

   @zyclove @ad1happy2go I try to reproduce this。I can only explain that 
candidateCommitFile can be null, but I can't explain why fileSchema is empty.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-31 Thread via GitHub


ad1happy2go commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1786764223

   @zyclove ```In addition, when submitting a task with spark-submit, in 
addition to adding configuration in the code or specifying a configuration 
file, can the configuration be added dynamically when submitting the task?```
- If you meant to pass configuration with spark submit, then You can use 
--conf to dynamically pass more configuration to hudi job.
   
   To triage this issue `Cannot parse  schema` Can you help us reproduce 
this with sample script/data. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-29 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1784029959

   
   The new hudi table also error with follow exception.
   
   ```
   Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2610)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2559)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2558)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2558)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1200)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1200)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1200)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2798)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2740)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2729)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:978)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2215)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
at 
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at 
org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177)
at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:113)
... 85 more
   Caused by: org.apache.avro.SchemaParseException: Cannot parse  schema
at org.apache.avro.Schema.parse(Schema.java:1633)
at org.apache.avro.Schema$Parser.parse(Schema.java:1430)
at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
at 
org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:220)
at 
org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:226)
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.composeSchemaEvolutionTransformer(HoodieMergeHelper.java:177)
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:94)
at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
at 
org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
at 

Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-26 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780588181

   This issue should add follow .
   // set latest schema
  if (StringUtils.isNullOrEmpty(avroSchema)) {
   avroSchema = latestHistorySchema;
   }
   
   ```
   Caused by: org.apache.avro.SchemaParseException: Cannot parse  schema
   at org.apache.avro.Schema.parse(Schema.java:1633)
   at org.apache.avro.Schema$Parser.parse(Schema.java:1430)
   at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
   at 
org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:220)
   at 
org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:226)
   at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.composeSchemaEvolutionTransformer(HoodieMergeHelper.java:177)
   at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:94)
   ``` 
   
   ```java
   public static InternalSchema getInternalSchemaByVersionId(long versionId, 
String tablePath, Configuration hadoopConf, String validCommits) {
   String avroSchema = "";
   Set commitSet = 
Arrays.stream(validCommits.split(",")).collect(Collectors.toSet());
   List validateCommitList = 
commitSet.stream().map(HoodieInstant::extractTimestamp).collect(Collectors.toList());
   
   FileSystem fs = FSUtils.getFs(tablePath, hadoopConf);
   Path hoodieMetaPath = new Path(tablePath, 
HoodieTableMetaClient.METAFOLDER_NAME);
   //step1:
   Path candidateCommitFile = commitSet.stream().filter(fileName -> 
HoodieInstant.extractTimestamp(fileName).equals(versionId + ""))
   .findFirst().map(f -> new Path(hoodieMetaPath, 
f)).orElse(null);
   if (candidateCommitFile != null) {
   try {
   byte[] data;
   try (FSDataInputStream is = fs.open(candidateCommitFile)) {
   data = FileIOUtils.readAsByteArray(is);
   } catch (IOException e) {
   throw e;
   }
   HoodieCommitMetadata metadata = 
HoodieCommitMetadata.fromBytes(data, HoodieCommitMetadata.class);
   String latestInternalSchemaStr = 
metadata.getMetadata(SerDeHelper.LATEST_SCHEMA);
   avroSchema = 
metadata.getMetadata(HoodieCommitMetadata.SCHEMA_KEY);
   if (latestInternalSchemaStr != null) {
   return 
SerDeHelper.fromJson(latestInternalSchemaStr).orElse(null);
   }
   } catch (Exception e1) {
   // swallow this exception.
   LOG.warn(String.format("Cannot find internal schema from 
commit file %s. Falling back to parsing historical internal schema", 
candidateCommitFile.toString()));
   }
   }
   // step2:
   FileBasedInternalSchemaStorageManager 
fileBasedInternalSchemaStorageManager = new 
FileBasedInternalSchemaStorageManager(hadoopConf, new Path(tablePath));
   String latestHistorySchema = 
fileBasedInternalSchemaStorageManager.getHistorySchemaStrByGivenValidCommits(validateCommitList);
   if (latestHistorySchema.isEmpty()) {
   return InternalSchema.getEmptyInternalSchema();
   }
   InternalSchema fileSchema = 
InternalSchemaUtils.searchSchema(versionId, 
SerDeHelper.parseSchemas(latestHistorySchema));
   
   **// set latest schema
   if (StringUtils.isNullOrEmpty(avroSchema)) {
   avroSchema = latestHistorySchema;
   }**
   // step3:
   return fileSchema.isEmptySchema() ? 
AvroInternalSchemaConverter.convert(HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(avroSchema))) : fileSchema;
   }
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-25 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780347759

   In addition, when submitting a task with spark-submit, in addition to adding 
configuration in the code or specifying a configuration file, can the 
configuration be added dynamically when submitting the task?
   @ad1happy2go 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-25 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780345033

   
[hoodie.avro.schema.external.transformation](https://hudi.apache.org/docs/configurations#hoodieavroschemaexternaltransformation)
   
   Check the hudi code to see if you can set this configuration to true.
   
   ```java
 public static final ConfigProperty 
AVRO_EXTERNAL_SCHEMA_TRANSFORMATION_ENABLE = ConfigProperty
 .key(AVRO_SCHEMA_STRING.key() + ".external.transformation")
 .defaultValue("false")
 .withAlternatives(AVRO_SCHEMA_STRING.key() + ".externalTransformation")
 .markAdvanced()
 .withDocumentation("When enabled, records in older schema are 
rewritten into newer schema during upsert,delete and background"
 + " compaction,clustering operations.");
   ``` 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-25 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1780338605

   @ad1happy2go 
   In another task, after upgrading to version 0.14, field incompatibility 
issues were reported.
   Can it be restored without rebuilding the data table?
   For example, through the Schema Evolution feature
   
   ```
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema 
type: "long" since the old schema type is: "string"
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:387)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:369)
at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:79)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335)
... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema 
type: "long" since the old schema type is: "string"
at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:75)
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:147)
... 32 more
   Caused by: org.apache.avro.AvroRuntimeException: cannot support rewrite 
value for schema type: "long" since the old schema type is: "string"
at 
org.apache.hudi.avro.HoodieAvroUtils.rewritePrimaryTypeWithDiffSchemaType(HoodieAvroUtils.java:1083)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewritePrimaryType(HoodieAvroUtils.java:1001)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:946)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:944)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchemaInternal(HoodieAvroUtils.java:902)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:873)
at 
org.apache.hudi.avro.HoodieAvroUtils.rewriteRecordWithNewSchema(HoodieAvroUtils.java:843)
at 
org.apache.hudi.common.model.HoodieAvroIndexedRecord.rewriteRecordWithNewSchema(HoodieAvroIndexedRecord.java:123)
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.lambda$composeSchemaEvolutionTransformer$2(HoodieMergeHelper.java:209)
at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.lambda$runMerge$0(HoodieMergeHelper.java:134)
at 
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:68)
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-25 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1779022892

   @ad1happy2go 
   In this case, the specified schema file does not exist. What could be the 
cause?
   Can I delete ods_smart_device_relation_batch/.hoodie/.schema/ under the data 
table and restore it automatically?
   The schema has not been obtained. Is there any way to manually generate or 
specify the schema?
   
   ```
   Caused by: org.apache.hudi.exception.HoodieCompactionException: Could not 
compact s3://big-data-us//hudi/bi/bi_ods/ods_smart_device_relation_batch
at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:129)
at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:155)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.compact(BaseHoodieTableServiceClient.java:297)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.lambda$runAnyPendingCompactions$5(BaseHoodieTableServiceClient.java:250)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.runAnyPendingCompactions(BaseHoodieTableServiceClient.java:248)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.inlineCompaction(BaseHoodieTableServiceClient.java:187)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.runTableServicesInline(BaseHoodieTableServiceClient.java:534)
at 
org.apache.hudi.client.BaseHoodieWriteClient.runTableServicesInline(BaseHoodieWriteClient.java:584)
at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:252)
at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1059)
at 
org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:441)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:112)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:108)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at 

Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776503543

   Yeah The version gets automatically upgrades when you write using new 
version. 0.14.0 uses table version 6. So the behaviour is expected. Not sure 
why it failed though. I will also create a table using 0.12.3 and try to 
upgrade and see i get any issues. 
   
   Do you use slack? If yes, you can join hudi community slack and we can sync 
up there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776497904

   Is there a WeChat group or other communication group where we can 
communicate with each other? The community group I joined before felt very 
inactive, and no one discussed the issues.
   @ad1happy2go 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776496036

   @ad1happy2go This issue is the same as 
https://github.com/apache/hudi/issues/9016 .
   This problem is caused by the upgrade to version 0.14. After the upgrade, 
this problem suddenly occurred after running for a few days.
   
   After working on it all morning yesterday, there was really nothing I could 
do, so I cleaned up the historical data and ran it again, and it became normal 
afterwards.
   
   Is it still caused by version compatibility issues?
   
   It was 0.12.3 before. After directly upgrading the 0.14 bundle package, I 
found that the version of the hoodie.properties file in the data table changed 
from 5 to 6. Does this mean that the version has been upgraded normally? There 
is no manual upgrade table operation through commands.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1775072273

   @zyclove Thanks for raising this. Looks like compaction is throwing out this 
Exception with those schema configuration. I will try to triage this. Can you 
help us with some sample data or sample script which can help us to reproduce 
this issue.
   
   I tried to reproduce using below code and see compaction happening fine - 
   ```
   SET hoodie.schema.on.read.enable=true;
   SET hoodie.datasource.write.reconcile.schema=true;
   SET hoodie.avro.schema.validate=true;
   SET hoodie.datasource.write.new.columns.nullable=true;
   
   CREATE TABLE hudi_table (
   ts BIGINT,
   uuid STRING,
   rider STRING,
   driver STRING,
   fare DECIMAL(10,4),
   city STRING
   ) USING HUDI
   tblproperties (
   type = 'mor', primaryKey = 'uuid', preCombineField = 'ts'
   ,hoodie.datasource.write.new.columns.nullable = 'true'
   ,hoodie.avro.schema.validate = 'true'
   ,hoodie.schema.on.read.enable = 'true'
   ,hoodie.datasource.write.reconcile.schema = 'true'
   )
   PARTITIONED BY (city);
   
   -- Tried multiple insert commands with multiple values and confirmed 
compaction is happening fine.
   INSERT INTO hudi_table
   VALUES
   
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',11.0001,'san_francisco'),
   
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',11.0001
 ,'san_francisco');
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


zyclove opened a new issue, #9902:
URL: https://github.com/apache/hudi/issues/9902

   
   **Describe the problem you faced**
   
   hudi schema error.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. spark-sql hudi task appears suddenly during operation
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version :3.2.1
   
   * Hive version :3.1.3
   
   * Hadoop version :3.2.2
   
   * Storage (HDFS/S3/GCS..) :s3
   
   * Running on Docker? (yes/no) :
   no
   
   **Additional context**
   
   add follow config, it does not work.
   
   ```
   #hoodie.datasource.write.new.columns.nullable=true
   #hoodie.avro.schema.validate=true
   #hoodie.schema.on.read.enable=true
   #hoodie.datasource.write.reconcile.schema=true
   ``` 
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   org.apache.hudi.exception.HoodieCompactionException: Could not compact 
s3://big-data-us/hudi/tables/bi_dw/dwd_device_dealer_relation_rt
at 
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:129)
at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:155)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.compact(BaseHoodieTableServiceClient.java:297)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.lambda$runAnyPendingCompactions$5(BaseHoodieTableServiceClient.java:250)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.runAnyPendingCompactions(BaseHoodieTableServiceClient.java:248)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.inlineCompaction(BaseHoodieTableServiceClient.java:187)
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.runTableServicesInline(BaseHoodieTableServiceClient.java:534)
at 
org.apache.hudi.client.BaseHoodieWriteClient.runTableServicesInline(BaseHoodieWriteClient.java:584)
at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:252)
at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1059)
at 
org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:441)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108)
at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230)
at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3751)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3749)
at org.apache.spark.sql.Dataset.(Dataset.scala:230)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at