[GitHub] [hudi] cdmikechen edited a comment on issue #2705: [SUPPORT] Can not read data schema using Spark3.0.2 on k8s with hudi-utilities (build in 2.12 and spark3)

GitBox Tue, 23 Mar 2021 01:19:50 -0700


cdmikechen edited a comment on issue #2705:
URL: https://github.com/apache/hudi/issues/2705#issuecomment-804636641



   I've found the problem: 
   There is a new configuration named 
`hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable` and it 
is `true` by default. If I use my custom transformer and set `target scheme` 
null, hudi will not work by a null schema.
   I set `target scheme` to the same as `source schema` for testing, so that 
spark will not work and report above errors. If I set 
`hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable` to 
false, hudi will successfully deal with the Kafka message and write it to hdfs.
   
   However, when synchronizing hive, I encountered the same problem as this  
https://github.com/apache/hudi/issues/1751#issuecomment-648460431 (set 
`hoodie.datasource.hive_sync.use_jdbc` false). When I set 
`hoodie.datasource.hive_sync.use_jdbc` to true, hive-sync can work. I think 
hudi still lost related packets by hive3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cdmikechen edited a comment on issue #2705: [SUPPORT] Can not read data schema using Spark3.0.2 on k8s with hudi-utilities (build in 2.12 and spark3)

Reply via email to