Hi, all, I am new to Hudi, so please forgive me for naive questions.
I was following the guides at https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html and at https://hudi.incubator.apache.org/docs/quick-start-guide/. My goal is to load original Parquet files (written by Spark application from Kafka to S3) into Hudi, delete some rows, and then save back to (a different path in) S3 (the modified Parquet file). There are other downstream applications that consumes the original Parquet files for further processing. My question: *Is there any format difference between the original Parquet files and the Hudi modified Parquet files?* Is the Hudi modified Parquet files compatible with the original Parquet files? In other words, will other downstream applications (previously consuming the original Parquet files) be able to consume the modified Parquet files (i.e. the Hudi dataset) without any code change? In the docs, I have seen the phrase "Hudi dataset", which, in my understanding, is simply a Parquet file with accompanied Hudi metadata. I have also read the migration doc ( https://hudi.incubator.apache.org/docs/migration_guide/). My understanding is that we can migrate from original Parquet file to Hudi dataset (Hudi modified Parquet file). *Can we use (or migrate) Hudi dataset (Hudi modified Parquet file) back to original Parquet file to be consumed by other downstream application?* Thank you very much!
