bvaradar commented on issue #2338: URL: https://github.com/apache/hudi/issues/2338#issuecomment-757125096
@so-lazy : when you query through spark datasource (not just single file), are you able to see unique record ? val df = spark.read.format("hudi").load("hdfs://hadoop01:9000/hudi/cars/carsdata/inf_car_bin/*") .... Also, Are you passing the config (spark.sql.hive.convertMetastoreParquet=false) when you are launching spark ? https://hudi.apache.org/docs/querying_data.html#spark-sql. Also, I see you have space around "=" sign (set spark.sql.hive.convertMetastoreParquet = false;) Try removing it. Please also enable INFO logging and run the select group by query and attach them. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org