[GitHub] [hudi] bvaradar commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

GitBox Sat, 09 Jan 2021 20:25:07 -0800


bvaradar commented on issue #2338:
URL: https://github.com/apache/hudi/issues/2338#issuecomment-757125096



   @so-lazy : 
   
   when you query through spark datasource (not just single file), are you able 
to see unique record ?
   
   val df = 
spark.read.format("hudi").load("hdfs://hadoop01:9000/hudi/cars/carsdata/inf_car_bin/*")
   ....
   
   Also, Are you passing the config 
(spark.sql.hive.convertMetastoreParquet=false) when you are launching spark ? 
https://hudi.apache.org/docs/querying_data.html#spark-sql. 
   
   Also, I see you have space around "=" sign (set 
spark.sql.hive.convertMetastoreParquet = false;)   Try removing it. Please also 
enable INFO logging and run the select group by query and attach them. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

Reply via email to