[GitHub] [hudi] so-lazy commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

GitBox Tue, 12 Jan 2021 05:31:50 -0800


so-lazy commented on issue #2338:
URL: https://github.com/apache/hudi/issues/2338#issuecomment-758656180



   > @so-lazy :
   
   > Also, I see you have space around "=" sign (set 
spark.sql.hive.convertMetastoreParquet = false;) Try removing it. Please also 
enable INFO logging and run the select group by query and attach them if the 
problem persists.
   
   @bvaradar Sorry,bvaradar, these days i was so busy didn't reply on time. 
Today i followed your suggest and attach  screen shot.
   
   > val df = 
spark.read.format("hudi").load("hdfs://hadoop01:9000/hudi/cars/carsdata/inf_car_bin/*")
   run this i got unique record
   
![query_on_spark_hudi](https://user-images.githubusercontent.com/56884416/104311556-54793780-5510-11eb-8877-a8e72a6275b9.png)
   
   > run sql on hive ro table
   
![query_on_hive_rotable](https://user-images.githubusercontent.com/56884416/104311551-53480a80-5510-11eb-82d3-ee92a1fa5229.png)
   
   > Also, Are you passing the config 
(spark.sql.hive.convertMetastoreParquet=false) when you are launching spark ? 
https://hudi.apache.org/docs/querying_data.html#spark-sql.
   
   Yeah after passing this config, i can see unique record. thanks so much , 
and now i am gonna try other index type, for global_bloom , it's so slowly may 
be there are some problems in my program. Hudi is a good software , thanks for 
your time and effort.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] so-lazy commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

Reply via email to