KnightChess commented on issue #10511: URL: https://github.com/apache/hudi/issues/10511#issuecomment-1905205987
There will be a variety of factor leading to the difference time in the query, like IO、cpu、dick load... in spark, like parallelism, the expand time of executor..., in hudi, snapshot reading should be slow than read-optimized theoretically, and they use diff reader to read diff file( ro base or rt base+log file). And there is another problem, does parquet file with bloom filter will faster than without bloom filter in reading? I don't think it is certain, you need to look at its actual production effect. In spark query, the difference between 2S cannot explain the slow problem. What do you think about, this is my shallow cognition, maybe others have better opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org