rubenssoto commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766821182
@vinothchandar Thank you so much for your answer. When do you plan to release this version? I will try to make some workarounds until then. Is this configuration right? ``` { "conf": { "spark.jars.packages": "org.apache.spark:spark-avro_2.12:2.4.4", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.jars": "s3://dl/lib/hudi-spark-bundle_2.12-0.8.0-SNAPSHOT.jar", "spark.sql.hive.convertMetastoreParquet": "false", "spark.hadoop.hoodie.metadata.enable": "true"} } ``` I made these 2 queries: spark.read.format('hudi').load('s3://ze-data-lake/temp/order_test').count() %%sql select count('*') from raw_courier_api.order_test On the pyspark query spark creates a job with 143 tasks, after 10 seconds of listing the count was fast, but in the spark sql query spark creates a job with 2000 tasks and was very slow, is it a Hudi or spark issue? Thank you so much! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org