lewyh commented on issue #7130: URL: https://github.com/apache/hudi/issues/7130#issuecomment-1302738763
Thanks for the quick response. I've tried setting the following when intializing the spark session: ``` conf = ( SparkConf() .setAppName(app_name) .set("spark.files.overwrite", spark_file_overwrite_str) .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.sql.hive.convertMetastoreParquet", False) .set("spark.rpc.message.maxSize", "512") .set("spark.kryoserializer.buffer.max", "512") .set("spark.sql.legacy.avro.datetimeRebaseModeInWrite", "CORRECTED") .set("spark.hadoop.fs.s3a.connection.maximum", "1000") ) sc = SparkContext(conf=conf) spark_session = SparkSession.builder.config(conf=conf).getOrCreate() ``` however I see the same behaviour. When the write is initiated, the spark error logs show: - Creating View Manager with storage type :REMOTE_FIRST - Hangs for roughly 3 minutes before returning a 500 Server error after sending a request using `RemoteHoodieTableFileSystemView.java:executeRequest` - Opens several `.rollback` files in the `.hoodie/` dir - Opens several log files in `.hoodie/metadata/files/` dir - Hangs for a few minutes before resulting in the `Timeout waiting for connection from pool` error I'm not sure if this is relevant, but a number of other tables are written using the same code, and their behaviour is different, instead of `:REMOTE_FIRST`, the log shows `Creating View Manager with storage type :MEMORY` before their writes succeed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org