lewyh commented on issue #7130:
URL: https://github.com/apache/hudi/issues/7130#issuecomment-1302738763

   Thanks for the quick response. I've tried setting the following when 
intializing the spark session:
   ```
      conf = (
           SparkConf()
           .setAppName(app_name)
           .set("spark.files.overwrite", spark_file_overwrite_str)
           .set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
           .set("spark.sql.hive.convertMetastoreParquet", False)
           .set("spark.rpc.message.maxSize", "512")
           .set("spark.kryoserializer.buffer.max", "512")
           .set("spark.sql.legacy.avro.datetimeRebaseModeInWrite", "CORRECTED")
           .set("spark.hadoop.fs.s3a.connection.maximum", "1000")
       )
       sc = SparkContext(conf=conf)
       spark_session = SparkSession.builder.config(conf=conf).getOrCreate()
   ```
   
   however I see the same behaviour. When the write is initiated, the spark 
error logs show:
   
   - Creating View Manager with storage type :REMOTE_FIRST
   - Hangs for roughly 3 minutes before returning a 500 Server error after 
sending a request using `RemoteHoodieTableFileSystemView.java:executeRequest`
   - Opens several `.rollback` files in the `.hoodie/` dir
   - Opens several log files in `.hoodie/metadata/files/` dir
   - Hangs for a few minutes before resulting in the `Timeout waiting for 
connection from pool` error
   
   I'm not sure if this is relevant, but a number of other tables are written 
using the same code, and their behaviour is different, instead of 
`:REMOTE_FIRST`, the log shows `Creating View Manager with storage type 
:MEMORY` before their writes succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to