[GitHub] [hudi] bvaradar commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

GitBox Sun, 19 Jul 2020 23:54:03 -0700


bvaradar commented on issue #1830:
URL: https://github.com/apache/hudi/issues/1830#issuecomment-660840191



   We spent time over the weekend setting up a local test bed with kafka and 
structured streaming to reproduce this behavior.  Here are the steps I followed 
with code : https://gist.github.com/bvaradar/d892c6c6a69664463f8601d09c187271 
   
   I ran the setup overnight for many hours with both MOR and COW tables but 
was not able to reproduce the gradual increase in time. I did see variance in 
processing time depending upon the incoming workload because of index lookup 
and parquet writing but there was no increase in processing time. 
   
   We should try to run this in S3 environment because we suspect this is seen 
in S3 environment alone. If possible,  Would you be interested in taking the 
above gist and run it in your setup ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

Reply via email to