[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-543620437 > > @PingHao - Both the issue- stuck executor, core dump - might be due to the same reasons. Will debug and fix it. > > Would you have some code snippets that can help me to reproduce the problem? > > My running code is difficult to isolation or share here, so here is a new test case (based on existing test case "maintenance" ) in your RocksDbStateStoreSuite.scala, to try simulator parallel spark tasks operation on each partition statestore and at same time have maintenance thread try destoryDB. see code here > > https://gist.github.com/PingHao/c20846542adda742f27ff00459fafe29#file-rocksdbstatestoresuite-scala-L384 > > I can produce core dump on my developer machine, but not sure if logic is legit anyway. > you can change N - number of partitions, and LOOPS. recommend N = number of cpu cores. @PingHao - I will fix the issue and post a message in your repo (https://github.com/PingHao/spark-statestore-rocksdb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-543619888 @marmbrus / @gatorsmile - Got your point on making it an external package. I will close the PR and corresponding JIRA. Will update this thread once I have submitted it to https://spark-packages.org/? Thanks, @gaborgsomogyi @HeartSaVioR @dongjoon-hyun @PingHao for looking into the PR and helped me to make significant improvements. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-540070575 @PingHao - Both the issue- stuck executor, core dump - might be due to the same reasons. Will debug and fix it. Would you have some code snippets that can help me to reproduce the problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-539838772 > 1. we using flatMapGroupsWithState, it cause it fail at begining Will update the PR with the fix > 2. Rocksdb checkpoint creating had a quite high time cost, sometimes > 20 secs, .. then I changed all of them to a ext4 partition, the result is much better, it's now could be < 10ms for most case, but still sometimes could be > 100ms. For Isolation and Data consistency, we checkpoint the rocksdb state to local disk. As you have suggested a good file system and SSD based instance storage should be used to get the best performance. > 3. All spark executors stucks when one of executor try to load snapshot file from spark checkpoint. Great catch. Let me look at it and make appropriate changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-533396055 > @itsvikramagr are you planning to resolve the remaining comments or waiting on second opinion? I think the config is not yet resolved. I was waiting for more comments. Will fix the config changes and any other pending changes over the weekend. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-531092730 ping @gaborgsomogyi @dongjoon-hyun @HeartSaVioR I have a comment from @gaborgsomogyi to resolve. Is there anything else I should be doing to get this patch into 3.0 release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-525823628 @gaborgsomogyi - I have addressed some of your comments and replied to remaining ones. Thanks again for reviewing the PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-521134445 > cool. is the issue here [#24922 (comment)](https://github.com/apache/spark/pull/24922#issuecomment-510327508) resolved? Yes this is now resolved. I have made the following changes to resolve it - There was a typo in setting dataBlockSize. Instead of KBs, I was setting it to Bytes. - For range scan, I was creating a RocksIterator which was not closed cleanly. - I fixed some of the rocksDB configs. Taken help from [here](https://github.com/facebook/rocksdb/issues/4112#issuecomment-470269235). In particular, set Max open files, strictly limited caches (No pinning of metadata in cache), Disable FillCache for range scan operations. Overall the memory usage is now contained and it's around 1-3 GB depending upon the data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-519130078 Longevity run results **Setup** Same as above Executor Instance type = C5d.2xlarge cores per executor = 8 ratePerSec = 20k | State Storage Type | Mode | Total Trigger Execution Time | Records Processed | Total State Rows | Number of Micro-batch | Comments | | --- | --- | --- | --- | --- | --- | --- | | RockSB | Append | ~1.5 hrs | 104.3 million | 10.5 million | 114 || https://user-images.githubusercontent.com/5220941/62632239-c9296900-b94f-11e9-95e7-d7f6cd9fa8a0.png";> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-517556064 > I agree keeping state in memory is not scalable, and the result looks promising. It might be better to have another kind of benchmark here, like stress test, to see the performance on stateful operations and let end users guide whether they're mostly encouraged to use this implementation, or use this selectively. > > What I did for my patch was following: > https://issues.apache.org/jira/browse/SPARK-21271 > [#21733 (comment)](https://github.com/apache/spark/pull/21733#issuecomment-411207042) > I have created the following [repo](https://github.com/itsvikramagr/spark-benchmark) in similar lines to what @HeartSaVioR has done for this patch. **Setup** - Used Qubole's distribution of Apache Spark 2.4.0 for my tests. - Master Instance Type = i3.xlarge - Driver Memory = 2g - num-executors = 1 - max-executors = 1 - spark.sql.shuffle.partitions = 8 - Run time = 30 mins - Source = Rate Source - executor Memory = 7g - spark.executor.memoryOverhead=3g - Processing Time = 30 sec Executor Instance type = i3.xlarge cores per executor = 4 ratePerSec = 20k | State Storage Type | Mode | Total Trigger Execution Time | Records Processed | Total State Rows | Comments| | --- | --- | --- | --- | --- | --- | | HDFS | Append | ~7 mins | 8.6 million | 2 million | Application failed before 30 mins | | RockSB | Append | ~30 minutes | 34.6 million | 7 million | | Executor Instance type = C5d.2xlarge cores per executor = 8 ratePerSec = 30k | State Storage Type | Mode | Total Trigger Execution Time | Records Processed | Total State Rows | Comments| | --- | --- | --- | --- | --- | --- | | HDFS | Append | 8 mins | 12.6 million | 3.1 million | Application was stuck because of GC | | RockSB | Complete | ~30 minutes | 47.34 million | 12.5 million | | Executor info when HDFS state storage is used https://user-images.githubusercontent.com/5220941/62346639-79443f80-b514-11e9-82ff-c41bdd2d5a91.png";> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-516792658 @HeartSaVioR, @gaborgsomogyi - I was able to fix the memory leaks and address a lot of your comments. I also ran the performance number for append mode and results were very encouraging. Best performance is seen in compute-intensive machines such as AWS C5 series. I will soon publish performance numbers in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-510327508 @HeartSaVioR, @gaborgsomogyi - thanks again for such a thorough review. I will soon resolve all your comments. Meanwhile, I was doing the performance analysis for various use-cases. While for complete output mode, I was able to ingest 2x-3x data than memory based backend in 1 executor and 5G container memory setup, I was not seeing encouraging results when the outputMode was "Append". I found that there is some memory leak (similar to one reported [here](https://github.com/facebook/rocksdb/issues/3216)) I am digging deeper to find the root cause. It might be some rocksdb config tunning or falling back to use an older version of rocksdb. (As @skonto commented earlier, code path for deletion of older states might need refactoring to get best out of rocksdb) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-508068035 @gaborgsomogyi - Thanks for your review. Let me get back after working on your suggestions/feedbacks. Will improve the styling and formatting as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-507108250 Thanks, @HeartSaVioR for the review. Let me work on your comments. Also, I am looking into generating performance numbers for various scenarios. Will soon get back with those as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-505325772 Looking at the unit test failures. It's related to the rocksDbPath folder name. Will make it configurable and update the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-505292778 Thanks, @HeartSaVioR. I understand it a very big change. As suggested let me create a stress test suite and paste some benchmark numbers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-505062488 Thanks @gaborgsomogyi - Will fix the style problem asap and update the PR - In my test setup, I was able to scale to more than 250 million keys using just 2 i3.xlarge executor nodes by running a group by aggregation query on campaign data source generated using rate source. I stopped my experiment after 5 hours. GC time was about 1.5% of the total task time (see attached). In the same setup, default implementation crashed after creating 35 million new state keys - I ran my experiments with varying load and under different stress condition. Please recommend more scenarios which you think I should be testing. https://user-images.githubusercontent.com/5220941/60031825-0baa2580-96c3-11e9-83aa-8e01311f5530.png";> https://user-images.githubusercontent.com/5220941/60032007-59269280-96c3-11e9-97ed-65dcc3323870.png";> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-504650735 ping @arunmahadevan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation
itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-504649951 ping @HeartSaVioR @tdas @jose-torres This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org