[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)** for PR 21469 at commit [`7ec3242`](https://github.com/apache/spark/commit/7ec32427cf0fda82d2f936fa0aab62e274a8c034). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)** for PR 21469 at commit [`7ec3242`](https://github.com/apache/spark/commit/7ec32427cf0fda82d2f936fa0aab62e274a8c034). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91503/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)** for PR 21469 at commit [`af57f26`](https://github.com/apache/spark/commit/af57f26239ef635cb9837e9ebcf37fbdaa215480). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21469 Nice, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @arunmahadevan Added custom metrics in state store to streaming query status as well. You can see `providerLoadedMapSize` is added to `stateOperators.customMetrics` in below output. I have to exclude `providerLoadedMapCountOfVersions` from the list, since average metric is implemented a bit tricky and doesn't look like easy to aggregate for streaming query status. We may want to reimplement SQLMetric and subclasses to make sure everything works correctly without any tricky approach, but that doesn't look like trivial to address and I think this is out of scope on this PR. ``` 18/06/06 22:51:23 INFO MicroBatchExecution: Streaming query made progress: { "id" : "7564a0b7-e3b2-4d53-b246-b774ab04e586", "runId" : "8dd34784-080c-4f86-afaf-ac089902252d", "name" : null, "timestamp" : "2018-06-06T13:51:15.467Z", "batchId" : 4, "numInputRows" : 547, "inputRowsPerSecond" : 67.15776550030694, "processedRowsPerSecond" : 65.94333936106088, "durationMs" : { "addBatch" : 7944, "getBatch" : 1, "getEndOffset" : 0, "queryPlanning" : 61, "setOffsetRange" : 5, "triggerExecution" : 8295, "walCommit" : 158 }, "eventTime" : { "avg" : "2018-06-06T13:51:10.313Z", "max" : "2018-06-06T13:51:14.250Z", "min" : "2018-06-06T13:51:07.098Z", "watermark" : "2018-06-06T13:50:36.676Z" }, "stateOperators" : [ { "numRowsTotal" : 20, "numRowsUpdated" : 16, "memoryUsedBytes" : 26679, "customMetrics" : { "providerLoadedMapSize" : 181911 } } ], "sources" : [ { "description" : "KafkaV2[Subscribe[apachelogs-v2]]", "startOffset" : { "apachelogs-v2" : { "2" : 489056, "4" : 489053, "1" : 489055, "3" : 489051, "0" : 489053 } }, "endOffset" : { "apachelogs-v2" : { "2" : 489056, "4" : 489053, "1" : 489055, "3" : 489051, "0" : 489053 } }, "numInputRows" : 547, "inputRowsPerSecond" : 67.15776550030694, "processedRowsPerSecond" : 65.94333936106088 } ], "sink" : { "description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@60999714" } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)** for PR 21469 at commit [`af57f26`](https://github.com/apache/spark/commit/af57f26239ef635cb9837e9ebcf37fbdaa215480). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21469 @jose-torres is it good to go? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91486/testReport)** for PR 21469 at commit [`345397d`](https://github.com/apache/spark/commit/345397dd530898697ef338ec55b81ad11ece4dc0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class StateStoreCustomAverageMetric(name: String, desc: String) extends StateStoreCustomMetric` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91486/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21469 >I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider May be this can be reported as a custom metrics and keep it optional and that way its not tied to any specific implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91486/testReport)** for PR 21469 at commit [`345397d`](https://github.com/apache/spark/commit/345397dd530898697ef338ec55b81ad11ece4dc0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Also added custom metric for the count of versions stored in loadedMaps. This is a new screenshot: https://user-images.githubusercontent.com/1317309/40978481-b46ad324-690e-11e8-9b0f-e80528612a62.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Looks like the size is added only once for same identity on SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in this case. There might be other valid cases, but not sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @arunmahadevan I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider (though this is only one implementation available in Apache Spark) so not sure this metric can be treated as a general one. (@tdas what do you think about this?) Btw, the cache is going to clean up when maintenance operation is in progress, so there could be more than 100 versions in map. Not sure why it shows 150x, but I couldn't find missing spot on the patch. Maybe the issue is from SizeEstimator.estimate()? One thing we need to check is how SizeEstimator.estimate() calculate the memory usage when Unsafe row objects are shared across versions. If SizeEstimator adds the size of object whenever it is referenced, it will report much higher memory usage than actual. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21469 @HeartSaVioR , may be then this should be reported in the "memoryUsedBytes" in the StateOperatorProgress (value reported in StreamingQueryProgress) because currently the usage reported does not reflect the memory used for the cache. Question: in the screenshot "Estimated size of states cache in provider total" is 3.3 MB whereas the "memory used by state total" is 20.6 KB with "total number of state rows" = 2. This 150x difference is expected with just 2 rows in the state? Were there 100 versions of the map in the sample output you posted? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org