[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #91509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)**
 for PR 21469 at commit 
[`7ec3242`](https://github.com/apache/spark/commit/7ec32427cf0fda82d2f936fa0aab62e274a8c034).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #91509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)**
 for PR 21469 at commit 
[`7ec3242`](https://github.com/apache/spark/commit/7ec32427cf0fda82d2f936fa0aab62e274a8c034).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91503/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #91503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)**
 for PR 21469 at commit 
[`af57f26`](https://github.com/apache/spark/commit/af57f26239ef635cb9837e9ebcf37fbdaa215480).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21469
  
Nice, LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
@arunmahadevan 
Added custom metrics in state store to streaming query status as well. You 
can see `providerLoadedMapSize` is added to `stateOperators.customMetrics` in 
below output.

I have to exclude `providerLoadedMapCountOfVersions` from the list, since 
average metric is implemented a bit tricky and doesn't look like easy to 
aggregate for streaming query status. 
We may want to reimplement SQLMetric and subclasses to make sure everything 
works correctly without any tricky approach, but that doesn't look like trivial 
to address and I think this is out of scope on this PR.

```
18/06/06 22:51:23 INFO MicroBatchExecution: Streaming query made progress: {
  "id" : "7564a0b7-e3b2-4d53-b246-b774ab04e586",
  "runId" : "8dd34784-080c-4f86-afaf-ac089902252d",
  "name" : null,
  "timestamp" : "2018-06-06T13:51:15.467Z",
  "batchId" : 4,
  "numInputRows" : 547,
  "inputRowsPerSecond" : 67.15776550030694,
  "processedRowsPerSecond" : 65.94333936106088,
  "durationMs" : {
"addBatch" : 7944,
"getBatch" : 1,
"getEndOffset" : 0,
"queryPlanning" : 61,
"setOffsetRange" : 5,
"triggerExecution" : 8295,
"walCommit" : 158
  },
  "eventTime" : {
"avg" : "2018-06-06T13:51:10.313Z",
"max" : "2018-06-06T13:51:14.250Z",
"min" : "2018-06-06T13:51:07.098Z",
"watermark" : "2018-06-06T13:50:36.676Z"
  },
  "stateOperators" : [ {
"numRowsTotal" : 20,
"numRowsUpdated" : 16,
"memoryUsedBytes" : 26679,
"customMetrics" : {
  "providerLoadedMapSize" : 181911
}
  } ],
  "sources" : [ {
"description" : "KafkaV2[Subscribe[apachelogs-v2]]",
"startOffset" : {
  "apachelogs-v2" : {
"2" : 489056,
"4" : 489053,
"1" : 489055,
"3" : 489051,
"0" : 489053
  }
},
"endOffset" : {
  "apachelogs-v2" : {
"2" : 489056,
"4" : 489053,
"1" : 489055,
"3" : 489051,
"0" : 489053
  }
},
"numInputRows" : 547,
"inputRowsPerSecond" : 67.15776550030694,
"processedRowsPerSecond" : 65.94333936106088
  } ],
  "sink" : {
"description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@60999714"
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #91503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)**
 for PR 21469 at commit 
[`af57f26`](https://github.com/apache/spark/commit/af57f26239ef635cb9837e9ebcf37fbdaa215480).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21469
  
@jose-torres is it good to go?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #91486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91486/testReport)**
 for PR 21469 at commit 
[`345397d`](https://github.com/apache/spark/commit/345397dd530898697ef338ec55b81ad11ece4dc0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StateStoreCustomAverageMetric(name: String, desc: String) 
extends StateStoreCustomMetric`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91486/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21469
  
>I didn't add the metric to StateOperatorProgress cause this behavior is 
specific to HDFSBackedStateStoreProvider

May be this can be reported as a custom metrics and keep it optional and 
that way its not tied to any specific implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #91486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91486/testReport)**
 for PR 21469 at commit 
[`345397d`](https://github.com/apache/spark/commit/345397dd530898697ef338ec55b81ad11ece4dc0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
Also added custom metric for the count of versions stored in loadedMaps.

This is a new screenshot:
https://user-images.githubusercontent.com/1317309/40978481-b46ad324-690e-11e8-9b0f-e80528612a62.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
Looks like the size is added only once for same identity on 
SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in 
this case. There might be other valid cases, but not sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-04 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
@arunmahadevan 
I didn't add the metric to StateOperatorProgress cause this behavior is 
specific to HDFSBackedStateStoreProvider (though this is only one 
implementation available in Apache Spark) so not sure this metric can be 
treated as a general one. (@tdas what do you think about this?)

Btw, the cache is going to clean up when maintenance operation is in 
progress, so there could be more than 100 versions in map. Not sure why it 
shows 150x, but I couldn't find missing spot on the patch. Maybe the issue is 
from SizeEstimator.estimate()?

One thing we need to check is how SizeEstimator.estimate() calculate the 
memory usage when Unsafe row objects are shared across versions. If 
SizeEstimator adds the size of object whenever it is referenced, it will report 
much higher memory usage than actual.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-04 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21469
  
@HeartSaVioR , may be then this should be reported in the "memoryUsedBytes" 
in the StateOperatorProgress (value reported in StreamingQueryProgress) because 
currently the usage reported does not reflect the memory used for the cache.

Question: in the screenshot "Estimated size of states cache in provider 
total" is 3.3 MB whereas the "memory used by state total" is 20.6 KB with 
"total number of state rows" = 2. This 150x difference is expected with just 2 
rows in the state? Were there 100 versions of the map in the sample output you 
posted?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2