No. of active states?

2020-05-07 Thread Something Something
Is there a way to get the total no. of active states in memory at any given
point in a Stateful Spark Structured Streaming job? We are thinking of
using this metric for 'Auto Scaling' our Spark cluster.


Re: No. of active states?

2020-05-07 Thread Jungtaek Lim
If you're referring total "entries" in all states in SS job, it's being
provided via StreamingQueryListener.

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries

Hope this helps.

On Fri, May 8, 2020 at 3:26 AM Something Something 
wrote:

> Is there a way to get the total no. of active states in memory at any
> given point in a Stateful Spark Structured Streaming job? We are thinking
> of using this metric for 'Auto Scaling' our Spark cluster.
>


Re: No. of active states?

2020-05-07 Thread Something Something
No. We are already capturing these metrics (e.g. numInputRows,
inputRowsPerSecond).

I am talking about "No. of States" in the memory at any given time.

On Thu, May 7, 2020 at 4:31 PM Jungtaek Lim 
wrote:

> If you're referring total "entries" in all states in SS job, it's being
> provided via StreamingQueryListener.
>
>
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
>
> Hope this helps.
>
> On Fri, May 8, 2020 at 3:26 AM Something Something <
> mailinglist...@gmail.com> wrote:
>
>> Is there a way to get the total no. of active states in memory at any
>> given point in a Stateful Spark Structured Streaming job? We are thinking
>> of using this metric for 'Auto Scaling' our Spark cluster.
>>
>


Re: No. of active states?

2020-05-07 Thread Jungtaek Lim
Have you looked through and see metrics for state operators?

It has been providing "total rows" of state, and starting from Spark 2.4 it
also provides additional metrics specific to HDFSBackedStateStoreProvider,
including estimated memory usage in overall.

https://github.com/apache/spark/blob/24fac1e0c70a783b4d240607639ff20d7dd24191/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L168-L179


On Fri, May 8, 2020 at 11:30 AM Something Something <
mailinglist...@gmail.com> wrote:

> No. We are already capturing these metrics (e.g. numInputRows,
> inputRowsPerSecond).
>
> I am talking about "No. of States" in the memory at any given time.
>
> On Thu, May 7, 2020 at 4:31 PM Jungtaek Lim 
> wrote:
>
>> If you're referring total "entries" in all states in SS job, it's being
>> provided via StreamingQueryListener.
>>
>>
>> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
>>
>> Hope this helps.
>>
>> On Fri, May 8, 2020 at 3:26 AM Something Something <
>> mailinglist...@gmail.com> wrote:
>>
>>> Is there a way to get the total no. of active states in memory at any
>>> given point in a Stateful Spark Structured Streaming job? We are thinking
>>> of using this metric for 'Auto Scaling' our Spark cluster.
>>>
>>


Re: No. of active states?

2020-05-08 Thread Edgardo Szrajber
 This should open a new world of real-time metrics for you.How to get Spark 
Metrics as JSON using Spark REST API in YARN Cluster mode


| 
| 
| 
|  |  |

 |

 |
| 
|  | 
How to get Spark Metrics as JSON using Spark REST API in YARN Cluster mode

Anbu Cheeralan

Spark provides the metrics in UI. You can access the UI using either port 4040 
(Standalone) or using a proxy thr...
 |

 |

 |



Bentzi

On Friday, May 8, 2020, 05:30:56 AM GMT+3, Something Something 
 wrote:  
 
 No. We are already capturing these metrics (e.g. numInputRows, 
inputRowsPerSecond).
I am talking about "No. of States" in the memory at any given time. 
On Thu, May 7, 2020 at 4:31 PM Jungtaek Lim  
wrote:

If you're referring total "entries" in all states in SS job, it's being 
provided via StreamingQueryListener.
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries

Hope this helps.
On Fri, May 8, 2020 at 3:26 AM Something Something  
wrote:

Is there a way to get the total no. of active states in memory at any given 
point in a Stateful Spark Structured Streaming job? We are thinking of using 
this metric for 'Auto Scaling' our Spark cluster.