Re: [prometheus-users] Use Case Assessment

2022-10-17 Thread Rishabh Arora
Thank you for this perspective. We're currently looking at other systems 
for our more granular, functional monitoring needs, or perhaps the idea of 
building one which caters to our requirements.



On Monday, 17 October 2022 at 14:18:46 UTC+5:30 sayf.eddi...@gmail.com 
wrote:

> Hello,
> Monitoring the health of the system with Prometheus is fine. but I think 
> you are trying to include it as a functional brick in the application, 
> which I am not very keen with. imo The monitoring system should not coupled 
> with the functionning of your system (as in your system should continue to 
> work fine if Prometheus is down for exp).
> You need sth else like issuing events and alert on them (there you are 
> free to focus on the payementID info)
>
> On Monday, October 17, 2022 at 9:35:17 AM UTC+2 Rishabh Arora wrote:
>
>> Thank you for the clarification, Stuart.
>>
>> On Monday, 17 October 2022 at 12:50:57 UTC+5:30 Stuart Clark wrote:
>>
>>> On 17/10/2022 07:26, Rishabh Arora wrote:
>>>
>>> Hello!
>>>
>>> I'm currently in the process of implementing Prometheus along with 
>>> Alertmanager as our de facto solution for node health monitoring. We have a 
>>> kubernetes, kafka, mqtt setup and for monitoring our infrastructure, 
>>> prometheus is an obvious good fit.
>>>
>>> We have an application / business case, where I'm wondering whether 
>>> Prometheus may be a reasonable solution. Our application needs to meet 
>>> certain SLAs. In case those SLAs are not being, some alerts need to be 
>>> firing. For example, consider the following case which bears close 
>>> resemblance to our real business case:
>>>
>>> An *Order* schema in our system has a *payment* field which can be one 
>>> of ['COMPLETED','FAILED','PENDING']. In our HA real time system, we need to 
>>> fire alerts for Orders which are in a PENDING state. Rows in our 
>>> *Orders* collection will be in the order of potentially millions. An 
>>> order also has a *paymentEngine* field, which represents the entity 
>>> responsible for processing the payment for the order.
>>>
>>> Now, with Prometheus, finding the total count of PENDING Orders would be 
>>> a simple metric, but what we're interested in is also the Order IDs. For 
>>> instance, is there a way I could capture the PENDING order IDs in the 
>>> "metadata"(???) or "payload" of the metric? Downstream in the alertmanager, 
>>> I'd also like to group by *paymentEngine* so I could potentially 
>>> inhibit alerts for an unstable engine.
>>>
>>> Can anyone please help me out? Apologies in advance for my naivety :)
>>>
>>> What you are asking for isn't really the job of Prometheus.
>>>
>>> Having a metric detailing the number of pending orders & alerting on 
>>> that is completely within the normal area for Prometheus & Alertmanager - 
>>> observing the system and alerting if there are issues that need 
>>> investigation. However the next step of dealing with the individual 
>>> events/orders is the job for a different system. If paymentEngine could be 
>>> a small number of options (e.g. PayPal, Swipe, Cash) then it would be 
>>> reasonable to have that as a label to the pending orders metric (which then 
>>> would allow you to alert if one method stops working), but order ID isn't 
>>> something you should ever put in the metrics. Instead once you were alerted 
>>> about a potential issue you might query your order database directly or 
>>> look at log files to dig into the detail and figure out what is happening.
>>>
>>> -- 
>>> Stuart Clark
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c93af5f4-4335-4a71-9bf6-d5e7032a0074n%40googlegroups.com.


Re: [prometheus-users] Use Case Assessment

2022-10-17 Thread sayf.eddi...@gmail.com
Hello,
Monitoring the health of the system with Prometheus is fine. but I think 
you are trying to include it as a functional brick in the application, 
which I am not very keen with. imo The monitoring system should not coupled 
with the functionning of your system (as in your system should continue to 
work fine if Prometheus is down for exp).
You need sth else like issuing events and alert on them (there you are free 
to focus on the payementID info)

On Monday, October 17, 2022 at 9:35:17 AM UTC+2 Rishabh Arora wrote:

> Thank you for the clarification, Stuart.
>
> On Monday, 17 October 2022 at 12:50:57 UTC+5:30 Stuart Clark wrote:
>
>> On 17/10/2022 07:26, Rishabh Arora wrote:
>>
>> Hello!
>>
>> I'm currently in the process of implementing Prometheus along with 
>> Alertmanager as our de facto solution for node health monitoring. We have a 
>> kubernetes, kafka, mqtt setup and for monitoring our infrastructure, 
>> prometheus is an obvious good fit.
>>
>> We have an application / business case, where I'm wondering whether 
>> Prometheus may be a reasonable solution. Our application needs to meet 
>> certain SLAs. In case those SLAs are not being, some alerts need to be 
>> firing. For example, consider the following case which bears close 
>> resemblance to our real business case:
>>
>> An *Order* schema in our system has a *payment* field which can be one 
>> of ['COMPLETED','FAILED','PENDING']. In our HA real time system, we need to 
>> fire alerts for Orders which are in a PENDING state. Rows in our *Orders* 
>> collection 
>> will be in the order of potentially millions. An order also has a 
>> *paymentEngine* field, which represents the entity responsible for 
>> processing the payment for the order.
>>
>> Now, with Prometheus, finding the total count of PENDING Orders would be 
>> a simple metric, but what we're interested in is also the Order IDs. For 
>> instance, is there a way I could capture the PENDING order IDs in the 
>> "metadata"(???) or "payload" of the metric? Downstream in the alertmanager, 
>> I'd also like to group by *paymentEngine* so I could potentially inhibit 
>> alerts for an unstable engine.
>>
>> Can anyone please help me out? Apologies in advance for my naivety :)
>>
>> What you are asking for isn't really the job of Prometheus.
>>
>> Having a metric detailing the number of pending orders & alerting on that 
>> is completely within the normal area for Prometheus & Alertmanager - 
>> observing the system and alerting if there are issues that need 
>> investigation. However the next step of dealing with the individual 
>> events/orders is the job for a different system. If paymentEngine could be 
>> a small number of options (e.g. PayPal, Swipe, Cash) then it would be 
>> reasonable to have that as a label to the pending orders metric (which then 
>> would allow you to alert if one method stops working), but order ID isn't 
>> something you should ever put in the metrics. Instead once you were alerted 
>> about a potential issue you might query your order database directly or 
>> look at log files to dig into the detail and figure out what is happening.
>>
>> -- 
>> Stuart Clark
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/91f5d8c2-d1f4-42df-bf9e-ccfae3f6d9b0n%40googlegroups.com.


Re: [prometheus-users] Use Case Assessment

2022-10-17 Thread Rishabh Arora
Thank you for the clarification, Stuart.

On Monday, 17 October 2022 at 12:50:57 UTC+5:30 Stuart Clark wrote:

> On 17/10/2022 07:26, Rishabh Arora wrote:
>
> Hello!
>
> I'm currently in the process of implementing Prometheus along with 
> Alertmanager as our de facto solution for node health monitoring. We have a 
> kubernetes, kafka, mqtt setup and for monitoring our infrastructure, 
> prometheus is an obvious good fit.
>
> We have an application / business case, where I'm wondering whether 
> Prometheus may be a reasonable solution. Our application needs to meet 
> certain SLAs. In case those SLAs are not being, some alerts need to be 
> firing. For example, consider the following case which bears close 
> resemblance to our real business case:
>
> An *Order* schema in our system has a *payment* field which can be one of 
> ['COMPLETED','FAILED','PENDING']. In our HA real time system, we need to 
> fire alerts for Orders which are in a PENDING state. Rows in our *Orders* 
> collection 
> will be in the order of potentially millions. An order also has a 
> *paymentEngine* field, which represents the entity responsible for 
> processing the payment for the order.
>
> Now, with Prometheus, finding the total count of PENDING Orders would be a 
> simple metric, but what we're interested in is also the Order IDs. For 
> instance, is there a way I could capture the PENDING order IDs in the 
> "metadata"(???) or "payload" of the metric? Downstream in the alertmanager, 
> I'd also like to group by *paymentEngine* so I could potentially inhibit 
> alerts for an unstable engine.
>
> Can anyone please help me out? Apologies in advance for my naivety :)
>
> What you are asking for isn't really the job of Prometheus.
>
> Having a metric detailing the number of pending orders & alerting on that 
> is completely within the normal area for Prometheus & Alertmanager - 
> observing the system and alerting if there are issues that need 
> investigation. However the next step of dealing with the individual 
> events/orders is the job for a different system. If paymentEngine could be 
> a small number of options (e.g. PayPal, Swipe, Cash) then it would be 
> reasonable to have that as a label to the pending orders metric (which then 
> would allow you to alert if one method stops working), but order ID isn't 
> something you should ever put in the metrics. Instead once you were alerted 
> about a potential issue you might query your order database directly or 
> look at log files to dig into the detail and figure out what is happening.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a45ef22c-d1e4-4155-aede-e5c2cae8d696n%40googlegroups.com.


Re: [prometheus-users] Use Case Assessment

2022-10-17 Thread Stuart Clark

On 17/10/2022 07:26, Rishabh Arora wrote:

Hello!

I'm currently in the process of implementing Prometheus along with 
Alertmanager as our de facto solution for node health monitoring. We 
have a kubernetes, kafka, mqtt setup and for monitoring our 
infrastructure, prometheus is an obvious good fit.


We have an application / business case, where I'm wondering whether 
Prometheus may be a reasonable solution. Our application needs to meet 
certain SLAs. In case those SLAs are not being, some alerts need to be 
firing. For example, consider the following case which bears close 
resemblance to our real business case:


An /Order/ schema in our system has a /payment/ field which can be one 
of ['COMPLETED','FAILED','PENDING']. In our HA real time system, we 
need to fire alerts for Orders which are in a PENDING state. Rows in 
our /Orders/ collection will be in the order of potentially millions. 
An order also has a /paymentEngine/ field, which represents the entity 
responsible for processing the payment for the order.


Now, with Prometheus, finding the total count of PENDING Orders would 
be a simple metric, but what we're interested in is also the Order 
IDs. For instance, is there a way I could capture the PENDING order 
IDs in the "metadata"(???) or "payload" of the metric? Downstream in 
the alertmanager, I'd also like to group by /paymentEngine/__so I 
could potentially inhibit alerts for an unstable engine.


Can anyone please help me out? Apologies in advance for my naivety :)


What you are asking for isn't really the job of Prometheus.

Having a metric detailing the number of pending orders & alerting on 
that is completely within the normal area for Prometheus & Alertmanager 
- observing the system and alerting if there are issues that need 
investigation. However the next step of dealing with the individual 
events/orders is the job for a different system. If paymentEngine could 
be a small number of options (e.g. PayPal, Swipe, Cash) then it would be 
reasonable to have that as a label to the pending orders metric (which 
then would allow you to alert if one method stops working), but order ID 
isn't something you should ever put in the metrics. Instead once you 
were alerted about a potential issue you might query your order database 
directly or look at log files to dig into the detail and figure out what 
is happening.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/43479ddc-5970-194e-4779-97b6fc6e1e32%40Jahingo.com.


[prometheus-users] Use Case Assessment

2022-10-17 Thread Rishabh Arora
Hello!

I'm currently in the process of implementing Prometheus along with 
Alertmanager as our de facto solution for node health monitoring. We have a 
kubernetes, kafka, mqtt setup and for monitoring our infrastructure, 
prometheus is an obvious good fit.

We have an application / business case, where I'm wondering whether 
Prometheus may be a reasonable solution. Our application needs to meet 
certain SLAs. In case those SLAs are not being, some alerts need to be 
firing. For example, consider the following case which bears close 
resemblance to our real business case:

An *Order* schema in our system has a *payment* field which can be one of 
['COMPLETED','FAILED','PENDING']. In our HA real time system, we need to 
fire alerts for Orders which are in a PENDING state. Rows in our *Orders* 
collection 
will be in the order of potentially millions. An order also has a 
*paymentEngine* field, which represents the entity responsible for 
processing the payment for the order.

Now, with Prometheus, finding the total count of PENDING Orders would be a 
simple metric, but what we're interested in is also the Order IDs. For 
instance, is there a way I could capture the PENDING order IDs in the 
"metadata"(???) or "payload" of the metric? Downstream in the alertmanager, 
I'd also like to group by *paymentEngine* so I could potentially inhibit 
alerts for an unstable engine.

Can anyone please help me out? Apologies in advance for my naivety :)

Best,

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/dd57c63c-5e33-4103-9d3b-7968b26a4a59n%40googlegroups.com.