[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2023-01-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17681038#comment-17681038
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 1/26/23 4:49 PM:


[~e.dimitrova] would this be something you consider to review? I am picking 
people from watchers ... 


was (Author: smiklosovic):
[~e.dimitrova] would this be something you consider to review? I am picking 
people from reviewers ... 

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.x
>
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2023-01-23 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679758#comment-17679758
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 1/23/23 11:01 AM:
-

I have yet again rebased against the current trunk and I improved the PR. The 
previous version was changing the API in such a way that implementations of 
loggers had a configuration object instead of hash map of properties. This is 
now changed and it is backward compatible with what is there currently. I 
simplified the PR as well.

It would be really great to look into this so we have a possibility to put 
diagnostic events into bin log for further inspection.

The refactoring of the code is also making whole (bin) logger stuff more 
aligned with everything else and it seems to be way more robust.

The current PR below also introduces three nodetool commands (enabled / disable 
/ get diagnostics log options). As there is in general the trend of not 
introducing new commands, I am happy to omit these commands to make the PR even 
simpler and it would be configurable only via JMX.

[~marcuse] do you think this is something which might appear in 5.0? We missed 
4.1 release ... unfortunately. 

https://github.com/instaclustr/cassandra/commits/CASSANDRA-13460

https://app.circleci.com/pipelines/github/instaclustr/cassandra/1761/workflows/ef1bde51-30b4-474a-86c5-bb7c98249cb2

it was run with multiplexer too, failures are not relevant to this PR.

I think we would need to improve tests a bit but an initial review round would 
be great to have. 


was (Author: smiklosovic):
I have yet again rebased against the current trunk and I improved the PR. The 
previous version was changing the API in such a way that implementations of 
loggers had a configuration object instead of hash map of properties. This is 
now changed and it is backward compatible with what is there currently. I 
simplified the PR as well.

It would be really great to look into this so we have a possibility to put 
diagnostic events into bin log for further inspection.

The refactoring of the code is also making whole (bin) logger stuff more 
aligned with everything else and it seems to be way more robust.

The current PR below also introduces three nodetool commands (enabled / disable 
/ get diagnostics log options). As there is in general the trend of not 
introducing new commands, I am happy to omit these commands to make the PR even 
simpler and it would be configurable only via JMX.

[~marcuse] do you think this is something which might appear in 5.0? We missed 
4.1 release ... unfortunately. 

https://github.com/instaclustr/cassandra/commits/CASSANDRA-13460

https://app.circleci.com/pipelines/github/instaclustr/cassandra/1761/workflows/ef1bde51-30b4-474a-86c5-bb7c98249cb2

it was run with multiplexer too, failure as not relevant to this PR.

I think we would need to improve tests a bit but an initial review round would 
be great to have. 

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.x
>
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2023-01-23 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679758#comment-17679758
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 1/23/23 11:00 AM:
-

I have yet again rebased against the current trunk and I improved the PR. The 
previous version was changing the API in such a way that implementations of 
loggers had a configuration object instead of hash map of properties. This is 
now changed and it is backward compatible with what is there currently. I 
simplified the PR as well.

It would be really great to look into this so we have a possibility to put 
diagnostic events into bin log for further inspection.

The refactoring of the code is also making whole (bin) logger stuff more 
aligned with everything else and it seems to be way more robust.

The current PR below also introduces three nodetool commands (enabled / disable 
/ get diagnostics log options). As there is in general the trend of not 
introducing new commands, I am happy to omit these commands to make the PR even 
simpler and it would be configurable only via JMX.

[~marcuse] do you think this is something which might appear in 5.0? We missed 
4.1 release ... unfortunately. 

https://github.com/instaclustr/cassandra/commits/CASSANDRA-13460

https://app.circleci.com/pipelines/github/instaclustr/cassandra/1761/workflows/ef1bde51-30b4-474a-86c5-bb7c98249cb2

it was run with multiplexer too, failure as not relevant to this PR.

I think we would need to improve tests a bit but an initial review round would 
be great to have. 


was (Author: smiklosovic):
I have yet again rebased against the current trunk and I improved the PR. The 
previous version was changing the API in such a way that implementations of 
loggers had a configuration object instead of hash map of properties. This is 
now changed and it is backward compatible with what is there currently. I 
simplified the PR as well.

It would be really great to look into this so we have a possibility to put 
diagnostic events into bin log for further inspection.

The refactoring of the code is also making whole (bin) logger stuff more 
aligned with everything else and it seems to be way more robust.

The current PR below also introduces three nodetool commands (enabled / disable 
/ get diagnostics log options). As there is in general the trend of not 
introducing new commands, I am happy to omit these commands to make the PR even 
simpler and it would be configurable only via JMX.

[~marcuse] do you think this is something which might appear in 5.0? We missed 
4.1 release ... unfortunately. 

https://github.com/instaclustr/cassandra/commits/CASSANDRA-13460

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.x
>
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2023-01-23 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679758#comment-17679758
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 1/23/23 10:57 AM:
-

I have yet again rebased against the current trunk and I improved the PR. The 
previous version was changing the API in such a way that implementations of 
loggers had a configuration object instead of hash map of properties. This is 
now changed and it is backward compatible with what is there currently. I 
simplified the PR as well.

It would be really great to look into this so we have a possibility to put 
diagnostic events into bin log for further inspection.

The refactoring of the code is also making whole (bin) logger stuff more 
aligned with everything else and it seems to be way more robust.

The current PR below also introduces three nodetool commands (enabled / disable 
/ get diagnostics log options). As there is in general the trend of not 
introducing new commands, I am happy to omit these commands to make the PR even 
simpler and it would be configurable only via JMX.

[~marcuse] do you think this is something which might appear in 5.0? We missed 
4.1 release ... unfortunately. 

https://github.com/instaclustr/cassandra/commits/CASSANDRA-13460


was (Author: smiklosovic):
I have yet again rebased againt the current trunk and I improved the PR. The 
previous version was changing the API in such a way that implementations of 
loggers had a configuration object instead of hash map of properties. This is 
now changed and it is backward compatible with what is there currently. I 
simplified the PR as well.

It would be really great to look into this so we have a possibility to put 
diagnostic events into bin log for further inspection.

The refactoring of the code is also making whole (bin) logger stuff more 
aligned with everything else and it seems to be way more robust.

The current PR below also introduces three nodetool commands (enabled / disable 
/ get diagnostics log options). As there is in general the trend of not 
introducing new commands, I am happy to omit these commands to make the PR even 
simpler and it would be configurable only via JMX.

[~marcuse] do you think this is something which might appear in 5.0? We missed 
4.1 release ... unfortunately. 

https://github.com/instaclustr/cassandra/commits/CASSANDRA-13460

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.x
>
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2022-06-15 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505903#comment-17505903
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 6/15/22 1:05 PM:


https://github.com/apache/cassandra/pull/1685


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/1497

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.x
>
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2022-03-13 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505905#comment-17505905
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 3/13/22, 7:54 PM:
-

Hi [~marcuse], would you give this a shot, please?  I think you did a lot of 
stuff around audit logs and I would like to know your opinion on the code 
restructuring related to that. The above explanation of what I did would be 
quite handy to go through to get easier into it.


was (Author: smiklosovic):
Hi [~marcuse], would you give this a shot, please?  I think you did a lot of 
stuff around audit logs and I would like to know your opinion on the code 
restructuring related to that.

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.x
>
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2021-09-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416781#comment-17416781
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 9/17/21, 4:34 PM:
-

Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled or disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to disk. Hence user can 
>still inspect these events on the fly from in-memory buffer, as it was before, 
>but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract 
class and I created org.apache.cassandra.log package where it is located. Audit 
logging and Diagnostic logging is very similar and I found myself repeating a 
lot of code all over again in order to implement this so I simplified it a lot. 
I have also extracted commont stuff for options (audit and diagnostic). Options 
for audit and diagnostics are extending BinLogOptions and BinLogOptions have 
its own builder. I wanted to do some simplification in composite data but it 
seems to be more complicated than I expected so I left it be.

I have also implemented diagnosticlogviewer tool, similar to auditlogviewer - 
my question here is if we want to also make some "generic" tool which would 
audit and diagnostic viewers extend because right now it is basically the same 
stuff except few changes which are mostly cosmetic. Hence I would like to know 
if you think it makes sense to try to extract common parts.

I have also implemented nodetool commands for disable, enable diagnostic 
logging and for its status, similar to audit log.

I would love to hear your feedback here, especially about the overall 
high-level implementation I did here so I am not doing something which is might 
be eventually rejected because of different expectations.

(1) [https://github.com/instaclustr/cassandra/tree/CASSANDRA-13460-2]


was (Author: stefan.miklosovic):
Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled or disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to 

[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2021-09-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416781#comment-17416781
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 9/17/21, 4:30 PM:
-

Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled or disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to disk. Hence user can 
>still inspect these events on the fly from in-memory buffer, as it was before, 
>but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract 
class and I created org.apache.cassandra.log package where it is located. Audit 
logging and Diagnostic logging is very similar and I found myself repeating a 
lot of code all over again in order to implement this so I simplified it a lot. 
-I have also extracted commont stuff for options too.- I wanted to do that but 
it seems to be more complicated than it seems especially in connection with 
composite data for JMX so I left it be.

 

I have also implemented diagnosticlogviewer tool, similar to auditlogviewer - 
my question here is if we want to also make some "generic" tool which would 
audit and diagnostic viewers extend because right now it is basically the same 
stuff except few changes which are mostly cosmetic. Hence I would like to know 
if you think it makes sense to try to extract common parts.

I have also implemented nodetool commands for disable, enable diagnostic 
logging and for its status, similar to audit log.

I would love to hear your feedback here, especially about the overall 
high-level implementation I did here so I am not doing something which is might 
be eventually rejected because of different expectations.

(1) [https://github.com/instaclustr/cassandra/tree/CASSANDRA-13460-2]


was (Author: stefan.miklosovic):
Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled or disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to disk. Hence user can 
>still inspect these events on the fly from in-memory buffer, as it was before, 

[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2021-09-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416781#comment-17416781
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 9/17/21, 4:07 PM:
-

Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled or disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to disk. Hence user can 
>still inspect these events on the fly from in-memory buffer, as it was before, 
>but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract 
class and I created org.apache.cassandra.log package where it is located. Audit 
logging and Diagnostic logging is very similar and I found myself repeating a 
lot of code all over again in order to implement this so I simplified it a lot. 
I have also extracted commont stuff for options too.

I have also implemented diagnosticlogviewer tool, similar to auditlogviewer - 
my question here is if we want to also make some "generic" tool which would 
audit and diagnostic viewers extend because right now it is basically the same 
stuff except few changes which are mostly cosmetic. Hence I would like to know 
if you think it makes sense to try to extract common parts.

I have also implemented nodetool commands for disable, enable diagnostic 
logging and for its status, similar to audit log.

I would love to hear your feedback here, especially about the overall 
high-level implementation I did here so I am not doing something which is might 
be eventually rejected because of different expectations.

(1) [https://github.com/instaclustr/cassandra/tree/CASSANDRA-13460-2]


was (Author: stefan.miklosovic):
Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled on disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to disk. Hence user can 
>still inspect these events on the fly from in-memory buffer, as it was before, 
>but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract 
class and I 

[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

2021-07-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382901#comment-17382901
 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 7/18/21, 9:09 PM:
-

Hi [~spod],

yet another store to be implemented might be built on virtual tables. Even it 
would not survive restarts, I think this still has its value. Even more so if 
it can be queried via CQL. However, if you prefer to have it in chronicle 
queues solely, that is something else ...

BTW, maybe the best thing would be to make this pluggable. I was checking the 
implementation and right now there is just memory persistence, just hard coded. 
This might be configurable via cassandra.yml where a user would supply his own 
impl.

One particular advantage I see in case of CQL instead of chronicle queues is 
that one can query it. In case of CQ, in order to "replay it", there would have 
to be some tooling to do that. Putting that into virtual tables gives us this 
querying capability "for free".


was (Author: stefan.miklosovic):
Hi [~spod],

yet another store to be implemented might be built on virtual tables. Even it 
would not survive restarts, I think this still has its value. Even more so if 
it can be queried via CQL. However, if you prefer to have it in chronicle 
queues solely, that is something else ...

BTW, maybe the best thing would be to make this pluggable. I was checking the 
implementation and right now there is just memory persistence, just hard coded. 
This might be configurable via cassandra.yml where a user would supply his own 
impl.

> Diag. Events: Add local persistency
> ---
>
> Key: CASSANDRA-13460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Observability
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
> Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org