Hi, Can we make some schema adjustments to System.compaction_history table to meet our needs? For example : If there is no task execution, what about just recording 0 bytes_in, 0 bytes_out for this compaction history . Other status can be placed in compaction_properties, and you can even add a column to represent the status. I am actually more inclined to add a column .
I think there may be many situations that are similar to the special situation of compaction execution that you described, such as the view building with no data or building with a non-successful state or other module. I think it is not possible to implement a separate solution for each particular situation. Štefan Miklošovič <smikloso...@apache.org> 于2024年9月27日周五 23:54写道: > Thank you Maxim for summarizing it like this for better visibility of what > you are suggesting to contribute. > > I am OK with that in general but: > > 1) The table should be limited in its size (I think the original patch you > had was already done with space-limitation in mind). Since we want to put > this on a heap, I think we should definitely make it capped on size. > 2) If both of them are limited in their size, can it not happen that there > would be an entry e.g. in compaction_operation_linked_stats which would not > have its counterpart in compaction_operation_status (or vice versa)? Could > not it be done in such a way that if an entry in > compaction_operation_status is dropped (as new entries would come in, old > would be discarded), the dropping of that entry would automatically drop > all related rows in compaction_operations_linked_tasks? > > For the record, we were also investigating if existing > system.compaction_history could not be somehow used for this but that > appears to be problematic as we would need to probably change the schema / > add new columns etc. and this is not so simple as it is a system table. We > might use the compaction_properties column which is a map and put all the > details there, however that starts to be quite uncomfortable on querying > and even if we somehow did it, it still does not solve all the issues the > two table approach seems to address. > > On Fri, Sep 27, 2024 at 4:21 PM Maxim Muzafarov <mmu...@apache.org> wrote: > >> Hello everyone, >> >> I still need a few more eyes on [1][2], but this time I'm going to try >> and do some marketing for the feature I'm talking about, so... >> >> >> We are trying to bridge the gap between the API that is called and the >> compaction process that MAY or MAY NOT be called as a result, and make >> users aware of what is happening inside the cluster with their running >> commands. Currently, this can only be viewed by reading logs, which is >> not a convenient way for both operators and audit subsystems of the >> node internals. >> >> What we want to do is store the history of running operations for the >> compaction manager in a small collection in the java heap and fill >> this gap with virtual tables on top of this data collection, namely: >> >> - compaction_operations_status - has (operation_type, operation_id) >> primary key and exposes the status of the cleanup command as a whole. >> It may or may not trigger the compaction process and the compaction >> may or may not appear in the sstable_tasks virtual table (active >> compactions); >> - compaction_operations_linked_tasks - has (operation_type, >> operation_id, compaction_id) as its primary key and shows the >> relationship between the user-triggered operation and the compaction >> process invoked as a result; >> >> The CASSANDRA-19670 [1] issue covers only the cleanup command and >> demonstrates the approach; all other commands, which can be identified >> by the OperationType class, could be implemented in follow-up issues. >> >> >> Examples: >> >> - The definition of these new virtual tables looks like: >> https://gist.github.com/Mmuzaf/2d3006f5b654d54e7cabc343cd73a2a3 >> >> - The output when we run the cleanup command, but it doesn't trigger >> the compaction: >> >> https://gist.github.com/user-attachments/assets/9089d5c1-70d4-475f-9cf7-cc16dff48699 >> >> >> [1] https://issues.apache.org/jira/browse/CASSANDRA-19760 >> [2] https://github.com/apache/cassandra/pull/3412/files >> >> On Mon, 15 Jul 2024 at 21:06, Maxim Muzafarov <mmu...@apache.org> wrote: >> > >> > Hello everyone, >> > >> > I would like to gently ask for help in reviewing the following issue >> > that we've been facing for a while: >> > https://issues.apache.org/jira/browse/CASSANDRA-19760 >> > >> > When a cleanup command is called, the compaction process under the >> > hood is triggered accordingly. However, if there is nothing to compact >> > or the cleanup command returns with a status other than SUCCESSFUL, >> > there is no way to get the execution results of the command that was >> > run. This is especially true when using any kind of >> > automation/scripting on top of JMX or as a nodetool wrapper. >> > >> > I propose to keep these history results in memory for some time and >> > expose them via a virtual table so that a user can query it to check >> > the status. >> > >> > Any suggestions are welcome. I believe other commands like verify, >> > scrub, etc. can be exposed in the same way. >> >