Hi,

Can we make some schema  adjustments to System.compaction_history table to
meet our needs?
For example : If there is no task execution, what about  just recording 0
bytes_in, 0 bytes_out for this compaction history . Other status can be
placed in compaction_properties,
and you can even add a column to represent the status. I am actually more
inclined to add a column .

I think there may be many situations that are similar to the special
situation of compaction execution that you described, such as the view
building  with no data or building with a non-successful state or other
module.
I think it is not possible to implement a separate solution for each
particular situation.

Štefan Miklošovič <smikloso...@apache.org> 于2024年9月27日周五 23:54写道:

> Thank you Maxim for summarizing it like this for better visibility of what
> you are suggesting to contribute.
>
> I am OK with that in general but:
>
> 1) The table should be limited in its size (I think the original patch you
> had was already done with space-limitation in mind). Since we want to put
> this on a heap, I think we should definitely make it capped on size.
> 2) If both of them are limited in their size, can it not happen that there
> would be an entry e.g. in compaction_operation_linked_stats which would not
> have its counterpart in compaction_operation_status (or vice versa)? Could
> not it be done in such a way that if an entry in
> compaction_operation_status is dropped (as new entries would come in, old
> would be discarded), the dropping of that entry would automatically drop
> all related rows in compaction_operations_linked_tasks?
>
> For the record, we were also investigating if existing
> system.compaction_history could not be somehow used for this but that
> appears to be problematic as we would need to probably change the schema /
> add new columns etc. and this is not so simple as it is a system table. We
> might use the compaction_properties column which is a map and put all the
> details there, however that starts to be quite uncomfortable on querying
> and even if we somehow did it, it still does not solve all the issues the
> two table approach seems to address.
>
> On Fri, Sep 27, 2024 at 4:21 PM Maxim Muzafarov <mmu...@apache.org> wrote:
>
>> Hello everyone,
>>
>> I still need a few more eyes on [1][2], but this time I'm going to try
>> and do some marketing for the feature I'm talking about, so...
>>
>>
>> We are trying to bridge the gap between the API that is called and the
>> compaction process that MAY or MAY NOT be called as a result, and make
>> users aware of what is happening inside the cluster with their running
>> commands. Currently, this can only be viewed by reading logs, which is
>> not a convenient way for both operators and audit subsystems of the
>> node internals.
>>
>> What we want to do is store the history of running operations for the
>> compaction manager in a small collection in the java heap and fill
>> this gap with virtual tables on top of this data collection, namely:
>>
>> - compaction_operations_status - has (operation_type, operation_id)
>> primary key and exposes the status of the cleanup command as a whole.
>> It may or may not trigger the compaction process and the compaction
>> may or may not appear in the sstable_tasks virtual table (active
>> compactions);
>> - compaction_operations_linked_tasks - has (operation_type,
>> operation_id, compaction_id) as its primary key and shows the
>> relationship between the user-triggered operation and the compaction
>> process invoked as a result;
>>
>> The CASSANDRA-19670 [1] issue covers only the cleanup command and
>> demonstrates the approach; all other commands, which can be identified
>> by the OperationType class, could be implemented in follow-up issues.
>>
>>
>> Examples:
>>
>> - The definition of these new virtual tables looks like:
>> https://gist.github.com/Mmuzaf/2d3006f5b654d54e7cabc343cd73a2a3
>>
>> - The output when we run the cleanup command, but it doesn't trigger
>> the compaction:
>>
>> https://gist.github.com/user-attachments/assets/9089d5c1-70d4-475f-9cf7-cc16dff48699
>>
>>
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-19760
>> [2] https://github.com/apache/cassandra/pull/3412/files
>>
>> On Mon, 15 Jul 2024 at 21:06, Maxim Muzafarov <mmu...@apache.org> wrote:
>> >
>> > Hello everyone,
>> >
>> > I would like to gently ask for help in reviewing the following issue
>> > that we've been facing for a while:
>> > https://issues.apache.org/jira/browse/CASSANDRA-19760
>> >
>> > When a cleanup command is called, the compaction process under the
>> > hood is triggered accordingly. However, if there is nothing to compact
>> > or the cleanup command returns with a status other than SUCCESSFUL,
>> > there is no way to get the execution results of the command that was
>> > run. This is especially true when using any kind of
>> > automation/scripting on top of JMX or as a nodetool wrapper.
>> >
>> > I propose to keep these history results in memory for some time and
>> > expose them via a virtual table so that a user can query it to check
>> > the status.
>> >
>> > Any suggestions are welcome. I believe other commands like verify,
>> > scrub, etc. can be exposed in the same way.
>>
>

Reply via email to