[ https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832992#comment-17832992 ]
Maxwell Guo edited comment on CASSANDRA-19448 at 4/2/24 6:25 AM: ----------------------------------------------------------------- [~brandon.williams][~tiagomlalves]Thank you very much for your reply. Regarding the granularity of RIP, my main point is that it mainly depends on the needs of the user. My patch allows the user to select seconds, milliseconds and microseconds. All I need to do is to ensure that it is consistent with Cassandra 's timestamp itself. bq. What I wonder is, in which scenarios would microsecond-level PIT restore would be useful? [~brandon.williams] may have already described it clearly, but what I may want to say again is, if a batch of data is deleted by mistake, then accurate time granularity to microseconds is the only prerequisite to ensure that all data can be restored in c*. Milliseconds and seconds are not enough, they may lost data. bq.couldn't we detect automatically the granularity of PIT restore based on the value I will update the PR again, and [~brandon.williams][~tiagomlalves] If you are willing, can you be the reviewer :) And I will prepare PRs for other branch if the fix for trunk is accepted . Update: bq.Regarding the code changes, couldn't we detect automatically the granularity of PIT restore based on the value an user specifies? We could make our parsing more lenient by just doing `DateTimeFormatter.ofPattern("yyyy:MM:dd HH:mm:ss[.[SSSSSS][SSS]]")` allowing to parse all seconds, milliseconds, and microseconds. See [description for rip |https://github.com/Maxwell-Guo/cassandra/blob/CASSANDRA-19448/conf/commitlog_archiving.properties#L43] , the user can randomly configures the rip time. I do not restrict the user to manually specify the accuracy level for Seconds, millseconds or microseconds . That's to say the code detect automatically the granularity of PIT restore based on the value an user specifies. As for bq. Also, if we could leverage on `Instant` datatype instead of `long` in `CommitLogArchiver` and postpone conversions to `CommitLogRestorer` I made some optimizations to the code based on your suggestions, but I find that changing to Instant is not fundamentally different from the original long type, because [restorePointInTime|https://github.com/Maxwell-Guo/cassandra/blob/CASSANDRA-19448/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L123] is only used for long value comparison, and as we have said the the restorePointInTime should be in microsecond level, so I change it to micro level [here|https://github.com/Maxwell-Guo/cassandra/blob/CASSANDRA-19448/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L207] was (Author: maxwellguo): [~brandon.williams][~tiagomlalves]Thank you very much for your reply. Regarding the granularity of RIP, my main point is that it mainly depends on the needs of the user. My patch allows the user to select seconds, milliseconds and microseconds. All I need to do is to ensure that it is consistent with Cassandra 's timestamp itself. bq. What I wonder is, in which scenarios would microsecond-level PIT restore would be useful? [~brandon.williams] may have already described it clearly, but what I may want to say again is, if a batch of data is deleted by mistake, then accurate time granularity to microseconds is the only prerequisite to ensure that all data can be restored in c*. Milliseconds and seconds are not enough, they may lost data. bq.couldn't we detect automatically the granularity of PIT restore based on the value I will update the PR again, and [~brandon.williams][~tiagomlalves] If you are willing, can you be the reviewer :) And I will prepare PRs for other branch if the fix for trunk is accepted . > CommitlogArchiver only has granularity to seconds for restore_point_in_time > --------------------------------------------------------------------------- > > Key: CASSANDRA-19448 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19448 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log > Reporter: Jeremy Hanna > Assignee: Maxwell Guo > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Commitlog archiver allows users to backup commitlog files for the purpose of > doing point in time restores. The [configuration > file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] > gives an example of down to the seconds granularity but then asks what > whether the timestamps are microseconds or milliseconds - defaulting to > microseconds. Because the [CommitLogArchiver uses a second based date > format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], > if a user specifies to restore at something at a lower granularity like > milliseconds or microseconds, that means that the it will truncate everything > after the second and restore to that second. So say you specify a > restore_point_in_time like this: > restore_point_in_time=2024:01:18 17:01:01.623392 > it will silently truncate everything after the 01 seconds. So effectively to > the user, it is missing updates between 01 and 01.623392. > This appears to be a bug in the intent. We should allow users to specify > down to the millisecond or even microsecond level. If we allow them to > specify down to microseconds for the restore point in time, then it may > internally need to change from a long. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org