[jira] [Commented] (CASSANDRA-17048) Replace sequential sstable generation identifier with ULID

Benedict Elliott Smith (Jira) Wed, 20 Oct 2021 02:36:41 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431096#comment-17431096
 ]


Benedict Elliott Smith commented on CASSANDRA-17048:
----------------------------------------------------

bq. UUID v1 is not lexicographically sortable by default as you mentioned, so 
we would have to be forced to use a custom representation anyway

A custom parser only, which can delegate to {{Long.toString}} and 
{{Long.fromString}} for the same density of representation (i.e. base32, though 
a different form), so we're not talking about much effort here. We could 
implement Crockford's scheme also rather trivially.

bq. AFAIK no UUID is guaranteed to generate monotonically growing identifiers 
when two are generated almost at the same time - this would probably never a 
problem in this context though

We do guarantee this within Cassandra for all TimeUUID we generate, for reasons 
of correctness.

bq. All in all, if we are forced wrap any UUID representation with some 
utilities to make its representation satisfy our needs, then I don't really 
know why not to use ULID as its implementation provides us everything we need

Because having multiple ways of doing the same thing in the codebase leads to 
inconsistency and confusion, and unnecessarily bloats our dependencies. We 
should not introduce new ways of doing things that may be provided by _simple_ 
modifications to existing mechanisms. 

The only relevant difference as far as I can tell is the {{toString}} and 
{{fromString}} methods, which can be delegated to {{Long.toString}}. This also 
provides us greater flexibility: If the motive is shorter strings, we can make 
them even shorter for case sensitive file systems, where we could use base64 
encoding, which would shrink the path to 16 chars (we might prefer to use a 
modified form that e.g. swaps / for , however), or a base60 encoding that has 
no special characters and would achieve 18 char lengths.

Additionally, if we want to reduce path sizes we can apply these improvements 
to our existing path specifications for directories that themselves use UUIDs, 
though this would obviously need to be handled with care.

> Replace sequential sstable generation identifier with ULID
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-17048
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17048
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 4.1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Replace the current sequential sstable generation identifier with ULID based.
> ULID is better because we do not need to scan the existing files to pick the 
> starting number as well as we can generate globally unique identifiers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17048) Replace sequential sstable generation identifier with ULID

Reply via email to