There are many different ways to avoid or minimise the chance of schema disagreements, the easiest way is to always send DDL queries to the same node in the cluster. This is very easy to implement and avoids schema disagreements at the cost of creating a single point of failure for DDL queries. More sophisticated methods also exist, such as locking and centralised schema modification, and you should consider which one is more suitable for your use case. Ignoring the schema disagreements problem is not recommended, as this is not a tested state for the cluster, you are likely to run into some known and unknown (and possibly severe) issues later.

The system_schema.columns table will almost certainly have more tombstones created than the number of tables deleted, unless each deleted table had only one column. I doubt creating and deleting 8 tables per day will be a problem, but I would recommend you find a way to test it before doing that on a production system, because I don't know anyone else is using Cassandra in this way.

From the surface, it does sound like TWCS with the date in in the partition key may fit your use case better than creating and deleting tables every day.


On 06/12/2023 08:26, Sébastien Rebecchi wrote:
Hello Jeff, Bowen

Thanks for your answer.
Now I understand that there is a bug in Cassandra that can not handle concurrent schema modifications, I was not aware of that severity, I thought that temporary schema mismatches were eventually resolved smartly, by a kind of "merge" mechanism. For my use cases, keyspaces and tables are created "on-demand", when receiving exceptions for invalid KS or table on insert (then the KS and table are created and the insert is retried). I can not afford to centralize schema modifications in a bottleneck, but I can afford the data inconsistencies, waiting for the fix in Cassandra. I'm more worried about tombstones in system tables, I assume that 8 tombstones per day (or even more, but in the order of no more than some dozens) is reasonable, can you confirm (or invalidate) that please?

Sébastien.

Le mer. 6 déc. 2023 à 03:00, Bowen Song via user <user@cassandra.apache.org> a écrit :

    The same table name with two different CF IDs is not just
    "temporary schema disagreements", it's much worse than that. This
    breaks the eventual consistency guarantee, and leads to silent
    data corruption. It's silently happening in the background, and
    you don't realise it until you suddenly do, and then everything
    seems to blow up at the same time. You need to sort this out ASAP.


    On 05/12/2023 19:57, Sébastien Rebecchi wrote:
    Hi Bowen,

    Thanks for your answer.

    I was thinking of extreme use cases, but as far as I am concerned
    I can deal with creation and deletion of 2 tables every 6 hours
    for a keyspace. So it lets around 8 folders of deleted tables per
    day - sometimes more cause I can see sometimes 2 folders created
    for a same table name, with 2 different ids, caused by temporary
    schema disagreements I guess.
    Basically it means 20 years before the KS folder has 65K
    subfolders, so I would say I have time to think of redesigning
    the data model ^^
    Nevertheless, does it sound too much in terms of thombstones in
    the systems tables (with the default GC grace period of 10 days)?

    Sébastien.

    Le mar. 5 déc. 2023, 12:19, Bowen Song via user
    <user@cassandra.apache.org> a écrit :

        Please rethink your use case. Create and delete tables
        concurrently often lead to schema disagreement. Even doing so
        on a single node sequentially will lead to a large number of
        tombstones in the system tables.

        On 04/12/2023 19:55, Sébastien Rebecchi wrote:
        Thank you Dipan.

        Do you know if there is a good reason for Cassandra to let
        tables folder even when there is no snapshot?

        I'm thinking of use cases where there is the need to create
        and delete small tables at a high rate. You could quickly
        end with more than 65K (limit of ext4) subdirectories in the
        KS directory, while 99.9.. % of them are residual of deleted
        tables.

        That looks quite dirty from Cassandra to not clean its own
        "garbage" by itself, and quite dangerous for the end user to
        have to do it alone, don't you think so?

        Thanks,

        Sébastien.

        Le lun. 4 déc. 2023, 11:28, Dipan Shah
        <dipan....@hotmail.com> a écrit :

            Hello Sebastien,

            There are no inbuilt tools that will automatically
            remove folders of deleted tables.

            Thanks,

            Dipan Shah

            
------------------------------------------------------------------------
            *From:* Sébastien Rebecchi <srebec...@kameleoon.com>
            *Sent:* 04 December 2023 13:54
            *To:* user@cassandra.apache.org <user@cassandra.apache.org>
            *Subject:* Remove folders of deleted tables
            Hello,

            When we delete a table with Cassandra, it lets the
            folder of that table on file system, even if there is no
            snapshot (auto snapshots disabled).
            So we end with the empty folder {data folder}/{keyspace
            name}/{table name-table id} containing only 1 
            subfolder, backups, which is itself empty.
            Is there a way to automatically remove folders of
            deleted tables?

            Sébastien.

Reply via email to