[ https://issues.apache.org/jira/browse/CASSANDRA-16961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johnny Miller updated CASSANDRA-16961: -------------------------------------- Description: When compaction encounters a large partition, it outputs a warning in the logs e.g.: (Apologies, had to redact some information) WARN [CompactionExecutor:343] 2021-09-16 09:28:43,539 BigTableWriter.java:211 - Writing large partition XXX/XXXX:sourceid:{color:#de350b}*2021-09-16 05\:00Z*{color} (1.381GiB) to sstable /mnt/var/lib/cassandra/data/segment/message-336c5ff04db211ebbffc2980407d44d6/md-58982-big-Data.db i.e [https://github.com/apache/cassandra/blob/cassandra-3.11.5/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L211] *Example Table/insert* CREATE TABLE myks.mytable ( sourceid text, {color:#de350b}*messagehour timestamp,*{color} messagetime timestamp, messageid text PRIMARY KEY ((sourceid, messagehour), messagetime, messageid) ) ; insert into myks.mytable (sourceid, messagehour, messagetime, messageid) values ('sourceid', '{color:#de350b}*2021-09-16 05:00Z'*{color}, '2021-09-16 05:00:31Z', '123ABC'); If I then need to try and work out which nodes in the cluster contain the replica data for this partition (from the logs), I will get the token via CQL eg: select distinct token(sourceid,messagehour) from myks.mytable where sourceid='sourceid' and messagehour='{color:#de350b}*2021-09-16 05:00Z*{color}'; system.token(sourceid, messagehour) ------------------------------------- {color:#de350b}*7663675819538124697*{color} I then run nodetool to get the endpoints for this token/ks/table eg nodetool getendpoints myks mytable {color:#de350b}*7663675819538124697*{color} 172.31.10.187 172.31.12.193 172.31.13.91 And *the list of endpoints is not correct* as the value outputted in the timestamp warning log entry, I suspect, is missing additional information/precision so obviously will give back the wrong token and hence the wrong endpoints. Possibly this warning log statement should output the actual partition key token in addition to the other information to avoid confusion and the string representation of the timestamp be correct. was: When compaction encounters a large partition, it outputs a warning in the logs e.g.: (Apologies, had to redact some information) WARN [CompactionExecutor:343] 2021-09-16 09:28:43,539 BigTableWriter.java:211 - Writing large partition XXX/XXXX:PROsVuVbHju33:{color:#de350b}*2021-09-16 05\:00Z*{color} (1.381GiB) to sstable /mnt/var/lib/cassandra/data/segment/message-336c5ff04db211ebbffc2980407d44d6/md-58982-big-Data.db i.e [https://github.com/apache/cassandra/blob/cassandra-3.11.5/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L211] *Example Table/insert* CREATE TABLE myks.mytable ( sourceid text, {color:#de350b}*messagehour timestamp,*{color} messagetime timestamp, messageid text PRIMARY KEY ((sourceid, messagehour), messagetime, messageid) ) ; insert into myks.mytable (sourceid, messagehour, messagetime, messageid) values ('PROsVuVbHju33', '{color:#de350b}*2021-09-16 05:00Z'*{color}, '2021-09-16 05:00:31Z', '123ABC'); If I then need to try and work out which nodes in the cluster contain the replica data for this partition (from the logs), I will get the token via CQL eg: select distinct token(sourceid,messagehour) from myks.mytable where sourceid='PROsVuVbHju33' and messagehour='{color:#de350b}*2021-09-16 05:00Z*{color}'; system.token(sourceid, messagehour) ------------------------------------- {color:#de350b}*7663675819538124697*{color} I then run nodetool to get the endpoints for this token/ks/table eg nodetool getendpoints myks mytable {color:#de350b}*7663675819538124697*{color} 172.31.10.187 172.31.12.193 172.31.13.91 And *the list of endpoints is not correct* as the value outputted in the timestamp warning log entry, I suspect, is missing additional information/precision so obviously will give back the wrong token and hence the wrong endpoints. Possibly this warning log statement should output the actual partition key token in addition to the other information to avoid confusion and the string representation of the timestamp be correct. > Timestamp String displayed for partition compaction warnings is not correct > --------------------------------------------------------------------------- > > Key: CASSANDRA-16961 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16961 > Project: Cassandra > Issue Type: Bug > Reporter: Johnny Miller > Priority: Normal > > When compaction encounters a large partition, it outputs a warning in the > logs e.g.: > (Apologies, had to redact some information) > WARN [CompactionExecutor:343] 2021-09-16 09:28:43,539 BigTableWriter.java:211 > - Writing large partition XXX/XXXX:sourceid:{color:#de350b}*2021-09-16 > 05\:00Z*{color} (1.381GiB) to sstable > /mnt/var/lib/cassandra/data/segment/message-336c5ff04db211ebbffc2980407d44d6/md-58982-big-Data.db > i.e > [https://github.com/apache/cassandra/blob/cassandra-3.11.5/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L211] > *Example Table/insert* > CREATE TABLE myks.mytable ( > sourceid text, > {color:#de350b}*messagehour timestamp,*{color} > messagetime timestamp, > messageid text > PRIMARY KEY ((sourceid, messagehour), messagetime, messageid) > ) ; > > insert into myks.mytable (sourceid, messagehour, messagetime, messageid) > values ('sourceid', '{color:#de350b}*2021-09-16 05:00Z'*{color}, '2021-09-16 > 05:00:31Z', '123ABC'); > If I then need to try and work out which nodes in the cluster contain the > replica data for this partition (from the logs), I will get the token via CQL > eg: > select distinct token(sourceid,messagehour) from myks.mytable where > sourceid='sourceid' and messagehour='{color:#de350b}*2021-09-16 > 05:00Z*{color}'; > system.token(sourceid, messagehour) > ------------------------------------- > {color:#de350b}*7663675819538124697*{color} > I then run nodetool to get the endpoints for this token/ks/table > eg > nodetool getendpoints myks mytable > {color:#de350b}*7663675819538124697*{color} > 172.31.10.187 > 172.31.12.193 > 172.31.13.91 > And *the list of endpoints is not correct* as the value outputted in the > timestamp warning log entry, I suspect, is missing additional > information/precision so obviously will give back the wrong token and hence > the wrong endpoints. > Possibly this warning log statement should output the actual partition key > token in addition to the other information to avoid confusion and the string > representation of the timestamp be correct. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org