[ https://issues.apache.org/jira/browse/CASSANDRA-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei Deng updated CASSANDRA-10306: --------------------------------- Labels: dtcs (was: ) > Splitting SSTables in time, deleting and archiving SSTables > ----------------------------------------------------------- > > Key: CASSANDRA-10306 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10306 > Project: Cassandra > Issue Type: Improvement > Reporter: Antti Nissinen > Labels: dtcs > > This document is a continuation for > [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195] and > describes some needs to be able split files in time wise as discussed also in > [CASSANDRA-8361|https://issues.apache.org/jira/browse/CASSANDRA-8361]. Data > model is explained shortly, then the practical issues running Cassandra with > time series data and needs for the splitting capabilities. > Data model: (snippet from > [CASSANDRA-9644|https://issues.apache.org/jira/browse/CASSANDRA-9644)] > Data is time series data. Data is saved so that one row contains a certain > time span of data for a given metric ( 20 days in this case). The row key > contains information about the start time of the time span and metrix name. > Column name gives the offset from the beginning of time span. Column time > stamp is set to correspond time stamp when adding together the timestamp from > the row key and the offset (the actual time stamp of data point). Data model > is analog to KairosDB implementation. > In the practical application the data is added to real-time into the column > family. While converting from legacy system old data is pre-loaded in timely > order by faking the timestamp of the column before starting the real-time > data collection. However, there is intermittently a need to insert also older > data to the database due to the fact that is has not been available in > real-time or additional time series are fed in afterward due to unforeseeable > needs. > Adding old data simultaneously with real-time data will lead to SSTables > that are containing data from a time period exceeding the length of the > compaction window (TWCS and DTCS). Therefore SSTables are not behaving in > predictable manner in compaction process. > Tombstones are masking the data from queries but the release of disk space > requires that SStables containing tombstones would be compacted together with > SSTables having the original data. While using TWCS or DTCS and writing > tombstones with timestamp corresponding the real time SStables containing the > original data will not end up to be compacted with SSTables having the > tombstone. Even if writing tombstones by faking the timestamps the SSTable > should be written apart from the on-going real-time data. Otherwise the > SSTables have to be splitted (see later). > TTL is a working method to delete data from column family and releasing disk > space in a predictable manner. However, setting the correct TTL is not a > trivial task. Required TTL might change e.g. due to legislation or the > customer would like to have a longer lifetime for the data. > The other factor affecting the disk space consumption is the variability of > the rate how much data is fed to the column family. In certain > troubleshooting cases the sample rate can be increased ten fold for a large > portion of collected time series. This will lead to rapid consumption of disk > space and old data has to be deleted / archived in a such manner that disk > space will be released in a quick and predictable manner. > Losing one or more nodes from the cluster and not having a spare hardware > will also lead to a situation that data from the lost node has to be > replicated again for the remaining nodes. This will lead to increased disk > space consumption per node and probably requires some cleaning of older data > away from the active column family. > All of the above issues could be of course handled just by adding more disk > space or nodes to the cluster. In the cloud environment that would a feasible > option. In the application sitting in real hardware in isolated environment > this is not a feasible solution due to practical reasons or due to costs. > Getting new hardware on sites might take a long time e.g. due to custom > regulations. > In the application domain (time series data collection) the data is not > modified after inserting to the column family. There will be only read > operations and deletion / archiving of old data based on the TTL or operator > actions. > The above reasoning will lead to following conclusions and proposals. > * TWCS and DTCS (with certain modifications) are leading to a well structured > SSTables where tables are organized in timely manner giving opportunities to > manage available disk capacity on nodes. Recovering from repairs works also > (compaction the flood of small SSTables with larger ones). > * Being able to effectively split the SStables along a given time line would > lead to SSTable sets on all nodes that would allow deletion or archiving > SSTables. What would be the mechanism to inactivate SSTables during deletion > / archiving so that nodes don’t start streaming “missing” data between nodes > (repairs)? > * Being able to split existing SSTables along multiple timelines determined > by TWCS would allow insertion of older data to the column family that would > eventually be compacted in desired manner in correct time window. Original > SSTable would be streamed to several SStables according to time windows. In > the end empty SSTables would be discarded. > * Splitting action would be a tool to be executed through the nodetool > command when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)