nickva opened a new pull request, #5603:
URL: https://github.com/apache/couchdb/pull/5603

   This started as experiment wondering if we could have a simple data 
structure to track rough time intervals to db sequence mappings. This is to let 
users get an idea of what changes happened in a rough timeframe. It should be 
something on the order days, month, years. Nothing too exact. For example, it 
would be nice to be able to say '_changes?since=$lastweek`.
   
   The data structure itself, called `time-seq` further below is a fixed size 
list of up to 50 key-value integer values, mapping time bins to db sequences. 
The structure is small enough (~500B) when serialized that it can fit well 
under the 4KB header size, yet it can represent exponentially decaying time 
intervals over two decades. This is a trade-off of having a small, fixed size: 
the further back in time we go, the lower the accuracy. However, this decaying 
behavior is often how most people look at time: when we talk about yesterday, 
we may refer to individual hours; when we talk about last month, we may only 
talk about individual days; when talking about years, we may care about months 
or quarters only.
   
   The implementation itself and the tests, including property tests written by 
@iilyak (thank you!) are in the first commit. The commit comment has more 
implementation details.
   
   Another unexpected benefit using a small data structure fitting inside the 
header and having a bit of luck is that we can implement this feature so it's 
downgrade safe. This can be accomplished by reusing a long unused db header 
field. This way if the user upgrades, then downgrades. The older code doesn't 
look or use that field so any new data structures there will be ignore. With 
this trick we can avoid having to issue new intermediate downgrade target 
release. The addition of time-seq data structure to the header is in the second 
commit. That second  commit also implement how the structure is upgraded: that 
happens in couch_db_updater only on commit.
   
   Since we're dealing with OS-defined time values, this is not a perfect 
solution. On some systems time could jump backward after a boot, or it may 
misbehave in other ways. There are few way to mitigate that:
     * Do not accept changes that appear to happen back in time. Those can be 
safely ignored and we'll start updating the time bins when the time finally 
catches up.
     * Do not accept time values lower than some minimum configurable value. 
Users knowing what their embedded system may  do after boot (if they jump back 
to 1970 for example until NTP kicks in) may set this minimum threshold to say 
1971. We simply set it as a default to a recent time when this feature was 
enabled.
     * Always allow a user to inspect and reset any time-seq structure without 
having a need to recreate dbs or lose data. This is accomplished by adding two 
helper $DB apis:  `GET $db/_time_seq` and `DELETE $db/_time_seq`. This allows 
users to inspect and reset any time-seq structure if they can detect something 
unexpected happened with the time sync (say the year jump to 2100 for a while).
     * Time bins are rounded to whole hours. We do not need any precise second 
or even minute level accuracy there. Even if the accuracy is off by days and 
the user knows that (by say inspecting their couch logs which also emit 
timestamp based on the same OS timer) they may choose to only use since values 
that are longer than whole days.
    
   The 3rd commit implements thew new `$db/_time_seq` calls and the general 
fabric level integration of the new feature.
   
   Finally, after all that, the `_changes?since=YYYY-MM-DDTHH:MM:SSZ` streaming 
is implemented in the last commit. Due to all the preparatory steps the last 
commit is pretty simple. We essentially handle is like the value `now` value 
for descending changes feeds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to