nickva commented on PR #5603:
URL: https://github.com/apache/couchdb/pull/5603#issuecomment-3125870889
* I updated the algorithm used for merging/rollup. It's now a bit simpler
the new algorithm simply merges bins together. First the shortest intervals
(multiple hours), then longer ones (multiple days) etc.
* Removed a few more mentions of ISO 8601 vs RFC 3339, focused more on "here
is the accepted format". As we don't really accept all the possible formats
anyway.
* Added docs for the new config values and the http APIs.
* Added more test coverage.
* Ran a downgrade test. Updated a db with the PR branch. Switched to main,
then verified it was possible to read and write the same dbs without any issue.
Ran the quick and dirty built-in fabric_bench test. Used q=8 and small docs.
Didn't noticed any significant difference between main and PR branch:
* main
_bulk_get rate (hz): 29000, 27000, 26000, 26000, 29000, 30000
single doc update (hz): 320, 330, 350, 320, 330
* PR
_bulk_get rate (hz): 30000, 30000, 30000, 29000, 26000, 27000
single doc update (hz): 340, 310, 310, 330, 340, 330
* To get a feel for how the rollup works ran a test which update the
time-seq data structure once per hour for 1 million hours:
```
3000-01-01T00:00:00Z -> 82176
3009-05-18T00:00:00Z -> 83712
3018-12-05T00:00:00Z -> 85584
3028-09-09T00:00:00Z -> 82416
3038-02-03T00:00:00Z -> 85488
3047-11-05T00:00:00Z -> 82704
3057-04-12T00:00:00Z -> 82944
3066-09-28T00:00:00Z -> 85872
3076-07-15T00:00:00Z -> 83520
3086-01-24T00:00:00Z -> 41472
3090-10-18T00:00:00Z -> 41472
3095-07-12T00:00:00Z -> 41760
3100-04-17T00:00:00Z -> 41472
3105-01-09T00:00:00Z -> 20736
3107-05-23T00:00:00Z -> 20736
3109-10-03T00:00:00Z -> 10368
3110-12-09T00:00:00Z -> 10368
3112-02-14T00:00:00Z -> 5184
3112-09-17T00:00:00Z -> 3456
3113-02-08T00:00:00Z -> 864
3113-03-16T00:00:00Z -> 864
3113-04-21T00:00:00Z -> 864
3113-05-27T00:00:00Z -> 864
3113-07-02T00:00:00Z -> 864
3113-08-07T00:00:00Z -> 288
3113-08-19T00:00:00Z -> 288
3113-08-31T00:00:00Z -> 288
3113-09-12T00:00:00Z -> 288
3113-09-24T00:00:00Z -> 288
3113-10-06T00:00:00Z -> 288
3113-10-18T00:00:00Z -> 288
3113-10-30T00:00:00Z -> 288
3113-11-11T00:00:00Z -> 288
3113-11-23T00:00:00Z -> 288
3113-12-05T00:00:00Z -> 288
3113-12-17T00:00:00Z -> 288
3113-12-29T00:00:00Z -> 96
3114-01-02T00:00:00Z -> 48
3114-01-04T00:00:00Z -> 48
3114-01-06T00:00:00Z -> 48
3114-01-08T00:00:00Z -> 48
3114-01-10T00:00:00Z -> 48
3114-01-12T00:00:00Z -> 48
3114-01-14T00:00:00Z -> 48
3114-01-16T00:00:00Z -> 48
3114-01-18T00:00:00Z -> 48
3114-01-20T00:00:00Z -> 24
3114-01-21T00:00:00Z -> 24
3114-01-22T00:00:00Z -> 24
3114-01-23T00:00:00Z -> 24
3114-01-24T00:00:00Z -> 24
3114-01-25T00:00:00Z -> 24
3114-01-26T00:00:00Z -> 24
3114-01-27T00:00:00Z -> 24
3114-01-28T00:00:00Z -> 24
3114-01-29T00:00:00Z -> 24
3114-01-30T00:00:00Z -> 6
3114-01-30T06:00:00Z -> 6
3114-01-30T12:00:00Z -> 3
3114-01-30T15:00:00Z -> 1
```
Noticed a few things:
* During the last day there are 4 individual intervals. So we could
determine which changes occurred about 3 to 6 hours apart.
* There are 11 individual days, then days are combined into pairs, so if
we ask for changes ``since=3114-01-09T00:00:00Z`` we may also get changes from
``3114-01-08T00:00:00Z``
* Most of the bins are devoted to keeping track the sequences in the
current year. That's exactly what we'd expect. We can efficiently get the
changes since.
* Even after 100 years we can still target intervals less than 10 years
apart
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]