nickva commented on PR #5603:
URL: https://github.com/apache/couchdb/pull/5603#issuecomment-3125870889

    * I updated the algorithm used for merging/rollup.  It's now a bit simpler 
the new algorithm simply merges bins together. First the shortest intervals 
(multiple hours), then longer ones (multiple days) etc.
   
   * Removed a few more mentions of ISO 8601 vs RFC 3339, focused more on "here 
is the accepted format". As we don't really accept all the possible formats 
anyway.
   
   * Added docs for the new config values and the http APIs.
   
   * Added more test coverage.
   
   * Ran a downgrade test. Updated a db with the PR branch. Switched to main, 
then verified it was possible to read and write the same dbs without any issue.
   
   Ran the quick and dirty built-in fabric_bench test. Used q=8 and small docs. 
Didn't noticed any significant difference between main and PR branch:
   
     * main 
        _bulk_get rate (hz): 29000, 27000, 26000, 26000, 29000, 30000
       single doc update (hz): 320, 330, 350, 320, 330
   
    * PR
       _bulk_get rate (hz): 30000, 30000, 30000, 29000, 26000, 27000
      single doc update (hz): 340, 310, 310, 330, 340, 330
   
   * To get a feel for how the rollup works ran a test which update the 
time-seq data structure once per hour for 1 million hours:
   
   ```
      3000-01-01T00:00:00Z -> 82176
      3009-05-18T00:00:00Z -> 83712
      3018-12-05T00:00:00Z -> 85584
      3028-09-09T00:00:00Z -> 82416
      3038-02-03T00:00:00Z -> 85488
      3047-11-05T00:00:00Z -> 82704
      3057-04-12T00:00:00Z -> 82944
      3066-09-28T00:00:00Z -> 85872
      3076-07-15T00:00:00Z -> 83520
      3086-01-24T00:00:00Z -> 41472
      3090-10-18T00:00:00Z -> 41472
      3095-07-12T00:00:00Z -> 41760
      3100-04-17T00:00:00Z -> 41472
      3105-01-09T00:00:00Z -> 20736
      3107-05-23T00:00:00Z -> 20736
      3109-10-03T00:00:00Z -> 10368
      3110-12-09T00:00:00Z -> 10368
      3112-02-14T00:00:00Z -> 5184
      3112-09-17T00:00:00Z -> 3456
      3113-02-08T00:00:00Z -> 864
      3113-03-16T00:00:00Z -> 864
      3113-04-21T00:00:00Z -> 864
      3113-05-27T00:00:00Z -> 864
      3113-07-02T00:00:00Z -> 864
      3113-08-07T00:00:00Z -> 288
      3113-08-19T00:00:00Z -> 288
      3113-08-31T00:00:00Z -> 288
      3113-09-12T00:00:00Z -> 288
      3113-09-24T00:00:00Z -> 288
      3113-10-06T00:00:00Z -> 288
      3113-10-18T00:00:00Z -> 288
      3113-10-30T00:00:00Z -> 288
      3113-11-11T00:00:00Z -> 288
      3113-11-23T00:00:00Z -> 288
      3113-12-05T00:00:00Z -> 288
      3113-12-17T00:00:00Z -> 288
      3113-12-29T00:00:00Z -> 96
      3114-01-02T00:00:00Z -> 48
      3114-01-04T00:00:00Z -> 48
      3114-01-06T00:00:00Z -> 48
      3114-01-08T00:00:00Z -> 48
      3114-01-10T00:00:00Z -> 48
      3114-01-12T00:00:00Z -> 48
      3114-01-14T00:00:00Z -> 48
      3114-01-16T00:00:00Z -> 48
      3114-01-18T00:00:00Z -> 48
      3114-01-20T00:00:00Z -> 24
      3114-01-21T00:00:00Z -> 24
      3114-01-22T00:00:00Z -> 24
      3114-01-23T00:00:00Z -> 24
      3114-01-24T00:00:00Z -> 24
      3114-01-25T00:00:00Z -> 24
      3114-01-26T00:00:00Z -> 24
      3114-01-27T00:00:00Z -> 24
      3114-01-28T00:00:00Z -> 24
      3114-01-29T00:00:00Z -> 24
      3114-01-30T00:00:00Z -> 6
      3114-01-30T06:00:00Z -> 6
      3114-01-30T12:00:00Z -> 3
      3114-01-30T15:00:00Z -> 1
   ```
   
   Noticed a few things:
     * During the last day there are 4 individual intervals. So we could 
determine which changes occurred about 3 to 6 hours apart.
     * There are 11 individual days, then days are combined into pairs, so if 
we ask for changes ``since=3114-01-09T00:00:00Z`` we may also get changes from 
``3114-01-08T00:00:00Z``
     * Most of the bins are devoted to keeping track the sequences in the 
current year. That's exactly what we'd expect. We can efficiently get the 
changes since.
     * Even after 100 years we can still target intervals less than 10 years 
apart
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to