Anton,
> >> The solution which Anton suggested does not look easy because it will > most likely significantly hurt performance > Mostly agree here, but what drop do we expect? What price do we ready to > pay? > Not sure, but seems some vendors ready to pay, for example, 5% drop for > this. 5% may be a big drop for some use-cases, so I think we should look at how to improve performance, not how to make it worse. > > >> it is hard to maintain a data structure to choose "page from free-list > with enough space closest to the beginning of the file". > We can just split each free-list bucket to the couple and use first for > pages in the first half of the file and the second for the last. > Only two buckets required here since, during the file shrink, first > bucket's window will be shrank too. > Seems, this give us the same price on put, just use the first bucket in > case it's not empty. > Remove price (with merge) will be increased, of course. > > The compromise solution is to have priority put (to the first path of the > file), with keeping removal as is, and schedulable per-page migration for > the rest of the data during the low activity period. > Free lists are large and slow by themselves, it is expensive to checkpoint and read them on start, so as a long-term solution I would look into removing them. Moreover, not sure if adding yet another background process will improve the codebase reliability and simplicity. If we want to go the hard path, I would look at free page tracking bitmap - a special bitmask page, where each page in an adjacent block is marked as 0 if it has free space more than a certain configurable threshold (say, 80%) - free, and 1 if less (full). Some vendors have successfully implemented this approach, which looks much more promising, but harder to implement. --AG
