On Tue, 10 Dec 2019 at 09:51, Andres Freund <and...@anarazel.de> wrote:
> Hi, > > I several times, most recently for the record format in the undo > patchset, wished for a fast variable width integer implementation for > postgres. Using very narrow integers, for space efficiency, solves the > space usage problem, but leads to extensibility / generality problems. > Yes. I've wanted flexible but efficiently packed integers quite a bit too, especially when working with wire protocols. Am I stabbing completely in the dark when wondering if this might be a step towards a way to lift the size limit on VARLENA Datums like bytea ? There are obvious practical concerns with doing so, given that our protocol offers no handle based lazy fetching for big VARLENA values, but that too needs a way to represent sizes sensibly and flexibly. > Even with those caveats, I think that's a pretty good result. Other > encodings were more expensive. And I think there's definitely some room > for optimization left. I don't feel at all qualified to question your analysis of the appropriate representation. But your explanation certainly makes a lot of sense as someone approaching the topic mostly fresh - I've done a bit with BCD but not much else. I assume we'd be paying a price in padding and alignment in most cases, and probably more memory copying, but these representations would likely be appearing mostly in places where other costs are overwhelmingly greater like network or disk I/O. If data lengths longer than that are required for a use case If baking a new variant integer format now, I think limiting it to 64 bits is probably a mistake given how long-lived PostgreSQL is, and how hard it can be to change things in the protocol, on disk, etc. > it > probably is better to either a) use the max-representable 8 byte integer > as an indicator that the length is stored or b) sacrifice another bit to > represent whether the integer is the data itself or the length. > I'd be inclined to suspect that (b) is likely worth doing. If nothing else because not being able to represent the full range of a 64-bit integer in the variant type is potentially going to be a seriously annoying hassle at points where we're interacting with places that could use the full width. We'd then have the potential for variant integers of > 2^64 but at least that's wholly under our control. I also routinely underestimate how truly huge a 64-bit integer really is. But even now 8 petabytes isn't as inconceivable as it used to be.... It mostly depends on how often you expect you'd be coming up on the boundaries where the extra bit would push you up a variant size. Do others see use in this? Yes. Very, very much yes. I'd be quick to want to expose it to SQL too. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise