Tom Lane wrote:
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
You're cavalierly waving away a whole boatload of problems that will
arise as soon as you start trying to make the index AMs play along
with this :-(.

It doesn't seem very hard.

The problem is that the index AMs are no longer in control of what goes
where within their indexes, which has always been their prerogative to
determine.  The fact that you think you can kluge btree to still work
doesn't mean that it will work for other AMs.

Well, it does work with all the existing AMs AFAICS. I do agree with the general point; it'd certainly be cleaner, more modular and more flexible if the AMs didn't need to know about the existence of the maps.

The idea that's becoming attractive to me while contemplating the
multiple-maps problem is that we should adopt something similar to
the old Mac OS idea of multiple "forks" in a relation.

Hmm. You also need to teach at least xlog.c and xlogutils.c about the map forks, for full page images and the invalid page tracking.

Well, you'd have to teach them something anyway, for any incarnation
of maps that they might need to update.

Umm, the WAL code doesn't care where the pages it operates on came from. Sure, we'll need rmgr-specific code that know what to do with the maps, but the full page image code would work without changes with the multiple RelFileNode approach.

The essential change with the map fork idea is that a RelFileNode no longer uniquely identifies a file on disk (ignoring the segmentation which is handled in smgr for now). Anything that operates on RelFileNodes, without any higher level information of what it is, needs to be modified to use RelFileNode+forkid instead. That includes at least the buffer manager, smgr, and the full page image code in xlog.c.

It's probably a pretty mechanical change, even though it affects a lot of code. We'd probably want to have a new struct, let's call it PhysFileId for now, for RelFileNode+forkid, and basically replace all occurrences of RelFileNode with PhysFileId in smgr, bufmgr and xlog code.

I also wonder what the performance impact of extending BufferTag is.

That's a fair objection, and obviously something we'd need to check.
But I don't recall seeing hash_any so high on any profile that I think
it'd be a big problem.

I do remember seeing hash_any in some oprofile runs. But that's fairly easy to test: we don't need to actually implement any of the stuff, other than add a field to BufferTag, and run pgbench.

My original thought was to have a separate RelFileNode for each of the maps. That would require no smgr or xlog changes, and not very many changes in the buffer manager, though I guess you'd more catalog changes. You had doubts about that on the previous thread (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but the "map forks" idea certainly seems much more invasive than that.

The main problems with that are (a) the need to expose every type of map
in pg_class and (b) the need to pass all those relfilenode numbers down
to pretty low levels of the system.

(a) is certainly a valid point. Regarding (b), I don't think the low level stuff (I assume you mean smgr, bufmgr, bgwriter, xlog by that) would need to be passed any additional relfilenode numbers. Or rather, they already work with relfilenodes, and they don't need to know whether the relfilenode is for an index, a heap, or an FSM attached to something else. The relfilenodes would be in RelationData, and we already have that around whenever we do anything that needs to differentiate between those.

Another consideration is which approach is easiest to debug. The "map fork" approach seems better on that front, as you can immediately see from the PhysFileId if a page is coming from an auxiliary map or the main data portion. That might turn out to be handy in the buffer manager or bgwriter as well; they don't currently have any knowledge of what a page contains.

The nice thing about the fork idea
is that you don't need any added info to uniquely identify what relation
you're working on.  The fork numbers would be hard-wired into whatever
code needed to know about particular forks.  (Of course, these same
advantages apply to using special space in an existing file.  I'm
just suggesting that we can keep these advantages without buying into
the restrictions that special space would have.)

I don't see that advantage. All the higher-level code that care which relation you're working on already have Relation around. All the lower-level stuff don't care.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to