Re: [HACKERS] Rewriting Free Space Map

2008-03-20 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 More precisely, on CVS HEAD it takes between 7.1-7.2%. After extending 
 BufferTag with one uint32, it takes 7.4-7.5%. So the effect is 
 measurable if you try hard enough, but not anything to get worried about.

And if we adopt the allegedly-faster hash_any that's in the patch
queue, the difference should get smaller yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-20 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

I also wonder what the performance impact of extending BufferTag is.


That's a fair objection, and obviously something we'd need to check.
But I don't recall seeing hash_any so high on any profile that I think
it'd be a big problem.


I do remember seeing hash_any in some oprofile runs. But that's fairly 
easy to test: we don't need to actually implement any of the stuff, 
other than add a field to BufferTag, and run pgbench.


I tried that. hash_any wasn't significant on pgbench, but I was able to 
construct a test case where it is. It goes like this:


BEGIN;
CREATE TEMPORARY TABLE foo (id int4);
INSERT INTO foo SELECT a FROM generate_series(1, 1) a;
INSERT INTO foo SELECT * FROM foo;
INSERT INTO foo SELECT * FROM foo;
INSERT INTO foo SELECT * FROM foo;
... (repeat multiple times)

oprofile says that hash_any consumes ~7 % of CPU time on that test on my 
laptop.


More precisely, on CVS HEAD it takes between 7.1-7.2%. After extending 
BufferTag with one uint32, it takes 7.4-7.5%. So the effect is 
measurable if you try hard enough, but not anything to get worried about.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-18 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Mar 17, 2008 at 01:23:46PM -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  The idea that's becoming attractive to me while contemplating the
  multiple-maps problem is that we should adopt something similar to
  the old Mac OS idea of multiple forks in a relation.
 
  Can we call them maps or metadata maps? forks sounds weird.

Actually, I do like forks, but to add a little bit diversity:

facets? aspects?

FWIW, the idea of mapping a relation to a directory quite compelling.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFH33c7Bcgs9XrR2kYRAuBQAJ9MjISqgn37umRIydxtUBYONORwDgCbBKkE
y7adUy7s/30TxQPQiJZZejA=
=PAQ9
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Hannu Krosing

On Sun, 2008-03-16 at 21:33 -0300, Alvaro Herrera wrote:
 Tom Lane wrote:
 
  The idea that's becoming attractive to me while contemplating the
  multiple-maps problem is that we should adopt something similar to
  the old Mac OS idea of multiple forks in a relation.  In addition
  to the main data fork which contains the same info as now, there could
  be one or more map forks which are separate files in the filesystem.

Are'nt we in a way doing this for indexes ?

 I think something similar could be used to store tuple visibility bits
 separately from heap tuple data itself, so +1 to this idea.

Not just bits, but whole visibility info (xmin,xmax,tmin,tmax, plus
bits) should be stored separately.

A separate fork for visibility should be organized as a b-tree index
(as we already have well-honed mechanisms for dealing with those
effectively) but visibility fork is stored in a compressed form by
storing ranges of all-visible or all-deleted tuples as two endpoints
only and also the tree is reorganized when possible similar to what we
currently do for HOT updates.

This will keep the visibility index really small for cases with little
updates, most likely one or two pages regardless of table size.

One important difference from indexes is that visibility info should be
stored first, before writing data to heap and creating ordinary index
entries.

 (The rough idea in my head was that you can do an indexscan and look
 up visibility bits without having to pull the whole heap along; and
 visibility updates are also cheaper, whether they come from indexscans
 or heap scans.  Of course, the implicit cost is that a seqscan needs to
 fetch the visibility pages, too; and the locking is more complex.)

another cost is heavy inserting/updating where there will probably be
more lock contention as visibility info for new tuples will more often
land on the same visibility pages due to visibility info being generally
smaller.

Of course, with visibility info in a separate fork, very narrow tables
will have the ratios reversed - for one byte wide table visibility info
will be a few times bigger than actual data, at least initially before
compression has kicked in.


Hannu











-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 On Sun, 2008-03-16 at 21:33 -0300, Alvaro Herrera wrote:
 Tom Lane wrote:
 The idea that's becoming attractive to me while contemplating the
 multiple-maps problem is that we should adopt something similar to
 the old Mac OS idea of multiple forks in a relation.

 Are'nt we in a way doing this for indexes ?

Not really --- indexes are closer to being independent entities, since
they have their own relfilenode values, own pg_class entries, etc.  What
I'm imagining here is something that's so tightly tied to the core heap
that there's no value in managing it as a distinct entity, thus the idea
of same relfilenode with a different extension.  The existence of
multiple forks in a relation wouldn't be exposed at all at the SQL
level.

 I think something similar could be used to store tuple visibility bits
 separately from heap tuple data itself, so +1 to this idea.

 Not just bits, but whole visibility info (xmin,xmax,tmin,tmax, plus
 bits) should be stored separately.

I'm entirely un-sold on this idea, but yeah it would be something that
would be possible to experiment with once we have a multi-fork
infrastructure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 You're cavalierly waving away a whole boatload of problems that will
 arise as soon as you start trying to make the index AMs play along
 with this :-(.  

 It doesn't seem very hard.

The problem is that the index AMs are no longer in control of what goes
where within their indexes, which has always been their prerogative to
determine.  The fact that you think you can kluge btree to still work
doesn't mean that it will work for other AMs.

 Also, I don't think that use the special space will scale to handle
 other kinds of maps such as the proposed dead space map.  (This is
 exactly why I said the other day that we need a design roadmap for all
 these ideas.)

 It works for anything that scales linearly with the relation itself. The 
 proposed FSM and visibility map both fall into that category.

It can work only with prearrangement among all the maps that are trying
to coexist in the same special space.  Every time a new one comes along,
we'd have to reconsider the space allocation, re-optimize tradeoffs,
and force an initdb that could not possibly be implemented in-place.

If we had a short list of maps in mind with no real prospect of
changing, then I think this would be acceptable; but that doesn't seem
to be the case.  It's certainly foolish to start detail design on
something like this when we don't even have the roadmap I asked
for about what sorts of maps we are agreed we want to have.

 The idea that's becoming attractive to me while contemplating the
 multiple-maps problem is that we should adopt something similar to
 the old Mac OS idea of multiple forks in a relation.

 Hmm. You also need to teach at least xlog.c and xlogutils.c about the 
 map forks, for full page images and the invalid page tracking.

Well, you'd have to teach them something anyway, for any incarnation
of maps that they might need to update.

 I also wonder what the performance impact of extending BufferTag is.

That's a fair objection, and obviously something we'd need to check.
But I don't recall seeing hash_any so high on any profile that I think
it'd be a big problem.

 My original thought was to have a separate RelFileNode for each of the 
 maps. That would require no smgr or xlog changes, and not very many 
 changes in the buffer manager, though I guess you'd more catalog 
 changes. You had doubts about that on the previous thread 
 (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
 the map forks idea certainly seems much more invasive than that.

The main problems with that are (a) the need to expose every type of map
in pg_class and (b) the need to pass all those relfilenode numbers down
to pretty low levels of the system.  The nice thing about the fork idea
is that you don't need any added info to uniquely identify what relation
you're working on.  The fork numbers would be hard-wired into whatever
code needed to know about particular forks.  (Of course, these same
advantages apply to using special space in an existing file.  I'm
just suggesting that we can keep these advantages without buying into
the restrictions that special space would have.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Lars-Erik Bjørk

On Mon, 2008-03-17 at 09:29 -0400, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  On Sun, 2008-03-16 at 21:33 -0300, Alvaro Herrera wrote:
  Tom Lane wrote:
  The idea that's becoming attractive to me while contemplating the
  multiple-maps problem is that we should adopt something similar to
  the old Mac OS idea of multiple forks in a relation.
 
  Are'nt we in a way doing this for indexes ?
 
 Not really --- indexes are closer to being independent entities, since
 they have their own relfilenode values, own pg_class entries, etc.  What
 I'm imagining here is something that's so tightly tied to the core heap
 that there's no value in managing it as a distinct entity, thus the idea
 of same relfilenode with a different extension.  The existence of
 multiple forks in a relation wouldn't be exposed at all at the SQL
 level.
 
  I think something similar could be used to store tuple visibility bits
  separately from heap tuple data itself, so +1 to this idea.
 
  Not just bits, but whole visibility info (xmin,xmax,tmin,tmax, plus
  bits) should be stored separately.
 
 I'm entirely un-sold on this idea, but yeah it would be something that
 would be possible to experiment with once we have a multi-fork
 infrastructure.
 
   regards, tom lane
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
I've started working on revamping Free Space Map, using the approach 
where we store a map of heap pages on every nth heap page. What we need 
now is discussion on the details of how exactly it should work.


You're cavalierly waving away a whole boatload of problems that will
arise as soon as you start trying to make the index AMs play along
with this :-(.  


It doesn't seem very hard. An indexam wanting to use FSM needs a little 
bit of code where the relation is extended, to let the FSM initialize 
FSM pages. And then there's the B-tree metapage issue I mentioned. But 
that's all, AFAICS.



Hash for instance has very narrow-minded ideas about
page allocation within its indexes.


Hash doesn't use FSM at all.


Also, I don't think that use the special space will scale to handle
other kinds of maps such as the proposed dead space map.  (This is
exactly why I said the other day that we need a design roadmap for all
these ideas.)


It works for anything that scales linearly with the relation itself. The 
proposed FSM and visibility map both fall into that category.


A separate file is certainly more flexible. I was leaning towards that 
option originally 
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00142.php) for 
that reason.



The idea that's becoming attractive to me while contemplating the
multiple-maps problem is that we should adopt something similar to
the old Mac OS idea of multiple forks in a relation.  In addition
to the main data fork which contains the same info as now, there could
be one or more map forks which are separate files in the filesystem.
They are named by relfilenode plus an extension, for instance a relation
with relfilenode NNN would have a data fork in file NNN (plus perhaps
NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
NNN.map.1 etc as needed).  We'd have to add one more field to buffer
lookup keys (BufferTag) to disambiguate which fork the referenced page
is in.  Having bitten that bullet, though, the idea trivially scales to
any number of map forks with potentially different space requirements
and different locking and WAL-logging requirements.


Hmm. You also need to teach at least xlog.c and xlogutils.c about the 
map forks, for full page images and the invalid page tracking. I also 
wonder what the performance impact of extending BufferTag is.


My original thought was to have a separate RelFileNode for each of the 
maps. That would require no smgr or xlog changes, and not very many 
changes in the buffer manager, though I guess you'd more catalog 
changes. You had doubts about that on the previous thread 
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
the map forks idea certainly seems much more invasive than that.


I like the map forks idea; it groups the maps nicely at the filesystem 
level, and I can see it being useful for all kinds of things in the 
future. The question is, is it really worth the extra code churn? If you 
think it is, I can try that approach.



Another possible advantage is that a new map fork could be added to an
existing table without much trouble.  Which is certainly something we'd
need if we ever hope to get update-in-place working.


Yep.


The main disadvantage I can see is that for very small tables, the
percentage overhead from multiple map forks of one page apiece is
annoyingly high.  However, most of the point of a map disappears if
the table is small, so we might finesse that by not creating any maps
until the table has reached some minimum size.


Yeah, the map fork idea is actually better than the every nth heap 
page approach from that point of view.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Simon Riggs
On Sun, 2008-03-16 at 21:33 -0300, Alvaro Herrera wrote:
 Tom Lane wrote:
 
  The idea that's becoming attractive to me while contemplating the
  multiple-maps problem is that we should adopt something similar to
  the old Mac OS idea of multiple forks in a relation.  In addition
  to the main data fork which contains the same info as now, there could
  be one or more map forks which are separate files in the filesystem.
 
 I think something similar could be used to store tuple visibility bits
 separately from heap tuple data itself, so +1 to this idea.
 
 (The rough idea in my head was that you can do an indexscan and look
 up visibility bits without having to pull the whole heap along; and
 visibility updates are also cheaper, whether they come from indexscans
 or heap scans.  Of course, the implicit cost is that a seqscan needs to
 fetch the visibility pages, too; and the locking is more complex.)

I very much like the idea of a generic method for including additional
bulk metadata for a relation (heap or index). That neatly provides a
general infrastructure for lots of clever things such as dead space,
visibility or other properties, while at the same time maintaining
modularity. 

Can we call them maps or metadata maps? forks sounds weird.

We don't need to assume anything about the maps themselves at this
stage, so we might imagine tightly coupled maps that are always updated
as a relation changes, or loosely coupled maps that are lazily updated
by background processes. Autovacuum then becomes the vehicle by which we
execute map maintenance procedures, defined according to which AMs are
installed and what relation options are set. So we have a completely
generalised data/metadata storage infrastructure.

Sensibly arranged this could provide an entry point for powerful new
features within existing and future index AMs. It also sounds like it
might avoid a whole class of bugs and special cases that I regrettably
foresee would be unavoidable in Heikki's proposal.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:
 My original thought was to have a separate RelFileNode for each of the 
 maps. That would require no smgr or xlog changes, and not very many 
 changes in the buffer manager, though I guess you'd more catalog 
 changes. You had doubts about that on the previous thread 
 (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
 the map forks idea certainly seems much more invasive than that.

 The main problems with that are (a) the need to expose every type of map
 in pg_class and (b) the need to pass all those relfilenode numbers down
 to pretty low levels of the system.  The nice thing about the fork idea
 is that you don't need any added info to uniquely identify what relation
 you're working on.  The fork numbers would be hard-wired into whatever
 code needed to know about particular forks.  (Of course, these same
 advantages apply to using special space in an existing file.  I'm
 just suggesting that we can keep these advantages without buying into
 the restrictions that special space would have.)

One advantage of using separate relfilenodes would be that if we need to
regenerate a map we could do it in a new relfilenode and swap it in like we do
with heap rewrites.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Andrew Dunstan



Gregory Stark wrote:

Tom Lane [EMAIL PROTECTED] writes:

  

Heikki Linnakangas [EMAIL PROTECTED] writes:

My original thought was to have a separate RelFileNode for each of the 
maps. That would require no smgr or xlog changes, and not very many 
changes in the buffer manager, though I guess you'd more catalog 
changes. You had doubts about that on the previous thread 
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
the map forks idea certainly seems much more invasive than that.
  

The main problems with that are (a) the need to expose every type of map
in pg_class and (b) the need to pass all those relfilenode numbers down
to pretty low levels of the system.  The nice thing about the fork idea
is that you don't need any added info to uniquely identify what relation
you're working on.  The fork numbers would be hard-wired into whatever
code needed to know about particular forks.  (Of course, these same
advantages apply to using special space in an existing file.  I'm
just suggesting that we can keep these advantages without buying into
the restrictions that special space would have.)



One advantage of using separate relfilenodes would be that if we need to
regenerate a map we could do it in a new relfilenode and swap it in like we do
with heap rewrites.

  


Why can't you just do that with a different extension and file rename? 
You'd need an exclusive lock while swapping in the new map, but you need 
that anyway, IIRC, and this way you don't even need a catalog change.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 One advantage of using separate relfilenodes would be that if we need to
 regenerate a map we could do it in a new relfilenode and swap it in like we do
 with heap rewrites.

You could probably do that using a temporary fork number, if the
situation ever came up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The idea that's becoming attractive to me while contemplating the
 multiple-maps problem is that we should adopt something similar to
 the old Mac OS idea of multiple forks in a relation.

 Can we call them maps or metadata maps? forks sounds weird.

I'm not wedded to forks, that's just the name that was used in the
only previous example I've seen.  Classic Mac had a resource fork
and a data fork within each file.

Don't think I like maps though, as (a) that prejudges what the
alternate forks might be used for, and (b) the name fails to be
inclusive of the data fork.  Other suggestions anyone?

BTW, thinking about the Mac precedent a little more, I believe
the way they grafted that Classic concept onto BSD was that
applications (which the user thinks of as single objects) are
now directories with multiple files inside them.  Probably it'd be
overkill to think of turning each relation into a subdirectory, but
then again maybe not?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Simon Riggs
On Mon, 2008-03-17 at 13:23 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  The idea that's becoming attractive to me while contemplating the
  multiple-maps problem is that we should adopt something similar to
  the old Mac OS idea of multiple forks in a relation.
 
  Can we call them maps or metadata maps? forks sounds weird.
 
 I'm not wedded to forks, that's just the name that was used in the
 only previous example I've seen.  Classic Mac had a resource fork
 and a data fork within each file.

Layer? Slab? Sheet? Strata/um? Overlay?

Layer makes sense to me because of the way GIS and CAD systems work.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread David Fetter
On Mon, Mar 17, 2008 at 01:23:46PM -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  The idea that's becoming attractive to me while contemplating
  the multiple-maps problem is that we should adopt something
  similar to the old Mac OS idea of multiple forks in a
  relation.
 
  Can we call them maps or metadata maps? forks sounds weird.
 
 I'm not wedded to forks, that's just the name that was used in the
 only previous example I've seen.  Classic Mac had a resource fork
 and a data fork within each file.
 
 Don't think I like maps though, as (a) that prejudges what the
 alternate forks might be used for, and (b) the name fails to be
 inclusive of the data fork.  Other suggestions anyone?

Segment?  Section?  Module?

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Jochem van Dieten
On Mon, Mar 17, 2008 at 6:23 PM, Tom Lane wrote:
 Simon Riggs writes:

 Can we call them maps or metadata maps? forks sounds weird.

  I'm not wedded to forks, that's just the name that was used in the
  only previous example I've seen.  Classic Mac had a resource fork
  and a data fork within each file.

Microsoft / NTFS calls them Data Streams:
http://www.wikistc.org/wiki/Alternate_data_streams

Jochem

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Mark Cave-Ayland

On Mon, 2008-03-17 at 13:23 -0400, Tom Lane wrote:

 I'm not wedded to forks, that's just the name that was used in the
 only previous example I've seen.  Classic Mac had a resource fork
 and a data fork within each file.

 Don't think I like maps though, as (a) that prejudges what the
 alternate forks might be used for, and (b) the name fails to be
 inclusive of the data fork.  Other suggestions anyone?

I believe that in the world of NTFS the concept is called streams:
http://support.microsoft.com/kb/105763.


HTH,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 I'm not wedded to forks, that's just the name that was used in the
 only previous example I've seen.  Classic Mac had a resource fork
 and a data fork within each file.

fwiw forks are not unique to MacOS, c.f.:
http://en.wikipedia.org/wiki/Fork_%28filesystem%29

However I'm not sure reusing any of these terms is such a hot idea. All it's
going to do is confuse someone into thinking we're actually talking about HFS
forks or NTFS data streams or whatever. Better to pick a term that isn't
already being used for such things so people don't get misled.

 BTW, thinking about the Mac precedent a little more, I believe
 the way they grafted that Classic concept onto BSD was that
 applications (which the user thinks of as single objects) are
 now directories with multiple files inside them.  Probably it'd be
 overkill to think of turning each relation into a subdirectory, but
 then again maybe not?

Well there are upsides and downsides. Many OSes have difficulties when you
have many files in a single directory. This would tend to reduce that. On the
other hand it would drastically increase the number of directory files the OS
has to keep track of and the total number of inodes being referenced.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

You're cavalierly waving away a whole boatload of problems that will
arise as soon as you start trying to make the index AMs play along
with this :-(.  



It doesn't seem very hard.


The problem is that the index AMs are no longer in control of what goes
where within their indexes, which has always been their prerogative to
determine.  The fact that you think you can kluge btree to still work
doesn't mean that it will work for other AMs.


Well, it does work with all the existing AMs AFAICS. I do agree with the 
general point; it'd certainly be cleaner, more modular and more flexible 
if the AMs didn't need to know about the existence of the maps.



The idea that's becoming attractive to me while contemplating the
multiple-maps problem is that we should adopt something similar to
the old Mac OS idea of multiple forks in a relation.


Hmm. You also need to teach at least xlog.c and xlogutils.c about the 
map forks, for full page images and the invalid page tracking.


Well, you'd have to teach them something anyway, for any incarnation
of maps that they might need to update.


Umm, the WAL code doesn't care where the pages it operates on came from. 
Sure, we'll need rmgr-specific code that know what to do with the maps, 
but the full page image code would work without changes with the 
multiple RelFileNode approach.


The essential change with the map fork idea is that a RelFileNode no 
longer uniquely identifies a file on disk (ignoring the segmentation 
which is handled in smgr for now). Anything that operates on 
RelFileNodes, without any higher level information of what it is, needs 
to be modified to use RelFileNode+forkid instead. That includes at least 
the buffer manager, smgr, and the full page image code in xlog.c.


It's probably a pretty mechanical change, even though it affects a lot 
of code. We'd probably want to have a new struct, let's call it 
PhysFileId for now, for RelFileNode+forkid, and basically replace all 
occurrences of RelFileNode with PhysFileId in smgr, bufmgr and xlog code.



I also wonder what the performance impact of extending BufferTag is.


That's a fair objection, and obviously something we'd need to check.
But I don't recall seeing hash_any so high on any profile that I think
it'd be a big problem.


I do remember seeing hash_any in some oprofile runs. But that's fairly 
easy to test: we don't need to actually implement any of the stuff, 
other than add a field to BufferTag, and run pgbench.


My original thought was to have a separate RelFileNode for each of the 
maps. That would require no smgr or xlog changes, and not very many 
changes in the buffer manager, though I guess you'd more catalog 
changes. You had doubts about that on the previous thread 
(http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
the map forks idea certainly seems much more invasive than that.


The main problems with that are (a) the need to expose every type of map
in pg_class and (b) the need to pass all those relfilenode numbers down
to pretty low levels of the system. 


(a) is certainly a valid point. Regarding (b), I don't think the low 
level stuff (I assume you mean smgr, bufmgr, bgwriter, xlog by that) 
would need to be passed any additional relfilenode numbers. Or rather, 
they already work with relfilenodes, and they don't need to know whether 
the relfilenode is for an index, a heap, or an FSM attached to something 
else. The relfilenodes would be in RelationData, and we already have 
that around whenever we do anything that needs to differentiate between 
those.


Another consideration is which approach is easiest to debug. The map 
fork approach seems better on that front, as you can immediately see 
from the PhysFileId if a page is coming from an auxiliary map or the 
main data portion. That might turn out to be handy in the buffer manager 
or bgwriter as well; they don't currently have any knowledge of what a 
page contains.



The nice thing about the fork idea
is that you don't need any added info to uniquely identify what relation
you're working on.  The fork numbers would be hard-wired into whatever
code needed to know about particular forks.  (Of course, these same
advantages apply to using special space in an existing file.  I'm
just suggesting that we can keep these advantages without buying into
the restrictions that special space would have.)


I don't see that advantage. All the higher-level code that care which 
relation you're working on already have Relation around. All the 
lower-level stuff don't care.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Dawid Kuroczko
On Mon, Mar 17, 2008 at 6:23 PM, Tom Lane [EMAIL PROTECTED] wrote:
  I'm not wedded to forks, that's just the name that was used in the
  only previous example I've seen.  Classic Mac had a resource fork
  and a data fork within each file.

  Don't think I like maps though, as (a) that prejudges what the
  alternate forks might be used for, and (b) the name fails to be
  inclusive of the data fork.  Other suggestions anyone?

Shadow?  As each err, fork trails each relfilenode? (Or perhaps shade).

Hints?  As something more generic than map?

   Regards,
 Dawid

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-16 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I've started working on revamping Free Space Map, using the approach 
 where we store a map of heap pages on every nth heap page. What we need 
 now is discussion on the details of how exactly it should work.

You're cavalierly waving away a whole boatload of problems that will
arise as soon as you start trying to make the index AMs play along
with this :-(.  Hash for instance has very narrow-minded ideas about
page allocation within its indexes.

Also, I don't think that use the special space will scale to handle
other kinds of maps such as the proposed dead space map.  (This is
exactly why I said the other day that we need a design roadmap for all
these ideas.)

The idea that's becoming attractive to me while contemplating the
multiple-maps problem is that we should adopt something similar to
the old Mac OS idea of multiple forks in a relation.  In addition
to the main data fork which contains the same info as now, there could
be one or more map forks which are separate files in the filesystem.
They are named by relfilenode plus an extension, for instance a relation
with relfilenode NNN would have a data fork in file NNN (plus perhaps
NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
NNN.map.1 etc as needed).  We'd have to add one more field to buffer
lookup keys (BufferTag) to disambiguate which fork the referenced page
is in.  Having bitten that bullet, though, the idea trivially scales to
any number of map forks with potentially different space requirements
and different locking and WAL-logging requirements.

Another possible advantage is that a new map fork could be added to an
existing table without much trouble.  Which is certainly something we'd
need if we ever hope to get update-in-place working.

The main disadvantage I can see is that for very small tables, the
percentage overhead from multiple map forks of one page apiece is
annoyingly high.  However, most of the point of a map disappears if
the table is small, so we might finesse that by not creating any maps
until the table has reached some minimum size.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-16 Thread Alvaro Herrera
Tom Lane wrote:

 The idea that's becoming attractive to me while contemplating the
 multiple-maps problem is that we should adopt something similar to
 the old Mac OS idea of multiple forks in a relation.  In addition
 to the main data fork which contains the same info as now, there could
 be one or more map forks which are separate files in the filesystem.

I think something similar could be used to store tuple visibility bits
separately from heap tuple data itself, so +1 to this idea.

(The rough idea in my head was that you can do an indexscan and look
up visibility bits without having to pull the whole heap along; and
visibility updates are also cheaper, whether they come from indexscans
or heap scans.  Of course, the implicit cost is that a seqscan needs to
fetch the visibility pages, too; and the locking is more complex.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers