subject:"Re\: \[HACKERS\] Another idea for dealing with cmin\/cmax"

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-10-05 Thread Jim Nasby


On Oct 3, 2006, at 2:23 PM, Gregory Stark wrote:
If the space set aside for these transaction ids is full when  
you're inserting
i suppose you could just go back to the FSM for another page. But I  
don't see
any way out when you're deleting. You have to mark xmax one way or  
another and
if there's no space left in the footer and you only have 4 bits in  
the tuple

what are you going to do?

As an aside doing vacuum freeze more aggressively might reduce the  
pressure on

these ITL slots.

But I don't see any way to guarantee a slot is available for xmax when
deleting. We would need some sort of scheme where the space for  
transaction
ids is able to grow but we're already growing from both ends of the  
page. We
would either have to interleave transaction ids with line pointers  
or store

them on another special page somewhere.


Well, worst-case you could just re-do the whole page if you need to  
expand the list of transaction slots; I don't think that's a huge  
deal. What did have me baffled was how to deal with xmax though,  
since (as you mentioned), you can end up in a situation where you  
can't delete a tuple because there's no more room on the page for  
another xmax.


But I just thought of a way around that which might be better than a  
separate store for transaction info: allow for moving a tuple off the  
current page by placing a link to it's new location, similar to how  
ctid works. We probably wouldn't want to try and cram that into the  
item list, but I think we should be able to create a special version  
of a tuple header (AddressForwardingHeader) that simply states the  
tuple has moved to this new ctid; go there.


Of course, anytime you have to follow that link you're going to pay a  
penalty, but I think this should only be needed when trying to delete  
a tuple on a page that's basically full. Theoretically, there  
shouldn't be too many people trying to hit that deleted tuple, but to  
further reduce the number of people hitting it, we could include the  
visibility info (or a pointer to it) in the AddressForwardingHeader.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-10-03 Thread Gregory Stark


ITAGAKI Takahiro [EMAIL PROTECTED] writes:

 ITL-like approach is more efficient than per-tuple XIDs
 unless all tuples in a page are locked at the same time.
 However, MAXTRANS and PCTFREE issues may bother us.

I'm not sure how Oracle gets away with MAXTRANS. Somehow it seems to never
arise as a practical problem yet it seems like there must be instances where
it would cause a serious problems.

It's worse for Postgres since as I understand it Oracle only needs to track
transaction ids that are actively locking the record. Postgres needs to track
transaction ids for the inserting and deleting transaction even when there's
no lock (or no lock remaining).

I can imagine having a scheme where there's a short list of transaction ids in
the footer and then each tuple just stores an index into that list. If you had
space for 16 transaction ids set aside then you could store xmin and xmax in a
single byte for example.

If the space set aside for these transaction ids is full when you're inserting
i suppose you could just go back to the FSM for another page. But I don't see
any way out when you're deleting. You have to mark xmax one way or another and
if there's no space left in the footer and you only have 4 bits in the tuple
what are you going to do?

As an aside doing vacuum freeze more aggressively might reduce the pressure on
these ITL slots. 

But I don't see any way to guarantee a slot is available for xmax when
deleting. We would need some sort of scheme where the space for transaction
ids is able to grow but we're already growing from both ends of the page. We
would either have to interleave transaction ids with line pointers or store
them on another special page somewhere.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-10-02 Thread Hannu Krosing

Ühel kenal päeval, E, 2006-10-02 kell 01:30, kirjutas Tom Lane:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  ... place a limit on the number of transactions that can be live in a table
  at once.
 
 Urk, well maybe, but ...
 
  you could shrink all the visibility info to 1 byte if you
  wanted to.
 
 ... 256 of 'em is surely not an acceptable limit.

I have been thinking about this, and it seems that especially for OLAP
loads it would be much better to keep tuple visibility info in a
separate file, lets call it Tuple Visibility Map (TVM)

TVM would have the following benefits:

1) TVM could be uses for index-only lookups as well as heap-only
lookups, also other index lookups could be filtered against it fast
before going to heap.

2) TVM could be heavily compressed, especially for bulk loads something
like a single (xmin, xmax,cmin,cmax) tuple plus RLE-encoded list of
pointers to it will do.

3) In case TVM space is needed in in any page, it would be easy to just
throw away cmin/cmax from tuples from committed/aborted transactions.

4) First pass of VACUUM would be much faster, as it has to scan only
TVM. Pages with no expired tuples would not need to be touched. 


If we can come up with a good design for TVM, it may also be an overall
win for many kinds of OLTP queries, as it may result in less writes to
disk and almost the same amount of writing to WAL.

Maybe bitmap or btree index would be something to use as a starting
point when designing TVM.

Another idea to consider would be to merge FSM and TVM and then use them
also for keeping data in cluster order.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-10-02 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 Jim C. Nasby [EMAIL PROTECTED] writes:
  ... place a limit on the number of transactions that can be live in a table
  at once.
 
 Urk, well maybe, but ...
 
  you could shrink all the visibility info to 1 byte if you
  wanted to.
 
 ... 256 of 'em is surely not an acceptable limit.

The plan Gavin Sherry and I were discussing at the Code Sprint was to store a
single most common xmin xmin in the per-page special area. Then have a bit
on each tuple indicating that the xmin isn't present in the tuple and instead
to use the xmin from the page footer. Any tuples with other values of xmin
would have to store that xmin normally.

The use case here is primarily tables loaded in large batch jobs that have
large swaths of the table, possibly the entire table, loaded in the same
transaction.

My thinking was that most common xmin could be very approximate. In fact my
my plan was to just store the first xmin the page sees there. This catches the
common use cases of pg_restore or COPY loading many records and even catches
most cases of large inserts into existing tables whenever they extend the
table.

A further refinement could be to have vacuum look for the actual most common
xmin and rewrite the page using it. If there's enough free space it may be
worthwhile storing InvalidTransactionId and allowing the next insert to set
the most common xmin.

However I'm not convinced of the importance of this refinement. The thing to
remember is that shaving bits off tuples is not a goal in itself. It's a means
to an end, namely packing more tuples on a page. If we shave off 4 bytes off
every tuple when we're loading thousands of tuples then we get to put more of
them on a page. If we save space on tuples when we're running vacuum that just
gives us more free space in the free space map and we only see a benefit down
the line if we end up eventually filling up that space.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-10-01 Thread ITAGAKI Takahiro

Tom Lane [EMAIL PROTECTED] wrote:

 Jim C. Nasby [EMAIL PROTECTED] writes:
  Dumb question... wouldn't getting down to 20 bytes buy us something?
 
 BTW, the apparently useless byte after the 27- or 23-byte header
 actually has some good use: in a table of up to 8 columns, you can
 fit a null bitmap there for free.  In a scheme that took us down
 to 20 rather than 19 bytes, even a narrow table would pay the full
 maxalign price for having a null.

We may use free bytes for other purposes. For example, if we increase
the size of XIDs from 4 to 6 bytes, we can get rid of transaction
wraparound problem. There may be some other uses.


 I'm in favor of combining cmin/cmax/xvac to get us down to 23 bytes,
 but I think anything beyond that is going to face a serious problem
 of greatly increased cost for diminishing returns.

If we really want to reduce the size of headers to 16 bytes,
we maybe need to do with HeapTupleHeaderData.t_ctid .

Under the assumption that tuples are offen updated in the same page,
we only need offsets in the page to link an old tuple to new one.
We can reduce the size of t_ctid from 6 to 2 bytes. When tuples
are updated to other pages, we probably need phantom ctid.

In another assumption that tuples are offen updated after frozen,
we can overlap ctid and xmin to a physical field using union.
But phantom xids are needed to update unfrozen tuples.

However, I think both assumptions have less probability than
the one assumed when we introduce phantom cids.
The above ideas probably do not work well.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-10-01 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 ... place a limit on the number of transactions that can be live in a table
 at once.

Urk, well maybe, but ...

 you could shrink all the visibility info to 1 byte if you
 wanted to.

... 256 of 'em is surely not an acceptable limit.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Heikki Linnakangas


ITAGAKI Takahiro wrote:
However, I think our next goal to shrink the headers is 16 bytes. The 
headers
become 23 bytes using phantom cids and we are limited by alignments, 
so we will

have no more advantages unless we delete extra 7 bytes in the headers.
...and it seems to be very difficult.


Yeah, I thought about that too earlier.

If we get rid of VACUUM FULL, or replace it with something that doesn't 
need xvac, and keep cmin and cmax in backend-private storage, we could 
get rid of the overlayed t_field4, which is 4 bytes. Then we're down to 
19 bytes.


We could get rid of t_hoff, because we should always be able to 
calculate the header size. Then we're down to 18 bytes.


There's currently  15 bits in use in the infomask. After we remove the 
HEAP_MOVED_* fields that we don't need without VACUUM FULL, that's down 
to 13 bits. t_natts only needs 11 bits, because MaxHeapAttributeNumber 
is 1600. We could move 5 of the bits in infomask to the high 5 bits of 
t_natts, and save one byte.


We're now down to 17 bytes. That's as far as I got.

So it seems we could shave off some bytes, but we still can't get down 
to 16. And the changes needed in total would be quite massive.


--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Martijn van Oosterhout

On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote:
 We could get rid of t_hoff, because we should always be able to 
 calculate the header size. Then we're down to 18 bytes.

Without t_hoff, how do you know the size of the null bitmap? You could
probably do it only if you assume the null bitmap, if present, always
covers all the columns...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Heikki Linnakangas


Martijn van Oosterhout wrote:

On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote:

We could get rid of t_hoff, because we should always be able to
calculate the header size. Then we're down to 18 bytes.


Without t_hoff, how do you know the size of the null bitmap? You could
probably do it only if you assume the null bitmap, if present, always
covers all the columns...


I think we assume that already. heap_form_tuple reserves space for the 
bitmap like this:


   if (hasnull)
   len += BITMAPLEN(numberOfAttributes);

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Martijn van Oosterhout

On Fri, Sep 29, 2006 at 10:59:13AM +0100, Heikki Linnakangas wrote:
 Martijn van Oosterhout wrote:
 On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote:
 We could get rid of t_hoff, because we should always be able to
 calculate the header size. Then we're down to 18 bytes.
 
 Without t_hoff, how do you know the size of the null bitmap? You could
 probably do it only if you assume the null bitmap, if present, always
 covers all the columns...
 
 I think we assume that already. heap_form_tuple reserves space for the 
 bitmap like this:
 
if (hasnull)
len += BITMAPLEN(numberOfAttributes);

Ok, now we do an ALTER TABLE blah ADD COLUMN ..., and we have to expand
the bitmaps for the entire table?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Heikki Linnakangas


Martijn van Oosterhout wrote:

On Fri, Sep 29, 2006 at 10:59:13AM +0100, Heikki Linnakangas wrote:

Martijn van Oosterhout wrote:

On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote:

We could get rid of t_hoff, because we should always be able to
calculate the header size. Then we're down to 18 bytes.

Without t_hoff, how do you know the size of the null bitmap? You could
probably do it only if you assume the null bitmap, if present, always
covers all the columns...

I think we assume that already. heap_form_tuple reserves space for the
bitmap like this:

if (hasnull)
len += BITMAPLEN(numberOfAttributes);


Ok, now we do an ALTER TABLE blah ADD COLUMN ..., and we have to expand
the bitmaps for the entire table?

No, you'd still have the the number of attributes (t_natts) in the header.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Jim C. Nasby

On Fri, Sep 29, 2006 at 01:15:06PM +0900, ITAGAKI Takahiro wrote:
 
 Jim C. Nasby [EMAIL PROTECTED] wrote:
 
  The reason I thought of this is because once the transaction commits, we
  have no use for the cid info. So we could do something like have
  bgwriter look for tuples that belong to committed transactions before it
  writes a page, and strip the cid out of them.
 
 Your concept is just like as the experimental method that I suggested before
 in http://archives.postgresql.org/pgsql-hackers/2005-08/msg01193.php
 We can remove cmin and cmax from commited tuples and xmin from frozen tuples
 and we might save some bytes in tuple headers.
 
 However, I think our next goal to shrink the headers is 16 bytes. The headers
 become 23 bytes using phantom cids and we are limited by alignments, so we 
 will
 have no more advantages unless we delete extra 7 bytes in the headers.
 ...and it seems to be very difficult.

Dumb question... wouldn't getting down to 20 bytes buy us something?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-29 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 Dumb question... wouldn't getting down to 20 bytes buy us something?

Only on 32-bit machines, which are getting less interesting as database
servers every day.  (Just last night I was reading somebody opining that
the transition to 64-bit hardware would be effectively complete by 2008
... and he was talking about desktop PCs, not serious iron.)

BTW, the apparently useless byte after the 27- or 23-byte header
actually has some good use: in a table of up to 8 columns, you can
fit a null bitmap there for free.  In a scheme that took us down
to 20 rather than 19 bytes, even a narrow table would pay the full
maxalign price for having a null.

I'm in favor of combining cmin/cmax/xvac to get us down to 23 bytes,
but I think anything beyond that is going to face a serious problem
of greatly increased cost for diminishing returns.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-28 Thread Heikki Linnakangas


Jim C. Nasby wrote:

In addition to/instead of abstracting cmin/cmax to a phantom ID, what
about allowing for two versions of the tuple header, one with cid info
and one without? That would allow for cid info to be stripped out when
pages were written to disk.
  


How exactly would that help? You can't just strip out cid info when 
writing to disk, if you don't want to lose the information.


And it's certainly a lot more complicated than the phantom id thing.

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-28 Thread Jim C. Nasby

On Thu, Sep 28, 2006 at 05:13:11PM +0100, Heikki Linnakangas wrote:
 Jim C. Nasby wrote:
 In addition to/instead of abstracting cmin/cmax to a phantom ID, what
 about allowing for two versions of the tuple header, one with cid info
 and one without? That would allow for cid info to be stripped out when
 pages were written to disk.
   
 
 How exactly would that help? You can't just strip out cid info when 
 writing to disk, if you don't want to lose the information.

Erm, sorry, brainfart... yeah, we'd need to be able to write the info
out to disk.

The reason I thought of this is because once the transaction commits, we
have no use for the cid info. So we could do something like have
bgwriter look for tuples that belong to committed transactions before it
writes a page, and strip the cid out of them.

The problem with *that* is that (AFAIK) you'll need cid info again once
you go to update or delete that tuple. And that might obviously need to
spill to disk before the transaction commits.

Back to the drawing board...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Another idea for dealing with cmin/cmax

2006-09-28 Thread ITAGAKI Takahiro


Jim C. Nasby [EMAIL PROTECTED] wrote:

 The reason I thought of this is because once the transaction commits, we
 have no use for the cid info. So we could do something like have
 bgwriter look for tuples that belong to committed transactions before it
 writes a page, and strip the cid out of them.

Your concept is just like as the experimental method that I suggested before
in http://archives.postgresql.org/pgsql-hackers/2005-08/msg01193.php
We can remove cmin and cmax from commited tuples and xmin from frozen tuples
and we might save some bytes in tuple headers.

However, I think our next goal to shrink the headers is 16 bytes. The headers
become 23 bytes using phantom cids and we are limited by alignments, so we will
have no more advantages unless we delete extra 7 bytes in the headers.
...and it seems to be very difficult.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

Re: [HACKERS] Another idea for dealing with cmin/cmax

16 matches

Site Navigation

Mail list logo

Footer information