Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-14 Thread Jack Orenstein

On Sep 13, 2008, at 4:39 AM, Tomasz Ostrowski wrote:


On 2008-09-12 15:52, Jack Orenstein wrote:

Sorry, I misspoke. I have an index, but preferred doing a scan  
without the index in this case.


Why?

The only reason I can think of is that you'd like to avoid disk  
seeking. But you get at most 1 row in 30 seconds, so disk latency  
(only several milliseconds) can be ignored.




Because other parts of our application consume results from the same  
query at normal speed.


Jack

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-13 Thread Tomasz Ostrowski

On 2008-09-12 15:52, Jack Orenstein wrote:

Sorry, I misspoke. I have an index, but preferred doing a scan without 
the index in this case.


Why?

The only reason I can think of is that you'd like to avoid disk seeking. 
But you get at most 1 row in 30 seconds, so disk latency (only several 
milliseconds) can be ignored.


Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-12 Thread Jack Orenstein

Tomasz Ostrowski wrote:

On 2008-09-11 17:21, Jack Orenstein wrote:

The id > last_id trick doesn't work for me -- I don't have an index that would 
support it efficiently.


You do not have a primary key? If you do then you have an index as it is
automatically created.


Sorry, I misspoke. I have an index, but preferred doing a scan without the index 
in this case.


Jack

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-12 Thread Jack Orenstein

Tomasz Ostrowski wrote:

On 2008-09-11 18:03, Jack Orenstein wrote:


When you do:
result = query("select something from sometable")
then all rows of a result will be cached by a client program. 

I am very sure this is not happening. Maybe some rows are being
cached (specifying fetch size), but certainly not all of them. It
used to, with older drivers, (7.4?) but I've been using 8.1 drivers
(at least) for a long time. Maybe some result set options you're
using cause such memory usage?


Wanna bet?

http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
| There a number of restrictions which will make the driver silently
| fall back to fetching the whole ResultSet at once. (...) The
| Connection must not be in autocommit mode. The backend closes cursors
| at the end of transactions, so in autocommit mode the backend will
| have closed the cursor before anything can be fetched from it.

So, when you turn on autocommit then it is caching it all. Fetch size is
ignored.


Well that explains what I've been seeing (autocommit on scan producing behavior 
that looks like SERIALIZABLE). Not the behavior I would prefer, but I understand 
it now.


Jack

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-12 Thread Tomasz Ostrowski
On 2008-09-11 17:21, Jack Orenstein wrote:

> The id > last_id trick doesn't work for me -- I don't have an index that 
> would 
> support it efficiently.

You do not have a primary key? If you do then you have an index as it is
automatically created.

Watch this:

test=> create temporary table test ( id int primary key, data text );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE

test=> insert into test (select i, 'this is a row number '||i::text from
(select generate_series(1,100) as i) as q);
INSERT 0 100

test=> explain analyze select * from test where id>50
   order by id limit 1;

 QUERY PLAN

 Limit  (cost=0.00..0.07 rows=1 width=36)
 (actual time=0.150..0.151 rows=1 loops=1)
   ->  Index Scan using test_pkey on test
   (cost=0.00..23769.63 rows=322248 width=36)
   (actual time=0.148..0.148 rows=1 loops=1)
 Index Cond: (id > 50)
 Total runtime: 0.191 ms
(4 rows)

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-11 Thread Tomasz Ostrowski
On 2008-09-11 18:03, Jack Orenstein wrote:

>> When you do:
>> result = query("select something from sometable")
>> then all rows of a result will be cached by a client program. 
> 
> I am very sure this is not happening. Maybe some rows are being
> cached (specifying fetch size), but certainly not all of them. It
> used to, with older drivers, (7.4?) but I've been using 8.1 drivers
> (at least) for a long time. Maybe some result set options you're
> using cause such memory usage?

Wanna bet?

http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
| There a number of restrictions which will make the driver silently
| fall back to fetching the whole ResultSet at once. (...) The
| Connection must not be in autocommit mode. The backend closes cursors
| at the end of transactions, so in autocommit mode the backend will
| have closed the cursor before anything can be fetched from it.

So, when you turn on autocommit then it is caching it all. Fetch size is
ignored.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-11 Thread Jack Orenstein

Tomasz Ostrowski wrote:

On 2008-09-11 17:21, Jack Orenstein wrote:


Then do the processing in separate transactions like this (in pseudocode):
The id > last_id trick doesn't work for me -- I don't have an index that would 
support it efficiently.


Turning on autocommit seems to work, I'm just not clear on the reason why.


Not knowing would bite you some time.

Please provide some (pseudo-)code on what you do. Do you mark rows as
processed? Do you save output of processing to a database? IMHO without
it it is hard to solve a mystery but I'll try below.


No, it's really as simple as what I said in earlier email. The scan just
walks through BIG very slowly. On another connection, we're inserting/updating
the same table, and in each transaction also updating TINY.




I played around with a JDBC test program, and so far cannot see how
the autocommit mode causes variations in what is seen by the
scan. The behavior I've observed is consistent with the SERIALIZABLE
isolation level, but 1) I thought the default was READ COMMITTED


When you do:
result = query("select something from sometable")
then all rows of a result will be cached by a client program. 


I am very sure this is not happening. Maybe some rows are being cached 
(specifying fetch size), but certainly not all of them. It used to, with older 
drivers, (7.4?) but I've been using 8.1 drivers (at least) for a long time. 
Maybe some result set options you're using cause such memory usage?


Jack

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-11 Thread Tomasz Ostrowski
On 2008-09-11 17:21, Jack Orenstein wrote:

>> Then do the processing in separate transactions like this (in pseudocode):
> The id > last_id trick doesn't work for me -- I don't have an index that 
> would 
> support it efficiently.
> 
> Turning on autocommit seems to work, I'm just not clear on the reason why.

Not knowing would bite you some time.

Please provide some (pseudo-)code on what you do. Do you mark rows as
processed? Do you save output of processing to a database? IMHO without
it it is hard to solve a mystery but I'll try below.

> I played around with a JDBC test program, and so far cannot see how
> the autocommit mode causes variations in what is seen by the
> scan. The behavior I've observed is consistent with the SERIALIZABLE
> isolation level, but 1) I thought the default was READ COMMITTED

When you do:
result = query("select something from sometable")
then all rows of a result will be cached by a client program. To see
effects of serialization modes you have to issue another query in the
same transaction or use a cursor.

Check memory usage of your client program - you'll see that it needs a
lot of memory for query results.

> 2) why does the 
> accumulation of row versions have anything to do with autocommit mode (as 
> opposed to isolation level) on a connection used for the scan?

I think after caching a result of a query you start processing your
rows. When you finish processing your first row you update your database
to save results. In autocommit mode a transaction in which you do this
update is automatically commited and ended. When autocommit is turned
off a transaction is not ended so from now on vacuum is not working
until you finish processing all rows.

Regards
Tometzky

PS. Please keep a CC to the list.
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-11 Thread Tomasz Ostrowski
On 2008-09-10 16:46, Jack Orenstein wrote:

> Application requirement. We need to do something for each row retrieved from 
> BIG 
> and the something is expensive. We do the scan slowly (30 second sleep inside 
> the loop) to amortize the cost.

Then do the processing in separate transactions like this (in pseudocode):

$last_id = -1;
do {
begin transaction;
$result = select * from bigtable
where id>$last_id
and processed=false
order by id limit 1;
if ( empty($result) ) {
rollback;
break;
}
do_something_expensive_with($result[0]);
update bigtable set processed=true where id=$result[0][id];
commit;
sleep 30;
} while (true);

Always avoid long running transactions. This is recommended for any
transactional database.

Regards
Tometzky
-- 
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
  Winnie the Pooh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-10 Thread Jack Orenstein

Martijn van Oosterhout wrote:

On Wed, Sep 10, 2008 at 09:45:04AM -0400, Jack Orenstein wrote:
Am I on the right track -- does autocommit = false for the BIG scan force 
versions of TINY to accumulate? I played around with a JDBC test program, 
and so far cannot see how the autocommit mode causes variations in what is 
seen by the scan. The behavior I've observed is consistent with the 
SERIALIZABLE isolation level, but 1) I thought the default was READ 
COMMITTED, and 2) why does the accumulation of row versions have anything 
to do with autocommit mode (as opposed to isolation level) on a connection 
used for the scan?


Vacuum can only clean up stuff older than the oldest open transaction.
So if you have a transaction which is open for hours then stuff made
since then it can't be vacuumed. The solution is: don't do that.

What I don't understand from your description is why your scan is slow


Application requirement. We need to do something for each row retrieved from BIG 
and the something is expensive. We do the scan slowly (30 second sleep inside 
the loop) to amortize the cost.



and how the autocommit relates to this. Postgresql only cares about
when you start and commit transactions, and I can't get from your
description when exactly that happens.


If the slow scan is done with autocommit = true, then the transactions updating 
BIG and TINY run with no degradation in performance (as long as TINY is vacuumed 
frequently).


If the slow scan is done with autocommit = false, then the transactions updating 
BIG and TINY get slower and slower and the TINY table's file bloats.


I guess the question is this: What are the transaction boundaries for a scan 
done with autocommit = false? (The connection has autcommit false, and the 
connection is used for nothing but the scan.)


Jack

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-10 Thread Alvaro Herrera
Martijn van Oosterhout wrote:

> Vacuum can only clean up stuff older than the oldest open transaction.
> So if you have a transaction which is open for hours then stuff made
> since then it can't be vacuumed. The solution is: don't do that.

Actually it's worse than that: older than the oldest transaction that
was active at the time when the current oldest transaction created its
snapshot.

As for autocommit, my guess is that the driver is doing "COMMIT; BEGIN".
This should not cause much of a problem in 8.3 compared to previous
releases, because the transaction gets its Xid at the time the first
command write command is run (previously it was grabbed when the
transaction started).  Also, I thought recent versions of the JDBC
driver did not issue the BEGIN right after COMMIT, so I'm surprised that
there's any visible difference at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-10 Thread Martijn van Oosterhout
On Wed, Sep 10, 2008 at 09:45:04AM -0400, Jack Orenstein wrote:
> Am I on the right track -- does autocommit = false for the BIG scan force 
> versions of TINY to accumulate? I played around with a JDBC test program, 
> and so far cannot see how the autocommit mode causes variations in what is 
> seen by the scan. The behavior I've observed is consistent with the 
> SERIALIZABLE isolation level, but 1) I thought the default was READ 
> COMMITTED, and 2) why does the accumulation of row versions have anything 
> to do with autocommit mode (as opposed to isolation level) on a connection 
> used for the scan?

Vacuum can only clean up stuff older than the oldest open transaction.
So if you have a transaction which is open for hours then stuff made
since then it can't be vacuumed. The solution is: don't do that.

What I don't understand from your description is why your scan is slow
and how the autocommit relates to this. Postgresql only cares about
when you start and commit transactions, and I can't get from your
description when exactly that happens.

Rule of thumb: don't hold transaction open unnessarily long.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


[GENERAL] Autocommit, isolation level, and vacuum behavior

2008-09-10 Thread Jack Orenstein
I'm trying to understand the effect of autocommit on vacuum behavior (postgres 
8.3, if it matters). Let's suppose you have two tables, BIG and TINY in a 
database accessed through JDBC.  BIG has lots of rows. There are inserts, 
updates, and every so often there is a scan of the entire table. The scan is 
slow, e.g. one row every 30 seconds. TINY has one row, which contains summary 
information from BIG. It is updated every time that BIG is inserted or updated. 
BIG is vacuumed weekly, and TINY is vacuumed every 1000 updates.


What I'm observing is that as my test program runs, transactions (insert/update 
BIG; update TINY) gets slower and slower, and the file storing the TINY table 
gets very big. I'm guessing that the long-running scan of BIG forces versions of 
the one row in TINY to accumulate, (just in case the TINY table is viewed, the 
connection has to have the correct view). As these accumulate, each update to 
TINY takes more and more time, and everything slows down.


I wrote a little JDBC test program to test this theory.  Long scans (with the 30 
second sleep) and with autocommit = false produces the problem described. 
Shorter scans (e.g. no sleep between rows of the BIG scan) produce better 
results. Also, if the scan is done on a connection with autocommit = true, 
everything works fine -- no slowdown, and no bloat of the TINY file.


Am I on the right track -- does autocommit = false for the BIG scan force 
versions of TINY to accumulate? I played around with a JDBC test program, and so 
far cannot see how the autocommit mode causes variations in what is seen by the 
scan. The behavior I've observed is consistent with the SERIALIZABLE isolation 
level, but 1) I thought the default was READ COMMITTED, and 2) why does the 
accumulation of row versions have anything to do with autocommit mode (as 
opposed to isolation level) on a connection used for the scan?


Jack Orenstein

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general