Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Thomas F.O'Connell
You might unknowingly be asking a FAQ. See the end of this section:
http://www.postgresql.org/docs/faqs.FAQ.html#4.6
Depending on the version of postgres you're running and the data types  
of the symbol and source columns, you might need to quote (or cast)  
your constant data for symbol and source, e.g.:

SELECT *
FROM article
WHERE symbol='12646'
AND source = '19'
ORDER BY time DESC
LIMIT 1000;
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Mar 18, 2005, at 3:56 AM, Alex Stapleton wrote:
Woops sorry we have indexes on (symbol, source, time) and there is no  
date
column :/

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time  
DESC
LIMIT 1000;
 QUERY PLAN
--- 
-

 Limit  (cost=321163.29..321165.79 rows=1000 width=466) (actual
time=33243.924..33246.021 rows=1000 loops=1)
   ->  Sort  (cost=321163.29..321402.31 rows=95609 width=466) (actual
time=33243.917..33244.626 rows=1000 loops=1)
 Sort Key: "time"
 ->  Seq Scan on article  (cost=0.00..301724.00 rows=95609
width=466) (actual time=0.022..32979.685 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33258.706 ms
(6 rows)

explain analyze SELECT * FROM article WHERE symbol=12646 AND source =  
19;
 QUERY PLAN
--- 
-

 Seq Scan on article  (cost=0.00..301724.00 rows=95609 width=466)  
(actual
time=0.021..33275.433 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33320.920 ms
(3 rows)

We can't use CLUSTER because we need the DB up all the time.
The analyze suggests that it's the I/O taking most of the time to me.
-Original Message-
From: Alban Hertroys [mailto:[EMAIL PROTECTED]
Sent: 18 March 2005 09:48
To: Alex Stapleton
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)
Alex Stapleton wrote:
SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time
DESC
LIMIT 1000;
To get the latest 1000 rows for that symbol and source.

My (not yet implemented) solution to this problem is to add a  
SEQUENCE and
index it so that by adding a WHERE id > [max_id]-1000 and ordering by  
time
DESC will reduce the I/O quite a lot. Am I right here? It would be  
nice if
there was a way to get PostgreSQL to try and precache the tables  
pages as
well, is there anyway I could achieve something like that? I have  
toyed
with
creating a ramdisk to store a lookup table of sorts on (we only care  
about
a
few columns initially) to speed this up a bit but its a right pain in  
the
arse to do by the looks of things.
First question that always gets asked here: What's the output of  
explain
analyse? Without that, people here can't see where the slowdown is.

I expect though, that the problem is the ordering by time. I imagine
that you could create an index on time, maybe truncated to months or
something similar (You can create indices based on functions). That
index alone should speed up the ordering already.
It could also be used to cluster the table, which should speed up  
things
some more, I suppose.

--
Alban Hertroys
MAG Productions
T: +31(0)53 4346874
F: +31(0)53 4346876
E: [EMAIL PROTECTED]
W: http://www.magproductions.nl
---(end of  
broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[GENERAL] ANN: Bricolage 1.8.5 Released

2005-03-18 Thread David Wheeler
The Bricolage development team is pleased to announce the release of
Bricolage 1.8.5. This maintenance release addresses a number of 
issues in
Bricolage 1.8.3 and adds a number of improvements (there was no
announcement for the short-lived 1.8.4 release). The SOAP server in
particular sees improvements in this release, with improved 
character set
support; better support for related stories and media using URIs in
addition to IDs; and as support for top-level element relations. 
Issues
with the ordering of story elements have also been corrected, as 
well as
errors when attempting to revert a story or media document or 
template.
Here are the other highlights of this release:

 Improvements
  * Added Linux startup script contrib/start_scripts/linux. [David]
  * Related story and media elements managed through the SOAP server
  can now use a combination of URI and site ID to identify related
  assets in addition to the existing approach of using story and 
media
  IDs. [David]

  * A list of subelements is now less likely to mysteriously become 
out
  of order and thus lead to strange action-at-a-distance errors. And
  even if they do become out of order, the error message will be 
more
  appropriate ("Warning! State inconsistent" instead of "Can't call
  method 'get_name' on an undefined value"). Reported by Curtis Poe.
  [David]

  * The SOAP media interface now supports creating relationships
  between the media documents elements and other story and media
  documents, just like the SOAP story interface does. [David]
  * The SOAP interface now supports Related stories and media on 
story
  type and media type elements just as in the UI. This involved the
  somewhat hackish necessity for including the "related_story_id" 
and
  "related_media_id" (or "related_story_uri" and 
"related_media_uri")
  attributes in the "elements" XML element, but it does the trick.
  [David]

 Bug Fixes
  * Calls to publish documents via SOAP will no longer fail if the
  published_version attribute is not specified and the document to 
be
  published has never been published before. [David]

  * The Bricolage virtual FTP server will no longer fail to start if
  Template Toolkit is installed but its version number is less than
  2.14. Reported by Adam Rinehart. [David]
  * Stories and Media created or updated via the SOAP interface will
  now associate contributors of the appropriate type, instead of 
"All
  Contributors". [Scott & David]

  * Deleting an element that has a template no longer causes an 
error.
  Thanks to Susan for the spot! [David]

  * Eliminated encoding errors when using the SOAP interface to 
output
  stories, media, or templates with wide characters. Reported by 
Scott
  Lanning. [David]

  * Reverting (stories, media, templates) no longer gives an error.
  Reported by Simon Wilcox, Rachel Murray, and others. [David]
  * Publishing a published version of a document that has a later
  version in workflow will no longer cause that later version to be
  mysteriously removed from workflow. This could be caused by 
passing a
  document looked up using the published_version to list() to
  $burner->publish_another in a template. [David]

  * The SOAP server story and media interfaces now support elements
  that contain both related stories and media, rather than one or 
the
  other. [David]

  * Attempting to preview a story or media document currently 
checked
  out to another user no longer causes an error. Reported by Paul
  Orrock. [David]

  * Custom fields with default values now have their values included
  when they are added to stories and media. Thanks to Clare 
Parkinson
  for the spot! [David]

  * The bric_queued script now requires a username and password and
  will authenticate the user. This user will then be used for 
logging
  events. All events logged when a job is run via the UI are now 
also
  logged by bric_queued. [Mark and David]

  * Preview redirections now use the protocol setting of the preview
  output channel if it's available, and falls back on using 
"http://";
  when it's not, instead of using the hard-coded "http://";. Thanks 
to
  Martin Bacovsky for the spot! [David]

  * The has_keyword() method in the Business class (from which the
  story and media classes inherit) now works. Thanks to Clare 
Parkinson
  for the spot! [David]

  * Clicking a link in the left-side navigation after the session 
has
  expired now causes the whole window to show the login form, rather
  than it showing inside the nav frame, which was useless. 
[Marshall]

  * The JavaScript that validates form contents once again works 
with
  htmlArea, provided htmlArea itself is patched. See
  
http://sour

Re: [GENERAL] Query performance problem

2005-03-18 Thread Phil Daintree
I can also do the same thing without sub-queries - I messed about some more 
since I was keen to ensure backward compatibility with prior versions of 
mysql that have left/right joins but no subqueries ... quite a bit quicker 
still!

Query took 0.0037 sec - 1/10th of the sub-query time.

SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster INNER JOIN periods ON True
LEFT JOIN chartdetails ON chartmaster.accountcode = chartdetails.accountcode 
AND periods.periodno = chartdetails.period
WHERE periods.periodno >=1 AND periods.periodno <=63 AND 
chartdetails.accountcode IS NULL  LIMIT 0 , 30


In postgres:

SQL executed.

Total runtime: 12.241 ms

Still this is a third of the time of the sub-query route but 4 times longer 
than mysql - this must be an install issue?


Thanks again for this idea Paul

phil


On Fri, 18 Mar 2005 14:07, you wrote:
> See the syntax for INSERT ... SELECT shown here:
> http://www.postgresql.org/docs/8.0/static/sql-insert.html
>
> Instead of doing a nested loop to INSERT new records, do it like this:
>
> For ($period = start; $period < end; $period++)
> {
> INSERT INTO chartdetails (accountcode, period)
>   SELECT accountcode, $period FROM chartdetails WHERE
> (accountcode, $period) NOT IN (
>   SELECT accountcode, period FROM chardetails WHERE period =
> $period
>   );
> }
>
> Or if you have some table that has 1 row for each period (call it
> "periods") then you could simply do:
>
> INSERT INTO chartdetails (accountcode, period)
>  SELECT accountcode, period FROM accountcode, period
>  WHERE (period BETWEEN $start AND $end) AND (accountcode,
> period) NOT IN (
>  SELECT accountcode, period FROM chartdetails WHERE period
> BETWEEN $start AND $end
>   );
>
> Note to others: see the legitimate use of an unconstrained CROSS JOIN?
>
> --
>
> Postgres's SELECT count(*) is slow if many records meet the WHERE clause
> being used.  It looks like you're only using testing for 0 or >0 in your
> query, so you could use:
>
> SELECT EXISTS (SELECT 1 FROM chartdetails WHERE );
>
> This will be much faster since with EXISTS, postgres only runs the query
> long enough to find out whether even one row would be returned--if so,
> it stops.
>
> Regards,
> Paul Tillotson
>
> Phil Daintree wrote:
> >Dear psqlers,
> >
> >I need your help!
> >
> >I administer/develop an open source PHP accounting software project
> > (webERP) [snip]
> >
> >
> > $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
> >
> > While ($AccountRow = DB_fetch_array($ChartAccounts)){
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
> >
> > echo '' . _('Period Number') . ' ' . $PeriodNo . 
> > '';
> >
> > // Check if there is an chart details record set up
> > $sql = 'SELECT count(*) FROM chartdetails
> > WHERE 
> > accountcode='.$AccountRow['accountcode'].'
> > AND period=' . $PeriodNo;
> > $InsChartDetails = DB_query($sql,$db,'','','',false);
> > $CountRows = DB_fetch_row($InsChartDetails);
> > $AccountExistsAlready = $CountRows[0];
> > DB_free_result($InsChartDetails);
> > if(! $AccountExistsAlready) {
> > $sql = 'INSERT INTO chartdetails (accountcode,
> > period)
> > VALUES (' . $AccountRow['accountcode'] 
> > . ',
> > ' . $PeriodNo . ')';
> > $InsChartDetails = DB_query($sql,$db);
> > DB_free_result($InsChartDetails);
> > }
> >
> > }
> >
> > /*Now run through each of the new chartdetail records created for each
> >account and update them with the B/Fwd and B/Fwd budget no updates would
> > be required where there were previously no chart details set up ie
> >FirstPeriodPostedTo > 0 */
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
> >
> > $sql = 'SELECT accountcode,
> > period,
> > actual + bfwd AS cfwd,
> > budget + bfwdbudget AS cfwdbudget
> > FROM chartdetails WHERE period =' . ($PeriodNo  
> > - 1);
> > $ChartDetailsCFwd = DB_query($sql,$db);
> >
> > while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
> >
> > $sql = 'UPDATE chartdetails SET bfwd =' . 
> > $myrow['cfwd'] . ',
> > bfwdbudget =' . 
> > $myrow['cfwdbudget'] . '
> >

Re: [GENERAL] Help with transactions

2005-03-18 Thread Michael Fuhr
On Fri, Mar 18, 2005 at 09:22:52AM -0500, Stephen Howie wrote:

> I have a java program that excepts print streams and inserts in into a 
> spool table as a bytea.  This fires a pl/pgsql trigger that passes the 
> bytea to a pl/perl function to process the bytea and spits the results 
> as an array back.  It then proceeds to insert the data into multiple 
> tables.  Problem is that two of the tables data is inserted into inside 
> this transaction, one has a foreign key to the other.  As you can guess 
> I get a foreign key violation because the transaction is not committed 

A transaction doesn't need to be committed for operations to see
the effects of previous operations in the same transaction, but
there could be visibility problems related to what happens when.

Could you post the simplest self-contained example that demonstrates
the problem?  It'll be easier to understand the interactions if we
can see the exact code.  In simple tests I successfully did what
you describe, so apparently my experiment didn't duplicate what
you're doing.

What version of PostgreSQL are you using?

> and as far as I understand PostgreSQL does not support dirty reads or 
> nested transactions.

You probably need neither.  PostgreSQL 8.0 supports nested transactions
in the form of savepoints.

> 1) what is there another way to handle this transaction that would
> resolve this violation without using dirty reads and

Let's see what you're doing before thinking dirty reads would be a
solution.  Or perhaps you're thinking about them in a slightly
different sense than transaction literature usually describes.

> 2) It looks like running the trigger after insert on a table does
> not run as a separate transaction.

Why would it?

> Is the insert to that table suppose to fail if the trigger fails?
> To me that defeats the purpose of having a trigger after insert.

An after trigger is about visibility.  Here's an excerpt from the
"Triggers" chapter in the documentation:

  Row after triggers are most sensibly used to propagate the updates
  to other tables, or make consistency checks against other tables.
  The reason for this division of labor is that an after trigger
  can be certain it is seeing the final value of the row, while a
  before trigger cannot; there might be other before triggers firing
  after it.

An operation ain't over 'til it's over: if an after trigger doesn't
like what it sees, it can still abort the operation by raising an
exception.  That doesn't defeat the purpose -- it's part of the
purpose.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Michael Fuhr
On Fri, Mar 18, 2005 at 10:33:01AM -0500, Tom Lane wrote:
> 
> I think it would be reasonable to back-patch a small fix to convert CRLF.
> The sort of rewrite Marco is considering, I wouldn't back-patch.

I just submitted a small patch to convert CRLF => LF, CR => LF.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Query performance problem

2005-03-18 Thread Phil Daintree
First time I ran it it took 5127.243 ms .. then I did a full vacuum.

then ...

SQL executed.

Total runtime: 33.707 ms


I am keen to just have the one lot of code all in the scripts ... so  I was 
pleased when the identical sql also worked on mysql!!!

Your SQL-query has been executed successfully (Query took 0.0350 sec)

SQL-query : [Edit] [Explain SQL] [Create PHP Code]

SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster, periods
WHERE (
periods.periodno
BETWEEN 1 AND 12
) AND (
chartmaster.accountcode, periods.periodno
) NOT IN 
(SELECT accountcode, period
FROM chartdetails
WHERE period
BETWEEN 1 AND 12
) LIMIT 0 , 30

You'll notice the discrepancy on the timings though!

Whilst pg is not performing the way mysql does with innodb - it is at least 
usable this way. I am guessing there is some gremlin with my install - I'll 
try an upgrade to v 8.

Phil


On Fri, 18 Mar 2005 14:07, you wrote:
> See the syntax for INSERT ... SELECT shown here:
> http://www.postgresql.org/docs/8.0/static/sql-insert.html
>
> Instead of doing a nested loop to INSERT new records, do it like this:
>
> For ($period = start; $period < end; $period++)
> {
> INSERT INTO chartdetails (accountcode, period)
>   SELECT accountcode, $period FROM chartdetails WHERE
> (accountcode, $period) NOT IN (
>   SELECT accountcode, period FROM chardetails WHERE period =
> $period
>   );
> }
>
> Or if you have some table that has 1 row for each period (call it
> "periods") then you could simply do:
>
> INSERT INTO chartdetails (accountcode, period)
>  SELECT accountcode, period FROM accountcode, period
>  WHERE (period BETWEEN $start AND $end) AND (accountcode,
> period) NOT IN (
>  SELECT accountcode, period FROM chartdetails WHERE period
> BETWEEN $start AND $end
>   );
>
> Note to others: see the legitimate use of an unconstrained CROSS JOIN?
>
> --
>
> Postgres's SELECT count(*) is slow if many records meet the WHERE clause
> being used.  It looks like you're only using testing for 0 or >0 in your
> query, so you could use:
>
> SELECT EXISTS (SELECT 1 FROM chartdetails WHERE );
>
> This will be much faster since with EXISTS, postgres only runs the query
> long enough to find out whether even one row would be returned--if so,
> it stops.
>
> Regards,
> Paul Tillotson
>
> Phil Daintree wrote:
> >Dear psqlers,
> >
> >I need your help!
> >
> >I administer/develop an open source PHP accounting software project
> > (webERP) [snip]
> >
> >
> > $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
> >
> > While ($AccountRow = DB_fetch_array($ChartAccounts)){
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
> >
> > echo '' . _('Period Number') . ' ' . $PeriodNo . 
> > '';
> >
> > // Check if there is an chart details record set up
> > $sql = 'SELECT count(*) FROM chartdetails
> > WHERE 
> > accountcode='.$AccountRow['accountcode'].'
> > AND period=' . $PeriodNo;
> > $InsChartDetails = DB_query($sql,$db,'','','',false);
> > $CountRows = DB_fetch_row($InsChartDetails);
> > $AccountExistsAlready = $CountRows[0];
> > DB_free_result($InsChartDetails);
> > if(! $AccountExistsAlready) {
> > $sql = 'INSERT INTO chartdetails (accountcode,
> > period)
> > VALUES (' . $AccountRow['accountcode'] 
> > . ',
> > ' . $PeriodNo . ')';
> > $InsChartDetails = DB_query($sql,$db);
> > DB_free_result($InsChartDetails);
> > }
> >
> > }
> >
> > /*Now run through each of the new chartdetail records created for each
> >account and update them with the B/Fwd and B/Fwd budget no updates would
> > be required where there were previously no chart details set up ie
> >FirstPeriodPostedTo > 0 */
> >
> > for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
> >
> > $sql = 'SELECT accountcode,
> > period,
> > actual + bfwd AS cfwd,
> > budget + bfwdbudget AS cfwdbudget
> > FROM chartdetails WHERE period =' . ($PeriodNo  
> > - 1);
> > $ChartDetailsCFwd = DB_query($sql,$db);
> >
> > while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
> >
> > $sql = 'UPDATE chartdetails SET bfwd =' . 
> > $myrow['cfwd'] . ',
> > 

Re: [GENERAL] SMP scaling

2005-03-18 Thread Tom Lane
Mark Rae <[EMAIL PROTECTED]> writes:
> Even with the numa support, which makes sure any memory allocated by 
> malloc or the stack ends up local to the processor which originally 
> called it, and then continues to schedule the process on that CPU,
> there is still the problem that all table accesses* go through 
> the shared buffer cache which resides in one location.
> [* is this true in all cases?]

Temp tables are handled in backend-local memory, but all normal tables
have to be accessed through shared buffers.  The implications of not
doing that are bad enough that it's hard to believe it could be a win
to change.  (In short: the hardware may not like syncing across
processors, but it can still do it faster than we could hope to do in
software.)

> it looks like SGI already have a solution in the form of 
> symmetric data objects. 
> In particular, the symmetric heap, an area of shared memory 
> which is replicated across all memory domains with the
> coherency being handled in hardware.

Hmm, do they support spinlock objects in this memory?  If so it could be
just the right thing.

> So it looks like all that might be needed is to replace the
> shmget calls in src/backend/port with the equivalent SGI functions.

Send a patch ;-)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Query performance problem

2005-03-18 Thread Paul Tillotson
Phil Daintree wrote:
Appreciated you help Paul - many thanks for taking the time.
 

I view this as merely passing on all the pearls of wisdom I have gleaned 
from this list.  : )

Advice:
Are you running this inside a transaction?  Do so, because if you don't,
then each UPDATE or INSERT or SELECT runs inside its own transaction,
and committing each transaction has overhead associated with it.
   


It looks like just putting this code inside a transaction has dramatically 
reduced the problem. Of course I knew this but it needed me to be embarassed 
to actually do it :-)

 

Glad to hear that this helped.  In case you are interested, the reason 
this makes such a dramatic difference is that each transaction's commit 
record must be logged to the commit log, which is a sequentially written 
file.  (Thus, you can only commit one transaction per revolution of the 
disk, and so if you have a 7200 rpm disk, you can't get more than 120 
transactions / second on a safely configured system unless your drive 
has a battery-backed write cache.)

This block of code is INSIDE a while loop that loops once for each row
in chartmaster:
for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
$sql = 'SELECT accountcode,
period,
actual + bfwd AS cfwd,
budget + bfwdbudget AS cfwdbudget
FROM chartdetails WHERE period =' . ($PeriodNo  
- 1);
$ChartDetailsCFwd = DB_query($sql,$db);
while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
$sql = 'UPDATE chartdetails SET bfwd =' . 
$myrow['cfwd'] . ',
bfwdbudget =' . 
$myrow['cfwdbudget'] . '
WHERE accountcode = ' . 
$myrow['accountcode'] . '
AND period >=' . $PeriodNo;
$UpdChartDetails = DB_query($sql,$db, '', '', 
'', false);
DB_free_result($UpdChartDetails);
}
DB_free_result($ChartDetailsCFwd);
}
It looks like you're updating the same row in chartmaster multiple
times.  
   

chartmaster is not being updated ... 

Sorry--I mean chartdetails.
One tip to remember:  If you have a statement like this:
UPDATE footable SET foocol = 'value' WHERE fooid = 'id';
And it is the case that foocol might already equal value, then write 
this instead:
UPDATE footable SET foocol = 'value' WHERE fooid = 'id' AND foocol <> 
'value';

This will help because no disk write will actually happen if foocol 
happens to already equal value.

chartdetails is - the chartdetails 
relating to each chartmaster record for all periods >= $PeriodNo I have to 
update all the following periods as the balance b/fwd for all successive 
periods has now increased if we post transaction back in time normally there 
might only be a few chartdetails records for the chartmaster account under 
review with chartdetails records with a period later than the one being 
posted.

 

Am I correct in thinking that bfwd is basically a running tally of 
actual, and bfwdbudget is a running tally of budget, as one might 
normally find in a spreadsheet?

If so, you could use this view to calculate the correct value for every 
location in the table:

CREATE VIEW newtotals AS SELECT
   thismonth.accountcode,
   thismonth.periodno,
   (SELECT SUM(actual) FROM chartdetails AS q1 WHERE q1.accountcode = 
accountcode AND q1.periodno < thismonth.periodno) as cfwd,
   (SELECT SUM(budget) FROM chartdetails AS q1 WHERE q1.accountcode = 
accountcode AND q1.periodno < thismonth.periodno) as cfwdbudget,
FROM chartdetails AS thismonth;

And then you could use an update statement:
UPDATE chartdetails
SET bfwd = cfwd, bfwdbudget = cfwdbudget
FROM newtotals
WHERE chartdetails.accountcode = newtotals.accountcode AND 
chartdetails.periodno = newtotals.periodno -- JOIN condition
   AND period BETWEEN $CreateTo AND $CreateFrom
   AND (bfwd <> cfwd OR bfwdbudget <> cfwdbudget); -- AVOID needless 
updates.

Since I don't have your tables to work with, this might need minor 
syntax tweaking, but I'm pretty sure it will work.
I think MySQL doesn't support views yet, but you could replace 
"newtotals" in the above update with a big subselect (which I think they 
are supposed to support in the latest 4.x version.)

Also: if you don't already have one, that UPDATE statement would 
probably use a compound index on (accountcode, periodno).

Now I begin to comprehend why CompiereERP doesn't support MySQL.  ; )
a row in chartdetails will be updated every time there is a gltrans posting to 
the period and account of the chartdetails ie quite often.
 

If it gets updated often it will need vacuuming often as well.
M

Re: [GENERAL] Vaccum analyze.

2005-03-18 Thread Lonni J Friedman
On Fri, 18 Mar 2005 20:28:50 -0300, Fernando Lujan <[EMAIL PROTECTED]> wrote:
> Hi folks,
> 
> I wanna know from you, how often must I run vaccum analyze on my db?
> 
> Once per day, twice... One per hour and so on...
> 
> I didn't find a especific document about this question.

That's because it depends on how you are using the database. In
general, running it once/day is a minimum safe assumption.  However,
if the data is changing really quickly, then you should run it more
frequently.   The only way to judge is to run it and see how much work
is being done each time.


-- 
~
L. Friedman[EMAIL PROTECTED]
LlamaLand   http://netllama.linux-sxs.org

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Vaccum analyze.

2005-03-18 Thread Dann Corbit
Search for "pg_autovacuum"

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Fernando Lujan
Sent: Friday, March 18, 2005 3:29 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Vaccum analyze.

Hi folks,

I wanna know from you, how often must I run vaccum analyze on my db?

Once per day, twice... One per hour and so on...

I didn't find a especific document about this question.

Thanks in advance.

Fernando Lujan

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if
your
  joining column's datatypes do not match

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Vaccum analyze.

2005-03-18 Thread Fernando Lujan
Hi folks,
I wanna know from you, how often must I run vaccum analyze on my db?
Once per day, twice... One per hour and so on...
I didn't find a especific document about this question.
Thanks in advance.
Fernando Lujan
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Statistics with PostgreSQL

2005-03-18 Thread Dann Corbit
/* On the test stub: */
[snip]
double   data[30];
int main(void)
{
size_t  i;
size_t  size = sizeof(data) / sizeof(data[0]);
for (i = 0; i < size; i++) {
data[i] = rand();
}


for (i = 0; i < size; i++) {
cout << data[i] << endl;
}

cout << "1st item is " << RandomSelect(data, 0, size - 1, 0) <<
endl;
cout << "2nd item is " << RandomSelect(data, 0, size - 1, 1) <<
endl;
cout << "3rd item is " << RandomSelect(data, 0, size - 1, 2) <<
endl;
for (i = 4; i < size; i++)
cout  << i << "th item is " << RandomSelect(data, 0, size - 1,
i) << endl;
return 0;
}

/*
The positions are cardinal, so the first three queries above should be:
cout << "1st item is " << RandomSelect(data, 0, size - 1, 1) <<
endl;
cout << "2nd item is " << RandomSelect(data, 0, size - 1, 2) <<
endl;
cout << "3rd item is " << RandomSelect(data, 0, size - 1, 3) <<
endl;
*/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SMP scaling

2005-03-18 Thread Mark Rae
On Fri, Mar 18, 2005 at 01:31:51PM -0500, Tom Lane wrote:
> BTW, although I know next to nothing about NUMA, I do know that it is
> configurable to some extent (eg, via numactl).  What was the
> configuration here exactly, and did you try alternatives?  Also,
> what was the OS exactly?  (I've heard that RHEL4 is a whole lot better
> than RHEL3 in managing NUMA, for example.  This may be generic to 2.6 vs
> 2.4 Linux kernels, or maybe Red Hat did some extra hacking.)

The Altix uses a 2.4.21 kernel with SGI's own modifications
to support up to 256 CPUs and their NUMALink hadware.
(Some of which has become the NUMA code in the 2.6 kernel)

Even with the numa support, which makes sure any memory allocated by 
malloc or the stack ends up local to the processor which originally 
called it, and then continues to schedule the process on that CPU,
there is still the problem that all table accesses* go through 
the shared buffer cache which resides in one location.
[* is this true in all cases?]

I was about to write a long explaination about how the only way 
to scale out to this size would be to have separate buffer caches in
each memory domain, and this would then require some kind of cache
coherency mechanism. But after reading a few bits of documentation, 
it looks like SGI already have a solution in the form of 
symmetric data objects. 

In particular, the symmetric heap, an area of shared memory 
which is replicated across all memory domains with the
coherency being handled in hardware.

So it looks like all that might be needed is to replace the
shmget calls in src/backend/port with the equivalent SGI functions.

-Mark

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] pg_dump large-file support > 16GB

2005-03-18 Thread Tom Lane
Rafael Martinez Guerrero <[EMAIL PROTECTED]> writes:
> We are trying to dump a 30GB+ database using pg_dump with the --file
> option. In the beginning everything works fine, pg_dump runs and we get
> a dumpfile. But when this file becomes 16GB it disappears from the
> filesystem,

FWIW, I tried and failed to duplicate this problem on a Fedora Core 3
machine using an ext3 filesystem.  I set up a dummy database that would
produce an approximately 18GB text dump and did
pg_dump big --file spare/big.dump
Seemed to work fine.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Martijn van Oosterhout
On Fri, Mar 18, 2005 at 10:23:14AM -, Alex Stapleton wrote:
> If I SET ENABLE_SEQSCAN TO OFF it uses the Index, but it does an Index scan
> backward if I do
> 
> SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY
> symbol desc, source DESC, time DESC LIMIT 1000;
> 
> Which is better but still quite slow.

What's the EXPLAIN ANALYZE of this query? Have you done VACUUM FULL
and/or ANALYZE recently?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpDKggtVgOBZ.pgp
Description: PGP signature


Re: [GENERAL] Install error at rpm package + Fedora Core3

2005-03-18 Thread Dianne Yumul
Sorry, I did not see your email off-list until after I sent the other 
one.

On Mar 18, 2005, at 11:56 AM, Devrim GUNDUZ wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
On Fri, 18 Mar 2005, Dianne Yumul wrote:
Why don't you try yum, it will install the dependencies for you ; )
http://linux.duke.edu/projects/yum/
As we talked off-list before, currently PGDG RPMs cannot be installed 
via yum, AFAIK.

Cheers,
- --
Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, 
devrim.gunduz~linux.org.tr
http://www.tdmsoft.com http://www.gunduz.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCOzJetl86P3SPfQ4RAvzdAKCSWTx1RZpwG6NNwrFZZMa+oTNARwCgp7FS
hYHeXYjZv9QdyJNa4OrAwsI=
=zUI8
-END PGP SIGNATURE-
---(end of 
broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if 
your
 joining column's datatypes do not match


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Install error at rpm package + Fedora Core3

2005-03-18 Thread Dianne Yumul
Yup, I didn't see your email off-list till after I re-sent the other 
email. But thanks for the info.

On Mar 18, 2005, at 11:56 AM, Devrim GUNDUZ wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
On Fri, 18 Mar 2005, Dianne Yumul wrote:
Why don't you try yum, it will install the dependencies for you ; )
http://linux.duke.edu/projects/yum/
As we talked off-list before, currently PGDG RPMs cannot be installed 
via yum, AFAIK.

Cheers,
- --
Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, 
devrim.gunduz~linux.org.tr
http://www.tdmsoft.com http://www.gunduz.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCOzJetl86P3SPfQ4RAvzdAKCSWTx1RZpwG6NNwrFZZMa+oTNARwCgp7FS
hYHeXYjZv9QdyJNa4OrAwsI=
=zUI8
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Install error at rpm package + Fedora Core3

2005-03-18 Thread Devrim GUNDUZ
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
On Fri, 18 Mar 2005, Dianne Yumul wrote:
Why don't you try yum, it will install the dependencies for you ; )
http://linux.duke.edu/projects/yum/
As we talked off-list before, currently PGDG RPMs cannot be installed via 
yum, AFAIK.

Cheers,
- --
Devrim GUNDUZ 
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.tdmsoft.com http://www.gunduz.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCOzJetl86P3SPfQ4RAvzdAKCSWTx1RZpwG6NNwrFZZMa+oTNARwCgp7FS
hYHeXYjZv9QdyJNa4OrAwsI=
=zUI8
-END PGP SIGNATURE-
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Install error at rpm package + Fedora Core3

2005-03-18 Thread Dianne Yumul
Why don't you try yum, it will install the dependencies for you ; )
http://linux.duke.edu/projects/yum/
On Mar 18, 2005, at 11:32 AM, Devrim GUNDUZ wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
On Fri, 18 Mar 2005, Edgar Amorim wrote:
I'm a kind new bye with linux & related stuffs. I'm trying to install 
PG on a FC3 using "postgresql-8.0.1-2PGDG.i686.rpm", but after to 
issue "rpm -ivh (the rpm file)" I've got a message telling "error: 
Failed dependencies: libpq.so.3 is need". I take a look at the 
documentation and it seems the libpq is the PG API, so shouldn't be 
built at the install time? How can I solve that or where should I 
look for in order to find out an answer?
You have to install postgresql-libs rpm first.
Regards,
- --
Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, 
devrim.gunduz~linux.org.tr
http://www.tdmsoft.com http://www.gunduz.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCOyzHtl86P3SPfQ4RAptpAJ9kNwuK6ttJoKv7FWRKhjgXKN5yRwCgylUo
Gnbj9xk74YioID+uOxlSFkY=
=sQIj
-END PGP SIGNATURE-
---(end of 
broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [GENERAL] Install error at rpm package + Fedora Core3

2005-03-18 Thread Edgar Amorim
Folks,
 
I'm a kind new bye with linux & related stuffs. I'm trying to install PG on a FC3 using "postgresql-8.0.1-2PGDG.i686.rpm", but after to issue "rpm -ivh (the rpm file)" I've got a message telling "error: Failed dependencies: libpq.so.3 is need". I take a look at the documentation and it seems the libpq is the PG API, so shouldn't be built at the install time? How can I solve that or where should I look for in order to find out an answer?
 
Thank you so much.
 
Edgar Amorim
Network Consultant
 
		Yahoo! Mail - Com 250MB de espaço.  Abra sua conta!

Re: [GENERAL] Install error at rpm package + Fedora Core3

2005-03-18 Thread Devrim GUNDUZ
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
On Fri, 18 Mar 2005, Edgar Amorim wrote:
I'm a kind new bye with linux & related stuffs. I'm trying to install 
PG on a FC3 using "postgresql-8.0.1-2PGDG.i686.rpm", but after to issue 
"rpm -ivh (the rpm file)" I've got a message telling "error: Failed 
dependencies: libpq.so.3 is need". I take a look at the documentation 
and it seems the libpq is the PG API, so shouldn't be built at the 
install time? How can I solve that or where should I look for in order 
to find out an answer?
You have to install postgresql-libs rpm first.
Regards,
- --
Devrim GUNDUZ 
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.tdmsoft.com http://www.gunduz.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFCOyzHtl86P3SPfQ4RAptpAJ9kNwuK6ttJoKv7FWRKhjgXKN5yRwCgylUo
Gnbj9xk74YioID+uOxlSFkY=
=sQIj
-END PGP SIGNATURE-
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[GENERAL] Help with transactions

2005-03-18 Thread Stephen Howie
Hello all,
I have a java program that excepts print streams and inserts in into a 
spool table as a bytea.  This fires a pl/pgsql trigger that passes the 
bytea to a pl/perl function to process the bytea and spits the results 
as an array back.  It then proceeds to insert the data into multiple 
tables.  Problem is that two of the tables data is inserted into inside 
this transaction, one has a foreign key to the other.  As you can guess 
I get a foreign key violation because the transaction is not committed 
and as far as I understand PostgreSQL does not support dirty reads or 
nested transactions.  I have two questions.  1) what is there another 
way to handle this transaction that would resolve this violation without 
using dirty reads and 2) It looks like running the trigger after insert 
on a table does not run as a separate transaction.  Is the insert to 
that table suppose to fail if the trigger fails?  To me that defeats the 
purpose of having a trigger after insert.

Thanks for any help
--
-
Stephen Howie

begin:vcard
fn:Stephen Howie
n:Howie;Stephen
email;internet:[EMAIL PROTECTED]
tel;work:260-760-5910
tel;fax:260-436-9472	
tel;cell:260-704-6262
x-mozilla-html:FALSE
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Statistics with PostgreSQL

2005-03-18 Thread Dann Corbit
Mean is just sum(col)/count(col)
Mode can be calculated with having, max, count
Median can be computed by sorting, using a cursor, and going to the
middle.

There are more efficient and better (more accurate) ways to do it, but
those have to be implemented at a low level.  Of course, since you have
libpq, anything is possible.

If you want to implement these things at a low level to get better
answers, Kahan (or compenstated) summation is a good idea, and do the
summation into a larger type to prevent loss of precision.

Here are some statistical templates I wrote that are free for any
purpose you like:
ftp://cap.connx.com/pub/tournament_software/Kahan.Hpp
ftp://cap.connx.com/pub/tournament_software/STATS.HPP

The Cephes collection by Moshier has good extended precision types, if
you need to carefully avoid any PLOSS.

For median, the QuickSelect algorithm is very good.  Here is an
implementation I wrote in C++:

#include 
#include 
#include 

using namespace std;

/*
**
** In the following code, every reference to CLR means:
**
**"Introduction to Algorithms"
**By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**ISBN 0-07-013143-0
*/


/*
** CLR, page 187
*/
template < class Etype >
Etype   
RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t  q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;

if (i <= k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}

size_t  RandRange(size_t a, size_t b)
{
size_t  c = (size_t) ((double) rand() / ((double) RAND_MAX +
1) * (b - a));
return c + a;
}

/*
** CLR, page 162
*/
template < class Etype >
size_t  
RandomPartition(Etype A[], size_t p, size_t r)
{
size_t  i = RandRange(p, r);
Etype   Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}

/*
** CLR, page 154
*/
template < class Etype >
size_t  
Partition(Etype A[], size_t p, size_t r)
{
Etype   x,
temp;
size_t  i,
j;

x = A[p];
i = p - 1;
j = r + 1;

for (;;) {
do {
j--;
} while (!(A[j] <= x));
do {
i++;
} while (!(A[i] >= x));
if (i < j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}

double   data[30];
int main(void)
{
size_t  i;
size_t  size = sizeof(data) / sizeof(data[0]);
for (i = 0; i < size; i++) {
data[i] = rand();
}


for (i = 0; i < size; i++) {
cout << data[i] << endl;
}

cout << "1st item is " << RandomSelect(data, 0, size - 1, 0) <<
endl;
cout << "2nd item is " << RandomSelect(data, 0, size - 1, 1) <<
endl;
cout << "3rd item is " << RandomSelect(data, 0, size - 1, 2) <<
endl;
for (i = 4; i < size; i++)
cout  << i << "th item is " << RandomSelect(data, 0, size - 1,
i) << endl;
return 0;
}

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hrishikesh
Deshmukh
Sent: Friday, March 18, 2005 10:37 AM
To: Postgresql-General
Subject: [GENERAL] Statistics with PostgreSQL

Hi All,

Is there a way to simple statistics like mean/median/mode in PostgreSQL.
I have tables like PsetID | IntensityValue. I want to find out mean
(intensityValue) of some PsetID(s)?!
Any urls/pointers/books would be a big help.

Thanks,
Hrishi

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Statistics with PostgreSQL

2005-03-18 Thread David Fetter
On Fri, Mar 18, 2005 at 01:37:10PM -0500, Hrishikesh Deshmukh wrote:
> Hi All,
> 
> Is there a way to simple statistics like mean/median/mode in
> PostgreSQL.  I have tables like PsetID | IntensityValue. I want to
> find out mean (intensityValue) of some PsetID(s)?!

> Any urls/pointers/books would be a big help.

Hrishi,

For statistics beyond avg() and stddev(), check out PL/R
http://www.joeconway.com/plr/

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[GENERAL] Statistics with PostgreSQL

2005-03-18 Thread Hrishikesh Deshmukh
Hi All,

Is there a way to simple statistics like mean/median/mode in PostgreSQL.
I have tables like PsetID | IntensityValue. I want to find out mean
(intensityValue) of some PsetID(s)?!
Any urls/pointers/books would be a big help.

Thanks,
Hrishi

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SMP scaling

2005-03-18 Thread Tom Lane
Mark Rae <[EMAIL PROTECTED]> writes:
> The altix still only scales up to 10x rather than 16x, but that probably 
> is the NUMA configuration taking effect now. 

BTW, although I know next to nothing about NUMA, I do know that it is
configurable to some extent (eg, via numactl).  What was the
configuration here exactly, and did you try alternatives?  Also,
what was the OS exactly?  (I've heard that RHEL4 is a whole lot better
than RHEL3 in managing NUMA, for example.  This may be generic to 2.6 vs
2.4 Linux kernels, or maybe Red Hat did some extra hacking.)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] SMP scaling

2005-03-18 Thread Tom Lane
Bruce Momjian  writes:
> So it seems our entire SMP problem was that global lock.  Nice.

Yeah, I was kind of expecting to see the LockMgrLock up next, but
it seems we're still a ways away from having a problem there.  I guess
that's because we only tend to touch locks once per query, whereas
we're grabbing and releasing buffers much more.

>From the relatively small absolute value of Mark's queries/sec numbers,
I suppose he is testing some fairly heavyweight queries (big enough
to not emphasize per-query overhead).  I wonder what the numbers would
look like with very small, simple queries.  It'd move the stress around
for sure ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] SMP scaling

2005-03-18 Thread Bruce Momjian
Mark Rae wrote:
> On Fri, Mar 18, 2005 at 10:38:24AM -0500, Tom Lane wrote:
> > Hey, that looks pretty sweet.  One thing this obscures though is whether
> > there is any change in the single-client throughput rate --- ie, is "1.00"
> > better or worse for CVS tip vs 8.0.1?
> 
> Here are the figures in queries per second.
> 
> Clients1  2  3  4  6  8 12 16 
> 32 64
> ---
> AMD64 pg-8.0.1  6.80  12.71  18.82  22.73  18.58  17.48  17.56  17.81
> AMD64 pg-20050316   6.80  13.23  19.32  25.09  24.56  24.93  25.20  25.09
> IA64  pg-8.0.1  3.72   7.32  10.81  14.21  10.81  10.85  10.92  11.09
> IA64  pg-20050316   3.99   7.92  11.78  15.46  15.17  15.09  15.41  15.58
> Altix pg-8.0.1  3.66   7.37  10.89  14.53  21.47  26.47  27.47  20.28  
> 17.12  18.66
> Altix pg-20050316   3.83   7.55  10.98  14.10  20.27  26.47  34.50  37.88  
> 38.45  38.12
> 
> So, it didn't make any difference for the Opteron, but the two 
> Itanium machines were 5% and 7% faster respectively.

So it seems our entire SMP problem was that global lock.  Nice.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Tom Lane
Marco Colombo <[EMAIL PROTECTED]> writes:
> Right now I'm parsing the string first, changing the resulting
> parse tree adding missing nodes (def, INDENT, DEINDENT) and
> then compiling it.

Hmmm ... is this really going to be simpler or more robust than lexing
the string carefully enough to insert tabs at the right places?  The
impression I had so far was that you'd only need to understand about
Python's string-literal conventions to get that right ... and that's
something that's not likely to change.  I'm not so sure that parse
trees can be regarded as an immutable API.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Marco Colombo
On Fri, 18 Mar 2005, Tom Lane wrote:
Michael Fuhr <[EMAIL PROTECTED]> writes:
On Fri, Mar 18, 2005 at 12:35:07AM -0500, Tom Lane wrote:
So that part of it can be solved fairly easily.

Should I submit a patch?  It should be only a few additional lines
in PLy_procedure_munge_source().  Would you apply it only to HEAD,
or would it be considered a bug fix that REL8_0_STABLE could get
as well?  It might be nice to have it in 8.0.2, whenever that comes
out.
I think it would be reasonable to back-patch a small fix to convert CRLF.
The sort of rewrite Marco is considering, I wouldn't back-patch.
			regards, tom lane
Short update: it isn't possible to get a callable object directly from
the source provided by the user, since 'return' is not valid outside
function definitions in Python.
Right now I'm parsing the string first, changing the resulting
parse tree adding missing nodes (def, INDENT, DEINDENT) and
then compiling it. Grammar definitions for a python function is:
funcdef: [decorators] 'def' NAME parameters ':' suite
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
What we get from the users is stmt+ (a sequence of stmt). The INDENT
and DEDENT tokens are generated by the parser only when indentation
level _changes_.
My plan is to generate two parse trees, one from this code
(with the right fname of course):
def fname():
pass
and one from the function definition. Then, we "attach" the root
of the second tree where the "pass" node is in the first tree.
We should get a parse tree ready for compilation.
I wish I could "push" the right tokens in the right places,
but it seems it's not possible.
Stay tuned.
.TM.
--
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[GENERAL] Japanese characters problem

2005-03-18 Thread Grzegorz PrzeÅdziecki
Welcome
Make some database and have problem with japanese characters maybe with other 
too
Below some example

Linux system it is fedora (RH)
Locale for postgresql i pl_PL.utf8
Database is in uniqode

Thanks for help.

==

[EMAIL PROTECTED] 1.0.4]# rpm -q postgresql
postgresql-7.4.7-3.FC3.1
[EMAIL PROTECTED] 1.0.4]# rpm -q postgresql-server
postgresql-server-7.4.7-3.FC3.1



crm=# select title,category from contentinfo ;
 title  | category
+--
 h264-charlie-900-hdd.avi   | æç
 h264-liloo-900-hdd.avi - drugi | éç
(2 rows)

crm=# select title,category from contentinfo where category='éæ';
 title  | category
+--
 h264-charlie-900-hdd.avi   | æç
 h264-liloo-900-hdd.avi - drugi | éç
(2 rows)

crm=# select title,category from contentinfo where category='çæ';
 title  | category
+--
 h264-charlie-900-hdd.avi   | æç
 h264-liloo-900-hdd.avi - drugi | éç
(2 rows)

crm=# select title,category from contentinfo where category='æ';
 title | category
---+--
(0 rows)

crm=# select title,category from contentinfo where category='çç';
 title  | category
+--
 h264-charlie-900-hdd.avi   | æç
 h264-liloo-900-hdd.avi - drugi | éç
(2 rows)

crm=# select title,category from contentinfo where category='ççç';
 title | category
---+--
(0 rows)

crm=# select title,category from contentinfo where category ilike 'çç';
 title | category
---+--
(0 rows)

crm=# select title,category from contentinfo where category ilike 'æç';
  title   | category
--+--
 h264-charlie-900-hdd.avi | æç
(1 row)

crm=# select title,category from contentinfo where category ilike 'éç';
 title  | category
+--
 h264-liloo-900-hdd.avi - drugi | éç
(1 row)

crm=# select title,category from contentinfo where category ilike 'éç';
 title  | category
+--
 h264-liloo-900-hdd.avi - drugi | éç
(1 row)

crm=# \l
   List of databases
   Name| Owner | Encoding
---+---+---
 crm   | sv| UNICODE
 template0 | sv| SQL_ASCII
 template1 | sv| SQL_ASCII

(3 rows)

crm=# select title,category from contentinfo where category ilike '%é
ç%';
 title  | category
+--
 h264-liloo-900-hdd.avi - drugi | éç
(1 row)

crm=# select title,category from contentinfo where category like 'éç';
 title  | category
+--
 h264-liloo-900-hdd.avi - drugi | éç
(1 row)


-- 
Grzegorz PrzeÅdziecki
www.PolskieSklepy.pl
kom. +48.606.822.506
gg:3701851 skype: grzegorz.przezdziecki

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[GENERAL] performance: pg vs pg!

2005-03-18 Thread Rick Schumeyer








At the suggestion of several
people, I have increased the

default settings in postgresql.conf
before continuing my

postgresql vs mysql
performance tests.

 

To date, I have only been
loading a (roughly) million-row

file, creating indexes
during the load, running a vacuum analyze,

and a couple of simple
queries.

 

I do intend on performing
more complicated tests, but I did not

want to do that until people
stopped telling me my methodology

for simple tests
was...flawed.

 

I ran a thorough series of
tests, varying shared_buffers from 1000 to 9000, 

work_mem from 1 to 9
and maintenance_work_mem from 1 to 9.

The complete results are
long (I will provide them if anyone is interested)

so I am only including a small
selection.

 

Before I do that, I will
confess that increasing memory made more of

a difference than I thought
it would.  I know many of you are thinking

"no kidding" but I
thought it would only be important for big

complicated queries, or a
server with multiple concurrent requests.

No, it makes a big
difference for "merely" loading a million rows and

indexing them.

 

  
Time in seconds

shared_buffers work_mem m_work_mem
COPY   VACUUM

 1000   1    1
186.154 9.814    

 3000   1    1 
64.404 4.526    

 5000   5    5 
65.036 3.435    

 9000   9    9 
63.664 2.218    

 

 

-- The relevant
commands

create table data (

  fid   integer,

  rid   integer,

  range real,

  x real,

  y real,

  z real,

  bs    real,

  snr   real,

  rvel  real,

  cfar  smallint);

 

create index fidrid_data on
data (fid,rid);

create index fidx on data
(x);

 

-- COPY a table with 934500
rows

COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z)
FROM '/home/rick/bed/data/data.dat';

 

-- VACUUM

vacuum analyze data;

 

 








Re: [GENERAL] pg_dump large-file support > 16GB

2005-03-18 Thread Rafael Martinez
On Fri, 2005-03-18 at 09:58 -0500, Tom Lane wrote:
> Rafael Martinez <[EMAIL PROTECTED]> writes:
> > On Thu, 2005-03-17 at 10:17 -0500, Tom Lane wrote:
> >> Is that a plain text, tar, or custom dump (-Ft or -Fc)?  Is the behavior
> >> different if you just write to stdout instead of using --file?
> 
> > - In this example, it is a plain text (--format=3Dp).
> > - If I write to stdout and redirect to a file, the dump finnish without
> > problems and I get a dump-text-file over 16GB without problems.
> 
> In that case, you have a glibc or filesystem bug and you should be
> reporting it to Red Hat.  The *only* difference between writing to
> stdout and writing to a --file option is that in one case we use
> the preopened "stdout" FILE* and in the other case we do
> fopen(filename, "w").  Your report therefore is stating that there
> is something broken about fopen'd files.
> 

Thanks for the information. I will contact RH.

-- 
Rafael Martinez, <[EMAIL PROTECTED]> 
Center for Information Technology Services 
University of Oslo, Norway   

PGP Public Key: http://folk.uio.no/rafael/


signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] SMP scaling

2005-03-18 Thread Mark Rae
On Fri, Mar 18, 2005 at 10:38:24AM -0500, Tom Lane wrote:
> Hey, that looks pretty sweet.  One thing this obscures though is whether
> there is any change in the single-client throughput rate --- ie, is "1.00"
> better or worse for CVS tip vs 8.0.1?

Here are the figures in queries per second.

Clients1  2  3  4  6  8 12 16 
32 64
---
AMD64 pg-8.0.1  6.80  12.71  18.82  22.73  18.58  17.48  17.56  17.81
AMD64 pg-20050316   6.80  13.23  19.32  25.09  24.56  24.93  25.20  25.09
IA64  pg-8.0.1  3.72   7.32  10.81  14.21  10.81  10.85  10.92  11.09
IA64  pg-20050316   3.99   7.92  11.78  15.46  15.17  15.09  15.41  15.58
Altix pg-8.0.1  3.66   7.37  10.89  14.53  21.47  26.47  27.47  20.28  
17.12  18.66
Altix pg-20050316   3.83   7.55  10.98  14.10  20.27  26.47  34.50  37.88  
38.45  38.12

So, it didn't make any difference for the Opteron, but the two 
Itanium machines were 5% and 7% faster respectively.

-Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SMP scaling

2005-03-18 Thread Tom Lane
Mark Rae <[EMAIL PROTECTED]> writes:
> Ok, I've done the tests comparing 8.0.1 against a snapshot from the 16th
> and the results are impressive.

> Clients1  2  3  4  6  8 12 16 
> 32 64
> ---
> Altix pg-8.0.1  1.00   2.02   2.98   3.97   5.87   7.23   7.51   5.54   
> 4.68   5.10
> Altix pg-20050316   1.00   1.97   2.86   3.68   5.29   6.90   9.00   9.88  
> 10.03   9.94
> AMD64 pg-8.0.1  1.00   1.87   2.77   3.34   2.73   2.57   2.58   2.62
> AMD64 pg-20050316   1.00   1.95   2.84   3.69   3.61   3.66   3.70   3.69
> IA64  pg-8.0.1  1.00   1.97   2.91   3.82   2.91   2.92   2.94   2.98
> IA64  pg-20050316   1.00   1.98   2.95   3.87   3.80   3.78   3.86   3.90


Hey, that looks pretty sweet.  One thing this obscures though is whether
there is any change in the single-client throughput rate --- ie, is "1.00"
better or worse for CVS tip vs 8.0.1?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Tom Lane
Michael Fuhr <[EMAIL PROTECTED]> writes:
> On Fri, Mar 18, 2005 at 12:35:07AM -0500, Tom Lane wrote:
>> So that part of it can be solved fairly easily.

> Should I submit a patch?  It should be only a few additional lines
> in PLy_procedure_munge_source().  Would you apply it only to HEAD,
> or would it be considered a bug fix that REL8_0_STABLE could get
> as well?  It might be nice to have it in 8.0.2, whenever that comes
> out.

I think it would be reasonable to back-patch a small fix to convert CRLF.
The sort of rewrite Marco is considering, I wouldn't back-patch.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] pg_atributes index space question

2005-03-18 Thread Tom Lane
Joe Maldonado <[EMAIL PROTECTED]> writes:
> db=# vacuum verbose analyze pg_catalog.pg_attribute;
> INFO:  vacuuming "pg_catalog.pg_attribute"
> INFO:  index "pg_attribute_relid_attnam_index" now contains 9965 row
> versions in 181557 pages

REINDEX is probably the only realistic way to fix that.  It shouldn't
take very long, fortunately, so the exclusive lock shouldn't be an
enormous problem.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Michael Fuhr
On Fri, Mar 18, 2005 at 11:34:46AM +0100, Marco Colombo wrote:

> Just let me suggest not to mimic its behaviour, but to use the
> Real Thing if we manage to. That is, directly use the Universal Line
> Support code provided by python itself, so that we don't even have
> to think about being compatible.

Sounds good if the Python API provides the hooks for doing so.  I
had started looking into that but didn't spent much time on it.

> I'm experimenting a bit, trying to write a PLy_procedure_compile()
> that does not require source munging. I'm aiming at removing the
> need for extra indentation.

Sounds good too, if that'll work.  Looking forward to seeing what
you find out.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] pg_dump large-file support > 16GB

2005-03-18 Thread Tom Lane
Rafael Martinez <[EMAIL PROTECTED]> writes:
> On Thu, 2005-03-17 at 10:17 -0500, Tom Lane wrote:
>> Is that a plain text, tar, or custom dump (-Ft or -Fc)?  Is the behavior
>> different if you just write to stdout instead of using --file?

> - In this example, it is a plain text (--format=3Dp).
> - If I write to stdout and redirect to a file, the dump finnish without
> problems and I get a dump-text-file over 16GB without problems.

In that case, you have a glibc or filesystem bug and you should be
reporting it to Red Hat.  The *only* difference between writing to
stdout and writing to a --file option is that in one case we use
the preopened "stdout" FILE* and in the other case we do
fopen(filename, "w").  Your report therefore is stating that there
is something broken about fopen'd files.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[GENERAL] pg_atributes index space question

2005-03-18 Thread Joe Maldonado
Hello,
After a VACUUM FULL I saw that pg_attribute tables indexes haven't
been deleted as reported by a subsequent vacuum analyze. But the pages
corresponding to just the table has been deleted to 196 pages from
about 181557 pages. Are all system tables affected by this ? How can
we reclaim this space without shutting down postmaster ? Is this fixed
in any new release ?
db=# select relpages, reltuples from pg_catalog.pg_class where
relname = 'pg_attribute';
relpages | reltuples
--+---
 196 |  9965
(1 row)
db=# vacuum verbose analyze pg_catalog.pg_attribute;
INFO:  vacuuming "pg_catalog.pg_attribute"
INFO:  index "pg_attribute_relid_attnam_index" now contains 9965 row
versions in 181557 pages
DETAIL:  1518 index row versions were removed.
181263 index pages have been deleted, 2 are currently reusable.
thanks


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[GENERAL] SMP scaling

2005-03-18 Thread Mark Rae
On Tue, Mar 15, 2005 at 07:00:25PM -0500, Bruce Momjian wrote:
> Oh, you have to try CVS HEAD or a nightly snapshot.  Tom made a major
> change that allows scaling in SMP environments.

Ok, I've done the tests comparing 8.0.1 against a snapshot from the 16th
and the results are impressive.
As well as the 16CPU Altix, I've done comparisons on two other 4CPU
machines which previously didn't scale as well as expected.


Clients1  2  3  4  6  8 12 16 
32 64
---
Altix pg-8.0.1  1.00   2.02   2.98   3.97   5.87   7.23   7.51   5.54   
4.68   5.10
Altix pg-20050316   1.00   1.97   2.86   3.68   5.29   6.90   9.00   9.88  
10.03   9.94
AMD64 pg-8.0.1  1.00   1.87   2.77   3.34   2.73   2.57   2.58   2.62
AMD64 pg-20050316   1.00   1.95   2.84   3.69   3.61   3.66   3.70   3.69
IA64  pg-8.0.1  1.00   1.97   2.91   3.82   2.91   2.92   2.94   2.98
IA64  pg-20050316   1.00   1.98   2.95   3.87   3.80   3.78   3.86   3.90

Altix == 16x 1.6GHz Itanium2192GB memory
AMD64 ==  4x 2.2GHz Opteron 848   8GB memory
IA64  ==  4x 1.5GHz Itanium2 16GB memory


The altix still only scales up to 10x rather than 16x, but that probably 
is the NUMA configuration taking effect now. 
Also this machine isn't set up to run databases, so only has 1 FC I/O card, 
which means a CPU can end up being 4 hops away from the memory and disk. 

As the database is so small (8GB), relative to the machine, the data will 
be on average 2 hops away. This gives an average of 72% of the speed of
local memory, based on previous measurements of speed vs hops. 

So getting 63% of the theoretical maximum database throughput is pretty good.


-Mark

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Marco Colombo
On Thu, 17 Mar 2005, Michael Fuhr wrote:
On Thu, Mar 17, 2005 at 09:48:51PM -0500, Tom Lane wrote:
Michael Fuhr <[EMAIL PROTECTED]> writes:
Line-ending CRs stripped, even inside quotes; mid-line CRs converted
to LF.  Tests done with Python 2.4 on FreeBSD 4.11-STABLE; I wonder
what Python on Windows would do.
Unfortunately, I don't think that proves anything, because according
to earlier discussion Python will do newline-munging when it reads
a file (including a script file).  The question that we have to deal
with is what are the rules for a string fed to PyRun_String ... and it
seems those rules are not the same.
Marco, you've stated that you're against munging the code because
"it's not our job to 'fix' data coming from the client."  But I'm
suggesting that we think about the code in a different way than the
current implementation does: not as a literal that we pass untouched
to the Python interpreter, but rather as code that Python would
munge anyway if it had read that code from a file.  We could still
store the code exactly as received and have the language handler
munge it on the fly, as we've discovered it's already doing.
Comments?  Have I overlooked anything?  Could munging CRs have
effects that a Python programmer wouldn't expect if the same code
had been read from a file?  Since it mimics Python's own behavior
with code read from a file, can anybody justify not doing it?
If you put it that way, I'm 100% with you.
Just let me suggest not to mimic its behaviour, but to use the
Real Thing if we manage to. That is, directly use the Universal Line
Support code provided by python itself, so that we don't even have
to think about being compatible.
Unfortunately, I'm new to python embedding. I think I'll ask on
python lists about the function definition code. Actually, we are
kind of a corner case: we ask the user to provide the function
body w/o the initial def declaration. We're treating partial, incomplete
python code and not a well-formed program, so we have to munge it
anyway. I have no idea if and how the python C API lets you 
control such low level details. I think what we really want is to
create a callable (anonymous) object from the source of its "body".

I'm experimenting a bit, trying to write a PLy_procedure_compile()
that does not require source munging. I'm aiming at removing the
need for extra indentation. The \r\n thing is another beast, and I'm
not sure it belongs to the same place in our code.
.TM.
--
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Alex Stapleton
I should point out that theres no garuntee our data is inserted in anything
like the order we want (time desc) but there is a high correlation. Most of
the time it is almost in order.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Alex Stapleton
Sent: 18 March 2005 09:29
To: pgsql-general@postgresql.org
Subject: [GENERAL] Select Last n Rows Matching an Index Condition (and
caches)


We have a ~10million row table but are expecting it to get larger, possibly
by a factor of 10 or more. The rows are quite long and fixed length (just
over 500 bytes.)

We have an index of (symbol, source, date) on this table and doing queries
like this

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC
LIMIT 1000;

To get the latest 1000 rows for that symbol and source.

However this takes quite a while at the best of times, (1-10 seconds.) The
query without the order by and the limit tends to return about 7 rows
which adds up to about 30MB of data. Once the pages are in the cache they
take around 100ms but this is to be expected. Unfortunately the initial
query required to cache it is unnacceptably long for web application like
ours.

My (not yet implemented) solution to this problem is to add a SEQUENCE and
index it so that by adding a WHERE id > [max_id]-1000 and ordering by time
DESC will reduce the I/O quite a lot. Am I right here? It would be nice if
there was a way to get PostgreSQL to try and precache the tables pages as
well, is there anyway I could achieve something like that? I have toyed with
creating a ramdisk to store a lookup table of sorts on (we only care about a
few columns initially) to speed this up a bit but its a right pain in the
arse to do by the looks of things.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Alex Stapleton
If I SET ENABLE_SEQSCAN TO OFF it uses the Index, but it does an Index scan
backward if I do

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY
symbol desc, source DESC, time DESC LIMIT 1000;

Which is better but still quite slow.

-Original Message-
From: Hegyvari Krisztian [mailto:[EMAIL PROTECTED]
Sent: 18 March 2005 10:25
To: Alex Stapleton; pgsql-general@postgresql.org
Subject: RE: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)


Does not it look like the index you are actually using is on article and
then PG has to filter for symbol and source?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alex Stapleton
Sent: Friday, March 18, 2005 10:57 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)

Woops sorry we have indexes on (symbol, source, time) and there is no
date
column :/

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time
DESC
LIMIT 1000;
 QUERY PLAN



 Limit  (cost=321163.29..321165.79 rows=1000 width=466) (actual
time=33243.924..33246.021 rows=1000 loops=1)
   ->  Sort  (cost=321163.29..321402.31 rows=95609 width=466) (actual
time=33243.917..33244.626 rows=1000 loops=1)
 Sort Key: "time"
 ->  Seq Scan on article  (cost=0.00..301724.00 rows=95609
width=466) (actual time=0.022..32979.685 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33258.706 ms
(6 rows)


explain analyze SELECT * FROM article WHERE symbol=12646 AND source =
19;
 QUERY PLAN



 Seq Scan on article  (cost=0.00..301724.00 rows=95609 width=466)
(actual
time=0.021..33275.433 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33320.920 ms
(3 rows)

We can't use CLUSTER because we need the DB up all the time.

The analyze suggests that it's the I/O taking most of the time to me.


-Original Message-
From: Alban Hertroys [mailto:[EMAIL PROTECTED]
Sent: 18 March 2005 09:48
To: Alex Stapleton
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)


Alex Stapleton wrote:
> SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time
DESC
> LIMIT 1000;
>
> To get the latest 1000 rows for that symbol and source.

> My (not yet implemented) solution to this problem is to add a SEQUENCE
and
> index it so that by adding a WHERE id > [max_id]-1000 and ordering by
time
> DESC will reduce the I/O quite a lot. Am I right here? It would be
nice if
> there was a way to get PostgreSQL to try and precache the tables pages
as
> well, is there anyway I could achieve something like that? I have
toyed
with
> creating a ramdisk to store a lookup table of sorts on (we only care
about
a
> few columns initially) to speed this up a bit but its a right pain in
the
> arse to do by the looks of things.

First question that always gets asked here: What's the output of explain
analyse? Without that, people here can't see where the slowdown is.

I expect though, that the problem is the ordering by time. I imagine
that you could create an index on time, maybe truncated to months or
something similar (You can create indices based on functions). That
index alone should speed up the ordering already.
It could also be used to cluster the table, which should speed up things
some more, I suppose.

--
Alban Hertroys
MAG Productions

T: +31(0)53 4346874
F: +31(0)53 4346876
E: [EMAIL PROTECTED]
W: http://www.magproductions.nl


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Hegyvari Krisztian
Does not it look like the index you are actually using is on article and
then PG has to filter for symbol and source?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alex Stapleton
Sent: Friday, March 18, 2005 10:57 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)

Woops sorry we have indexes on (symbol, source, time) and there is no
date
column :/

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time
DESC
LIMIT 1000;
 QUERY PLAN



 Limit  (cost=321163.29..321165.79 rows=1000 width=466) (actual
time=33243.924..33246.021 rows=1000 loops=1)
   ->  Sort  (cost=321163.29..321402.31 rows=95609 width=466) (actual
time=33243.917..33244.626 rows=1000 loops=1)
 Sort Key: "time"
 ->  Seq Scan on article  (cost=0.00..301724.00 rows=95609
width=466) (actual time=0.022..32979.685 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33258.706 ms
(6 rows)


explain analyze SELECT * FROM article WHERE symbol=12646 AND source =
19;
 QUERY PLAN



 Seq Scan on article  (cost=0.00..301724.00 rows=95609 width=466)
(actual
time=0.021..33275.433 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33320.920 ms
(3 rows)

We can't use CLUSTER because we need the DB up all the time.

The analyze suggests that it's the I/O taking most of the time to me.


-Original Message-
From: Alban Hertroys [mailto:[EMAIL PROTECTED]
Sent: 18 March 2005 09:48
To: Alex Stapleton
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)


Alex Stapleton wrote:
> SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time
DESC
> LIMIT 1000;
>
> To get the latest 1000 rows for that symbol and source.

> My (not yet implemented) solution to this problem is to add a SEQUENCE
and
> index it so that by adding a WHERE id > [max_id]-1000 and ordering by
time
> DESC will reduce the I/O quite a lot. Am I right here? It would be
nice if
> there was a way to get PostgreSQL to try and precache the tables pages
as
> well, is there anyway I could achieve something like that? I have
toyed
with
> creating a ramdisk to store a lookup table of sorts on (we only care
about
a
> few columns initially) to speed this up a bit but its a right pain in
the
> arse to do by the looks of things.

First question that always gets asked here: What's the output of explain
analyse? Without that, people here can't see where the slowdown is.

I expect though, that the problem is the ordering by time. I imagine
that you could create an index on time, maybe truncated to months or
something similar (You can create indices based on functions). That
index alone should speed up the ordering already.
It could also be used to cluster the table, which should speed up things
some more, I suppose.

--
Alban Hertroys
MAG Productions

T: +31(0)53 4346874
F: +31(0)53 4346876
E: [EMAIL PROTECTED]
W: http://www.magproductions.nl


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] plpython function problem workaround

2005-03-18 Thread Marco Colombo
On Thu, 17 Mar 2005, Tom Lane wrote:
Martijn van Oosterhout  writes:
On Thu, Mar 17, 2005 at 01:03:36PM +0100, Marco Colombo wrote:
OMG! It's indenting the funtion body. I think you can't do that
w/o being syntax-aware. I'm not familiar with the code, why is it
adding a 'def' in front of it at all? I undestand that once you do
it you'll have to shift the code by an indentation level.

Presumbly because it wants to create a function, which can later be
called. Since python is sensetive to whitespace it has to indent the
code to make it work.
Seems like we have to upgrade that thing to have a complete
understanding of Python lexical rules --- at least enough to know where
the line boundaries are.  Which is pretty much exactly the same as
knowing which CRs to strip out.  So I guess we have a candidate place
for a solution.
Anyone want to code it up?  I don't know enough Python to do it ...
I'm no expert but I'll look into it. Unless someone else already
tried it, I want to investigate first if it's possible to create
a callable object w/o using 'def', which alters the name space and
(the thing we're interested to) needs an extra identation level.
At first sight, what we do now (at function creation time) is:
1) execute a function definition;
2) compile a function call, and save the resulting code object for later use.
I'm wondering if we can save one step, and use a python callable object.
.TM.
--
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Martijn van Oosterhout
On Fri, Mar 18, 2005 at 09:29:06AM -, Alex Stapleton wrote:
> We have a ~10million row table but are expecting it to get larger, possibly
> by a factor of 10 or more. The rows are quite long and fixed length (just
> over 500 bytes.)
> 
> We have an index of (symbol, source, date) on this table and doing queries
> like this
> 
> SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC
> LIMIT 1000;
> 
> To get the latest 1000 rows for that symbol and source.
> 
> However this takes quite a while at the best of times, (1-10 seconds.) The
> query without the order by and the limit tends to return about 7 rows
> which adds up to about 30MB of data. Once the pages are in the cache they
> take around 100ms but this is to be expected. Unfortunately the initial
> query required to cache it is unnacceptably long for web application like
> ours.

I think the normal approach for this is an index on
(symbol,source,time). You may need to change the query to:

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY
symbol desc, source DESC, time DESC LIMIT 1000;

The EXPLAIN ANALYZE output would also be very helpful...

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpZLcFIL16fX.pgp
Description: PGP signature


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Alex Stapleton
Woops sorry we have indexes on (symbol, source, time) and there is no date
column :/

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC
LIMIT 1000;
 QUERY PLAN


 Limit  (cost=321163.29..321165.79 rows=1000 width=466) (actual
time=33243.924..33246.021 rows=1000 loops=1)
   ->  Sort  (cost=321163.29..321402.31 rows=95609 width=466) (actual
time=33243.917..33244.626 rows=1000 loops=1)
 Sort Key: "time"
 ->  Seq Scan on article  (cost=0.00..301724.00 rows=95609
width=466) (actual time=0.022..32979.685 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33258.706 ms
(6 rows)


explain analyze SELECT * FROM article WHERE symbol=12646 AND source = 19;
 QUERY PLAN


 Seq Scan on article  (cost=0.00..301724.00 rows=95609 width=466) (actual
time=0.021..33275.433 rows=42959 loops=1)
   Filter: ((symbol = 12646) AND (source = 19))
 Total runtime: 33320.920 ms
(3 rows)

We can't use CLUSTER because we need the DB up all the time.

The analyze suggests that it's the I/O taking most of the time to me.


-Original Message-
From: Alban Hertroys [mailto:[EMAIL PROTECTED]
Sent: 18 March 2005 09:48
To: Alex Stapleton
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition
(and caches)


Alex Stapleton wrote:
> SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time
DESC
> LIMIT 1000;
>
> To get the latest 1000 rows for that symbol and source.

> My (not yet implemented) solution to this problem is to add a SEQUENCE and
> index it so that by adding a WHERE id > [max_id]-1000 and ordering by time
> DESC will reduce the I/O quite a lot. Am I right here? It would be nice if
> there was a way to get PostgreSQL to try and precache the tables pages as
> well, is there anyway I could achieve something like that? I have toyed
with
> creating a ramdisk to store a lookup table of sorts on (we only care about
a
> few columns initially) to speed this up a bit but its a right pain in the
> arse to do by the looks of things.

First question that always gets asked here: What's the output of explain
analyse? Without that, people here can't see where the slowdown is.

I expect though, that the problem is the ordering by time. I imagine
that you could create an index on time, maybe truncated to months or
something similar (You can create indices based on functions). That
index alone should speed up the ordering already.
It could also be used to cluster the table, which should speed up things
some more, I suppose.

--
Alban Hertroys
MAG Productions

T: +31(0)53 4346874
F: +31(0)53 4346876
E: [EMAIL PROTECTED]
W: http://www.magproductions.nl


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Select Last n Rows Matching an Index Condition (and

2005-03-18 Thread Alban Hertroys
Alex Stapleton wrote:
SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC
LIMIT 1000;
To get the latest 1000 rows for that symbol and source.

My (not yet implemented) solution to this problem is to add a SEQUENCE and
index it so that by adding a WHERE id > [max_id]-1000 and ordering by time
DESC will reduce the I/O quite a lot. Am I right here? It would be nice if
there was a way to get PostgreSQL to try and precache the tables pages as
well, is there anyway I could achieve something like that? I have toyed with
creating a ramdisk to store a lookup table of sorts on (we only care about a
few columns initially) to speed this up a bit but its a right pain in the
arse to do by the looks of things.
First question that always gets asked here: What's the output of explain 
analyse? Without that, people here can't see where the slowdown is.

I expect though, that the problem is the ordering by time. I imagine 
that you could create an index on time, maybe truncated to months or 
something similar (You can create indices based on functions). That 
index alone should speed up the ordering already.
It could also be used to cluster the table, which should speed up things 
some more, I suppose.

--
Alban Hertroys
MAG Productions
T: +31(0)53 4346874
F: +31(0)53 4346876
E: [EMAIL PROTECTED]
W: http://www.magproductions.nl
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[GENERAL] Select Last n Rows Matching an Index Condition (and caches)

2005-03-18 Thread Alex Stapleton
We have a ~10million row table but are expecting it to get larger, possibly
by a factor of 10 or more. The rows are quite long and fixed length (just
over 500 bytes.)

We have an index of (symbol, source, date) on this table and doing queries
like this

SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC
LIMIT 1000;

To get the latest 1000 rows for that symbol and source.

However this takes quite a while at the best of times, (1-10 seconds.) The
query without the order by and the limit tends to return about 7 rows
which adds up to about 30MB of data. Once the pages are in the cache they
take around 100ms but this is to be expected. Unfortunately the initial
query required to cache it is unnacceptably long for web application like
ours.

My (not yet implemented) solution to this problem is to add a SEQUENCE and
index it so that by adding a WHERE id > [max_id]-1000 and ordering by time
DESC will reduce the I/O quite a lot. Am I right here? It would be nice if
there was a way to get PostgreSQL to try and precache the tables pages as
well, is there anyway I could achieve something like that? I have toyed with
creating a ramdisk to store a lookup table of sorts on (we only care about a
few columns initially) to speed this up a bit but its a right pain in the
arse to do by the looks of things.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] pg_dump large-file support > 16GB

2005-03-18 Thread Rafael Martinez
On Thu, 2005-03-17 at 10:41 -0700, Aly Dharshi wrote:

Hello

> Would it help to use a different filesystem like SGI's XFS ?

I do not see the connection between this problem and using another
filesystem. I think we would have this problem with all the programs in
the system if we had a problem with the filesystem we are using.

>  Would it be 
> possible to even implement that at you site at this stage ?
> 

We can not do this if we want support from our "operative system
departament", they do not support XFS at the present. 

-- 
Rafael Martinez, <[EMAIL PROTECTED]> 
Center for Information Technology Services 
University of Oslo, Norway   

PGP Public Key: http://folk.uio.no/rafael/


signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] pg_dump large-file support > 16GB

2005-03-18 Thread Rafael Martinez
On Thu, 2005-03-17 at 10:17 -0500, Tom Lane wrote:
> Rafael Martinez Guerrero <[EMAIL PROTECTED]> writes:
> > We are trying to dump a 30GB+ database using pg_dump with the --file
> > option. In the beginning everything works fine, pg_dump runs and we get
> > a dumpfile. But when this file becomes 16GB it disappears from the
> > filesystem, pg_dump continues working without giving an error until it
> > finnish (even when the file does not exist)(The filesystem has free
> > space).
> 
> Is that a plain text, tar, or custom dump (-Ft or -Fc)?  Is the behavior
> different if you just write to stdout instead of using --file?
> 
>   regards, tom lane

- In this example, it is a plain text (--format=p).
- If I write to stdout and redirect to a file, the dump finnish without
problems and I get a dump-text-file over 16GB without problems.


-- 
Rafael Martinez, <[EMAIL PROTECTED]> 
Center for Information Technology Services 
University of Oslo, Norway   

PGP Public Key: http://folk.uio.no/rafael/


signature.asc
Description: This is a digitally signed message part