Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
You might unknowingly be asking a FAQ. See the end of this section: http://www.postgresql.org/docs/faqs.FAQ.html#4.6 Depending on the version of postgres you're running and the data types of the symbol and source columns, you might need to quote (or cast) your constant data for symbol and source, e.g.: SELECT * FROM article WHERE symbol='12646' AND source = '19' ORDER BY time DESC LIMIT 1000; -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 18, 2005, at 3:56 AM, Alex Stapleton wrote: Woops sorry we have indexes on (symbol, source, time) and there is no date column :/ SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; QUERY PLAN --- - Limit (cost=321163.29..321165.79 rows=1000 width=466) (actual time=33243.924..33246.021 rows=1000 loops=1) -> Sort (cost=321163.29..321402.31 rows=95609 width=466) (actual time=33243.917..33244.626 rows=1000 loops=1) Sort Key: "time" -> Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.022..32979.685 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33258.706 ms (6 rows) explain analyze SELECT * FROM article WHERE symbol=12646 AND source = 19; QUERY PLAN --- - Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.021..33275.433 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33320.920 ms (3 rows) We can't use CLUSTER because we need the DB up all the time. The analyze suggests that it's the I/O taking most of the time to me. -Original Message- From: Alban Hertroys [mailto:[EMAIL PROTECTED] Sent: 18 March 2005 09:48 To: Alex Stapleton Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Alex Stapleton wrote: SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; To get the latest 1000 rows for that symbol and source. My (not yet implemented) solution to this problem is to add a SEQUENCE and index it so that by adding a WHERE id > [max_id]-1000 and ordering by time DESC will reduce the I/O quite a lot. Am I right here? It would be nice if there was a way to get PostgreSQL to try and precache the tables pages as well, is there anyway I could achieve something like that? I have toyed with creating a ramdisk to store a lookup table of sorts on (we only care about a few columns initially) to speed this up a bit but its a right pain in the arse to do by the looks of things. First question that always gets asked here: What's the output of explain analyse? Without that, people here can't see where the slowdown is. I expect though, that the problem is the ordering by time. I imagine that you could create an index on time, maybe truncated to months or something similar (You can create indices based on functions). That index alone should speed up the ordering already. It could also be used to cluster the table, which should speed up things some more, I suppose. -- Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: [EMAIL PROTECTED] W: http://www.magproductions.nl ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[GENERAL] ANN: Bricolage 1.8.5 Released
The Bricolage development team is pleased to announce the release of Bricolage 1.8.5. This maintenance release addresses a number of issues in Bricolage 1.8.3 and adds a number of improvements (there was no announcement for the short-lived 1.8.4 release). The SOAP server in particular sees improvements in this release, with improved character set support; better support for related stories and media using URIs in addition to IDs; and as support for top-level element relations. Issues with the ordering of story elements have also been corrected, as well as errors when attempting to revert a story or media document or template. Here are the other highlights of this release: Improvements * Added Linux startup script contrib/start_scripts/linux. [David] * Related story and media elements managed through the SOAP server can now use a combination of URI and site ID to identify related assets in addition to the existing approach of using story and media IDs. [David] * A list of subelements is now less likely to mysteriously become out of order and thus lead to strange action-at-a-distance errors. And even if they do become out of order, the error message will be more appropriate ("Warning! State inconsistent" instead of "Can't call method 'get_name' on an undefined value"). Reported by Curtis Poe. [David] * The SOAP media interface now supports creating relationships between the media documents elements and other story and media documents, just like the SOAP story interface does. [David] * The SOAP interface now supports Related stories and media on story type and media type elements just as in the UI. This involved the somewhat hackish necessity for including the "related_story_id" and "related_media_id" (or "related_story_uri" and "related_media_uri") attributes in the "elements" XML element, but it does the trick. [David] Bug Fixes * Calls to publish documents via SOAP will no longer fail if the published_version attribute is not specified and the document to be published has never been published before. [David] * The Bricolage virtual FTP server will no longer fail to start if Template Toolkit is installed but its version number is less than 2.14. Reported by Adam Rinehart. [David] * Stories and Media created or updated via the SOAP interface will now associate contributors of the appropriate type, instead of "All Contributors". [Scott & David] * Deleting an element that has a template no longer causes an error. Thanks to Susan for the spot! [David] * Eliminated encoding errors when using the SOAP interface to output stories, media, or templates with wide characters. Reported by Scott Lanning. [David] * Reverting (stories, media, templates) no longer gives an error. Reported by Simon Wilcox, Rachel Murray, and others. [David] * Publishing a published version of a document that has a later version in workflow will no longer cause that later version to be mysteriously removed from workflow. This could be caused by passing a document looked up using the published_version to list() to $burner->publish_another in a template. [David] * The SOAP server story and media interfaces now support elements that contain both related stories and media, rather than one or the other. [David] * Attempting to preview a story or media document currently checked out to another user no longer causes an error. Reported by Paul Orrock. [David] * Custom fields with default values now have their values included when they are added to stories and media. Thanks to Clare Parkinson for the spot! [David] * The bric_queued script now requires a username and password and will authenticate the user. This user will then be used for logging events. All events logged when a job is run via the UI are now also logged by bric_queued. [Mark and David] * Preview redirections now use the protocol setting of the preview output channel if it's available, and falls back on using "http://"; when it's not, instead of using the hard-coded "http://";. Thanks to Martin Bacovsky for the spot! [David] * The has_keyword() method in the Business class (from which the story and media classes inherit) now works. Thanks to Clare Parkinson for the spot! [David] * Clicking a link in the left-side navigation after the session has expired now causes the whole window to show the login form, rather than it showing inside the nav frame, which was useless. [Marshall] * The JavaScript that validates form contents once again works with htmlArea, provided htmlArea itself is patched. See http://sour
Re: [GENERAL] Query performance problem
I can also do the same thing without sub-queries - I messed about some more since I was keen to ensure backward compatibility with prior versions of mysql that have left/right joins but no subqueries ... quite a bit quicker still! Query took 0.0037 sec - 1/10th of the sub-query time. SELECT chartmaster.accountcode, periods.periodno FROM chartmaster INNER JOIN periods ON True LEFT JOIN chartdetails ON chartmaster.accountcode = chartdetails.accountcode AND periods.periodno = chartdetails.period WHERE periods.periodno >=1 AND periods.periodno <=63 AND chartdetails.accountcode IS NULL LIMIT 0 , 30 In postgres: SQL executed. Total runtime: 12.241 ms Still this is a third of the time of the sub-query route but 4 times longer than mysql - this must be an install issue? Thanks again for this idea Paul phil On Fri, 18 Mar 2005 14:07, you wrote: > See the syntax for INSERT ... SELECT shown here: > http://www.postgresql.org/docs/8.0/static/sql-insert.html > > Instead of doing a nested loop to INSERT new records, do it like this: > > For ($period = start; $period < end; $period++) > { > INSERT INTO chartdetails (accountcode, period) > SELECT accountcode, $period FROM chartdetails WHERE > (accountcode, $period) NOT IN ( > SELECT accountcode, period FROM chardetails WHERE period = > $period > ); > } > > Or if you have some table that has 1 row for each period (call it > "periods") then you could simply do: > > INSERT INTO chartdetails (accountcode, period) > SELECT accountcode, period FROM accountcode, period > WHERE (period BETWEEN $start AND $end) AND (accountcode, > period) NOT IN ( > SELECT accountcode, period FROM chartdetails WHERE period > BETWEEN $start AND $end > ); > > Note to others: see the legitimate use of an unconstrained CROSS JOIN? > > -- > > Postgres's SELECT count(*) is slow if many records meet the WHERE clause > being used. It looks like you're only using testing for 0 or >0 in your > query, so you could use: > > SELECT EXISTS (SELECT 1 FROM chartdetails WHERE ); > > This will be much faster since with EXISTS, postgres only runs the query > long enough to find out whether even one row would be returned--if so, > it stops. > > Regards, > Paul Tillotson > > Phil Daintree wrote: > >Dear psqlers, > > > >I need your help! > > > >I administer/develop an open source PHP accounting software project > > (webERP) [snip] > > > > > > $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db); > > > > While ($AccountRow = DB_fetch_array($ChartAccounts)){ > > > > for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) { > > > > echo '' . _('Period Number') . ' ' . $PeriodNo . > > ''; > > > > // Check if there is an chart details record set up > > $sql = 'SELECT count(*) FROM chartdetails > > WHERE > > accountcode='.$AccountRow['accountcode'].' > > AND period=' . $PeriodNo; > > $InsChartDetails = DB_query($sql,$db,'','','',false); > > $CountRows = DB_fetch_row($InsChartDetails); > > $AccountExistsAlready = $CountRows[0]; > > DB_free_result($InsChartDetails); > > if(! $AccountExistsAlready) { > > $sql = 'INSERT INTO chartdetails (accountcode, > > period) > > VALUES (' . $AccountRow['accountcode'] > > . ', > > ' . $PeriodNo . ')'; > > $InsChartDetails = DB_query($sql,$db); > > DB_free_result($InsChartDetails); > > } > > > > } > > > > /*Now run through each of the new chartdetail records created for each > >account and update them with the B/Fwd and B/Fwd budget no updates would > > be required where there were previously no chart details set up ie > >FirstPeriodPostedTo > 0 */ > > > > for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) { > > > > $sql = 'SELECT accountcode, > > period, > > actual + bfwd AS cfwd, > > budget + bfwdbudget AS cfwdbudget > > FROM chartdetails WHERE period =' . ($PeriodNo > > - 1); > > $ChartDetailsCFwd = DB_query($sql,$db); > > > > while ($myrow = DB_fetch_array($ChartDetailsCFwd)){ > > > > $sql = 'UPDATE chartdetails SET bfwd =' . > > $myrow['cfwd'] . ', > > bfwdbudget =' . > > $myrow['cfwdbudget'] . ' > >
Re: [GENERAL] Help with transactions
On Fri, Mar 18, 2005 at 09:22:52AM -0500, Stephen Howie wrote: > I have a java program that excepts print streams and inserts in into a > spool table as a bytea. This fires a pl/pgsql trigger that passes the > bytea to a pl/perl function to process the bytea and spits the results > as an array back. It then proceeds to insert the data into multiple > tables. Problem is that two of the tables data is inserted into inside > this transaction, one has a foreign key to the other. As you can guess > I get a foreign key violation because the transaction is not committed A transaction doesn't need to be committed for operations to see the effects of previous operations in the same transaction, but there could be visibility problems related to what happens when. Could you post the simplest self-contained example that demonstrates the problem? It'll be easier to understand the interactions if we can see the exact code. In simple tests I successfully did what you describe, so apparently my experiment didn't duplicate what you're doing. What version of PostgreSQL are you using? > and as far as I understand PostgreSQL does not support dirty reads or > nested transactions. You probably need neither. PostgreSQL 8.0 supports nested transactions in the form of savepoints. > 1) what is there another way to handle this transaction that would > resolve this violation without using dirty reads and Let's see what you're doing before thinking dirty reads would be a solution. Or perhaps you're thinking about them in a slightly different sense than transaction literature usually describes. > 2) It looks like running the trigger after insert on a table does > not run as a separate transaction. Why would it? > Is the insert to that table suppose to fail if the trigger fails? > To me that defeats the purpose of having a trigger after insert. An after trigger is about visibility. Here's an excerpt from the "Triggers" chapter in the documentation: Row after triggers are most sensibly used to propagate the updates to other tables, or make consistency checks against other tables. The reason for this division of labor is that an after trigger can be certain it is seeing the final value of the row, while a before trigger cannot; there might be other before triggers firing after it. An operation ain't over 'til it's over: if an after trigger doesn't like what it sees, it can still abort the operation by raising an exception. That doesn't defeat the purpose -- it's part of the purpose. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] plpython function problem workaround
On Fri, Mar 18, 2005 at 10:33:01AM -0500, Tom Lane wrote: > > I think it would be reasonable to back-patch a small fix to convert CRLF. > The sort of rewrite Marco is considering, I wouldn't back-patch. I just submitted a small patch to convert CRLF => LF, CR => LF. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Query performance problem
First time I ran it it took 5127.243 ms .. then I did a full vacuum. then ... SQL executed. Total runtime: 33.707 ms I am keen to just have the one lot of code all in the scripts ... so I was pleased when the identical sql also worked on mysql!!! Your SQL-query has been executed successfully (Query took 0.0350 sec) SQL-query : [Edit] [Explain SQL] [Create PHP Code] SELECT chartmaster.accountcode, periods.periodno FROM chartmaster, periods WHERE ( periods.periodno BETWEEN 1 AND 12 ) AND ( chartmaster.accountcode, periods.periodno ) NOT IN (SELECT accountcode, period FROM chartdetails WHERE period BETWEEN 1 AND 12 ) LIMIT 0 , 30 You'll notice the discrepancy on the timings though! Whilst pg is not performing the way mysql does with innodb - it is at least usable this way. I am guessing there is some gremlin with my install - I'll try an upgrade to v 8. Phil On Fri, 18 Mar 2005 14:07, you wrote: > See the syntax for INSERT ... SELECT shown here: > http://www.postgresql.org/docs/8.0/static/sql-insert.html > > Instead of doing a nested loop to INSERT new records, do it like this: > > For ($period = start; $period < end; $period++) > { > INSERT INTO chartdetails (accountcode, period) > SELECT accountcode, $period FROM chartdetails WHERE > (accountcode, $period) NOT IN ( > SELECT accountcode, period FROM chardetails WHERE period = > $period > ); > } > > Or if you have some table that has 1 row for each period (call it > "periods") then you could simply do: > > INSERT INTO chartdetails (accountcode, period) > SELECT accountcode, period FROM accountcode, period > WHERE (period BETWEEN $start AND $end) AND (accountcode, > period) NOT IN ( > SELECT accountcode, period FROM chartdetails WHERE period > BETWEEN $start AND $end > ); > > Note to others: see the legitimate use of an unconstrained CROSS JOIN? > > -- > > Postgres's SELECT count(*) is slow if many records meet the WHERE clause > being used. It looks like you're only using testing for 0 or >0 in your > query, so you could use: > > SELECT EXISTS (SELECT 1 FROM chartdetails WHERE ); > > This will be much faster since with EXISTS, postgres only runs the query > long enough to find out whether even one row would be returned--if so, > it stops. > > Regards, > Paul Tillotson > > Phil Daintree wrote: > >Dear psqlers, > > > >I need your help! > > > >I administer/develop an open source PHP accounting software project > > (webERP) [snip] > > > > > > $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db); > > > > While ($AccountRow = DB_fetch_array($ChartAccounts)){ > > > > for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) { > > > > echo '' . _('Period Number') . ' ' . $PeriodNo . > > ''; > > > > // Check if there is an chart details record set up > > $sql = 'SELECT count(*) FROM chartdetails > > WHERE > > accountcode='.$AccountRow['accountcode'].' > > AND period=' . $PeriodNo; > > $InsChartDetails = DB_query($sql,$db,'','','',false); > > $CountRows = DB_fetch_row($InsChartDetails); > > $AccountExistsAlready = $CountRows[0]; > > DB_free_result($InsChartDetails); > > if(! $AccountExistsAlready) { > > $sql = 'INSERT INTO chartdetails (accountcode, > > period) > > VALUES (' . $AccountRow['accountcode'] > > . ', > > ' . $PeriodNo . ')'; > > $InsChartDetails = DB_query($sql,$db); > > DB_free_result($InsChartDetails); > > } > > > > } > > > > /*Now run through each of the new chartdetail records created for each > >account and update them with the B/Fwd and B/Fwd budget no updates would > > be required where there were previously no chart details set up ie > >FirstPeriodPostedTo > 0 */ > > > > for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) { > > > > $sql = 'SELECT accountcode, > > period, > > actual + bfwd AS cfwd, > > budget + bfwdbudget AS cfwdbudget > > FROM chartdetails WHERE period =' . ($PeriodNo > > - 1); > > $ChartDetailsCFwd = DB_query($sql,$db); > > > > while ($myrow = DB_fetch_array($ChartDetailsCFwd)){ > > > > $sql = 'UPDATE chartdetails SET bfwd =' . > > $myrow['cfwd'] . ', > >
Re: [GENERAL] SMP scaling
Mark Rae <[EMAIL PROTECTED]> writes: > Even with the numa support, which makes sure any memory allocated by > malloc or the stack ends up local to the processor which originally > called it, and then continues to schedule the process on that CPU, > there is still the problem that all table accesses* go through > the shared buffer cache which resides in one location. > [* is this true in all cases?] Temp tables are handled in backend-local memory, but all normal tables have to be accessed through shared buffers. The implications of not doing that are bad enough that it's hard to believe it could be a win to change. (In short: the hardware may not like syncing across processors, but it can still do it faster than we could hope to do in software.) > it looks like SGI already have a solution in the form of > symmetric data objects. > In particular, the symmetric heap, an area of shared memory > which is replicated across all memory domains with the > coherency being handled in hardware. Hmm, do they support spinlock objects in this memory? If so it could be just the right thing. > So it looks like all that might be needed is to replace the > shmget calls in src/backend/port with the equivalent SGI functions. Send a patch ;-) regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Query performance problem
Phil Daintree wrote: Appreciated you help Paul - many thanks for taking the time. I view this as merely passing on all the pearls of wisdom I have gleaned from this list. : ) Advice: Are you running this inside a transaction? Do so, because if you don't, then each UPDATE or INSERT or SELECT runs inside its own transaction, and committing each transaction has overhead associated with it. It looks like just putting this code inside a transaction has dramatically reduced the problem. Of course I knew this but it needed me to be embarassed to actually do it :-) Glad to hear that this helped. In case you are interested, the reason this makes such a dramatic difference is that each transaction's commit record must be logged to the commit log, which is a sequentially written file. (Thus, you can only commit one transaction per revolution of the disk, and so if you have a 7200 rpm disk, you can't get more than 120 transactions / second on a safely configured system unless your drive has a battery-backed write cache.) This block of code is INSIDE a while loop that loops once for each row in chartmaster: for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) { $sql = 'SELECT accountcode, period, actual + bfwd AS cfwd, budget + bfwdbudget AS cfwdbudget FROM chartdetails WHERE period =' . ($PeriodNo - 1); $ChartDetailsCFwd = DB_query($sql,$db); while ($myrow = DB_fetch_array($ChartDetailsCFwd)){ $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ', bfwdbudget =' . $myrow['cfwdbudget'] . ' WHERE accountcode = ' . $myrow['accountcode'] . ' AND period >=' . $PeriodNo; $UpdChartDetails = DB_query($sql,$db, '', '', '', false); DB_free_result($UpdChartDetails); } DB_free_result($ChartDetailsCFwd); } It looks like you're updating the same row in chartmaster multiple times. chartmaster is not being updated ... Sorry--I mean chartdetails. One tip to remember: If you have a statement like this: UPDATE footable SET foocol = 'value' WHERE fooid = 'id'; And it is the case that foocol might already equal value, then write this instead: UPDATE footable SET foocol = 'value' WHERE fooid = 'id' AND foocol <> 'value'; This will help because no disk write will actually happen if foocol happens to already equal value. chartdetails is - the chartdetails relating to each chartmaster record for all periods >= $PeriodNo I have to update all the following periods as the balance b/fwd for all successive periods has now increased if we post transaction back in time normally there might only be a few chartdetails records for the chartmaster account under review with chartdetails records with a period later than the one being posted. Am I correct in thinking that bfwd is basically a running tally of actual, and bfwdbudget is a running tally of budget, as one might normally find in a spreadsheet? If so, you could use this view to calculate the correct value for every location in the table: CREATE VIEW newtotals AS SELECT thismonth.accountcode, thismonth.periodno, (SELECT SUM(actual) FROM chartdetails AS q1 WHERE q1.accountcode = accountcode AND q1.periodno < thismonth.periodno) as cfwd, (SELECT SUM(budget) FROM chartdetails AS q1 WHERE q1.accountcode = accountcode AND q1.periodno < thismonth.periodno) as cfwdbudget, FROM chartdetails AS thismonth; And then you could use an update statement: UPDATE chartdetails SET bfwd = cfwd, bfwdbudget = cfwdbudget FROM newtotals WHERE chartdetails.accountcode = newtotals.accountcode AND chartdetails.periodno = newtotals.periodno -- JOIN condition AND period BETWEEN $CreateTo AND $CreateFrom AND (bfwd <> cfwd OR bfwdbudget <> cfwdbudget); -- AVOID needless updates. Since I don't have your tables to work with, this might need minor syntax tweaking, but I'm pretty sure it will work. I think MySQL doesn't support views yet, but you could replace "newtotals" in the above update with a big subselect (which I think they are supposed to support in the latest 4.x version.) Also: if you don't already have one, that UPDATE statement would probably use a compound index on (accountcode, periodno). Now I begin to comprehend why CompiereERP doesn't support MySQL. ; ) a row in chartdetails will be updated every time there is a gltrans posting to the period and account of the chartdetails ie quite often. If it gets updated often it will need vacuuming often as well. M
Re: [GENERAL] Vaccum analyze.
On Fri, 18 Mar 2005 20:28:50 -0300, Fernando Lujan <[EMAIL PROTECTED]> wrote: > Hi folks, > > I wanna know from you, how often must I run vaccum analyze on my db? > > Once per day, twice... One per hour and so on... > > I didn't find a especific document about this question. That's because it depends on how you are using the database. In general, running it once/day is a minimum safe assumption. However, if the data is changing really quickly, then you should run it more frequently. The only way to judge is to run it and see how much work is being done each time. -- ~ L. Friedman[EMAIL PROTECTED] LlamaLand http://netllama.linux-sxs.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Vaccum analyze.
Search for "pg_autovacuum" -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fernando Lujan Sent: Friday, March 18, 2005 3:29 PM To: pgsql-general@postgresql.org Subject: [GENERAL] Vaccum analyze. Hi folks, I wanna know from you, how often must I run vaccum analyze on my db? Once per day, twice... One per hour and so on... I didn't find a especific document about this question. Thanks in advance. Fernando Lujan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Vaccum analyze.
Hi folks, I wanna know from you, how often must I run vaccum analyze on my db? Once per day, twice... One per hour and so on... I didn't find a especific document about this question. Thanks in advance. Fernando Lujan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Statistics with PostgreSQL
/* On the test stub: */ [snip] double data[30]; int main(void) { size_t i; size_t size = sizeof(data) / sizeof(data[0]); for (i = 0; i < size; i++) { data[i] = rand(); } for (i = 0; i < size; i++) { cout << data[i] << endl; } cout << "1st item is " << RandomSelect(data, 0, size - 1, 0) << endl; cout << "2nd item is " << RandomSelect(data, 0, size - 1, 1) << endl; cout << "3rd item is " << RandomSelect(data, 0, size - 1, 2) << endl; for (i = 4; i < size; i++) cout << i << "th item is " << RandomSelect(data, 0, size - 1, i) << endl; return 0; } /* The positions are cardinal, so the first three queries above should be: cout << "1st item is " << RandomSelect(data, 0, size - 1, 1) << endl; cout << "2nd item is " << RandomSelect(data, 0, size - 1, 2) << endl; cout << "3rd item is " << RandomSelect(data, 0, size - 1, 3) << endl; */ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SMP scaling
On Fri, Mar 18, 2005 at 01:31:51PM -0500, Tom Lane wrote: > BTW, although I know next to nothing about NUMA, I do know that it is > configurable to some extent (eg, via numactl). What was the > configuration here exactly, and did you try alternatives? Also, > what was the OS exactly? (I've heard that RHEL4 is a whole lot better > than RHEL3 in managing NUMA, for example. This may be generic to 2.6 vs > 2.4 Linux kernels, or maybe Red Hat did some extra hacking.) The Altix uses a 2.4.21 kernel with SGI's own modifications to support up to 256 CPUs and their NUMALink hadware. (Some of which has become the NUMA code in the 2.6 kernel) Even with the numa support, which makes sure any memory allocated by malloc or the stack ends up local to the processor which originally called it, and then continues to schedule the process on that CPU, there is still the problem that all table accesses* go through the shared buffer cache which resides in one location. [* is this true in all cases?] I was about to write a long explaination about how the only way to scale out to this size would be to have separate buffer caches in each memory domain, and this would then require some kind of cache coherency mechanism. But after reading a few bits of documentation, it looks like SGI already have a solution in the form of symmetric data objects. In particular, the symmetric heap, an area of shared memory which is replicated across all memory domains with the coherency being handled in hardware. So it looks like all that might be needed is to replace the shmget calls in src/backend/port with the equivalent SGI functions. -Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] pg_dump large-file support > 16GB
Rafael Martinez Guerrero <[EMAIL PROTECTED]> writes: > We are trying to dump a 30GB+ database using pg_dump with the --file > option. In the beginning everything works fine, pg_dump runs and we get > a dumpfile. But when this file becomes 16GB it disappears from the > filesystem, FWIW, I tried and failed to duplicate this problem on a Fedora Core 3 machine using an ext3 filesystem. I set up a dummy database that would produce an approximately 18GB text dump and did pg_dump big --file spare/big.dump Seemed to work fine. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
On Fri, Mar 18, 2005 at 10:23:14AM -, Alex Stapleton wrote: > If I SET ENABLE_SEQSCAN TO OFF it uses the Index, but it does an Index scan > backward if I do > > SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY > symbol desc, source DESC, time DESC LIMIT 1000; > > Which is better but still quite slow. What's the EXPLAIN ANALYZE of this query? Have you done VACUUM FULL and/or ANALYZE recently? Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpDKggtVgOBZ.pgp Description: PGP signature
Re: [GENERAL] Install error at rpm package + Fedora Core3
Sorry, I did not see your email off-list until after I sent the other one. On Mar 18, 2005, at 11:56 AM, Devrim GUNDUZ wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, On Fri, 18 Mar 2005, Dianne Yumul wrote: Why don't you try yum, it will install the dependencies for you ; ) http://linux.duke.edu/projects/yum/ As we talked off-list before, currently PGDG RPMs cannot be installed via yum, AFAIK. Cheers, - -- Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.tdmsoft.com http://www.gunduz.org -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFCOzJetl86P3SPfQ4RAvzdAKCSWTx1RZpwG6NNwrFZZMa+oTNARwCgp7FS hYHeXYjZv9QdyJNa4OrAwsI= =zUI8 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Install error at rpm package + Fedora Core3
Yup, I didn't see your email off-list till after I re-sent the other email. But thanks for the info. On Mar 18, 2005, at 11:56 AM, Devrim GUNDUZ wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, On Fri, 18 Mar 2005, Dianne Yumul wrote: Why don't you try yum, it will install the dependencies for you ; ) http://linux.duke.edu/projects/yum/ As we talked off-list before, currently PGDG RPMs cannot be installed via yum, AFAIK. Cheers, - -- Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.tdmsoft.com http://www.gunduz.org -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFCOzJetl86P3SPfQ4RAvzdAKCSWTx1RZpwG6NNwrFZZMa+oTNARwCgp7FS hYHeXYjZv9QdyJNa4OrAwsI= =zUI8 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Install error at rpm package + Fedora Core3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, On Fri, 18 Mar 2005, Dianne Yumul wrote: Why don't you try yum, it will install the dependencies for you ; ) http://linux.duke.edu/projects/yum/ As we talked off-list before, currently PGDG RPMs cannot be installed via yum, AFAIK. Cheers, - -- Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.tdmsoft.com http://www.gunduz.org -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFCOzJetl86P3SPfQ4RAvzdAKCSWTx1RZpwG6NNwrFZZMa+oTNARwCgp7FS hYHeXYjZv9QdyJNa4OrAwsI= =zUI8 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Install error at rpm package + Fedora Core3
Why don't you try yum, it will install the dependencies for you ; ) http://linux.duke.edu/projects/yum/ On Mar 18, 2005, at 11:32 AM, Devrim GUNDUZ wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, On Fri, 18 Mar 2005, Edgar Amorim wrote: I'm a kind new bye with linux & related stuffs. I'm trying to install PG on a FC3 using "postgresql-8.0.1-2PGDG.i686.rpm", but after to issue "rpm -ivh (the rpm file)" I've got a message telling "error: Failed dependencies: libpq.so.3 is need". I take a look at the documentation and it seems the libpq is the PG API, so shouldn't be built at the install time? How can I solve that or where should I look for in order to find out an answer? You have to install postgresql-libs rpm first. Regards, - -- Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.tdmsoft.com http://www.gunduz.org -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFCOyzHtl86P3SPfQ4RAptpAJ9kNwuK6ttJoKv7FWRKhjgXKN5yRwCgylUo Gnbj9xk74YioID+uOxlSFkY= =sQIj -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Install error at rpm package + Fedora Core3
Folks, I'm a kind new bye with linux & related stuffs. I'm trying to install PG on a FC3 using "postgresql-8.0.1-2PGDG.i686.rpm", but after to issue "rpm -ivh (the rpm file)" I've got a message telling "error: Failed dependencies: libpq.so.3 is need". I take a look at the documentation and it seems the libpq is the PG API, so shouldn't be built at the install time? How can I solve that or where should I look for in order to find out an answer? Thank you so much. Edgar Amorim Network Consultant Yahoo! Mail - Com 250MB de espaço. Abra sua conta!
Re: [GENERAL] Install error at rpm package + Fedora Core3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, On Fri, 18 Mar 2005, Edgar Amorim wrote: I'm a kind new bye with linux & related stuffs. I'm trying to install PG on a FC3 using "postgresql-8.0.1-2PGDG.i686.rpm", but after to issue "rpm -ivh (the rpm file)" I've got a message telling "error: Failed dependencies: libpq.so.3 is need". I take a look at the documentation and it seems the libpq is the PG API, so shouldn't be built at the install time? How can I solve that or where should I look for in order to find out an answer? You have to install postgresql-libs rpm first. Regards, - -- Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.tdmsoft.com http://www.gunduz.org -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFCOyzHtl86P3SPfQ4RAptpAJ9kNwuK6ttJoKv7FWRKhjgXKN5yRwCgylUo Gnbj9xk74YioID+uOxlSFkY= =sQIj -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] Help with transactions
Hello all, I have a java program that excepts print streams and inserts in into a spool table as a bytea. This fires a pl/pgsql trigger that passes the bytea to a pl/perl function to process the bytea and spits the results as an array back. It then proceeds to insert the data into multiple tables. Problem is that two of the tables data is inserted into inside this transaction, one has a foreign key to the other. As you can guess I get a foreign key violation because the transaction is not committed and as far as I understand PostgreSQL does not support dirty reads or nested transactions. I have two questions. 1) what is there another way to handle this transaction that would resolve this violation without using dirty reads and 2) It looks like running the trigger after insert on a table does not run as a separate transaction. Is the insert to that table suppose to fail if the trigger fails? To me that defeats the purpose of having a trigger after insert. Thanks for any help -- - Stephen Howie begin:vcard fn:Stephen Howie n:Howie;Stephen email;internet:[EMAIL PROTECTED] tel;work:260-760-5910 tel;fax:260-436-9472 tel;cell:260-704-6262 x-mozilla-html:FALSE version:2.1 end:vcard ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Statistics with PostgreSQL
Mean is just sum(col)/count(col) Mode can be calculated with having, max, count Median can be computed by sorting, using a cursor, and going to the middle. There are more efficient and better (more accurate) ways to do it, but those have to be implemented at a low level. Of course, since you have libpq, anything is possible. If you want to implement these things at a low level to get better answers, Kahan (or compenstated) summation is a good idea, and do the summation into a larger type to prevent loss of precision. Here are some statistical templates I wrote that are free for any purpose you like: ftp://cap.connx.com/pub/tournament_software/Kahan.Hpp ftp://cap.connx.com/pub/tournament_software/STATS.HPP The Cephes collection by Moshier has good extended precision types, if you need to carefully avoid any PLOSS. For median, the QuickSelect algorithm is very good. Here is an implementation I wrote in C++: #include #include #include using namespace std; /* ** ** In the following code, every reference to CLR means: ** **"Introduction to Algorithms" **By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest **ISBN 0-07-013143-0 */ /* ** CLR, page 187 */ template < class Etype > Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i) { size_t q, k; if (p == r) return A[p]; q = RandomPartition(A, p, r); k = q - p + 1; if (i <= k) return RandomSelect(A, p, q, i); else return RandomSelect(A, q + 1, r, i - k); } size_t RandRange(size_t a, size_t b) { size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b - a)); return c + a; } /* ** CLR, page 162 */ template < class Etype > size_t RandomPartition(Etype A[], size_t p, size_t r) { size_t i = RandRange(p, r); Etype Temp; Temp = A[p]; A[p] = A[i]; A[i] = Temp; return Partition(A, p, r); } /* ** CLR, page 154 */ template < class Etype > size_t Partition(Etype A[], size_t p, size_t r) { Etype x, temp; size_t i, j; x = A[p]; i = p - 1; j = r + 1; for (;;) { do { j--; } while (!(A[j] <= x)); do { i++; } while (!(A[i] >= x)); if (i < j) { temp = A[i]; A[i] = A[j]; A[j] = temp; } else return j; } } double data[30]; int main(void) { size_t i; size_t size = sizeof(data) / sizeof(data[0]); for (i = 0; i < size; i++) { data[i] = rand(); } for (i = 0; i < size; i++) { cout << data[i] << endl; } cout << "1st item is " << RandomSelect(data, 0, size - 1, 0) << endl; cout << "2nd item is " << RandomSelect(data, 0, size - 1, 1) << endl; cout << "3rd item is " << RandomSelect(data, 0, size - 1, 2) << endl; for (i = 4; i < size; i++) cout << i << "th item is " << RandomSelect(data, 0, size - 1, i) << endl; return 0; } -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hrishikesh Deshmukh Sent: Friday, March 18, 2005 10:37 AM To: Postgresql-General Subject: [GENERAL] Statistics with PostgreSQL Hi All, Is there a way to simple statistics like mean/median/mode in PostgreSQL. I have tables like PsetID | IntensityValue. I want to find out mean (intensityValue) of some PsetID(s)?! Any urls/pointers/books would be a big help. Thanks, Hrishi ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Statistics with PostgreSQL
On Fri, Mar 18, 2005 at 01:37:10PM -0500, Hrishikesh Deshmukh wrote: > Hi All, > > Is there a way to simple statistics like mean/median/mode in > PostgreSQL. I have tables like PsetID | IntensityValue. I want to > find out mean (intensityValue) of some PsetID(s)?! > Any urls/pointers/books would be a big help. Hrishi, For statistics beyond avg() and stddev(), check out PL/R http://www.joeconway.com/plr/ Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[GENERAL] Statistics with PostgreSQL
Hi All, Is there a way to simple statistics like mean/median/mode in PostgreSQL. I have tables like PsetID | IntensityValue. I want to find out mean (intensityValue) of some PsetID(s)?! Any urls/pointers/books would be a big help. Thanks, Hrishi ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SMP scaling
Mark Rae <[EMAIL PROTECTED]> writes: > The altix still only scales up to 10x rather than 16x, but that probably > is the NUMA configuration taking effect now. BTW, although I know next to nothing about NUMA, I do know that it is configurable to some extent (eg, via numactl). What was the configuration here exactly, and did you try alternatives? Also, what was the OS exactly? (I've heard that RHEL4 is a whole lot better than RHEL3 in managing NUMA, for example. This may be generic to 2.6 vs 2.4 Linux kernels, or maybe Red Hat did some extra hacking.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] SMP scaling
Bruce Momjian writes: > So it seems our entire SMP problem was that global lock. Nice. Yeah, I was kind of expecting to see the LockMgrLock up next, but it seems we're still a ways away from having a problem there. I guess that's because we only tend to touch locks once per query, whereas we're grabbing and releasing buffers much more. >From the relatively small absolute value of Mark's queries/sec numbers, I suppose he is testing some fairly heavyweight queries (big enough to not emphasize per-query overhead). I wonder what the numbers would look like with very small, simple queries. It'd move the stress around for sure ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SMP scaling
Mark Rae wrote: > On Fri, Mar 18, 2005 at 10:38:24AM -0500, Tom Lane wrote: > > Hey, that looks pretty sweet. One thing this obscures though is whether > > there is any change in the single-client throughput rate --- ie, is "1.00" > > better or worse for CVS tip vs 8.0.1? > > Here are the figures in queries per second. > > Clients1 2 3 4 6 8 12 16 > 32 64 > --- > AMD64 pg-8.0.1 6.80 12.71 18.82 22.73 18.58 17.48 17.56 17.81 > AMD64 pg-20050316 6.80 13.23 19.32 25.09 24.56 24.93 25.20 25.09 > IA64 pg-8.0.1 3.72 7.32 10.81 14.21 10.81 10.85 10.92 11.09 > IA64 pg-20050316 3.99 7.92 11.78 15.46 15.17 15.09 15.41 15.58 > Altix pg-8.0.1 3.66 7.37 10.89 14.53 21.47 26.47 27.47 20.28 > 17.12 18.66 > Altix pg-20050316 3.83 7.55 10.98 14.10 20.27 26.47 34.50 37.88 > 38.45 38.12 > > So, it didn't make any difference for the Opteron, but the two > Itanium machines were 5% and 7% faster respectively. So it seems our entire SMP problem was that global lock. Nice. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] plpython function problem workaround
Marco Colombo <[EMAIL PROTECTED]> writes: > Right now I'm parsing the string first, changing the resulting > parse tree adding missing nodes (def, INDENT, DEINDENT) and > then compiling it. Hmmm ... is this really going to be simpler or more robust than lexing the string carefully enough to insert tabs at the right places? The impression I had so far was that you'd only need to understand about Python's string-literal conventions to get that right ... and that's something that's not likely to change. I'm not so sure that parse trees can be regarded as an immutable API. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] plpython function problem workaround
On Fri, 18 Mar 2005, Tom Lane wrote: Michael Fuhr <[EMAIL PROTECTED]> writes: On Fri, Mar 18, 2005 at 12:35:07AM -0500, Tom Lane wrote: So that part of it can be solved fairly easily. Should I submit a patch? It should be only a few additional lines in PLy_procedure_munge_source(). Would you apply it only to HEAD, or would it be considered a bug fix that REL8_0_STABLE could get as well? It might be nice to have it in 8.0.2, whenever that comes out. I think it would be reasonable to back-patch a small fix to convert CRLF. The sort of rewrite Marco is considering, I wouldn't back-patch. regards, tom lane Short update: it isn't possible to get a callable object directly from the source provided by the user, since 'return' is not valid outside function definitions in Python. Right now I'm parsing the string first, changing the resulting parse tree adding missing nodes (def, INDENT, DEINDENT) and then compiling it. Grammar definitions for a python function is: funcdef: [decorators] 'def' NAME parameters ':' suite suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT What we get from the users is stmt+ (a sequence of stmt). The INDENT and DEDENT tokens are generated by the parser only when indentation level _changes_. My plan is to generate two parse trees, one from this code (with the right fname of course): def fname(): pass and one from the function definition. Then, we "attach" the root of the second tree where the "pass" node is in the first tree. We should get a parse tree ready for compilation. I wish I could "push" the right tokens in the right places, but it seems it's not possible. Stay tuned. .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] Japanese characters problem
Welcome Make some database and have problem with japanese characters maybe with other too Below some example Linux system it is fedora (RH) Locale for postgresql i pl_PL.utf8 Database is in uniqode Thanks for help. == [EMAIL PROTECTED] 1.0.4]# rpm -q postgresql postgresql-7.4.7-3.FC3.1 [EMAIL PROTECTED] 1.0.4]# rpm -q postgresql-server postgresql-server-7.4.7-3.FC3.1 crm=# select title,category from contentinfo ; title | category +-- h264-charlie-900-hdd.avi | æç h264-liloo-900-hdd.avi - drugi | éç (2 rows) crm=# select title,category from contentinfo where category='éæ'; title | category +-- h264-charlie-900-hdd.avi | æç h264-liloo-900-hdd.avi - drugi | éç (2 rows) crm=# select title,category from contentinfo where category='çæ'; title | category +-- h264-charlie-900-hdd.avi | æç h264-liloo-900-hdd.avi - drugi | éç (2 rows) crm=# select title,category from contentinfo where category='æ'; title | category ---+-- (0 rows) crm=# select title,category from contentinfo where category='çç'; title | category +-- h264-charlie-900-hdd.avi | æç h264-liloo-900-hdd.avi - drugi | éç (2 rows) crm=# select title,category from contentinfo where category='ççç'; title | category ---+-- (0 rows) crm=# select title,category from contentinfo where category ilike 'çç'; title | category ---+-- (0 rows) crm=# select title,category from contentinfo where category ilike 'æç'; title | category --+-- h264-charlie-900-hdd.avi | æç (1 row) crm=# select title,category from contentinfo where category ilike 'éç'; title | category +-- h264-liloo-900-hdd.avi - drugi | éç (1 row) crm=# select title,category from contentinfo where category ilike 'éç'; title | category +-- h264-liloo-900-hdd.avi - drugi | éç (1 row) crm=# \l List of databases Name| Owner | Encoding ---+---+--- crm | sv| UNICODE template0 | sv| SQL_ASCII template1 | sv| SQL_ASCII (3 rows) crm=# select title,category from contentinfo where category ilike '%é ç%'; title | category +-- h264-liloo-900-hdd.avi - drugi | éç (1 row) crm=# select title,category from contentinfo where category like 'éç'; title | category +-- h264-liloo-900-hdd.avi - drugi | éç (1 row) -- Grzegorz PrzeÅdziecki www.PolskieSklepy.pl kom. +48.606.822.506 gg:3701851 skype: grzegorz.przezdziecki ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[GENERAL] performance: pg vs pg!
At the suggestion of several people, I have increased the default settings in postgresql.conf before continuing my postgresql vs mysql performance tests. To date, I have only been loading a (roughly) million-row file, creating indexes during the load, running a vacuum analyze, and a couple of simple queries. I do intend on performing more complicated tests, but I did not want to do that until people stopped telling me my methodology for simple tests was...flawed. I ran a thorough series of tests, varying shared_buffers from 1000 to 9000, work_mem from 1 to 9 and maintenance_work_mem from 1 to 9. The complete results are long (I will provide them if anyone is interested) so I am only including a small selection. Before I do that, I will confess that increasing memory made more of a difference than I thought it would. I know many of you are thinking "no kidding" but I thought it would only be important for big complicated queries, or a server with multiple concurrent requests. No, it makes a big difference for "merely" loading a million rows and indexing them. Time in seconds shared_buffers work_mem m_work_mem COPY VACUUM 1000 1 1 186.154 9.814 3000 1 1 64.404 4.526 5000 5 5 65.036 3.435 9000 9 9 63.664 2.218 -- The relevant commands create table data ( fid integer, rid integer, range real, x real, y real, z real, bs real, snr real, rvel real, cfar smallint); create index fidrid_data on data (fid,rid); create index fidx on data (x); -- COPY a table with 934500 rows COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM '/home/rick/bed/data/data.dat'; -- VACUUM vacuum analyze data;
Re: [GENERAL] pg_dump large-file support > 16GB
On Fri, 2005-03-18 at 09:58 -0500, Tom Lane wrote: > Rafael Martinez <[EMAIL PROTECTED]> writes: > > On Thu, 2005-03-17 at 10:17 -0500, Tom Lane wrote: > >> Is that a plain text, tar, or custom dump (-Ft or -Fc)? Is the behavior > >> different if you just write to stdout instead of using --file? > > > - In this example, it is a plain text (--format=3Dp). > > - If I write to stdout and redirect to a file, the dump finnish without > > problems and I get a dump-text-file over 16GB without problems. > > In that case, you have a glibc or filesystem bug and you should be > reporting it to Red Hat. The *only* difference between writing to > stdout and writing to a --file option is that in one case we use > the preopened "stdout" FILE* and in the other case we do > fopen(filename, "w"). Your report therefore is stating that there > is something broken about fopen'd files. > Thanks for the information. I will contact RH. -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ signature.asc Description: This is a digitally signed message part
Re: [GENERAL] SMP scaling
On Fri, Mar 18, 2005 at 10:38:24AM -0500, Tom Lane wrote: > Hey, that looks pretty sweet. One thing this obscures though is whether > there is any change in the single-client throughput rate --- ie, is "1.00" > better or worse for CVS tip vs 8.0.1? Here are the figures in queries per second. Clients1 2 3 4 6 8 12 16 32 64 --- AMD64 pg-8.0.1 6.80 12.71 18.82 22.73 18.58 17.48 17.56 17.81 AMD64 pg-20050316 6.80 13.23 19.32 25.09 24.56 24.93 25.20 25.09 IA64 pg-8.0.1 3.72 7.32 10.81 14.21 10.81 10.85 10.92 11.09 IA64 pg-20050316 3.99 7.92 11.78 15.46 15.17 15.09 15.41 15.58 Altix pg-8.0.1 3.66 7.37 10.89 14.53 21.47 26.47 27.47 20.28 17.12 18.66 Altix pg-20050316 3.83 7.55 10.98 14.10 20.27 26.47 34.50 37.88 38.45 38.12 So, it didn't make any difference for the Opteron, but the two Itanium machines were 5% and 7% faster respectively. -Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SMP scaling
Mark Rae <[EMAIL PROTECTED]> writes: > Ok, I've done the tests comparing 8.0.1 against a snapshot from the 16th > and the results are impressive. > Clients1 2 3 4 6 8 12 16 > 32 64 > --- > Altix pg-8.0.1 1.00 2.02 2.98 3.97 5.87 7.23 7.51 5.54 > 4.68 5.10 > Altix pg-20050316 1.00 1.97 2.86 3.68 5.29 6.90 9.00 9.88 > 10.03 9.94 > AMD64 pg-8.0.1 1.00 1.87 2.77 3.34 2.73 2.57 2.58 2.62 > AMD64 pg-20050316 1.00 1.95 2.84 3.69 3.61 3.66 3.70 3.69 > IA64 pg-8.0.1 1.00 1.97 2.91 3.82 2.91 2.92 2.94 2.98 > IA64 pg-20050316 1.00 1.98 2.95 3.87 3.80 3.78 3.86 3.90 Hey, that looks pretty sweet. One thing this obscures though is whether there is any change in the single-client throughput rate --- ie, is "1.00" better or worse for CVS tip vs 8.0.1? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] plpython function problem workaround
Michael Fuhr <[EMAIL PROTECTED]> writes: > On Fri, Mar 18, 2005 at 12:35:07AM -0500, Tom Lane wrote: >> So that part of it can be solved fairly easily. > Should I submit a patch? It should be only a few additional lines > in PLy_procedure_munge_source(). Would you apply it only to HEAD, > or would it be considered a bug fix that REL8_0_STABLE could get > as well? It might be nice to have it in 8.0.2, whenever that comes > out. I think it would be reasonable to back-patch a small fix to convert CRLF. The sort of rewrite Marco is considering, I wouldn't back-patch. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] pg_atributes index space question
Joe Maldonado <[EMAIL PROTECTED]> writes: > db=# vacuum verbose analyze pg_catalog.pg_attribute; > INFO: vacuuming "pg_catalog.pg_attribute" > INFO: index "pg_attribute_relid_attnam_index" now contains 9965 row > versions in 181557 pages REINDEX is probably the only realistic way to fix that. It shouldn't take very long, fortunately, so the exclusive lock shouldn't be an enormous problem. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] plpython function problem workaround
On Fri, Mar 18, 2005 at 11:34:46AM +0100, Marco Colombo wrote: > Just let me suggest not to mimic its behaviour, but to use the > Real Thing if we manage to. That is, directly use the Universal Line > Support code provided by python itself, so that we don't even have > to think about being compatible. Sounds good if the Python API provides the hooks for doing so. I had started looking into that but didn't spent much time on it. > I'm experimenting a bit, trying to write a PLy_procedure_compile() > that does not require source munging. I'm aiming at removing the > need for extra indentation. Sounds good too, if that'll work. Looking forward to seeing what you find out. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] pg_dump large-file support > 16GB
Rafael Martinez <[EMAIL PROTECTED]> writes: > On Thu, 2005-03-17 at 10:17 -0500, Tom Lane wrote: >> Is that a plain text, tar, or custom dump (-Ft or -Fc)? Is the behavior >> different if you just write to stdout instead of using --file? > - In this example, it is a plain text (--format=3Dp). > - If I write to stdout and redirect to a file, the dump finnish without > problems and I get a dump-text-file over 16GB without problems. In that case, you have a glibc or filesystem bug and you should be reporting it to Red Hat. The *only* difference between writing to stdout and writing to a --file option is that in one case we use the preopened "stdout" FILE* and in the other case we do fopen(filename, "w"). Your report therefore is stating that there is something broken about fopen'd files. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] pg_atributes index space question
Hello, After a VACUUM FULL I saw that pg_attribute tables indexes haven't been deleted as reported by a subsequent vacuum analyze. But the pages corresponding to just the table has been deleted to 196 pages from about 181557 pages. Are all system tables affected by this ? How can we reclaim this space without shutting down postmaster ? Is this fixed in any new release ? db=# select relpages, reltuples from pg_catalog.pg_class where relname = 'pg_attribute'; relpages | reltuples --+--- 196 | 9965 (1 row) db=# vacuum verbose analyze pg_catalog.pg_attribute; INFO: vacuuming "pg_catalog.pg_attribute" INFO: index "pg_attribute_relid_attnam_index" now contains 9965 row versions in 181557 pages DETAIL: 1518 index row versions were removed. 181263 index pages have been deleted, 2 are currently reusable. thanks ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] SMP scaling
On Tue, Mar 15, 2005 at 07:00:25PM -0500, Bruce Momjian wrote: > Oh, you have to try CVS HEAD or a nightly snapshot. Tom made a major > change that allows scaling in SMP environments. Ok, I've done the tests comparing 8.0.1 against a snapshot from the 16th and the results are impressive. As well as the 16CPU Altix, I've done comparisons on two other 4CPU machines which previously didn't scale as well as expected. Clients1 2 3 4 6 8 12 16 32 64 --- Altix pg-8.0.1 1.00 2.02 2.98 3.97 5.87 7.23 7.51 5.54 4.68 5.10 Altix pg-20050316 1.00 1.97 2.86 3.68 5.29 6.90 9.00 9.88 10.03 9.94 AMD64 pg-8.0.1 1.00 1.87 2.77 3.34 2.73 2.57 2.58 2.62 AMD64 pg-20050316 1.00 1.95 2.84 3.69 3.61 3.66 3.70 3.69 IA64 pg-8.0.1 1.00 1.97 2.91 3.82 2.91 2.92 2.94 2.98 IA64 pg-20050316 1.00 1.98 2.95 3.87 3.80 3.78 3.86 3.90 Altix == 16x 1.6GHz Itanium2192GB memory AMD64 == 4x 2.2GHz Opteron 848 8GB memory IA64 == 4x 1.5GHz Itanium2 16GB memory The altix still only scales up to 10x rather than 16x, but that probably is the NUMA configuration taking effect now. Also this machine isn't set up to run databases, so only has 1 FC I/O card, which means a CPU can end up being 4 hops away from the memory and disk. As the database is so small (8GB), relative to the machine, the data will be on average 2 hops away. This gives an average of 72% of the speed of local memory, based on previous measurements of speed vs hops. So getting 63% of the theoretical maximum database throughput is pretty good. -Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] plpython function problem workaround
On Thu, 17 Mar 2005, Michael Fuhr wrote: On Thu, Mar 17, 2005 at 09:48:51PM -0500, Tom Lane wrote: Michael Fuhr <[EMAIL PROTECTED]> writes: Line-ending CRs stripped, even inside quotes; mid-line CRs converted to LF. Tests done with Python 2.4 on FreeBSD 4.11-STABLE; I wonder what Python on Windows would do. Unfortunately, I don't think that proves anything, because according to earlier discussion Python will do newline-munging when it reads a file (including a script file). The question that we have to deal with is what are the rules for a string fed to PyRun_String ... and it seems those rules are not the same. Marco, you've stated that you're against munging the code because "it's not our job to 'fix' data coming from the client." But I'm suggesting that we think about the code in a different way than the current implementation does: not as a literal that we pass untouched to the Python interpreter, but rather as code that Python would munge anyway if it had read that code from a file. We could still store the code exactly as received and have the language handler munge it on the fly, as we've discovered it's already doing. Comments? Have I overlooked anything? Could munging CRs have effects that a Python programmer wouldn't expect if the same code had been read from a file? Since it mimics Python's own behavior with code read from a file, can anybody justify not doing it? If you put it that way, I'm 100% with you. Just let me suggest not to mimic its behaviour, but to use the Real Thing if we manage to. That is, directly use the Universal Line Support code provided by python itself, so that we don't even have to think about being compatible. Unfortunately, I'm new to python embedding. I think I'll ask on python lists about the function definition code. Actually, we are kind of a corner case: we ask the user to provide the function body w/o the initial def declaration. We're treating partial, incomplete python code and not a well-formed program, so we have to munge it anyway. I have no idea if and how the python C API lets you control such low level details. I think what we really want is to create a callable (anonymous) object from the source of its "body". I'm experimenting a bit, trying to write a PLy_procedure_compile() that does not require source munging. I'm aiming at removing the need for extra indentation. The \r\n thing is another beast, and I'm not sure it belongs to the same place in our code. .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
I should point out that theres no garuntee our data is inserted in anything like the order we want (time desc) but there is a high correlation. Most of the time it is almost in order. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Alex Stapleton Sent: 18 March 2005 09:29 To: pgsql-general@postgresql.org Subject: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) We have a ~10million row table but are expecting it to get larger, possibly by a factor of 10 or more. The rows are quite long and fixed length (just over 500 bytes.) We have an index of (symbol, source, date) on this table and doing queries like this SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; To get the latest 1000 rows for that symbol and source. However this takes quite a while at the best of times, (1-10 seconds.) The query without the order by and the limit tends to return about 7 rows which adds up to about 30MB of data. Once the pages are in the cache they take around 100ms but this is to be expected. Unfortunately the initial query required to cache it is unnacceptably long for web application like ours. My (not yet implemented) solution to this problem is to add a SEQUENCE and index it so that by adding a WHERE id > [max_id]-1000 and ordering by time DESC will reduce the I/O quite a lot. Am I right here? It would be nice if there was a way to get PostgreSQL to try and precache the tables pages as well, is there anyway I could achieve something like that? I have toyed with creating a ramdisk to store a lookup table of sorts on (we only care about a few columns initially) to speed this up a bit but its a right pain in the arse to do by the looks of things. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
If I SET ENABLE_SEQSCAN TO OFF it uses the Index, but it does an Index scan backward if I do SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY symbol desc, source DESC, time DESC LIMIT 1000; Which is better but still quite slow. -Original Message- From: Hegyvari Krisztian [mailto:[EMAIL PROTECTED] Sent: 18 March 2005 10:25 To: Alex Stapleton; pgsql-general@postgresql.org Subject: RE: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Does not it look like the index you are actually using is on article and then PG has to filter for symbol and source? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Stapleton Sent: Friday, March 18, 2005 10:57 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Woops sorry we have indexes on (symbol, source, time) and there is no date column :/ SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; QUERY PLAN Limit (cost=321163.29..321165.79 rows=1000 width=466) (actual time=33243.924..33246.021 rows=1000 loops=1) -> Sort (cost=321163.29..321402.31 rows=95609 width=466) (actual time=33243.917..33244.626 rows=1000 loops=1) Sort Key: "time" -> Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.022..32979.685 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33258.706 ms (6 rows) explain analyze SELECT * FROM article WHERE symbol=12646 AND source = 19; QUERY PLAN Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.021..33275.433 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33320.920 ms (3 rows) We can't use CLUSTER because we need the DB up all the time. The analyze suggests that it's the I/O taking most of the time to me. -Original Message- From: Alban Hertroys [mailto:[EMAIL PROTECTED] Sent: 18 March 2005 09:48 To: Alex Stapleton Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Alex Stapleton wrote: > SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC > LIMIT 1000; > > To get the latest 1000 rows for that symbol and source. > My (not yet implemented) solution to this problem is to add a SEQUENCE and > index it so that by adding a WHERE id > [max_id]-1000 and ordering by time > DESC will reduce the I/O quite a lot. Am I right here? It would be nice if > there was a way to get PostgreSQL to try and precache the tables pages as > well, is there anyway I could achieve something like that? I have toyed with > creating a ramdisk to store a lookup table of sorts on (we only care about a > few columns initially) to speed this up a bit but its a right pain in the > arse to do by the looks of things. First question that always gets asked here: What's the output of explain analyse? Without that, people here can't see where the slowdown is. I expect though, that the problem is the ordering by time. I imagine that you could create an index on time, maybe truncated to months or something similar (You can create indices based on functions). That index alone should speed up the ordering already. It could also be used to cluster the table, which should speed up things some more, I suppose. -- Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: [EMAIL PROTECTED] W: http://www.magproductions.nl ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
Does not it look like the index you are actually using is on article and then PG has to filter for symbol and source? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Stapleton Sent: Friday, March 18, 2005 10:57 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Woops sorry we have indexes on (symbol, source, time) and there is no date column :/ SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; QUERY PLAN Limit (cost=321163.29..321165.79 rows=1000 width=466) (actual time=33243.924..33246.021 rows=1000 loops=1) -> Sort (cost=321163.29..321402.31 rows=95609 width=466) (actual time=33243.917..33244.626 rows=1000 loops=1) Sort Key: "time" -> Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.022..32979.685 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33258.706 ms (6 rows) explain analyze SELECT * FROM article WHERE symbol=12646 AND source = 19; QUERY PLAN Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.021..33275.433 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33320.920 ms (3 rows) We can't use CLUSTER because we need the DB up all the time. The analyze suggests that it's the I/O taking most of the time to me. -Original Message- From: Alban Hertroys [mailto:[EMAIL PROTECTED] Sent: 18 March 2005 09:48 To: Alex Stapleton Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Alex Stapleton wrote: > SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC > LIMIT 1000; > > To get the latest 1000 rows for that symbol and source. > My (not yet implemented) solution to this problem is to add a SEQUENCE and > index it so that by adding a WHERE id > [max_id]-1000 and ordering by time > DESC will reduce the I/O quite a lot. Am I right here? It would be nice if > there was a way to get PostgreSQL to try and precache the tables pages as > well, is there anyway I could achieve something like that? I have toyed with > creating a ramdisk to store a lookup table of sorts on (we only care about a > few columns initially) to speed this up a bit but its a right pain in the > arse to do by the looks of things. First question that always gets asked here: What's the output of explain analyse? Without that, people here can't see where the slowdown is. I expect though, that the problem is the ordering by time. I imagine that you could create an index on time, maybe truncated to months or something similar (You can create indices based on functions). That index alone should speed up the ordering already. It could also be used to cluster the table, which should speed up things some more, I suppose. -- Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: [EMAIL PROTECTED] W: http://www.magproductions.nl ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] plpython function problem workaround
On Thu, 17 Mar 2005, Tom Lane wrote: Martijn van Oosterhout writes: On Thu, Mar 17, 2005 at 01:03:36PM +0100, Marco Colombo wrote: OMG! It's indenting the funtion body. I think you can't do that w/o being syntax-aware. I'm not familiar with the code, why is it adding a 'def' in front of it at all? I undestand that once you do it you'll have to shift the code by an indentation level. Presumbly because it wants to create a function, which can later be called. Since python is sensetive to whitespace it has to indent the code to make it work. Seems like we have to upgrade that thing to have a complete understanding of Python lexical rules --- at least enough to know where the line boundaries are. Which is pretty much exactly the same as knowing which CRs to strip out. So I guess we have a candidate place for a solution. Anyone want to code it up? I don't know enough Python to do it ... I'm no expert but I'll look into it. Unless someone else already tried it, I want to investigate first if it's possible to create a callable object w/o using 'def', which alters the name space and (the thing we're interested to) needs an extra identation level. At first sight, what we do now (at function creation time) is: 1) execute a function definition; 2) compile a function call, and save the resulting code object for later use. I'm wondering if we can save one step, and use a python callable object. .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
On Fri, Mar 18, 2005 at 09:29:06AM -, Alex Stapleton wrote: > We have a ~10million row table but are expecting it to get larger, possibly > by a factor of 10 or more. The rows are quite long and fixed length (just > over 500 bytes.) > > We have an index of (symbol, source, date) on this table and doing queries > like this > > SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC > LIMIT 1000; > > To get the latest 1000 rows for that symbol and source. > > However this takes quite a while at the best of times, (1-10 seconds.) The > query without the order by and the limit tends to return about 7 rows > which adds up to about 30MB of data. Once the pages are in the cache they > take around 100ms but this is to be expected. Unfortunately the initial > query required to cache it is unnacceptably long for web application like > ours. I think the normal approach for this is an index on (symbol,source,time). You may need to change the query to: SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY symbol desc, source DESC, time DESC LIMIT 1000; The EXPLAIN ANALYZE output would also be very helpful... Hope this helps, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpZLcFIL16fX.pgp Description: PGP signature
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches)
Woops sorry we have indexes on (symbol, source, time) and there is no date column :/ SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; QUERY PLAN Limit (cost=321163.29..321165.79 rows=1000 width=466) (actual time=33243.924..33246.021 rows=1000 loops=1) -> Sort (cost=321163.29..321402.31 rows=95609 width=466) (actual time=33243.917..33244.626 rows=1000 loops=1) Sort Key: "time" -> Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.022..32979.685 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33258.706 ms (6 rows) explain analyze SELECT * FROM article WHERE symbol=12646 AND source = 19; QUERY PLAN Seq Scan on article (cost=0.00..301724.00 rows=95609 width=466) (actual time=0.021..33275.433 rows=42959 loops=1) Filter: ((symbol = 12646) AND (source = 19)) Total runtime: 33320.920 ms (3 rows) We can't use CLUSTER because we need the DB up all the time. The analyze suggests that it's the I/O taking most of the time to me. -Original Message- From: Alban Hertroys [mailto:[EMAIL PROTECTED] Sent: 18 March 2005 09:48 To: Alex Stapleton Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Select Last n Rows Matching an Index Condition (and caches) Alex Stapleton wrote: > SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC > LIMIT 1000; > > To get the latest 1000 rows for that symbol and source. > My (not yet implemented) solution to this problem is to add a SEQUENCE and > index it so that by adding a WHERE id > [max_id]-1000 and ordering by time > DESC will reduce the I/O quite a lot. Am I right here? It would be nice if > there was a way to get PostgreSQL to try and precache the tables pages as > well, is there anyway I could achieve something like that? I have toyed with > creating a ramdisk to store a lookup table of sorts on (we only care about a > few columns initially) to speed this up a bit but its a right pain in the > arse to do by the looks of things. First question that always gets asked here: What's the output of explain analyse? Without that, people here can't see where the slowdown is. I expect though, that the problem is the ordering by time. I imagine that you could create an index on time, maybe truncated to months or something similar (You can create indices based on functions). That index alone should speed up the ordering already. It could also be used to cluster the table, which should speed up things some more, I suppose. -- Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: [EMAIL PROTECTED] W: http://www.magproductions.nl ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Select Last n Rows Matching an Index Condition (and
Alex Stapleton wrote: SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; To get the latest 1000 rows for that symbol and source. My (not yet implemented) solution to this problem is to add a SEQUENCE and index it so that by adding a WHERE id > [max_id]-1000 and ordering by time DESC will reduce the I/O quite a lot. Am I right here? It would be nice if there was a way to get PostgreSQL to try and precache the tables pages as well, is there anyway I could achieve something like that? I have toyed with creating a ramdisk to store a lookup table of sorts on (we only care about a few columns initially) to speed this up a bit but its a right pain in the arse to do by the looks of things. First question that always gets asked here: What's the output of explain analyse? Without that, people here can't see where the slowdown is. I expect though, that the problem is the ordering by time. I imagine that you could create an index on time, maybe truncated to months or something similar (You can create indices based on functions). That index alone should speed up the ordering already. It could also be used to cluster the table, which should speed up things some more, I suppose. -- Alban Hertroys MAG Productions T: +31(0)53 4346874 F: +31(0)53 4346876 E: [EMAIL PROTECTED] W: http://www.magproductions.nl ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] Select Last n Rows Matching an Index Condition (and caches)
We have a ~10million row table but are expecting it to get larger, possibly by a factor of 10 or more. The rows are quite long and fixed length (just over 500 bytes.) We have an index of (symbol, source, date) on this table and doing queries like this SELECT * FROM article WHERE symbol=12646 AND source = 19 ORDER BY time DESC LIMIT 1000; To get the latest 1000 rows for that symbol and source. However this takes quite a while at the best of times, (1-10 seconds.) The query without the order by and the limit tends to return about 7 rows which adds up to about 30MB of data. Once the pages are in the cache they take around 100ms but this is to be expected. Unfortunately the initial query required to cache it is unnacceptably long for web application like ours. My (not yet implemented) solution to this problem is to add a SEQUENCE and index it so that by adding a WHERE id > [max_id]-1000 and ordering by time DESC will reduce the I/O quite a lot. Am I right here? It would be nice if there was a way to get PostgreSQL to try and precache the tables pages as well, is there anyway I could achieve something like that? I have toyed with creating a ramdisk to store a lookup table of sorts on (we only care about a few columns initially) to speed this up a bit but its a right pain in the arse to do by the looks of things. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] pg_dump large-file support > 16GB
On Thu, 2005-03-17 at 10:41 -0700, Aly Dharshi wrote: Hello > Would it help to use a different filesystem like SGI's XFS ? I do not see the connection between this problem and using another filesystem. I think we would have this problem with all the programs in the system if we had a problem with the filesystem we are using. > Would it be > possible to even implement that at you site at this stage ? > We can not do this if we want support from our "operative system departament", they do not support XFS at the present. -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ signature.asc Description: This is a digitally signed message part
Re: [GENERAL] pg_dump large-file support > 16GB
On Thu, 2005-03-17 at 10:17 -0500, Tom Lane wrote: > Rafael Martinez Guerrero <[EMAIL PROTECTED]> writes: > > We are trying to dump a 30GB+ database using pg_dump with the --file > > option. In the beginning everything works fine, pg_dump runs and we get > > a dumpfile. But when this file becomes 16GB it disappears from the > > filesystem, pg_dump continues working without giving an error until it > > finnish (even when the file does not exist)(The filesystem has free > > space). > > Is that a plain text, tar, or custom dump (-Ft or -Fc)? Is the behavior > different if you just write to stdout instead of using --file? > > regards, tom lane - In this example, it is a plain text (--format=p). - If I write to stdout and redirect to a file, the dump finnish without problems and I get a dump-text-file over 16GB without problems. -- Rafael Martinez, <[EMAIL PROTECTED]> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ signature.asc Description: This is a digitally signed message part