Re: [HACKERS] On columnar storage
On Sun, Jun 14, 2015 at 10:30 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: Are you looking to avoid all hardware-based limits, or would using a 64 bit row pointer be possible? That would give you 2^64 or 1.8 E19 unique rows over whatever granularity/uniqueness you use (per table, per database, etc.) -- Mike Nolan. I don't think the number of tuples is the main problem here, it's the number of pages a single relation can have. Looking at the numbers of rows as a direct function of TID size is misleading, because the TID is split into two fixed parts - page number (32b) and tuple number (16b). For the record, 2^48 is 281,474,976,710,656 which ought to be enough for anybody, but we waste large part of that because we assume there might be up to 2^16 tuples per page, although the actual limit is way lower (~290 for 8kB pages, and ~1200 for 32kB pages. So we can only have ~4 billion pages, which is where the 32TB limit comes from (with 32kB pages it's 128TB). Longer TIDs are one a straightforward way to work around this limit, assuming you add the bits to the 'page number' field. Adding 16 bits (thus using 64-bit pointers) would increase the limit 2^16-times to about 2048 petabytes (with 8kB pages). But that of course comes with a cost, because you have to keep those larger TIDs in indexes etc. Another option might be to split the 48 bits differently, by moving 5 bits to the page number part of TID (so that we expect ~2048 tuples per page at most). That'd increase the limit to 1PB (4PB with 32kB pages). The column store approach is somehow orthogonal to this, because it splits the table vertically into multiple pieces, each stored in a separate relfilenode and thus using a separate sequence of page numbers. And of course, the usual 'horizontal' partitioning has a very similar effect (separate filenodes). regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services Thanks for the reply. It's been a while since my last data structures course (1971), but I do remember a few things. I have never personally needed more than 1500 columns in a table, but can see how some might. Likewise, the 32TB limit hasn't affected me yet, either. I doubt either ever will. Solving either or both of those seems like it may at some point require a larger bit space for (at least some) TIDs, which is why I was wondering if a goal here is to eliminate all (practical) limits, It probably doesn't make sense to force all users to use that large bit space (with the associated space and performance penalties) If there's a way to do this, then you are all truly wizards. (This all reminds me of how the IP4 bit space was parcelled up into Class A, B, C and D addresses, at a time when people thought 32 bits would last us forever. Maybe 128 bits actually will.) -- Mike Nolan
Re: [HACKERS] On columnar storage
On Thu, Jun 11, 2015 at 7:03 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: We hope to have a chance to discuss this during the upcoming developer unconference in Ottawa. Here are some preliminary ideas to shed some light on what we're trying to do. I've been trying to figure out a plan to enable native column stores (CS or colstore) for Postgres. Motivations: * avoid the 32 TB limit for tables * avoid the 1600 column limit for tables * increased performance Are you looking to avoid all hardware-based limits, or would using a 64 bit row pointer be possible? That would give you 2^64 or 1.8 E19 unique rows over whatever granularity/uniqueness you use (per table, per database, etc.) -- Mike Nolan.
Re: [HACKERS] pg_xlog - pg_xjournal?
Why not take a simpler approach and create a zero length file in directories that should not be fiddled with by non-experts using a file name something like DO.NOT.DELETE.THESE.FILES? No, it won't prevent the incredibly stupid from doing incredibly stupid things, nothing will. -- Mike Nolan
[HACKERS] 9.3 RC1 psql encoding reporting inconsistently?
This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding as SQL_ASCII and \set report it as UTF8? psql (9.3rc1) Type help for help. postgres=# \l List of databases Name Owner Encoding Collate Ctype Access privileges - - --- - - postgres postgres SQL_ASCII C C template0 postgres SQL_ASCII C C =c/postgres + postgres=CTc/postgres template1 postgres SQL_ASCII C C =c/postgres + postgres=CTc/postgres (3 rows) postgres=# \set AUTOCOMMIT = 'on' ECHO = 'queries' PROMPT1 = '%/%R%# ' PROMPT2 = '%/%R%# ' PROMPT3 = ' ' VERBOSITY = 'default' VERSION = 'PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27), 32-bit' DBNAME = 'postgres' USER = 'postgres' PORT = '5432' ENCODING = 'UTF8' postgres=# -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 10/12/12, Josh Berkus j...@agliodbs.com wrote: I realize you weren't around when we removed row OIDs, but I was *still* getting flack from that in 2008. And we lost entire OSS projects to other databases because of removing row OIDs. And those were marked deprecated for 3 years before we removed them. FWIW, the documentation for 9.2 still mentions OIDs and the 'default_with_oids' parameter, in what release was it announced they would be removed and in what release were they removed? -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Mon, Jun 11, 2012 at 4:47 PM, Peter Eisentraut pete...@gmx.net wrote: On sön, 2012-06-10 at 17:24 -0400, Robert Haas wrote: and also affects the naming of any UNIX sockets created. Why would that matter? If you configure M ports and N Unix socket locations, you get M*N actual sockets created. ...I *seriously* doubt that this is the behavior anyone wants. Creating M sockets per directory seems patently silly. How else would it work? If I say, syntax aside, listen on ports 5432 and 5433, and use socket directories /tmp and /var/run/postgresql, then a libpq-using client would expect to be able to connect using -h /tmp -p 5432 -h /tmp -p 5433 -h /var/run/postgresql -p 5432 -h /var/run/postgresql -p 5433 So you do need to create M*N sockets. I don't really see a problem with that. What about entries in pg_hba.conf? Will they need to be able to specify both the directory and the port number? -- Mike Nolan
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
On 6/2/12, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On the other hand, if we simply say PostgreSQL computes the replication delay by subtracting the time at which the WAL was generated, as recorded on the master, from the time at which it is replayed by the slave then, hey, we still have a wart, but it's pretty clear what the wart is and how to fix it, and we can easily document that. Again, if we could get rid of the failure modes and make this really water-tight, I think I'd be in favor of that, but it seems to me that we are in the process of expending a lot of energy and an even larger amount of calendar time to create a system that will misbehave in numerous subtle ways instead of one straightforward one. I don't see that as a good trade. Well, okay, but let's document if you use this feature, it's incumbent on you to make sure the master and slave clocks are synced. We recommend running NTP. or words to that effect. What if the two servers are in different time zones? -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem/bug in drop tablespace?
On 5/11/12, Albe Laurenz laurenz.a...@wien.gv.at wrote: Michael Nolan wrote: I see one potential difference between your results and mine. When I rebuild the tablespace, I wind up with the same filename/OID as before, I'm not sure you do. Right. That's strange. Usually OIDs get incremented, so you shouldn't end up with the same OID for the new tablespace. Can you provide a complete testcase? I thought I had, until you were unable to reproduce it. :-) -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem/bug in drop tablespace?
On 5/11/12, Michael Nolan htf...@gmail.com wrote: On 5/11/12, Albe Laurenz laurenz.a...@wien.gv.at wrote: Michael Nolan wrote: I see one potential difference between your results and mine. When I rebuild the tablespace, I wind up with the same filename/OID as before, I'm not sure you do. Right. That's strange. Usually OIDs get incremented, so you shouldn't end up with the same OID for the new tablespace. Can you provide a complete testcase? I thought I had, until you were unable to reproduce it. :-) -- Mike Nolan My plan at this point is to wait until beta 1 of 9.2 is out, then see if I can reproduce the problem there. -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem/bug in drop tablespace?
On Fri, May 11, 2012 at 10:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the question to me is exactly how much good it will do to stop deletion of the pg_tablespace entry, if the underlying files are gone. I'm having a hard time getting excited about expending cycles on that. There could be multiple reasons why the underlying files are not there, such as a filesystem that isn't currently mounted for some reason. It seems prudent to throw an error on drop tablespace if there are references to that tablespace in the catalog, or perhaps require a 'force' clause to override any errors, but it probably isn't something most DBAs would run into very often. Thanks for figuring it out, Tom. -- MIke Nolan
Re: [HACKERS] problem/bug in drop tablespace?
On 5/9/12, Albe Laurenz laurenz.a...@wien.gv.at wrote: I cannot reproduce this on 9.1.3: Odd, I've tried it another two times, with similar results to my initial post. Here's what I get starting with the point where I deleted the files in the tablespace: mytest=# select * from mytable; select * from mytable; ERROR: could not open file pg_tblspc/289477766/PG_9.1_201105231/289477763/289477785: No such file or directory mytest=# \d mytable Table public.mytable Column Type Modifiers -- --- - id integer not null valtext Indexes: mytable_pkey PRIMARY KEY, btree (id), tablespace mytblspc mytest=# drop tablespace mytblspc; drop tablespace mytblspc; WARNING: could not open directory pg_tblspc/289477766/PG_9.1_201105231: No such file or directory DROP TABLESPACE Time: 16.460 ms mytest=# \d mytable Table public.mytable Column Type Modifiers -- --- - id integer not null valtext Indexes: mytable_pkey PRIMARY KEY, btree (id) mytest=# create tablespace mytblspc location '/home/postgres/mytb'; create tablespace mytblspc location '/home/postgres/mytb'; CREATE TABLESPACE Time: 42.396 ms mytest=# \d mytable Table public.mytable Column Type Modifiers -- --- - id integer not null valtext Indexes: mytable_pkey PRIMARY KEY, btree (id) mytest=# reindex table mytable; reindex table mytable; REINDEX Time: 112.981 ms mytest=# \d mytable Table public.mytable Column Type Modifiers -- --- - id integer not null valtext Indexes: mytable_pkey PRIMARY KEY, btree (id) Here's what's in the mytb directory now: [postgres@romaine PG_9.1_201105231]$ ls -lR : total 4 drwx--. 2 postgres postgres 4096 May 9 13:22 289477763 ./289477763: total 16 -rw---. 1 postgres postgres 16384 May 9 13:22 289477790 It appears that the index has been rebuilt in the mytblspc tablespace, though \d mytable does not show that. I get the same results whether I rebuild the specific index as you did or reindex the table, as I did. I'm running on 9.1.3 built from the source code, not a distribution. -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem/bug in drop tablespace?
I see one potential difference between your results and mine. When I rebuild the tablespace, I wind up with the same filename/OID as before, I'm not sure you do. -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem/bug in drop tablespace?
The last portion of my original post got edited out by mistake. The tests I ran were on version 9.1.3, running Fedora 14, kernel 2.6.35.14-106.fc14-i686. It seems to me that DROP TABLESPACE should check to see if there are references in the system catalog to the tablespace before dropping it, not just that the tablespace itself is empty. That way it would have thrown an error when I tried to drop the tablespace. A somewhat separate issue is what to do when a tablespace is inaccessible, such as due to a disk failure. The thread on -general that prompted my tests was a relatively easy one to suggest how to repair, because the lost tablespace only had indexes in it. But that's not exactly a -hackers issue, more of a question of better backup protocols. -- Mike Nolan
[HACKERS] problem/bug in drop tablespace?
While researching a problem reported on the -general list by a user who lost a disk containing his index tablespace, I ran into something, but I'm not sure is a serious bug or just an inconsistency in how \d shows tables. Here are the steps I took. 1. Create a new database 'MYDB' and connect to it. 2. Create a new tablespace 'MYTBLSP' 3. Create a table 'MYTABLE' and populate it. 4. Create an index 'MYIND' on that table, with the index in the new tablespace, MYTBLSP. Now, exit psql and delete the files in the tablespace directory created in step 2, simulating the problem the user had. Trying to execute an SQL command on the table MYTABLE will, as expected, generate an error. Now, drop tablespace MYTBLSP. This will produce an error, but it will delete the tablespace according to \db. Recreate tablespace MYTBLSP. Regenerate the index on MYTABLE. Queries will work on this table again, as expected. Now, here's the problem I ran into: The index will be rebuilt in tablespace MYTBLSP, but \d on table MYTABLE will not show the index as being in that tablespace. -- Mike Nolan
Re: [HACKERS] Temporary tables under hot standby
What is the use case for temporary tables on a hot standby server? Perhaps this is a noobie question, but it seems to me that a hot standby server's use by* applications* or *users* should be limited to transactions that don't alter the database in any form. However, I can see where temporary tables might be needed at the system level (if not already available) in order to prepare more efficient plans for some complex read-only queries. -- Mike Nolan
Re: [HACKERS] Slow temporary tables when using sync rep
On Mon, Apr 16, 2012 at 6:27 PM, Thom Brown t...@linux.com wrote: Hi, I've noticed that when using synchronous replication (on 9.2devel at least), temporary tables become really slow: Since temporary tables are only present until the session ends (or possibly only until a commit), why are they replicated at all? BTW, should we have an entry in the index for 'temporary tables? -- Mike Nolan
Re: [HACKERS] [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?
On 4/11/12, 乔志强 qiaozhiqi...@leadcoretech.com wrote: Yes, increase wal_keep_segments. Even if you set wal_keep_segments to 64, the amount of disk space for WAL files is only 1GB, so there is no need to worry so much, I think. No? But when a transaction larger than 1GB... Then you may need WAL space larger than 1GB as well. For replication to work, it seems likely that you may need to have sufficient WAL space to handle a row, possibly the entire transaction.. But since a single statement can update thousands or millions of rows, do you always need enough WAL space to hold the entire transaction? So in sync streaming replication, if master delete WAL before sent to the only standby, all transaction will fail forever, the master tries to avoid a PANIC error rather than termination of replication. but in sync replication, termination of replication is THE bigger PANIC error. That's somewhat debatable. Would I rather have a master that PANICED or a slave that lost replication? I would choose the latter. A third option, which may not even be feasible, would be to have the master fail the transaction if synchronous replication cannot be achieved, although that might have negative consequences as well. Another question: Does master send WAL to standby before the transaction commit ? That's another question for the core team, I suspect. A related question is what happens if there is a rollback? -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?
On 4/11/12, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Michael Nolan htf...@gmail.com wrote: On 4/11/12, 乔志强 qiaozhiqi...@leadcoretech.com wrote: But when a transaction larger than 1GB... Then you may need WAL space larger than 1GB as well. For replication to work, it seems likely that you may need to have sufficient WAL space to handle a row, possibly the entire transaction.. But since a single statement can update thousands or millions of rows, do you always need enough WAL space to hold the entire transaction? No. Does master send WAL to standby before the transaction commit ? Yes. A related question is what happens if there is a rollback? PostgreSQL doesn't use a rollback log; WAL files can be reclaimed as soon as the work they represent has been persisted to the database by a CHECKPOINT, even if it is not committed. Because there can be multiple versions of each row in the base table, each with its own xmin (telling which transaction committed it) and xmax (telling which transaction expired it) visibiliity checking can handle the commits and rollbacks correctly. It also uses a commit log (CLOG), hint bits, and other structures to help resolve visibility. It is a complex topic, but it does work. Thanks, Kevin. That does lead to a question about the problem that started this thread, though. How does one determine how big the WAL space needs to be to not cause streaming replication to fail? Or maybe this is a bug after all? -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?
On 4/11/12, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Apr 11, 2012 at 3:31 PM, 乔志强 qiaozhiqi...@leadcoretech.com wrote: So in sync streaming replication, if master delete WAL before sent to the only standby, all transaction will fail forever, the master tries to avoid a PANIC error rather than termination of replication. but in sync replication, termination of replication is THE bigger PANIC error. I see your point. When there are backends waiting for replication, the WAL files which the standby might not have received yet must not be removed. If they are removed, replication keeps failing forever because required WAL files don't exist in the master, and then waiting backends will never be released unless replication mode is changed to async. This should be avoided. To fix this issue, we should prevent the master from deleting the WAL files including the minimum waiting LSN or bigger ones. I'll think more and implement the patch. With asynchonous replication, does the master even know if a slave fails because of a WAL problem? And does/should it care? Isn't there a separate issue with synchronous replication? If it fails, what's the appropriate action to take on the master? PANICing it seems to be a bad idea, but having transactions never complete because they never hear back from the synchronous slave (for whatever reason) seems bad too. -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What Would You Like To Do?
The lists all seem to be focusing on the things that the developers would like to add to PostgreSQL, what about some things that users or ISPs might like to have, and thus perhaps something that companies might actually see as worth funding? For example: A fully integrated ability to query across multiple databases,possibly on multiple servers, something Oracle has had for nearly two decades. Complete isolation at the user level, allowing an ISP to support multiple independent customers on a server without having to fiddle with multiple back ends each running on a separate port, a feature that MySQL has had for as far back as I can recall, and one of the reasons ISPs are more likely to offer MySQL than PostgreSQL. The ability to restore a table from a backup file to a different table name in the same database and schema. A built-in report writer, capable of things like column totals. (SqlPlus has this, even though it isn't very pretty.) -- Mike Nolan
Re: [HACKERS] What Would You Like To Do?
On Tue, Sep 13, 2011 at 12:26 PM, Joshua D. Drake j...@commandprompt.comwrote: On 09/13/2011 10:13 AM, Michael Nolan wrote: The lists all seem to be focusing on the things that the developers would like to add to PostgreSQL, what about some things that users or ISPs might like to have, and thus perhaps something that companies might actually see as worth funding? Well just my own two cents ... but it all depends on who is doing the funding. At this point 80% of the work CMD codes for Pg (or tertiary projects and modules) is funded by companies. So let's not assume that companies aren't funding things. They are. But perhaps if a few 'commercial' features were on the wish list there would be more companies willing to fund development? The developers get a bit of what they want to work on, the production users get a bit of what they need, everybody's happy. For example: A fully integrated ability to query across multiple databases,possibly on multiple servers, something Oracle has had for nearly two decades. That isn't the approach to take. The fact that Oracle has it is not a guarantee that it is useful or good. If you need to query across databases (assuming within the same cluster) then you designed your database wrong and should have used our SCHEMA support (what Oracle calls Namespaces) instead. This is the difference between developers and real world users. Real world users may not have the ability, time or resources to redesign their databases just because that's the 'best' way to do something. Will it be the most efficient way to do it? Almost certainly not. I've been involved in a few corporate mergers, and there was a short term need to do queries on the combined databases while the tiger team handling the IT restructuring figured out how (or whether) to merge the dabases together. (One of these happened to be an Oracle/Oracle situation, it was a piece of cake even though the two data centers were 750 miles apart and the table structures had almost nothing in common. Another was a two week headache, the third was even worse!) In a perfect world, it would be nice if one could do combined queries linking a PostgreSQL database with an Oracle one, or a MySQL one, too. Because sometimes, that's what you gotta do. Even something that is several hundred times slower is going to be faster than merging the databases together. When I do this today, I have to write a program (in perl or php) that accesses both databases and merges it by hand. The ability to restore a table from a backup file to a different table name in the same database and schema. This can be done but agreed it is not intuitive. Can you elaborate on tha a bit, please? The only way I've been able to do it is to edit the dump file to change the table name. That's not very practical with a several gigabyte dump file, even less so with one that is much larger. If this capability already exists, is it documented? (SqlPlus has this, even though it isn't very pretty.) A built-in report writer, capable of things like column totals. There are a billion and one tools that do this without us having to reinvent the wheel. Why would we support that? There are other databases out there, too, why reinvent the wheel by working on PostgreSQL? :-) The question shoud be, would this be USEFUL? -- Mike Nolan
Re: [HACKERS] What Would You Like To Do?
On Tue, Sep 13, 2011 at 2:55 PM, Joshua D. Drake j...@commandprompt.comwrote: On 09/13/2011 11:51 AM, Michael Nolan wrote: The ability to restore a table from a backup file to a different table name in the same database and schema. This can be done but agreed it is not intuitive. Can you elaborate on tha a bit, please? The only way I've been able to do it is to edit the dump file to change the table name. That's not very practical with a several gigabyte dump file, even less so with one that is much larger. If this capability already exists, is it documented? You use the -Fc method, extract the TOC and edit just the TOC (so you don't have to edit a multi-gig file) That is, at best, a bit obscure. I've wondered at times if the -f tar option would have any benefits here, though it appears to have significant downsides. A downside of either method may be that I can't predict in advance when I will want to do a restore of a single table from a backup file, so I'd have to always use that method of generating the file. I did propose an extension to pg_restore a couple of months ago to add an option to re-name a table as it is restored, but that seemed to have generated no interest. Maybe an external tool that reads a pg_dump file looking for a specific table and writes that portion of the dump file to a separate file, changing the table name would be easier? It'd probably have to handle most of or all of the different pg_dump formats, but that doesn't sound like an unachievable goal. -- Mike Nolan
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
On Fri, Jul 8, 2011 at 10:27 PM, Robert Haas robertmh...@gmail.com wrote: But if that's what you want, just don't put your data in different databases in the first place. That's what schemas are for. Sadly, DBAs don't always have the ability to put all their data in one database, even if that is what schemas are for. The ability to do cross-database (most likely cross-server as well) queries would address a lot of real-world problems. - Mike Nolan no...@tssi.com
Re: [HACKERS] generate_series() Interpretation
On Mon, Jun 27, 2011 at 1:38 PM, David E. Wheeler da...@kineticode.comwrote: Yeah, which is why I said it was subject to interpretation. Of course there's no way to tell generate_series() which to use, which is what I figured. generate_series() is doing exactly what it was designed to do, the imprecision regarding adding '1 month' to something that may or may not have been intended to be 'last day of the month' is a limitation in the interval code. One way to change this would be to implement another interval type such as 'full_month' which would take a date that is know to be the last day of the month and make it the last day of the appropriate month. If the starting date is NOT the last day of a month, the existing logic would suffice. Or you can do as I have done and create your own last_day() function that takes any date and makes it the last day of that month, and apply it to the output of generate_series(); -- Mike Nolan no...@tssi.com
Re: [HACKERS] New/Revised TODO? Gathering actual read performance data for use by planner
On Wed, May 25, 2011 at 11:18 AM, Robert Haas robertmh...@gmail.com wrote: I basically agree. There have been several recent discussions of this topic on both -hackers and -performance; it is likely that the TODO needs to be updated with some more recent links. Anything to help the NKOTB to get up to speed would be appreciated, though I still think it is not just a 'caching' issue. The question I hesitated to ask in Ottawa was: So, what information would you like and what would you do with it? -- Mike Nolan
[HACKERS] New/Revised TODO? Gathering actual read performance data for use by planner
In the TODO list is this item: *Modify the planner to better estimate caching effects * Tom mentioned this in his presentation at PGCON, and I also chatted with Tom about it briefly afterwards. Based on last year's discussion of this TODO item, it seems thoughts have been focused on estimating how much data is being satisfied from PG's shared buffers. However, I think that's only part of the problem. Specifically, read performance is going to be affected by: 1. Reads fulfilled from shared buffers. 2. Reads fulfilled from system cache. 3. Reads fulfilled from disk controller cache. 4. Reads from physical media. #4 is further complicated by the type of physical media for that specific block. For example, reads that can be fulfilled from a SSD are going to be much faster than ones that access hard drives (or even slower types of media.) System load is going to impact all of these as well. Therefore, I suggest that an alternative to the above TODO may be to gather performance data without knowing (or more importantly without needing to know) which of the above sources fulfilled the read. This data would probably need to be kept separately for each table or index, as some tables or indexes may be mostly or fully in cache or on faster physical media than others, although in the absence of other data about a specific table or index, data about other relations in the same tablespace might be of some use. Tom mentioned that the cost of doing multiple system time-of-day calls for each block read might be prohibitive, it may also be that the data may also be too coarse on some systems to be truly useful (eg, the epoch time in seconds.) If this data were available, that could mean that successive plans for the same query could have significantly different plans (and thus actual performance), based on what has happened recently, so these statistics would have to be relatively short term and updated frequently, but without becoming computational bottlenecks. The problem is one I'm interested in working on. -- Mike Nolan
Re: [HACKERS] Coding style question
I think Tom stated it pretty well: When the variable is going to be set anyway in straight-line code at the top of the function, then it's mostly a matter of taste whether you set it with an initializer or an assignment. the key phrase is: "set anyway in straigh-tline code at the top of the function" (I don't go so far as to introduce artificial scopes just for the sake of nesting variable declarations). I don't introduce artificial scopes either. However, I do try to declare variables in the most-tightly-enclosing scope. For example, if a variable is only used in one branch of an if statement, declare the variable inside that block, not in the enclosing scope. good... This may not inform the current conversation at all, but a while back I went on a cross-compiler compatibility binge for all of my active projects, and I found that some compilers (*cough* Borland *cough) had some very strange compiler/run time errors unless all variables were declared at the top of the function, before any other code gets executed. For better or for worse, I started strictly declaring all variables in this manner, with initialization happening afterward, and the behavior has stuck with me. I don't know whether any compilers used for postgres builds still have this issue - it's been a few years. I also find that if you're declaring a lot of variables in a single block, that's usually a sign that the block is too large and should be refactored (e.g. by moving some code into separate functions). If you keep your functions manageably small (which is not always the case in the Postgres code, unfortunately), the declarations are usually pretty clearly visible. I couldn't agree more. Insert emphatic agreement here. Refactoring into smaller functions or doing a bit of object orientation almost always solves that readability problem for me. -- Nolan Cafferky Software Developer IT Department RBS Interactive [EMAIL PROTECTED]
Re: [HACKERS] Syntax bug? Group by?
Mark Woodward wrote: select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15; I still assert that there will always only be one row to this query. This is an aggregate query, so all the rows with ycis_id = 15, will be aggregated. Since ycis_id is the identifying part of the query, it should not need to be grouped. My question, is it a syntactic technicality that PostgreSQL asks for a group by, or a bug in the parser? I think your point is that every non-aggregate column in the results of the query also appears in the where clause and is given a single value there, so conceivably, an all-knowing, all-powerful postgres could recognize this and do the implied GROUP by on these columns. I'm not in a position to give a definitive answer on this, but I suspect that adjusting the query parser/planner to allow an implied GROUP BY either gets prohibitively complicated, or fits too much of a special case to be worth implementing. select ycis_id, some_other_id, min(tindex), avg(tindex) from y where ycis_id = 15 group by some_other_id; Here, postgres would have to use the group by you specified, and also recognize the single-valued constant assigned to ycis_id. Maybe not too bad, but: select ycis_id, some_other_id, min(tindex), avg(tindex) from y where ycis_id = some_single_valued_constant(foo, bar) group by some_other_id; In this case, postgres doesn't know whether some_single_valued_constant() will really return the same single value for every tuple. Ultimately, as more complex queries are introduced, it would become a lot simpler for the query writer to just specify the group by columns instead of trying to guess it from the where clause. Final note: I could also see situations where an implied group by would silently allow a poorly written query to execute, instead of throwing an error that suggests to the query writer that they did something wrong. -- Nolan Cafferky Software Developer IT Department RBS Interactive [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] status of dbf2pg
I received the following note from the original author of dbf2pg: Date: Tue, 05 Aug 2003 18:43:22 +0400 From: Maarten Boekhold [EMAIL PROTECTED] Subject: Re: status of dbf2pg To: [EMAIL PROTECTED] On 08/03/2003 06:55:01 AM nolan wrote: What is the status of dbf2pg. There do not appear to have been any updates to it in quite some time. No status. I created this around 1995/96 and haven't looked at it since. There is a dbf2pg package on debian that appears to have a higher version number, but I don't know what to do with a .deb file extension and I'm getting ready to go out of town so I won't have time to look into it until mid-August at the earliest. -- Mike Nolan ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Who is maintaining the dbf2pg contrib utility?
Is anyone actively maintaining the dbase/dbf2pg utility in the contrib library? I have a couple of patches to fix bugs and one feature enhancement to suggest, but I'm not sure of the proper procedure for a contrib package. I've tried contacting the author at the address given in the package ([EMAIL PROTECTED]) with no response, but maybe that's an old address. -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Make clean fails
Yep, that solved both the make clean and the coredump problems. Is that piece of information in the developers FAQ anywhere? It's in the how to use CVS instructions ... So it is. I probably read that before I got CVS working here, and it isn't mentioned (or that section of the docs referenced) in the CVS section in the Developer's FAQ. I found both sections insufficient for me to get CVS working here, and since I am thinking about using it for another project I picked up a copy of 'ESSENTIAL CVS' to fill in some of the gaps in my knowledge. Just part of the baptism of fire for a newbie, I guess. :-) -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] make clean fails (more info)
Sorry if I didn't mention it before, but this was on FreeBSD. -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] make check createdb fails
More problems with the latest CVS on FreeBSD: make[3]: Leaving directory `/home/nolan/beta/pgsql/contrib/spi' /bin/sh ./pg_regress --temp-install --top-builddir=../../.. --schedule=./paralle l_schedule --multibyte=SQL_ASCII == removing existing temp installation== == creating temporary installation== == initializing database system == == starting postmaster== Segmentation fault (core dumped) Segmentation fault (core dumped) Segmentation fault (core dumped) -- Mike Nolan ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Make clean fails
[EMAIL PROTECTED] writes: Make clean fails on an up-to-date CVS (as of 2:40 PM CDT on Sunday). make: *** tsearch2: No such file or directory. Stop. I suspect you forgot -d in your cvs update commands. You really need both -d and -P to make cvs update behave reasonably ... I have no idea why they are not the default behavior. Yep, that solved both the make clean and the coredump problems. Is that piece of information in the developers FAQ anywhere? -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] did you read my mails ?
This allows functions to read/write operating system files. Is this of interest to anyone? Is this something that requires untrusted status, as it would if I wrote a function in perl to do the same thing? -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] initcap incompatibility issue
It sounds like Oracle is simply regexing for anything that ISN'T a letter to initcap right after it. If that's the case, you could just regex too. Or more likely, use the appropriate ctype.h function (isalpha, probably). Having tested it, Oracle capitalizes after all non-alphanumeric characters, so !isalnum() is the appropriate function. (That makes it a one-line patch on 7.3.3, which I've already tested.) AFAIK, our specification for this function is be like Oracle, so it's a bug fix and fair game for 7.4. Of course, the sooner you get it in the more likely we'll see it that way ;-). Later in beta, only critical bugfixes will be accepted, and this one surely ain't very critical. Now if I can just get CVS working on Redhat 8 and remember how to build a patch, even a one-liner. :-) -- Mike Nolan ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] initcap incompatibility issue
The initcap function is not completely consistent with Oracle's initcap function: SELECT initcap('alex hyde-whyte'); In Oracle 9.2i this will return 'Alex Hyde-White', in PostgreSQL 7.3.3 it returns 'Alex Hyde-white'. It looks like a relatively simple change to oracle_compat.c in backend/utils/adt, but is this a bugfix that can be made during the pre-beta period for 7.4 or does it need to wait? -- Mike Nolan ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] initcap incompatibility issue
The initcap function is not completely consistent with Oracle's initcap function: SELECT initcap('alex hyde-whyte'); In Oracle 9.2i this will return 'Alex Hyde-White', in PostgreSQL 7.3.3 it returns 'Alex Hyde-white'. No, it doesn't change the 'y' to an 'i', that's a typo in my earlier note. As far as I can tell, not capitalizing the first letter after a dash is the only inconsistency with Oracle's implementation of this function. If a patch is in order at this time, I will try to produce it, I need to learn set up CVS and how to do that anyway. :-) -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] initcap incompatibility issue
As far as I can tell, not capitalizing the first letter after a dash is the only inconsistency with Oracle's implementation of this function. Wrong again. Oracle also capitalizes the first letter after a comma, semicolon, colon, period, and both a single and double quote. (And that's all I've tested so far.) So, I guess I need to write a program to test all possible combinations to see how incompatible the function is. Making this change will be a larger patch than I had initially anticipated. That also brings into question whether this is really a bugfix or a specification change, a question which is relevant since we're in the feature freeze for 7.4. -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Physical Database Configuration
Well, correct solution is to implement tablespaces on which objects like databases, tables and indexes can be put. I've not looked at the SQL standard, but it seems to me like the order should be: Databases Tablespaces Schemas Objects (tables, indexes, functions, etc.) And it really isn't hierarchical. As I understand them (based on my Oracle background), tablespaces, unlike schemas, do NOT create a layer of data abstraction. That is to say, while the same table name can exist in multiple schemas, only one instance of a given table name within a given schema can exist, regardless of what tablespace it is in. That makes the tablespace a property of an object. Whether or not two databases can share tablespaces isn't clear to me, though as a DBA I can think of good reasons why they probably shouldn't do so, I'm not sure if that is an absolute. I have no idea what is the status of that effort right now. You can search the archives or I hope this kicks a fresh discussion..:-) I'm game, though I'm also not ready to lead such a project, probably not even the discussion on it. -- Mike Nolan ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: Tablespaces (was Re: [HACKERS] [GENERAL] Physical Database
I thought Tablespaces were already implemented. Are they not? Apparently not. A group has been formed to work on it, though. -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Updating psql for features of new FE/BE protocol
This is irrelevant to what I'm doing, in any case, and it's not an itch I feel personally. Work on it yourself if you want it ... OK, I figured it out. :-) It's a fairly short patch in 7.3.3, what do I need to do to submit it for 7.4? I also made a minor functional change that may need to be turned into an additional variant on echo: if ECHO is set to 'queries' and output is being sent other than to stdout (\o), it echoes the query to the output file or pipe. If that's too much of a change and this needs to be a separate option, I'd suggest something like setting ECHO to 'queryout'. To be complete, would it need to have a command line equivalent, too? BTW, I figured out how to direct the output to a separate file each time, I put the following in .psqlrc: \o |tee `echo psql_${$}.txt` -- Mike Nolan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] Physical Database Configuration
That should be Tablespaces databases schemas objects with each of them implemented as a directory and data files under it. If we could get a quota check propogated in both direction, that would be pretty good, may be a warning when things start getting close to limit. I disagree. Just as you can have multiple schemas within one database you can have multiple tablespaces within one database. And the tablespace is irrelevant as far as specifying an object is concerned. A fully qualified object would be: database.schema.object, not tablespace.database.schema.object or database.tablespace.schema.object. -- Mike Nolan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] Physical Database Configuration
Well, with above proposal, drop database should be as simple. It's just that it would be more than one `rm -rf`rather than just one. Right, there would be potentially one per tablespace. The key point here is that the tablespace definitions are known cluster-wide, so a DROP DATABASE x command running in database y would still be able to figure out which subdirectories it needs to zap. It sounds like you envision tablespaces (or at least the default tablespace) as being above databases in some contexts. Reducing other tablespaces to mere subdirectories under the 'base' directory for a database sounds like a practical implementation measure. I presume that the 'global' directory is for stuff that is not specific to any one database within a database cluster. If so, there is an inconsistency in the current directory structure in that SOME global information is in the main /usr/local/pgsql/data directory (or equivalent) while other global information is in the global subdirectory. Being able to zap a database with one or more 'rm -rf' commands assumes that there will be files from just ONE database permitted in any given tablespace, and ONLY files from that database. The former is probably a good thing to require and enforce, the latter is unenforcable but probably still advisable. Is this doable within the time frame for the 7.4 feature freeze? -- Mike Nolan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [GENERAL] Physical Database Configuration
Shridhar Daithankar [EMAIL PROTECTED] writes: On Thursday 26 June 2003 21:56, [EMAIL PROTECTED] wrote: Is this doable within the time frame for the 7.4 feature freeze? Good question. ROTFL... the answer is no. Feature freeze is Tuesday, people. In practice, the time to start coding new stuff is already long past. Especially major new stuff. If you start now you might have something done for 7.5. Forgive us, Tom, we are still learning how the cycle works. (Or at least I am.) I am also probably grossly underestimating the pervasiveness of implementing tablespaces. -- Mike Nolan ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] A portable code question
In the little fix I came up with for psql last night, I need to be able to ensure that something sent to a pipe (and then to stdout) completes before issuing the prompt directly to stdout. I did this with: system ('sleep 1');, but I'm fairly sure that is not portable nor does it ENSURE completion. What's the proper way to do this? And what's a good book on writing portable code? -- Mike Nolan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] A portable code question
Assuming you're using file streams to write to the pipe, fflush() will do the trick. The problem is that the pipe (from \o |tee ) is intermingling writes to stdout by tee with direct writes to stdout from within psql. I do issue a fflush, because that's necessary to make the pipe do its thing, but the next line of code also does a write to stdout and the pipe generally doesn't have time to complete that write to stdout, resulting in intermingled output. (fflush makes sure the pipe GETS the stream, it doesn't wait around to make sure it's DONE with it, probably because there's no way for whatever the pipe calls to report back when it is done.) This is a bit of a hack, but adding an option to the \o code so that it writes simultaneously to the pipe and to stdout instead of using tee looks like a lot more work, especially since the code appears to have a couple of other places where intermingling to stdout is possible, especially if readline is used. Throwing in system('sleep 1'); was the way I resolved the timing question here, but that may not be portable enough for inclusion into the code base. -- Mike Nolan ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [GENERAL] Physical Database Configuration
DB2 looks good. I have horrid, horrid memories of wrestling with the Oracle extent madness. I think Oracle's extents came from their fixed size data file legacy, in 9i the extent limits appear to be completely overridable and sometimes even ignored, such as the next extent size. I agree that the 128 extent limit was a pain, and the default for each new extent to be larger than the previous one created many problems. Oracle also took physical abstraction one level beyond 'tablespaces'. I think if each tablespace pointed to a specific directory, that'd be sufficient for me. And since I envision the tablespace as an attribute of the table that should take care of the 1GB file rollover issue, as the rollover would occur in the same directory as the first file. Without having delved into the code yet, setting up entries for user default tablespaces and system information is probably at least as much work as getting a tablespace to point to a specific directory for the purposes of opening or creating files for an object. My personal preference would be to have four tablespaces predefined as part of a new database, though initially they could all point to the same place: SYSTEM USER TEMP INDEXES What about the concepts of a 'read-only' tablespace, or taking tablespaces offline? -- Mike Nolan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Updating psql for features of new FE/BE protocol
Is it too late to suggest that there be a way to have output displayed on screen AND output to a file? I've got my Oracle systems set up so that all sqlplus sessions do this, complete with using the process or session number as part of the output file name so each is unique. This gives me a running record of what I did when, which saves me a LOT of time if I want to view the results of some query I ran last week. I can delete or zip up files if I get short on disk space space -- Mike Nolan ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Updating psql for features of new FE/BE protocol
[EMAIL PROTECTED] writes: Is it too late to suggest that there be a way to have output displayed on screen AND output to a file? tee perhaps? Tee ALMOST does it. Try doing a \d while tee'ing the output, for example. I don't quite get everything back before it asks for the next input line, sometimes all that is missing is the prompt itself. I haven't set up a 7.4 test system yet, but I've been looking into it in 7.3.3. it gives me something fairly harmless to work on as I learn more C. I think tee may write straight to sysout, so it is probably intermingling with the writes from within psql. I'm not sure why sometimes it is only missing a line or two and other times it is missing several lines. There doesn't appear to be a way to set the popen on the \o command to non-buffer mode or to force a flush on a pipe. (The equivalent of fflush.) I have also noticed that if I have timing on, the timing stats do not get sent to the output file, just to the screen. (That doesn't concern me at this point, it was just a side comment on screen vs file output.) This is irrelevant to what I'm doing, in any case, and it's not an itch I feel personally. Work on it yourself if you want it ... I'm trying to, now I really feel like a rookie! :-) -- Mike Nolan ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org