Re: [GENERAL] moving from MySQL to pgsql
Le 10/10/2012 10:47, Vineet Deodhar a écrit : Hi ! 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data type or something else) Another way, and a good practice toot is to use SQL DOMAINs wich is a part of the ISO SQL since 1992 that MySQL don't have... CREATE DOMAIN TINYINT AS SMALLINT CHECK (VALUE BETWEEN 0 AND 255) Most part of the modelling tools are able to use DOMAINs in their modelling process, like PowerDesigner, Mega, ERwin, Rational... A + -- Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66 Le site sur le langage SQL et les SGBDR : http://sqlpro.developpez.com Enseignant Arts & Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence Audit, conseil, expertise, formation, modélisation, tuning, optimisation *** http://www.sqlspot.com * -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
>Yeah, a pg_dump mode that dumped everything but large objects would be nice. There is option -b for pg_dump which controls whether large objects are dumped or no. The problem is that with option -b it dumps all large objects regardless of what schema you requested it to dump using option -n. Otherwise it works fine. >I'm now wondering about the idea of implementing a pg_dump option that >dumped large objects into a directory tree like > lobs/[loid]/[lob_md5] >and wrote out a restore script that loaded them using `lo_import`. > >During dumping temporary copies could be written to something like >lobs/[loid]/.tmp. with the md5 being calculated on the fly as the >byte stream is read. If the dumped file had the same md5 as the existing >one it'd just delete the tempfile; otherwise the tempfile would be >renamed to the calculated md5. > >That way incremental backup systems could manage the dumped LOB tree >without quite the same horrible degree of duplication as is currently >faced when using lo in the database with pg_dump. > >A last_modified timestamp on `pg_largeobject_metadata` would be even >better, allowing the cost of reading and discarding rarely-changed large >objects to be avoided. Definitely interesting idea with incremental backups. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] moving from MySQL to pgsql
On Thu, Oct 11, 2012 at 5:26 AM, Ondrej Ivanič wrote: > Hi, > > On 10 October 2012 19:47, Vineet Deodhar wrote: > > 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data > > type or something else) > > What do you exactly mean? Do you care about storage requirements or > constraints? The smallest numeric type in postgres is smallint: range > is +/- 32K and you need two bytes. You can use check constraint to > restrict the range (postgres doesn't have signed / unsigned types): > > create table T ( > tint_signed smallint check ( tint_signed >= -128 and tint_signed =< 127 > ), > tint_unsigned smallint check ( tint_unsigned >= 0 and tint_unsigned =< > 255 ) > ) > > Yes. Considering the storage requirements , I am looking for TINYINT kind of data type. > if you care about storage then "char" (yes, with quotes) might be the > right type for you. > >> -- >> Ondrej Ivanic >> (ondrej.iva...@gmail.com) >> (http://www.linkedin.com/in/ondrejivanic) >> > > If I use "char" for numeric field, would it be possible to do numeric operations comparisons such as max(tint_unsigned) ? --- Vineet
Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
On 10/11/2012 01:35 PM, tigran2-postg...@riatest.com wrote: Using files stored outside the database creates all sorts of problems. For starters you lose ACID guaranties. I would prefer to keep them in database. We did a lot of experiments with Large Objects and they really worked fine (stored hundreds of LOBs ranging from a few MB up to 1GB). Postgres does a really good job with Large Objects. If it was not the pg_dump problem I would not hesitate to use LOBs. Yeah, a pg_dump mode that dumped everything but large objects would be nice. Right now I find storing large objects in the DB such a pain from a backup management point of view that I avoid it where possible. I'm now wondering about the idea of implementing a pg_dump option that dumped large objects into a directory tree like lobs/[loid]/[lob_md5] and wrote out a restore script that loaded them using `lo_import`. During dumping temporary copies could be written to something like lobs/[loid]/.tmp. with the md5 being calculated on the fly as the byte stream is read. If the dumped file had the same md5 as the existing one it'd just delete the tempfile; otherwise the tempfile would be renamed to the calculated md5. That way incremental backup systems could manage the dumped LOB tree without quite the same horrible degree of duplication as is currently faced when using lo in the database with pg_dump. A last_modified timestamp on `pg_largeobject_metadata` would be even better, allowing the cost of reading and discarding rarely-changed large objects to be avoided. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
>I believe the general consensus around here is to not do that, if you can avoid it. File systems are much better equipped to handle files of that magnitude, especially when it comes to retrieving them, scanning >through their contents, or really, any access pattern aside from simple storage. > >You're better off storing the blob on disk somewhere and storing a row that refers to its location. Either key pieces for a naming scheme or the full path. > >This is especially true if you mean to later access that data with PHP. > >-- >Shaun Thomas Using files stored outside the database creates all sorts of problems. For starters you lose ACID guaranties. I would prefer to keep them in database. We did a lot of experiments with Large Objects and they really worked fine (stored hundreds of LOBs ranging from a few MB up to 1GB). Postgres does a really good job with Large Objects. If it was not the pg_dump problem I would not hesitate to use LOBs.
Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
>Large Objects and bytea are the only ways. > >If you want to pg_dump only certain large objects, that won't work as far as I know (maybe using permissions and a non-superuser can help). > >You absolutely need to pg_dump parts of the database regularly? > >Yours, >Laurenz Albe It is not an absolute requirement but would be really nice to have. We have a multi-tenant database with each tenant data stored in a separate scheme. Using pg_dump seems to be the ideal way to migrate tenant data from one database to another when we need to do it to balance the load.
Re: [GENERAL] Shorthand syntax for triggers
On 10/11/2012 12:22 AM, Joe Van Dyk wrote: 3. Triggers can access a special CHANGED value that's either NEW for insert or updates, or OLD for deletes. I'm not a big fan of the prior comments about small syntax changes, but this would simplify quite a bit of code. I'd *really* like a way to refer to "NEW for INSERT or UPDATE, OLD for DELETE" as an implicit automatic variable. I do see the appeal of making trigger functions anonymous, but I really doubt it's worth the hassle. 4. Default for 'after insert' triggers is to return null, as I believe it doesn't matter what you return here. This is a trivial convenience, but not one I'd be against. 5. Way less repetitive typing. If you're repeating the same triggers over and over you may want to look at writing them to be re-usable. See eg: http://wiki.postgresql.org/wiki/Audit_trigger_91plus -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] moving from MySQL to pgsql
On 10/10/2012 04:47 PM, Vineet Deodhar wrote: 2) I run MySQL from a USB stick. There is no installation required (on WinXP.). (not tried on Ubuntu) Is it the same for pgsql? On Windows PostgreSQL is usually installed as a system service with its own user account (pre-9.2) or running in the network service account (9.2+). This isn't strictly required, though. You can keep the .zip binary releases on a USB key and use pg_ctl to start/stop them from your own scripts. If you're bundling Pg in your application this may be the best choice. See: http://www.enterprisedb.com/products-services-training/pgbindownload You *really* shouldn't keep the database its self on a USB key. Performance is likely to be terrible, and many USB keys have quite short write lifetimes so a database on a USB key can wear some of them out in a real hurry. Think about your backup process too. With PostgreSQL you have a couple of options, including log archiving, periodic dumps, and warm standby. Please read the backup chapter of the manual in detail. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
On Wed, Oct 10, 2012 at 09:24:32PM -0400, Tom Lane wrote: > I'd not been thinking of that change as something we'd risk > back-patching, but maybe we should consider putting it into 9.2. It > seems like the index-only scan support has put a new level of premium on > the quality of the planner's rowcount estimates. Yes, please do! It's that or we globally disable index-only scans, which I'd prefer not to do. Let me know if you'd like me to test a patch, I can apply it and see if it fixes our issue at hand. > Meanwhile, that range condition in itself looks a tad, er, klugy. > Do you really need that, or is this a crummy way of stating > foobar.id = m.id? No, it's really needed. That's merely the tip of the kluginess; don't get me started! The dangers of an organically grown schema. :) This thing has been growing since Postgres v6. (Looking back at how far Postgres has come from 6.x to 9.2 is truly awe-inspiring) -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgp5oFeJLefSg.pgp Description: PGP signature
Re: [GENERAL] moving from MySQL to pgsql
On 10/10/2012 02:18 AM, Sim Zacks wrote: 2) I run MySQL from a USB stick. There is no installation required (on WinXP.). (not tried on Ubuntu) Is it the same for pgsql? To use postgres on a USB stick, see http://www.postgresonline.com/journal/archives/172-Starting-PostgreSQL-in-windows-without-install.html 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data type or something else) You can either use bool or smallint with a constraint. Or he could create a custom type easily enough. JD Sim -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 'full_page_writes=off' , VACUUM and crashing streaming slaves...
>> Oct 5 15:00:25 db01 postgres[76648]: [5944-1] javafail@dbcluster 76648 0: >> FATAL: pipe() failed: Too many open files in system > > This message must be coming from initSelfPipe(), and after poking around > a bit I think the failure must be occurring while a new backend is > attempting to do "OwnLatch(&MyProc->procLatch)" in InitProcess. The > reason the postmaster treats this as a crash is that the new backend > just armed the dead-man switch (MarkPostmasterChildActive) but it exits > without doing ProcKill which would disarm it. So this is just an > order-of-operations bug in InitProcess: we're assuming that it can't > fail before reaching "on_shmem_exit(ProcKill, 0)", and the latch > additions broke that. (Though looking at it, assuming that the > PGSemaphoreReset call cannot fail seems a tad risky too.) > > So that explains the crashes, but it doesn't (directly) explain why you > had data corruption. I've since been able to repeat this with full_page_writes=on and have had identical corruption, so I don't think this is full_page_writes related any more. I can also confirm that there was just one crash by the master database because it required manual intervention to bring back up (a backup label was sitting stale in the data dir[1]). The slaves died during the post-crash VACUUM just as earlier. I'm completing the VACUUM now and am seeing multiple warnings from VACUUM. WARNING: relation "tbl_a" page 2115352 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115353 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115354 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115355 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115356 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115357 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115358 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115359 is uninitialized --- fixing WARNING: relation "tbl_a" page 2115360 is uninitialized --- fixing On one of the slaves: Oct 11 00:17:36 db02 postgres[66904]: [21-1] @ 66904 0: WARNING: page 120547 of relation base/16387/20196 is uninitialized Oct 11 00:17:36 db02 postgres[66904]: [21-2] @ 66904 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 125016, lastBlockVacuumed 0 Oct 11 00:17:36 db02 postgres[66904]: [22-1] @ 66904 0: PANIC: WAL contains references to invalid pages Oct 11 00:17:36 db02 postgres[66904]: [22-2] @ 66904 0: CONTEXT: xlog redo vacuum: rel 1663/16387/20196; blk 125016, lastBlockVacuumed 0 Oct 11 00:17:36 db02 postgres[66897]: [10-1] @ 66897 0: LOG: startup process (PID 66904) was terminated by signal 6: Abort trap Oct 11 00:17:36 db02 postgres[66897]: [11-1] @ 66897 0: LOG: terminating any other active server processes Hopefully this is helpful information. -sc [1] The backup label was residual from the initial sync and wasn't cleaned up for some reason during a pg_stop_backup(). For now I'm chalking this up as a bug in repmgr even though repmgr completed cloning the slave successfully (supposedly). START WAL LOCATION: 9F/3620 (file 0001009F0036) CHECKPOINT LOCATION: 9F/37E9D6D8 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2012-10-06 17:48:10 UTC LABEL: repmgr_standby_clone_1349545601 -- Sean Chittenden s...@chittenden.org -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can two “SELECT FOR UPDATE” statements on the same table cause a deadlock?
Steve A writes: > In a nutshell, I'm curious about the order in which PG will lock rows during > a SELECT FOR UPDATE. If two simultaneous SELECT FOR UPDATE statements select > intersecting rows from the same table, can PG be relied upon to lock the rows > in a consistent manner that always avoids deadlock (e.g. in order of > ascending primary key)? Only if you use ORDER BY in each such query to constrain the rows to be locked in the same order. You probably want to spend some time perusing the fine manual very carefully: http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-FOR-UPDATE-SHARE regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
Greg Sabino Mullane writes: > -> Bitmap Heap Scan on foobar o (C=30389..835271 R=8980 W=8) > (AT=0.06..0.07 R=1 L=1) >Recheck Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || > '.99'))) >Filter: (((status) <> ALL ('{panda,penguin}'[])) \ > AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$'))) >-> Bitmap Index Scan on foobar_pkey (C=0..30386 R=1888670 W=0) > (AT=0.02..0.02 R=1 L=1) > Index Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || > '.99'))) Actually, looking closer, I think the problem is not with the estimation of the index-only scan on the other index; the planner is estimating that as pretty expensive, which it is. The problem is that it thinks the above bitmap scan is pretty expensive, when it isn't. And the reason evidently is that it's totally off in the weeds about the selectivity of the range condition on foobar.id. Anytime you've got 1888670 estimated rows and 1 actual row, you've got a problem. This is related to the problem I was on about a couple weeks ago: http://archives.postgresql.org/message-id/17655.1348874...@sss.pgh.pa.us namely that the planner fails to recognize pairs of clauses as a range constraint if they're join clauses. If it had recognized that, you'd have gotten an estimate that would still be far more than "1 row", but would be more than an order of magnitude less than this one, which would be enough to fix this problem. I'd not been thinking of that change as something we'd risk back-patching, but maybe we should consider putting it into 9.2. It seems like the index-only scan support has put a new level of premium on the quality of the planner's rowcount estimates. Meanwhile, that range condition in itself looks a tad, er, klugy. Do you really need that, or is this a crummy way of stating foobar.id = m.id? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index only scan
On 11/10/12 12:41, Tom Lane wrote: Gavin Flower writes: On 11/10/12 01:03, Lars Helge Øverland wrote: My question is: Would it be feasible and/or possible to implement index only scans in a way that it could take advantage of several, single-column indexes? For example, a query spanning columns a, b, c could take advantage of 3 single-column indexes put on columns a, b, c. Index only scans do use multiple indexes of single fields where appropriate. Here the planner determined it only needed to scan 2 of the 3 relevant single field indexes. But your example isn't an index-only scan ... it's a plain old bitmap scan, and so it does touch the heap. The difficulty with what Lars proposes is that there's no way to scan the different indexes "in sync" --- each one will be ordered according to its own column order. In principle I guess we could read out the index data and do a join using the ctid's, but it's far from clear that such a thing would be worth the trouble. regards, tom lane Thanks for the correction! Cheers, Gavin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] moving from MySQL to pgsql
I noticed something here that none of the other replies addressed. Vineet Deodhar wrote: 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data type or something else) The answer to your question depends on what you were using the TINYINT for. For example, many people use TINYINT in MySQL to store BOOLEAN (true/false) data because MySQL doesn't have a real BOOLEAN type; MySQL has the BOOLEAN keyword, but if you choose it then what you actually get is a TINYINT. And so, if you are actually using the field just for true/false or 1/0 values, then Postgres' BOOLEAN type (which is not a number) is the best thing for you to use. If you were storing actual numbers outside that range, then use SMALLINT or something like that. Any reverse engineering of the MySQL schema will never know you used the BOOLEAN keyword and just say you have a TINYINT, so you may have to study your schema and its uses more to know what kind of data/type you actually have. -- Darren Duncan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
Found a good demonstration of the problem. Here's explain analyze of a query on 9.2 with enable_indexonlyscan = off; This produces the exact same plan as 8.3. The tables in question have been analyzed. Changing random_page_cost has no effect. The main foobar table has 17M rows. I did multiple runs of both to eliminate any caching effects. foobar.id is VARCHAR(16) foobar.status is VARCHAR(32) Indexes: "foobar_pkey" PRIMARY KEY, btree (id) CLUSTER "foobar_status" UNIQUE, btree (status, id) (8.3 and up, plus 9.2 with index scan disabled) GroupAggregate (C=30389..1754503 R=1 W=22) (AT=0.3..0.3 R=1 L=1) -> Nested Loop Left Join (C=30389..1754147 R=23751 W=22) (AT=0.1..0.2 R=7 L=1) -> Nested Loop Left Join (C=30389..835374 R=8980 W=16) (AT=0.1.0.1 R=1 L=1) -> Index Scan using foobar_pkey on foobar m (C=0..13 R=1 W=8) (AT=0.03..0.03 rows=1 L=1) Index Cond: ((id) = '17464097') Filter: ((id) !~~ '%.%') -> Bitmap Heap Scan on foobar o (C=30389..835271 R=8980 W=8) (AT=0.06..0.07 R=1 L=1) Recheck Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.99'))) Filter: (((status) <> ALL ('{panda,penguin}'[])) \ AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$'))) -> Bitmap Index Scan on foobar_pkey (C=0..30386 R=1888670 W=0) (AT=0.02..0.02 R=1 L=1) Index Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.99'))) -> Index Scan using baz_id on baz (C=0..101 R=40 W=22) (AT=0.03..0.06 R=7 L=1) Index Cond: ((o.id) = (id)) Filter: (price <> 0::numeric) Rows Removed by Filter: 3 Total runtime: 0.459 ms Now, if we turn on index only scans, we get a terrible runtime: GroupAggregate (C=0.00..1314945 R=1 W=22) (AT=34502..34502 R=1 L=1) -> Nested Loop Left Join (C=0.00..1314589 R=23751 W=22) (AT=31934..34502 R=7 L=1) -> Nested Loop Left Join (C=0.00..395816 R=8980 W=16) (AT=31934..34502 R=1 L=1) -> Index Only Scan using foobar_pkey on foobar m (C=0.00..13.81 R=1 W=8) (AT=0.029..0.034 R=1 L=1) Index Cond: (id = '17464097') Filter: ((id) !~~ '%.%') Heap Fetches: 0 -> Index Only Scan using foobar_status on foobar o (C=0.00..395713 R=8980 W=8) (AT=31934..34502 R=1 L=1) Index Cond: ((id >= (m.id)) AND (id <= ((m.id) || '.99'))) Filter: (((status) <> ALL ('{panda,penguin}'[])) \ AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$'))) Heap Fetches: 0 -> Index Scan using baz_id on baz (C=0..101 R=40 W=22) (AT=0.1..0.2 R=7 L=1) Index Cond: ((o.id) = (id)) Filter: (price <> 0::numeric) Rows Removed by Filter: 3 Total runtime: 34502.670 ms Yeah34 seconds versus near-instant. The first index-only scan does great, but that second one - ouch - even with no heap fetches at all! -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgp7vrOJXPrmk.pgp Description: PGP signature
Re: [GENERAL] moving from MySQL to pgsql
Hi, On 10 October 2012 19:47, Vineet Deodhar wrote: > 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data > type or something else) What do you exactly mean? Do you care about storage requirements or constraints? The smallest numeric type in postgres is smallint: range is +/- 32K and you need two bytes. You can use check constraint to restrict the range (postgres doesn't have signed / unsigned types): create table T ( tint_signed smallint check ( tint_signed >= -128 and tint_signed =< 127 ), tint_unsigned smallint check ( tint_unsigned >= 0 and tint_unsigned =< 255 ) ) if you care about storage then "char" (yes, with quotes) might be the right type for you. -- Ondrej Ivanic (ondrej.iva...@gmail.com) (http://www.linkedin.com/in/ondrejivanic) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
Greg Sabino Mullane writes: > Found a good demonstration of the problem. Here's explain analyze of a > query on 9.2 with enable_indexonlyscan = off; This produces the exact same > plan as 8.3. The tables in question have been analyzed. Changing > random_page_cost has no effect. The main foobar table has 17M rows. Can you provide a self-contained test case for this (ie, sample data)? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index only scan
Gavin Flower writes: > On 11/10/12 01:03, Lars Helge Øverland wrote: >> My question is: Would it be feasible and/or possible to implement >> index only scans in a way that it could take advantage of several, >> single-column indexes? For example, a query spanning columns a, b, c >> could take advantage of 3 single-column indexes put on columns a, b, >> c. > Index only scans do use multiple indexes of single fields where > appropriate. Here the planner determined it only needed to scan 2 of > the 3 relevant single field indexes. But your example isn't an index-only scan ... it's a plain old bitmap scan, and so it does touch the heap. The difficulty with what Lars proposes is that there's no way to scan the different indexes "in sync" --- each one will be ordered according to its own column order. In principle I guess we could read out the index data and do a join using the ctid's, but it's far from clear that such a thing would be worth the trouble. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index only scan
Hi, On 10 October 2012 23:03, Lars Helge Øverland wrote: > We are now in the process of designing a new component for analytics > and this feature got me thinking we could utilize postgres over other > alternatives like column-oriented databases. Basically we will have a > wide, denormalized table with 20+ columns with relatively low > cardinality. Typically we will have queries which sums a fact column > based on where/group by clauses on several dimension columns (standard > data warehouse stuff). An example would be "select a, b, c, sum(d) > from analytics where a=1 and b=2 group by a,b,c"; > > Finally, is there anyone else who are using postgres for this purpose > and have some good tips to share in order to achieve good performance, > including index strategies, beyond the standard config best practices? yes, we had fact table which has around 250 columns and 250mil rows. The question is if you can partition your data set. For example, monthly partition. This keeps indexes small but all queries must be constrained by the same column as is used for partitioning (ie. monthly partitions -> every query should have "datetime between ... and ...") From my experience postgres is not good with large group by queries. For example, your query: select a, b, c, sum(d) from analytics where a=1 and b=2 group by a,b,c could be executed over multiple connections: insert into t select select a, b, c, sum(d) as d from analytics where c >= val1 and c < val2 and a=1 and b=2 group by a,b,c insert into t select select a, b, c, sum(d) as d from analytics where c >= val2 and c < val3 and a=1 and b=2 group by a,b,c ... insert into t select select a, b, c, sum(d) as d from analytics where c >= valN-1 and c < valN and a=1 and b=2 group by a,b,c and then get the final result: select a, b, c, sum(d) from t group by a,b,c You can use pgpool-II parallel query feature instead of manual slicing. -- Ondrej Ivanic (ondrej.iva...@gmail.com) (http://www.linkedin.com/in/ondrejivanic) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index only scan
On 11/10/12 01:03, Lars Helge Øverland wrote: Hi all, first of all thanks for the great new "index only scan" feature in 9.2. We have managed to adapt our app (dhis2.org) to take advantage of it and it really speeds up several queries significantly. We are now in the process of designing a new component for analytics and this feature got me thinking we could utilize postgres over other alternatives like column-oriented databases. Basically we will have a wide, denormalized table with 20+ columns with relatively low cardinality. Typically we will have queries which sums a fact column based on where/group by clauses on several dimension columns (standard data warehouse stuff). An example would be "select a, b, c, sum(d) from analytics where a=1 and b=2 group by a,b,c"; Now my initial idea was to simply put a single index on all of those columns, in the hope that "index only scans" would kick in. It seems this is not the case, as strictly one (single or multi-column) index is required to enable index only scans for a query. My question is: Would it be feasible and/or possible to implement index only scans in a way that it could take advantage of several, single-column indexes? For example, a query spanning columns a, b, c could take advantage of 3 single-column indexes put on columns a, b, c. Finally, is there anyone else who are using postgres for this purpose and have some good tips to share in order to achieve good performance, including index strategies, beyond the standard config best practices? best regards, Lars Helge Øverland Index only scans do use multiple indexes of single fields where appropriate. Here the planner determined it only needed to scan 2 of the 3 relevant single field indexes. Cheers, Gavin -- index_only_scan_001.sql DROP TABLE IF EXISTS iostab; CREATE TABLE iostab ( id int PRIMARY KEY, a int, b int, c int, d int, z text ); INSERT INTO iostab (id, a, b, c, d, z) VALUES (generate_series(1, 100), 1000 * random(), 1000 * random(), 1000 * random(), 1000 * random(), 'qq' || random()); CREATE INDEX ON iostab (a); CREATE INDEX ON iostab (b); CREATE INDEX ON iostab (c); CREATE INDEX ON iostab (d); ANALYZE VERBOSE iostab; EXPLAIN SELECT i.* FROM iostab i WHERE i.a = 2 AND i.b = 7 AND i.c = 4 /**/;/**/ // DROP TABLE psql:index_only_scan_001.sql:14: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "iostab_pkey" for table "iostab" CREATE TABLE INSERT 0 100 CREATE INDEX CREATE INDEX CREATE INDEX CREATE INDEX psql:index_only_scan_001.sql:30: INFO: analyzing "public.iostab" psql:index_only_scan_001.sql:30: INFO: "iostab": scanned 15385 of 15385 pages, containing 100 live rows and 0 dead rows; 3 rows in sample, 100 estimated total rows ANALYZE QUERY PLAN Bitmap Heap Scan on iostab i (cost=41.37..45.39 rows=1 width=90) Recheck Cond: ((b = 7) AND (a = 2)) Filter: (c = 4) -> BitmapAnd (cost=41.37..41.37 rows=1 width=0) -> Bitmap Index Scan on iostab_b_idx (cost=0.00..20.55 rows=960 width=0) Index Cond: (b = 7) -> Bitmap Index Scan on iostab_a_idx (cost=0.00..20.57 rows=963 width=0) Index Cond: (a = 2) (8 rows) gavin=>
Re: [GENERAL] Shorthand syntax for triggers
On Wed, Oct 10, 2012 at 9:22 AM, Joe Van Dyk wrote: > I wish I could do: > > create trigger some_trigger after insert on products > execute procedure do $$ begin > insert into audits values (CHANGED.value); > end $$ language plpgsql; IF TG_OP = 'DELETE' THEN RENAME OLD TO myrow; ELSE RENAME NEW TO myrow; END IF; and then use insert into audits values (myrow.value); > > > Changes/improvements: > > 1. Triggers default to 'for each row' > > 2. Triggers can use anonymous functions > > 3. Triggers can access a special CHANGED value that's either NEW for insert > or updates, or OLD for deletes. > > 4. Default for 'after insert' triggers is to return null, as I believe it > doesn't matter what you return here. > > 5. Way less repetitive typing. > > > Thoughts? Is this a terrible idea? -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
On Wed, Oct 10, 2012 at 01:31:29PM -0400, Tom Lane wrote: > The above doesn't seem like a regression to me. You told it not to use > a seqscan, and it didn't. (The reason it now considers the index is > that an index-only scan is possible; before 9.2 there was no reason to > consider an indexscan at all given this query, so you got the seqscan > despite the attempted disable.) Ah...index-only scans. Now it makes sense. ... > It's not obvious that this is a worse plan than a seqscan --- the > index-only scans will only have to read the index not the heap, at least > if the heap is all-visible. If it's coming out slower, then that's a > question of whether the cost estimates match reality. I'd wonder how > many heap fetches occur anyway, and also whether you've tweaked the > planner cost parameters. We've lowered random_page_cost, but raising it back to the default does not help. > You should be able to force it back to the seqscan based plan by turning > off enable_indexscan or enable_indexonlyscan. It would be useful to > see EXPLAIN ANALYZE (not just EXPLAIN) results for both this plan and > the seqscan plan in 9.2. Thanks, I will play around with both a better test case and getting some explain analyzes (they were taking too long to run; thought I should get the email out first in case it was something obvious). -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpn2mbk5zwEr.pgp Description: PGP signature
Re: [GENERAL] pymssql Connection to the database failed for an unknown reason
On 10 Oct 2012, at 3:17, thomson...@sourcephotonics.com wrote: > On Saturday, January 29, 2011 9:08:22 PM UTC+8, orgilhp wrote: >> Hello >> I am using pymssql to connect to MSSQL2008 database. But an error >> occurs: >> -- > import pymssql > conn = pymssql.connect(host='orgilhpnb\mssql2008', user='erp', > password='123', database='eoffice_clone') >> Traceback (most recent call last): >> File "", line 1, in >> File "/usr/lib/pymodules/python2.6/pymssql.py", line 609, in connect >>raise InterfaceError, e[0] >> pymssql.InterfaceError: Connection to the database failed for an >> unknown reason. >> -- >> >> The host name, user, password and database name are all correct. >> Why I get an error? Please help me! >> >> Any suggestion would be highly appreciated! >> >> Best regards, >> Orgil > > Do you solve this issue? I encounted the same issue > would you like to reply the issue? It looks like you're accidentally using an MS-SQL interface to connect to a Postgres database? Or you're asking the wrong mailing list ;) Alban Hertroys -- Religion is: Volunteering your children for brainwashing. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_upgrade not detecting version properly
On Wed, Oct 10, 2012 at 10:35:06AM -0600, Chris Ernst wrote: > On 10/10/2012 09:56 AM, Bruce Momjian wrote: > > Can you show me what is in the PG_VERSION file in the old cluster? It > > should be "9.1". > > Hi Bruce, > > Thank you for the reply. Indeed it is "9.1": > > # cat /postgresql/9.1/main/PG_VERSION > 9.1 > > And just for good measure: > > cat /postgresql/9.2/main/PG_VERSION > 9.2 > > And there are no other PostgreSQL versions on this machine. > > Hmm... I was just about to send this when something else occurred to me. > I had initially tried to run pg_upgrade as root and it said it couldn't > be run as root. So I've been running it as my own user (which is in the > postgres group). However, everything in /postgresql/9.1/main is owned > by postgres with 700 permissions. > > I switched to the postgres user and now pg_upgrade is running. Perhaps > just a more informative error message is in order. > > Thank you for the shove in the right direction =) Oops, that code was returning zero if it couldn't open the file. The attached, applied patch to head and 9.2 issues a proper error message. Seems this "zero return" has been in the code since the beginning. :-( -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c new file mode 100644 index 11e7e75..a9f9d85 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** get_major_server_version(ClusterInfo *cl *** 149,155 snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION", cluster->pgdata); if ((version_fd = fopen(ver_filename, "r")) == NULL) ! return 0; if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 || sscanf(cluster->major_version_str, "%d.%d", &integer_version, --- 149,155 snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION", cluster->pgdata); if ((version_fd = fopen(ver_filename, "r")) == NULL) ! pg_log(PG_FATAL, "could not open version file: %s\n", ver_filename); if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 || sscanf(cluster->major_version_str, "%d.%d", &integer_version, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
Greg Sabino Mullane writes: > We are seeing a performance regression when moving to 9.2. There is a > complex query that is doing a self-join, but long story short, it > is choosing to use a multi-column index when it really ought not to be. > I was not able to develop a standalone test case without resorting > to changing enable_seqscan, but this does show the difference: > CREATE TABLE gregtest (a TEXT PRIMARY KEY, b TEXT); > CREATE INDEX gregtest_i ON gregtest(b,a); > SET enable_seqscan = off; > EXPLAIN SELECT 1 FROM gregtest WHERE a <> 'panda' AND b <> 'yak'; > On versions 8.2, 8.3, 8.4, 9.0, and 9.1, this gives: > Seq Scan on gregtest (cost=100.00..122.90 rows=855 width=0) >Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text)) > Which makes sense - I would imagine that b = 'yak' would use the index, > but the negation means the index is not very useful? > However, on 9.2, this gives: > Bitmap Heap Scan on gregtest (cost=8.76..31.66 rows=855 width=0) >Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text)) >-> Bitmap Index Scan on gregtest_i (cost=0.00..8.55 rows=860 width=0) The above doesn't seem like a regression to me. You told it not to use a seqscan, and it didn't. (The reason it now considers the index is that an index-only scan is possible; before 9.2 there was no reason to consider an indexscan at all given this query, so you got the seqscan despite the attempted disable.) > The above was tested on stock versions of Postgres, with no changes > made to postgresql.conf. In the actual query, the result is something like > this on 9.2 (columns explained below): > Nested Loop (cost=0.00..6050226723847.12 rows=282638194054762 width=8) >Join Filter: ((o.foo)::text <> ((m.foo)::text || 'panda'::text)) >-> Index Only Scan using index_i on foobar m (cost=0.00..889187.83 > rows=16998032 width=8) >-> Materialize (cost=0.00..1079773.42 rows=16627702 width=8) > -> Index Only Scan using index_i on foobar o (cost=0.00..931682.91 > rows=16627702 width=8) >Filter: ((status)::text <> 'split'::text) It's not obvious that this is a worse plan than a seqscan --- the index-only scans will only have to read the index not the heap, at least if the heap is all-visible. If it's coming out slower, then that's a question of whether the cost estimates match reality. I'd wonder how many heap fetches occur anyway, and also whether you've tweaked the planner cost parameters. You should be able to force it back to the seqscan based plan by turning off enable_indexscan or enable_indexonlyscan. It would be useful to see EXPLAIN ANALYZE (not just EXPLAIN) results for both this plan and the seqscan plan in 9.2. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Planner chooses multi-column index in 9.2 when maybe it should not
We are seeing a performance regression when moving to 9.2. There is a complex query that is doing a self-join, but long story short, it is choosing to use a multi-column index when it really ought not to be. I was not able to develop a standalone test case without resorting to changing enable_seqscan, but this does show the difference: CREATE TABLE gregtest (a TEXT PRIMARY KEY, b TEXT); CREATE INDEX gregtest_i ON gregtest(b,a); SET enable_seqscan = off; EXPLAIN SELECT 1 FROM gregtest WHERE a <> 'panda' AND b <> 'yak'; On versions 8.2, 8.3, 8.4, 9.0, and 9.1, this gives: Seq Scan on gregtest (cost=100.00..122.90 rows=855 width=0) Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text)) Which makes sense - I would imagine that b = 'yak' would use the index, but the negation means the index is not very useful? However, on 9.2, this gives: Bitmap Heap Scan on gregtest (cost=8.76..31.66 rows=855 width=0) Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text)) -> Bitmap Index Scan on gregtest_i (cost=0.00..8.55 rows=860 width=0) The above was tested on stock versions of Postgres, with no changes made to postgresql.conf. In the actual query, the result is something like this on 9.2 (columns explained below): Nested Loop (cost=0.00..6050226723847.12 rows=282638194054762 width=8) Join Filter: ((o.foo)::text <> ((m.foo)::text || 'panda'::text)) -> Index Only Scan using index_i on foobar m (cost=0.00..889187.83 rows=16998032 width=8) -> Materialize (cost=0.00..1079773.42 rows=16627702 width=8) -> Index Only Scan using index_i on foobar o (cost=0.00..931682.91 rows=16627702 width=8) Filter: ((status)::text <> 'split'::text) But like this on 8.3: Nested Loop (cost=1003294.60..8207409555713.15 rows=283931552087940 width=8) Join Filter: ((o.foo)::text <> ((m.foo)::text || 'panda'::text)) -> Seq Scan on foobar o (cost=0.00..962314.95 rows=16672204 width=8) Filter: ((status)::text <> 'split'::text) -> Materialize (cost=1003294.60..1240121.96 rows=17030236 width=8) -> Seq Scan on foobar m (cost=0.00..919739.36 rows=17030236 width=8) In the above, foobar has a primary key on foo, and an index named index_i on foobar(status, foo). In another variation of the query, 8.3 uses foobar_pkey as well, rather than index_i, and filters that. Matter of fact, index_i is never used. At any rate, the change causes the original query to run much, much slower. Problem on 9.2? Something wrong with our system and/or query? More information needed from me? -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpELPYaKfY4x.pgp Description: PGP signature
Re: [GENERAL] pg_upgrade not detecting version properly
On 10/10/2012 09:56 AM, Bruce Momjian wrote: > Can you show me what is in the PG_VERSION file in the old cluster? It > should be "9.1". Hi Bruce, Thank you for the reply. Indeed it is "9.1": # cat /postgresql/9.1/main/PG_VERSION 9.1 And just for good measure: cat /postgresql/9.2/main/PG_VERSION 9.2 And there are no other PostgreSQL versions on this machine. Hmm... I was just about to send this when something else occurred to me. I had initially tried to run pg_upgrade as root and it said it couldn't be run as root. So I've been running it as my own user (which is in the postgres group). However, everything in /postgresql/9.1/main is owned by postgres with 700 permissions. I switched to the postgres user and now pg_upgrade is running. Perhaps just a more informative error message is in order. Thank you for the shove in the right direction =) Cheers! - Chris -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] FTS for a controlled vocab
Hello, I am trying to identify how best to handle the situation where a controlled vocabulary needs to be searched on using full text search. I have a list of brand names that have, what FTS deems, blank characters in them that I need to search against. E.g. (+)people, D&G, 100% Design. These particular combinations are proving to be difficult so I would like to do a replacement at index and query time (+)people, +people -> pluspeople 100% Design -> 100percent Design D&G, D & G, DG -> DandG Running these through the default parser means I get a much reduced lexemes that won't be exact enough. alias | description | token | dictionaries | dictionary | lexemes ---+-+---++--+- asciiword | Word, all ASCII | d | {english_stem} | english_stem | {d} blank | Space symbols | & | {} | | asciiword | Word, all ASCII | g | {english_stem} | english_stem | {g} (3 rows) alias | description | token | dictionaries | dictionary | lexemes ---+-+++--+- blank | Space symbols | ( | {} | | blank | Space symbols | +) | {} | | asciiword | Word, all ASCII | people | {english_stem} | english_stem | {peopl} Can I achieve this with FTS and dictionaries or would I need a custom parser? Any other ideas on how a search like this could work? I have considered using the actual text column in the query to try and match the exact term using ilike and pg_trgm. So for a user query of 'D&G dresses' the select could be: select * from test where text_val @@ plainto_tsquery('english','d&g dresses') and lex ilike'%d&g%'; but there would be some horrible query mangling to find all words that have the blank tokens and use them in multiple ilike comparisons. Thanks Sumit
[GENERAL] Shorthand syntax for triggers
Instead of this: create function some_trigger() returns trigger as $$ begin if TG_OP = 'DELETE' then insert into audits values (OLD.value); else insert into audits values (NEW.value); end if; return NULL; end $$ language plpgsql; create trigger some_trigger after insert on products for each row execute procedure some_trigger(); I wish I could do: create trigger some_trigger after insert on products execute procedure do $$ begin insert into audits values (CHANGED.value); end $$ language plpgsql; Changes/improvements: 1. Triggers default to 'for each row' 2. Triggers can use anonymous functions 3. Triggers can access a special CHANGED value that's either NEW for insert or updates, or OLD for deletes. 4. Default for 'after insert' triggers is to return null, as I believe it doesn't matter what you return here. 5. Way less repetitive typing. Thoughts? Is this a terrible idea?
Re: [GENERAL] pg_upgrade not detecting version properly
On Tue, Oct 9, 2012 at 09:50:22PM -0600, Chris Ernst wrote: > Hi all, > > I'm trying to test using pg_upgrade to go from 9.1.6 to 9.2.1 on Ubuntu > server 10.04. But when I run pg_upgrade, it tells me I can only run it > on 8.3 or later. > > Old: > postgres=# SELECT version(); > version > > > PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real > (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3, 64-bit > (1 row) > > > New: > postgres=# SELECT version(); > version > > > PostgreSQL 9.2.1 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real > (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3, 64-bit > (1 row) > > > Yet when I try to run pg_upgrade: > > $ /usr/lib/postgresql/9.2/bin/pg_upgrade -b /usr/lib/postgresql/9.1/bin/ > -d /postgresql/9.1/main -B /usr/lib/postgresql/9.2/bin/ -D > /postgresql/9.2/main -k -c -v > Running in verbose mode > Performing Consistency Checks > - > Checking current, bin, and data directories ok > Checking cluster versions > This utility can only upgrade from PostgreSQL version 8.3 and later. > Failure, exiting > > Any idea what could be going on here? > > Thank you in advance for your help. That is cetainly odd. It is using this C code: if (GET_MAJOR_VERSION(old_cluster.major_version) < 803) pg_log(PG_FATAL, "This utility can only upgrade from PostgreSQL version 8.3 and later.\n"); which is assigned from this function: get_major_server_version(ClusterInfo *cluster) { FILE *version_fd; charver_filename[MAXPGPATH]; int integer_version = 0; int fractional_version = 0; snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION", cluster->pgdata); if ((version_fd = fopen(ver_filename, "r")) == NULL) return 0; if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 || sscanf(cluster->major_version_str, "%d.%d", &integer_version, &fractional_version) != 2) pg_log(PG_FATAL, "could not get version from %s\n", cluster->pgdata); fclose(version_fd); return (100 * integer_version + fractional_version) * 100; } Can you show me what is in the PG_VERSION file in the old cluster? It should be "9.1". -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: Suggestions for the best strategy to emulate returning multiple sets of results
create a ramdrive On Wed, 2012-10-10 at 14:27 +0100, Seref Arikan wrote: > Hi Merlin, > Thanks for the response. At the moment, the main function is creating > two temp tables that drops on commit, and python functions fills > these. Not too bad, but I'd like to push these temp tables to ram, > which is a bit tricky due to not having a direct method of doing this > with postgresql. (a topic that has been discussed in the past in this > mail group) > > The global variable idea is interesting though. I have not encountered > this before, is it the global dictionary SD/GD mentioned here: > http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ? > It may help perform the expensive transformations once and reuse the > results. > > Kind regards > Seref > > On Wed, Oct 10, 2012 at 2:06 PM, Merlin Moncure > wrote: > On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan > wrote: > > Thanks Merlin, > > I've tried arrays but plpython does not support returning > arrays of custom > > db types (which is what I'd need to do) > > > > hm -- yeah. can your custom types be broken down into plain > SQL types > (that is, composite types?). maybe stash the results in > global > variable and return it in two calls, or insert into into a > tempt > table that drops on commit? > > merlin > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pymssql Connection to the database failed for an unknown reason
On Saturday, January 29, 2011 9:08:22 PM UTC+8, orgilhp wrote: > Hello > I am using pymssql to connect to MSSQL2008 database. But an error > occurs: > -- > >>> import pymssql > >>> conn = pymssql.connect(host='orgilhpnb\mssql2008', user='erp', > >>> password='123', database='eoffice_clone') > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/pymodules/python2.6/pymssql.py", line 609, in connect > raise InterfaceError, e[0] > pymssql.InterfaceError: Connection to the database failed for an > unknown reason. > -- > > The host name, user, password and database name are all correct. > Why I get an error? Please help me! > > Any suggestion would be highly appreciated! > > Best regards, > Orgil Do you solve this issue? I encounted the same issue would you like to reply the issue? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] libpq-how to add a schema to search path
Hi Laurenz, Thanks for quick reply. I hope it will help. Will check and revert. Best Regards, dpsmails --- On Wed, 10/10/12, Albe Laurenz wrote: From: Albe Laurenz Subject: RE: [GENERAL] libpq-how to add a schema to search path To: "Divakar Singh *EXTERN*" , pgsql-general@postgresql.org Date: Wednesday, October 10, 2012, 4:01 PM Divakar Singh wrote: > While making connection to PGSQL using libpq, is there any option to mention the schema name? > Something similar exists in java, however not sure about libpq. Use the "options" connection parameter, like this: psql "dbname=test user=laurenz port=5432 options='-c search_path=schema1,schema2'" Yours, Laurenz Albe
Re: [GENERAL] Re: Suggestions for the best strategy to emulate returning multiple sets of results
Thanks Bret, I'm concerned about what happens when my functions under high load fills the ramdrive with temporary tables I'm using. The advantage of telling postgres to use ram with an option to fall back to disk is significantly better in terms of uptime. However, I was thinking about some mechanism in the middle tier that watches the space in the ram drive and redirects queries to functions that create temp tables on disk, if ram drive is close to full. That may help me accomplish what I'm trying to Regards Seref On Wed, Oct 10, 2012 at 3:58 PM, Bret Stern < bret_st...@machinemanagement.com> wrote: > create a ramdrive > On Wed, 2012-10-10 at 14:27 +0100, Seref Arikan wrote: > > Hi Merlin, > > Thanks for the response. At the moment, the main function is creating > > two temp tables that drops on commit, and python functions fills > > these. Not too bad, but I'd like to push these temp tables to ram, > > which is a bit tricky due to not having a direct method of doing this > > with postgresql. (a topic that has been discussed in the past in this > > mail group) > > > > The global variable idea is interesting though. I have not encountered > > this before, is it the global dictionary SD/GD mentioned here: > > http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ? > > It may help perform the expensive transformations once and reuse the > > results. > > > > Kind regards > > Seref > > > > On Wed, Oct 10, 2012 at 2:06 PM, Merlin Moncure > > wrote: > > On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan > > wrote: > > > Thanks Merlin, > > > I've tried arrays but plpython does not support returning > > arrays of custom > > > db types (which is what I'd need to do) > > > > > > > > hm -- yeah. can your custom types be broken down into plain > > SQL types > > (that is, composite types?). maybe stash the results in > > global > > variable and return it in two calls, or insert into into a > > tempt > > table that drops on commit? > > > > merlin > > > > >
Re: [GENERAL] Compression
Where do I find more information about PG fork you mentioned? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Compression-tp4304322p5727363.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Suggestions for the best strategy to emulate returning multiple sets of results
Comments inline (sorry, did not cc the group in the other mail) On Wed, Oct 10, 2012 at 2:55 PM, Merlin Moncure wrote: > On Wed, Oct 10, 2012 at 8:27 AM, Seref Arikan > wrote: > > Hi Merlin, > > Thanks for the response. At the moment, the main function is creating two > > temp tables that drops on commit, and python functions fills these. Not > too > > bad, but I'd like to push these temp tables to ram, which is a bit tricky > > due to not having a direct method of doing this with postgresql. (a topic > > that has been discussed in the past in this mail group) > > > > The global variable idea is interesting though. I have not encountered > this > > before, is it the global dictionary SD/GD mentioned here: > > http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ? > > It may help perform the expensive transformations once and reuse the > > results. > > yeah. maybe though you might find that the overhead of temp tables is > already pretty good -- they are mostly ram based in typical usage as > they aren't synced. I find actually the greatest overhead in terms of > using them is creation and dropping -- so for very low latency > transactions I use a unlogged permanent table with value returned by > txid_current() as the leading field in the key. > This is very interesting. The reason I've tried to avoid a shared temp table is that I'd have to have a session id for calls, which led to severe performance issues with the entity attribute value approach I'm using in the temp table. Your approach sounds to have been designed to overcome my problem, but I have no idea what an unlogged table does, and your use of txid_current. Could you explain a bit? Regards Seref > > merlin >
[GENERAL] Re: Suggestions for the best strategy to emulate returning multiple sets of results
On Wed, Oct 10, 2012 at 8:27 AM, Seref Arikan wrote: > Hi Merlin, > Thanks for the response. At the moment, the main function is creating two > temp tables that drops on commit, and python functions fills these. Not too > bad, but I'd like to push these temp tables to ram, which is a bit tricky > due to not having a direct method of doing this with postgresql. (a topic > that has been discussed in the past in this mail group) > > The global variable idea is interesting though. I have not encountered this > before, is it the global dictionary SD/GD mentioned here: > http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ? > It may help perform the expensive transformations once and reuse the > results. yeah. maybe though you might find that the overhead of temp tables is already pretty good -- they are mostly ram based in typical usage as they aren't synced. I find actually the greatest overhead in terms of using them is creation and dropping -- so for very low latency transactions I use a unlogged permanent table with value returned by txid_current() as the leading field in the key. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] moving from MySQL to pgsql
On Wed, Oct 10, 2012 at 3:47 AM, Vineet Deodhar wrote: > Hi ! > At present, I am using MySQL as backend for my work. > Because of the licensing implications, I am considering to shift from MySQL > to pgsql. > Typically, my apps are multi-user, web based or LAN based. > > 1) Read over the internet that --- > Postgres is not threaded, but every connection gets it's own process. The OS > will distribute the processes across the processors. Basically a single > connection will not be any faster with SMP, but multiple connections will > be. > > MySQL is multi-threaded server so it can use many processors. A separate > thread is created for each connection. > source: PostgreSQL is multi-threaded in that it has multiple execution threads. The only difference is that each thread has its own process where in mysql every thread runs in the same process. Each approach has various pros and cons that ultimately don't matter most applications. > In what way it might affect my app performance? Basically, it doesn't -- at least not very much. There are many other things that are going to make a much bigger difference. > 2) I run MySQL from a USB stick. > There is no installation required (on WinXP.). (not tried on Ubuntu) > Is it the same for pgsql? Not sure what you mean there. Mysql has a windows installer, as does postgres. It's possible to bootstrap postgres without an installer if you know what you're doing, but generally silent mode install is the way to go. You've omitted some very important details, like the specific security model of the windows environments you'll install to. > 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data > type or something else) You have a couple of options: Postgresql explicitly-double-quoted "char", which is a byte. Another options is to use smallint + check constraints. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
tigran2-postg...@riatest.com wrote: > Is there any other way to store large data in Postgres that allows streaming and correctly works with > multiple schemas per database? Large Objects and bytea are the only ways. If you want to pg_dump only certain large objects, that won't work as far as I know (maybe using permissions and a non-superuser can help). You absolutely need to pg_dump parts of the database regularly? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Suggestions for the best strategy to emulate returning multiple sets of results
Hi Merlin, Thanks for the response. At the moment, the main function is creating two temp tables that drops on commit, and python functions fills these. Not too bad, but I'd like to push these temp tables to ram, which is a bit tricky due to not having a direct method of doing this with postgresql. (a topic that has been discussed in the past in this mail group) The global variable idea is interesting though. I have not encountered this before, is it the global dictionary SD/GD mentioned here: http://www.postgresql.org/docs/9.0/static/plpython-sharing.html ? It may help perform the expensive transformations once and reuse the results. Kind regards Seref On Wed, Oct 10, 2012 at 2:06 PM, Merlin Moncure wrote: > On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan > wrote: > > Thanks Merlin, > > I've tried arrays but plpython does not support returning arrays of > custom > > db types (which is what I'd need to do) > > > hm -- yeah. can your custom types be broken down into plain SQL types > (that is, composite types?). maybe stash the results in global > variable and return it in two calls, or insert into into a tempt > table that drops on commit? > > merlin >
[GENERAL] Re: Suggestions for the best strategy to emulate returning multiple sets of results
On Mon, Oct 8, 2012 at 4:59 PM, Seref Arikan wrote: > Thanks Merlin, > I've tried arrays but plpython does not support returning arrays of custom > db types (which is what I'd need to do) hm -- yeah. can your custom types be broken down into plain SQL types (that is, composite types?). maybe stash the results in global variable and return it in two calls, or insert into into a tempt table that drops on commit? merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
On 10/10/2012 05:16 AM, tigran2-postg...@riatest.com wrote: I need to store large files (from several MB to 1GB) in Postgres database. The database has multiple schemas. It looks like Postgres has 2 options to store large objects: LOB and BYTEA. However we seem to hit problems with each of these options. I believe the general consensus around here is to not do that, if you can avoid it. File systems are much better equipped to handle files of that magnitude, especially when it comes to retrieving them, scanning through their contents, or really, any access pattern aside from simple storage. You're better off storing the blob on disk somewhere and storing a row that refers to its location. Either key pieces for a naming scheme or the full path. This is especially true if you mean to later access that data with PHP. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Index only scan
Hi all, first of all thanks for the great new "index only scan" feature in 9.2. We have managed to adapt our app (dhis2.org) to take advantage of it and it really speeds up several queries significantly. We are now in the process of designing a new component for analytics and this feature got me thinking we could utilize postgres over other alternatives like column-oriented databases. Basically we will have a wide, denormalized table with 20+ columns with relatively low cardinality. Typically we will have queries which sums a fact column based on where/group by clauses on several dimension columns (standard data warehouse stuff). An example would be "select a, b, c, sum(d) from analytics where a=1 and b=2 group by a,b,c"; Now my initial idea was to simply put a single index on all of those columns, in the hope that "index only scans" would kick in. It seems this is not the case, as strictly one (single or multi-column) index is required to enable index only scans for a query. My question is: Would it be feasible and/or possible to implement index only scans in a way that it could take advantage of several, single-column indexes? For example, a query spanning columns a, b, c could take advantage of 3 single-column indexes put on columns a, b, c. Finally, is there anyone else who are using postgres for this purpose and have some good tips to share in order to achieve good performance, including index strategies, beyond the standard config best practices? best regards, Lars Helge Øverland -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] libpq-how to add a schema to search path
Divakar Singh wrote: > While making connection to PGSQL using libpq, is there any option to mention the schema name? > Something similar exists in java, however not sure about libpq. Use the "options" connection parameter, like this: psql "dbname=test user=laurenz port=5432 options='-c search_path=schema1,schema2'" Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Storing large files in multiple schemas: BLOB or BYTEA
Hi, I need to store large files (from several MB to 1GB) in Postgres database. The database has multiple schemas. It looks like Postgres has 2 options to store large objects: LOB and BYTEA. However we seem to hit problems with each of these options. 1. LOB. This works almost ideal, can store up to 2GB and allows streaming so that we do not hit memory limits in our PHP backend when reading the LOB. However all blobs are stored in pg_catalog and are not part of schema. This leads to a big problem when you try to use pg_dump with options -n and -b to dump just one schema with its blobs. It dumps the schema data correctly however then it include ALL blobs in the database not just the blobs that belong to the particular schema. Is there a way to dump the single schema with its blobs using pg_dump or some other utility? 2. BYTEA. These are correctly stored per schema so pg_dump -n works correctly however I cannot seem to find a way to stream the data. This means that there is no way to access the data from PHP if it is larger than memory limit. Is there any other way to store large data in Postgres that allows streaming and correctly works with multiple schemas per database? Thanks. (Sorry if this double-posts on pgsql-php, I did not know which is the best list for this question).
Re: [GENERAL] moving from MySQL to pgsql
On Wed, Oct 10, 2012 at 2:20 AM, Vineet Deodhar wrote: > On Wed, Oct 10, 2012 at 2:38 PM, Chris Travers wrote: > >> >> >> On Wed, Oct 10, 2012 at 1:47 AM, Vineet Deodhar > > wrote: PostgreSQL has an excellent optimizer and the on-disk layout is >> completely different. This will dwarf any changes due to threads vs >> queries. >> > > >> However be prepared to rethink your indexing strategies. >> >> Best Wishes, >> Chris Travers >> > > > Thanks Chris. > I didn't understand by what do you mean by "be prepared to rethink your > indexing strategies." > > In MySQL, I have created indexes, Unique indexes, complex or multi-field > indexes, etc. > In what way should I re-consider the indexing? > > In InnoDB your tables are basically primary key indexes with the rest of the row data attached. For this reason a sequential scan is *slow* since it cannot traverse the table in physical order. In PostgreSQL tables are indexed paged heaps and there is essentially no difference between a UNIQUE index on not null columns and a primary key. What this means is that in MySQL/InnoDB more indexes are almost always better, because a sequential scan is always very slow. In PostgreSQL, sequential scans are pretty fast but primary key lookups are a little slower. Consequently on PostgreSQL you may want to reduce the number of non-unique indexes at first and add back as necessary.
Re: [GENERAL] moving from MySQL to pgsql
On Wed, Oct 10, 2012 at 2:38 PM, Chris Travers wrote: > > > On Wed, Oct 10, 2012 at 1:47 AM, Vineet Deodhar > wrote: PostgreSQL has an excellent optimizer and > the on-disk layout is > completely different. This will dwarf any changes due to threads vs > queries. > > However be prepared to rethink your indexing strategies. > > Best Wishes, > Chris Travers > Thanks Chris. I didn't understand by what do you mean by "be prepared to rethink your indexing strategies." In MySQL, I have created indexes, Unique indexes, complex or multi-field indexes, etc. In what way should I re-consider the indexing? Thanks, Vineet
Re: [GENERAL] moving from MySQL to pgsql
On 10/10/2012 10:47 AM, Vineet Deodhar wrote: Hi ! At present, I am using MySQL as backend for my work. Because of the licensing implications, I am considering to shift from MySQL to pgsql. Typically, my apps are multi-user, web based or LAN based. 1) Read over the internet that --- Postgres is not threaded, but every connection gets it's own process. The OS will distribute the processes across the processors. Basically a single connection will not be any faster with SMP, but multiple connections will be. MySQL is multi-threaded server so it can use many processors. A separate thread is created for each connection. source: http://dcdbappl1.cern.ch:8080/dcdb/archive/ttraczyk/db_compare/db_compare.html#Comparison+of+Oracle%2C+MySQL+and+Postgres+DBMS In what way it might affect my app performance? Performance will not be affected negatively. MySQL only has one thread per connection, so a single query will never use multiple threads (scary concept to think about). 2) I run MySQL from a USB stick. There is no installation required (on WinXP.). (not tried on Ubuntu) Is it the same for pgsql? To use postgres on a USB stick, see http://www.postgresonline.com/journal/archives/172-Starting-PostgreSQL-in-windows-without-install.html 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data type or something else) You can either use bool or smallint with a constraint. Sim
Re: [GENERAL] moving from MySQL to pgsql
On Wed, Oct 10, 2012 at 1:47 AM, Vineet Deodhar wrote: > Hi ! > At present, I am using MySQL as backend for my work. > Because of the licensing implications, I am considering to shift from > MySQL to pgsql. > Typically, my apps are multi-user, web based or LAN based. > > 1) Read over the internet that --- > Postgres is not threaded, but every connection gets it's own process. The > OS will distribute the processes across the processors. Basically a single > connection will not be any faster with SMP, but multiple connections will > be. > > MySQL is multi-threaded server so it can use many processors. A separate > thread is created for each connection. > source: > http://dcdbappl1.cern.ch:8080/dcdb/archive/ttraczyk/db_compare/db_compare.html#Comparison+of+Oracle%2C+MySQL+and+Postgres+DBMS > > In what way it might affect my app performance? > Not much. MySQL doesn't support intraquery parallelism to my knowledge. You will get extra robustness due to process isolation however. There might be some slight costs due to shared memory management overhead but these are probably insignificant compared to other factors. PostgreSQL has an excellent optimizer and the on-disk layout is completely different. This will dwarf any changes due to threads vs queries. However be prepared to rethink your indexing strategies. > > 2) I run MySQL from a USB stick. > There is no installation required (on WinXP.). (not tried on Ubuntu) > Is it the same for pgsql? > > You would want a user account created because PostgreSQL won't run as an administrator but otherwise, yes. > 3) Can I simulate MySQL's TINYINT data-type (using maybe the custom data > type or something else) > I would use smallint (2 byte int), but if you have to "char" with quotes should give you a one byte field. I don't know about utf8 tules on it though. Best Wishes, Chris Travers
Re: [GENERAL] something better than pgtrgm?
Thanks, but no, we do need the performance And we have admins (not users) enter the names and codes, but we can't make it way complicated to do that. I thought you meant that they see to it that the names end up in the database under the correct encoding (which is a logical thing to do..) Thanks anyway :)! WBL On Tue, Oct 9, 2012 at 5:16 PM, Andrew Sullivan wrote: > On Tue, Oct 09, 2012 at 03:54:35PM +0200, Willy-Bas Loos wrote: > > > > > If so, I > > > can almost imagine a way this could work > > > > > > > Great! How? > > Well, it involves very large tables. But basically, you work out a > "variant" table for any language you like, and then query across it > with subsets of the trigrams you were just working with. It probably > sucks in performance, but at least you're likely to get valid > sequences this way. > > For inspiration on this (and why I have so much depressing news on the > subject of internationalization in a multi-script and multi-lingual > environment), see RFC 3743 and RFC 4290. These are related (among > other things) to how to make "variants" of different DNS labels > somehow hang together. The problem is not directly related to what > you're working on, but it's a similar sort of problem: people have > rough ideas of what they're entering, and they need an exact match. > You have the good fortune of being able to provide them with a hint! > I wish I were in your shoes. > > A > > -- > Andrew Sullivan > a...@crankycanuck.ca > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- "Quality comes from focus and clarity of purpose" -- Mark Shuttleworth