Re: [GENERAL] pg_dump excluding tables content but not table schema
On Mon, 28 Dec 2009 19:39:36 + Raymond O'Donnell wrote: > On 28/12/2009 18:41, Ivan Sergio Borgonovo wrote: > > > I'd like to just dump the table schema without dumping the table > > contend. > pg_dump -s -t My fault. I was not clear enough. I'd like to make a "mostly" full backup, excluding from backup just the data of some tables but backing up the schema of those tables. mmm let me try if pg_dump -Fc -Z9 -s -t *.cache* -d mydb > schema_only.bak pg_dump -Fc -Z9 -T *.cache* -d mydb > nearly_full.bak cat nearly_full.bak schema_only.bak | pg_restore -1 -d mydb It seems it is working... I'll test if everything is there. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_dump excluding tables content but not table schema
I've some tables that are just cache. I'd like to just dump the table schema without dumping the table contend. I think I could do it in 2 steps but I'd like to avoid it. Is there a way? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] flagging first row inserted for each "group" of key
On Thu, 17 Dec 2009 10:38:32 +0100 "A. Kretschmer" wrote: > In response to Ivan Sergio Borgonovo : > > I've a web application. > > > > I'm logging data related to the landing page and subsequent > > selected hits. > > > > create table track_hit ( > > hitid serial not null, > > /* pk? I don't mind if serial wrap around > > pk could be (hitid, tracking_time) */ > > esid varchar(32), -- related to session > > tracking_time timestamp not null default now(), > > -- some other stats > > -- first_hit boolean not null default false, -- ??? > > ); > > > > I'd like to be sure I just count one "first hit" in a session > > (same esid) in an efficient way that means I'd like to mark them > > with a > > select distinct on (esid) esid, tracking_time from track_hit order > by esid, tracking_time; > > returns only one record for each esid, ordered by tracking_time. > Should work with 8.x, maybe sice 7.x (I'm not sure) I think I've tried to resolve a concurrency problem in the wrong place... still... what is the difference between: select min(hitid) as h from track_hit group by esid; and select distinct on (esid) hitid from track_hit order by esid, track_time; I haven't enough test data to see if they perform differently. The second form seems to perform a little bit faster. I'd expect the opposite: the first performing better. I think I'll add an index on track_time for reporting and maybe make primary key (hitid, tracking_time). I don't want to be bothered by hitid wrap around, so I don't want to make it a pk alone, still I may need a pk. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] flagging first row inserted for each "group" of key
I've a web application. I'm logging data related to the landing page and subsequent selected hits. create table track_hit ( hitid serial not null, /* pk? I don't mind if serial wrap around pk could be (hitid, tracking_time) */ esid varchar(32), -- related to session tracking_time timestamp not null default now(), -- some other stats -- first_hit boolean not null default false, -- ??? ); I'd like to be sure I just count one "first hit" in a session (same esid) in an efficient way that means I'd like to mark them with a flag and avoid a group by, min subquery cycle when I'm doing reporting. I can't trust the browser and I just want one first hit for each esid, no matter if they have the same tracking_time. Of course the problem is concurrency, but maybe I missed some pg trick that could help me. I'm on 8.3 and no plan to move to 8.4 shortly, so no windowing functions that will make reporting easier/faster. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [Bacula-users] [GENERAL] Catastrophic changes to PostgreSQL 8.4
On Thu, 3 Dec 2009 12:22:50 +0100 (CET) "Kern Sibbald" wrote: > Yes, that is my experience too. I understand Craig's comments, > but I would much prefer that Bacula just backup and restore and > leave the checking of filename consistencies to other programs. > At least for the moment, that seems to work quite well. Obviously > if users mix character sets, sometime display of filenames in > Bacula will be wierd, but nevertheless Bacula will backup and > restore them so that what was on the system before the backup is > what is restored. I expect a backup software has a predictable, reversible behaviour and warn me if I'm shooting myself in the foot. It should be the responsibility of the admin to restore files in a proper place knowing that locales may be a problem. I think Bacula is taking the right approach. Still I'd surely appreciate as a feature a "tool" that will help me to restore files in a system with a different locale than the original one or warn me if the locale is different or it can't be sure it is the same. That's exactly what Postgresql is doing: at least warning you. Even Postgresql is taking the right approach. An additional "guessed original locale" field and a tool/option to convert/restore with selected locale could be an interesting feature. What is Bacula going to do with xattr on different systems? Postgresql seems to offer a good choice of tools to convert between encodings and deal with bytea. Formally I'd prefer bytea but in real use it may just be an additional pain and other DB may not offer the same tools for encoding/bytea conversions. Is it possible to search for a file in a backup set? What is it going to happen if I'm searching from a system that has a different locale from the one the backup was made on? Can I use regexp? Can accents be ignored during searches? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] duplicating a schema
On Tue, 1 Dec 2009 09:53:12 +0100 Ivan Sergio Borgonovo wrote: > On Tue, 1 Dec 2009 11:39:06 +0900 > Schwaighofer Clemens wrote: > > > On Sun, Nov 29, 2009 at 21:21, Ivan Sergio Borgonovo > > wrote: > > > I need to create a new schema with all the content in an > > > existing one, just with a new name. > > > > The way I've found is: > > > - make a backup > > > - load it in a dev box > > > - rename the schema > > > - make a backup of the new schema > > > - restore the new schema on the original DB. > > > > Is there a more efficient approach? > > > Sadly no. With smaller DBs I do a sed on the dump ... wished > > there would be a restore with not only a target DB but also a > > target schema. > I thought about sed but I think postgresql parse better SQL than me > and sed together. semi-tested solution: pg_dump -d mydb -Fp --schema=XXX > mydb.bak (echo 'begin;'; echo 'alter schema XXX rename to YYY'; echo 'create schema XXX' authorization AAA; pg_restore --schema=XXX < mydb.bak; echo 'commit;') > psql -d mydb And some further notes and a script here http://www.webthatworks.it/d1/node/page/howto_duplicating_schema_postgresql -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] duplicating a schema
On Tue, 1 Dec 2009 11:39:06 +0900 Schwaighofer Clemens wrote: > On Sun, Nov 29, 2009 at 21:21, Ivan Sergio Borgonovo > wrote: > > I need to create a new schema with all the content in an existing > > one, just with a new name. > > The way I've found is: > > - make a backup > > - load it in a dev box > > - rename the schema > > - make a backup of the new schema > > - restore the new schema on the original DB. > > Is there a more efficient approach? > Sadly no. With smaller DBs I do a sed on the dump ... wished there > would be a restore with not only a target DB but also a target > schema. I thought about sed but I think postgresql parse better SQL than me and sed together. Why do you prefer sed over backup/restore on smaller DB? I didn't test this... but I think it could be even better if I wouldn't prefer to have a full backup before such operation: pg_dump -dmydb --schema=XXX -Fp > XXX.bak begin; alter schema XXX rename to YYY; create schema XXX; \i XXX.bak; commit; This could be obtained with a pipe... but in case something goes wrong I'd prefer to have the "backup" of the schema somewhere in spite of needing to recreate it. Renaming a schema seems pretty fast. So I don't think in case the transaction abort it would make any big difference compared to changing the schema name in another DB. Still being able to have a schema as a target would make things cleaner, faster and safer. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] duplicating a schema
I need to create a new schema with all the content in an existing one, just with a new name. The way I've found is: - make a backup - load it in a dev box - rename the schema - make a backup of the new schema - restore the new schema on the original DB. Is there a more efficient approach? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] advocating pgsql was:passing parameters to multiple statements
On Wed, 18 Nov 2009 14:39:05 +0100 Pavel Stehule wrote: > Standard is good. And I prefere standard everywhere, where is [snip] > Any dogmatism is wrong - yes. But minimally me - and probably David > has very bad experience with design ala "all sql code for all > databases". And I have good experience with different strategy - > early decomposition and separation application and database (engine > specific) layer. Nothing less, nothing more. [snip] I'm not competing on the technical position of the issue, yours is very respected. I'd like to put the accent on the "social" part of it and on the fall back on our beloved DB. I think there are far more people knowing just one DB and badly than companies with such strict (insane) policies and... well PostgreSQL is very standard compliant. I'd beat another horse ;) -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] passing parameters to multiple statements
On Wed, 18 Nov 2009 11:38:46 +0100 Pavel Stehule wrote: > 2009/11/18 Ivan Sergio Borgonovo : > > On Tue, 17 Nov 2009 20:16:36 -0800 > > David Fetter wrote: > > > >> On Tue, Nov 17, 2009 at 09:33:05AM -0700, Konstantin Izmailov > >> wrote: > >> > Some companies have policy to stay DB agnostic, i.e. use > >> > standard SQL only. > > > >> That's called shooting yourself in the head. > > > > I'm a small fish. I use just Free software and still I think that > > departing from agnosticity has its cost even if you don't have to > > pay license costs. > > Especially if you did it without knowing it or with no reason. > > Many times departing from agnostic code is caused by: > > - lack of knowledge of standards/more than one DB > > - early optimization > > > > It's just a matter of where you're going to compromise and why, > > but you've to do it consciously. > > > > eg. a lot of code could run on mysql and postgresql as well at no > > cost, but many people just ignore there is something else other > > than mysql. > > That's shooting yourself in the head without even knowing the > > reason. > Sorry, but David has true. I understand, so management is happy, I didn't say he was wrong. As usual it is a matter of knowledge and trade off. How can you say what's better if: - you don't know what is standard - you don't know the performance impact of writing something in a dialect of SQL rather than in a standard way One thing is saying you accept the need of breaking compatibility for some DB another is saying that pursuing writing standard code is reckless since it makes all projects too complex and bloated. Ignorance and dogmatism are strict relatives, but I'd say the former is the root of the later. In fact what I generally observe is: - we just know [this] (ignorance) - this *looks* like it will run faster/be easier to write if we write it this way - we tried it on another DB and it performed really bad/was really complicated to rewrite - everything else other than [this] is bad, why should we care (dogmatism) Depending on the domain of the application the DB may not be such a critical part of the overall, and still many things may easily be written in a way that is independent from the DB. In my experience you may end up writing 90% of code that could easily be written in a standard way and with no appreciable difference in costs (performance/coding). Writing stuff in a way that it will make cheaper porting code may even protect you from the changes in the DB you chose as a target. A policy that mandates the use of portable SQL code for any part of any application that you're going to write in a company is equally insane as a policy that mandates to write all code for all applications in python and make them in such a way that they could be automatically translated in any language whose name start with p ;) But I think such kind of policy is rarer than the programmers that know more than a couple of SQL dialects. I don't think companies with such an high level of dogmatism can survive enough long to get involved in something that is not trivial, while it is far more frequent to see applications that don't have such an high coupling with the DB that still are dependent on it just for lack of knowledge of SQL. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] passing parameters to multiple statements
On Tue, 17 Nov 2009 20:16:36 -0800 David Fetter wrote: > On Tue, Nov 17, 2009 at 09:33:05AM -0700, Konstantin Izmailov > wrote: > > Some companies have policy to stay DB agnostic, i.e. use standard > > SQL only. > That's called shooting yourself in the head. I'm a small fish. I use just Free software and still I think that departing from agnosticity has its cost even if you don't have to pay license costs. Especially if you did it without knowing it or with no reason. Many times departing from agnostic code is caused by: - lack of knowledge of standards/more than one DB - early optimization It's just a matter of where you're going to compromise and why, but you've to do it consciously. eg. a lot of code could run on mysql and postgresql as well at no cost, but many people just ignore there is something else other than mysql. That's shooting yourself in the head without even knowing the reason. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] impersonating a user/ownership problems
I've several schemas each one is owned by a user. Then there are websites sharing the same code base using different schemas and connecting with different users. When I have to refactor I generally have to make the same changes for all the schemas. I log in as a user that have enough right to all schemas and apply the changes. Since I've to make all the changes to all schemas I can't apply changes to each schema with different connections. But this ends up in writing a lot of code for altering ownership of objects and eg. turning an int into a serial becomes particularly painful. create sequence alter table ... owner to alter sequence ... owned by ... alter table ... set default ... Is there a less painful approach? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] knowing which table/schema is going to be chosen
On Thu, 12 Nov 2009 10:38:27 + Richard Huxton wrote: > Ivan Sergio Borgonovo wrote: > > I have a search_path that may not just consist of $user, public. > > eg. > > $user, public, test, import > > > > I'd like to know which table is going to be chosen if I do a > > select * from mytable; > > > Is there a way to ask postgresql the schema of the table that > > will be chosen? > > Hmm - I don't know of a specific function. You could do something > like this though: > SELECT nspname FROM pg_namespace > WHERE oid = ( > SELECT relnamespace FROM pg_class > WHERE oid = 'mytable'::regclass::oid > ); This surely meet my needs, and I'm going to place it in my toolbox still... is there a way that could use information_schema? My need was caused by a compromise with 2 immature API... so I'm not surprised that a solution looks like an hack but I was wondering if in other cases knowing in advance which table postgresql is going to pick up could be a legit interest. BTW I think I've spotted an error in the docs: http://www.postgresql.org/docs/8.3/interactive/ddl-schemas.html http://www.postgresql.org/docs/8.4/interactive/ddl-schemas.html 5.7.2. The Public Schema In the previous sections we created tables without specifying any schema names. By default, such tables (and other objects) are automatically put into a schema named "public". Every new database contains such a schema. Thus, the following are equivalent: ... CREATE TABLE products ( ... ); and: CREATE TABLE public.products ( ... ); I think they are not equivalent if the search_path contains the name of an existing schema. Is there anything equivalent to search_path in the SQL standard? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] knowing which table/schema is going to be chosen
I may have several tables with the same name in different schema. pina.mytable, test.mytable, import.mytable I have a search_path that may not just consist of $user, public. eg. $user, public, test, import I'd like to know which table is going to be chosen if I do a select * from mytable; In this case test.mytable will be chosen. Is there a way to ask postgresql the schema of the table that will be chosen? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] xml import/export tools and performance
I need to import/export through xml files. I was looking for tools/examples/HOWTO for postgresql. Right now I still don't know the details of the requirements. I know I won't need a GUI. I know one of the exporting parties will be a MS SQL 2005 server, so it would be nice if there was an easy way to import xml generated with the FOR XML AUTO, XMLSCHEMA ('forpg'). I'd like to have a tool that can write XSD from queries automatically. I may need to strictly specify one of the xml output format since one of the importing parties pretend to support xml with something that is more like a tagged csv. Currently I'm mostly looking around to see where it would be convenient to move the boundaries of the problem. I'd be curious about what kind of performance impact they may have compared to COPY. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pk vs unique not null differences between 8.3.4 and 8.3.8
On Mon, 02 Nov 2009 09:53:06 -0500 Tom Lane wrote: > Ivan Sergio Borgonovo writes: > > This statement succede in 8.3.8 > > alter table shop_commerce_gift drop constraint > > shop_commerce_gift_pkey; > > but failed in 8.3.4. > > Uh, that doesn't seem to match the name of the index? Yep... that was my surprise. I don't have a time machine to be absolutely sure about what I did but I should have created a pk on both machines with a create table shop_commerce_gift( giftcode varchar primary key ... ); but in the newer pg I actually find the pk constraint... in the older I find a not null + an unique index with a different name from the pk name of the former. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pk vs unique not null differences between 8.3.4 and 8.3.8
I prepared a script to upgrade the schema in a 8.3.8 dev box and now I was trying to replicate the change in a staging 8.3.4 box... and it failed. I don't have a recent backup of my dev box so I can't check if I'm day dreaming. This statement succede in 8.3.8 alter table shop_commerce_gift drop constraint shop_commerce_gift_pkey; but failed in 8.3.4. I checked the table def in the 8.3.4 and it reports: giftcode| character varying(16) | not null ... Indexes: "shop_commerce_gift_giftcode_key" UNIQUE, btree (giftcode) but then when I try to drop the index... pg says that the index is needed for shop_commerce_gift_giftcode_key constraint was it something that was fixed between the releases or I just didn't take note of what I really did in the staging box? btw I was using the same pgsql version from my desktop to \d the tables. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still on joining array/inline values was and is: design, ref integrity and performance
On Wed, 28 Oct 2009 10:12:19 -0500 Peter Hunsberger wrote: > > The first approach requires a distinct/group by that may be > > expensive. > > The second one requires I keep in memory all the emails while the > > first statement run. > Unless you're dealing with 100,000's of these things I think you're > engaging in a process of "premature optimization". Group by can > work efficiently over millions of rows. We may get in the range of half that number occasionally but not feeding emails directly from a HTTP request. Still the number of passwords generated in one run may be in the range of 50K. But well I could calmly wait 2 or 3 seconds. Making some very rough test on a similar box to the one I'll have to use it takes few milliseconds on a not indexed table. > Do the simplest thing possible. Get it working, then see if you > have any new problems you need to solve. Every issue you've > described so far is database design 101 and should present no real > problem. I think you're agonizing over nothing... That's always a good advice. Sometimes you're out just for moral support. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] still on joining array/inline values was and is: design, ref integrity and performance
On Tue, 27 Oct 2009 10:54:06 + Richard Huxton wrote: > > Association between email and password is just meant to build up > > a queue for mailing and there is no uniqueness constraint on > > (password, email) pair. > > > create table pw_email( > > password varchar(16), > > email varchar(64) > > ); > > > create table pw_resource( > > res int references ... > > password varchar(16) > > ); > > But I've to generate password/email couples first before filling > > pw_resource. > > The simplest thing would be to do them the other way around, but > assuming you can't... Trying to understand why I can't do the other way around I made clearer to myself the constraints. - There is no reason to have a password without an associated recipient. - There is no reason to have a password without an associated resource resource. - I don't want the same password for more than one resource. So to satisfy the first 2 requirements... it looks to me the table should be: create table resources( res int primary key /* other characteristics of the resource... */ ); create table pw( res int references resources(ref) on delete cascade, password varchar(16) not null, email varchar(64) not null, /* Should I chose some unique constraint on the couples? which? */ ); To have the 3rd constraint I'd have a table: create table pw_res( password varchar(16) primary key, res int references resources (res) on delete cascade ); This comes handy for 2 reasons: - it helps me to enforce the third constraint - it makes it easier to find which resource is associated with a password that will be a common search But this introduces one more problem if I decide to delete a password. This could happen for "shared" and non shared passwords. I don't think it is something that may happen frequently... but it may happen... and once you start to introduce mess in your data it is hard to come back. So a fk on pw.password may help... but... once I've that fk I can't insert res,pw,email without pw baing in pw_res. If I do the other way around inserting password(s) in pw_res I may end up with passwords with no destination associated. Surely I could wrap everything in a transaction so that if I can't succede to insert email/pw records everything will be rolled back. But I can't see how to make it cleaner. I can get emails and associate them with a resource and a password in one sql statement if I can defer the constraint check on password. The next statement would be used to fill pw_res with distinct values from pw. If I do the other way around in case I need different passwords for different emails I'll have to: - count the emails and fill pw_res with as many password as needed. I think I'll need generate_series. - find a way to associate these passwords with emails I don't know how to do the later. Still supposing I knew how to write the sql statement that will take values('email1', 'email2'...) as suggested by Merlin Moncure in a different thread I can see some pros and cons of these approaches. The first approach requires a distinct/group by that may be expensive. The second one requires I keep in memory all the emails while the first statement run. I'd think that if memory consumption start to be an issue, that means that the number of emails start to be too large to be loaded from an HTTP request, so I could have it on file and load it in a temp table and so on... Still before I make any test I need to find how to associate emails with passwords considering that I'd like to pass email "inline" the SQL statement and I'll have the passwords in pw_res. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] design, ref integrity and performance
On Tue, 27 Oct 2009 09:17:59 + Richard Huxton wrote: > Ivan Sergio Borgonovo wrote: > > Hi, > > I've to generate unique password and associate them with emails. > > Association with emails is just to mail the password, email + > > password aren't "the password", just the password is. > > So a bunch of emails may be associated with the same password. > > So there are 2 kind of passwords: > > - shared, multiple use > > - individual (they could be multiple or single use) > So are you saying I login using *just* a password, not using my > email as a user-name? yes > How do you know who is logging in and what does it mean for the > password to be shared? I don't care who "logged in". Shared just mean several people could use a resource providing the same password. They are a sort of "promotional codes". Some of them can be shared, I don't mind if people give them to others provided they let our service be known by others. Then there are unique passwords on which we may or may not check if they are used by multiple users of the site... but that's another problem. Association between email and password is just meant to build up a queue for mailing and there is no uniqueness constraint on (password, email) pair. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] design, ref integrity and performance
Hi, I've to generate unique password and associate them with emails. Association with emails is just to mail the password, email + password aren't "the password", just the password is. So a bunch of emails may be associated with the same password. So there are 2 kind of passwords: - shared, multiple use - individual (they could be multiple or single use) I've as input a list of emails and according to the kind of passwords I've to generate I fill a table that is create table pw_email( password varchar(16), email varchar(64) ); So data inside may look like /* bunch of shared passwords */ abcdefg, 1...@example.com abcdefg, 2...@example.com abcdefg, 3...@example.com abcdefg, 4...@example.com /* bunch of individual passwords */ abcdefg1, 1...@example.com abcdefg2, 2...@example.com abcdefg3, 5...@example.com abcdefg4, 6...@example.com Now each password is linked to the access of one or more resource. I could just add the resource id to the pw_email table but: - it is not normalized - considering many password will be duplicated, I'll have to check on a larger table to see which password give access to what - I'd like to associate different passwords to different resource so that one password doesn't grant access to more than one resource. create table pw_resource( res int references ... password varchar(16) ); Of course if one password is in pw_email but it is not in pw_resource I have a problem. But I've to generate password/email couples first before filling pw_resource. I could make the constraint deferrable, add a on delete cascade to pw_email.password but: - I'm not sure it is a good design - I'm still not sure it can work as expected Before testing if it can work I'd like to hear some comment on the design. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] joining an array with a table or...?
On Mon, 26 Oct 2009 14:56:26 -0400 Merlin Moncure wrote: > On Mon, Oct 26, 2009 at 11:05 AM, Ivan Sergio Borgonovo > wrote: > > To make it more concrete I came up with: > > > > select coalesce(u.mail,j.mail) from ( > > select (array['m...@example1.com','m...@example2.com'])[i] as mail > > from generate_series(1,2) i) j > > left join users u on upper(u.mail)=upper(j.mail); > > how about this: > select coalesce(u.mail,j.mail) from > ( > values ('m...@example1.com'), ('m...@example2.com') > ) j(mail) > left join users u on upper(u.mail)=upper(j.mail); Yours is between 4 to 10 times faster excluding time on client side to escape the strings. I'll play a bit with client code to see if the advantage is kept. It looks nicer too. Currently I'm testing with very few match between input array and user table. Will this have different impact on the 2 methods? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] joining an array with a table or...?
On Mon, 26 Oct 2009 14:15:26 +0100 Ivan Sergio Borgonovo wrote: > I've a list of emails and a list of users (with emails). > > If the list of emails was already inside a table > > create table mails ( > mail varchar(64) > ); > > create table users ( > name varchar(127), > mail varchar(64) > ); > > I'd do: > select coalesce(u.mail, m.mail) from mails left join users on > u.mail=m.mail; > > Now mails are into a php array and they may be in the range of 5000 > but generally less. > > The final query will be something like > insert into mailqueue (qid, uid, mail, ...) select ... > > and since some fields are pseudo random sequences computed from a > serial, it would be "clean" to do it just in one query. > Any clean technique? To make it more concrete I came up with: select coalesce(u.mail,j.mail) from ( select (array['m...@example1.com','m...@example2.com'])[i] as mail from generate_series(1,2) i) j left join users u on upper(u.mail)=upper(j.mail); but I sincerely dislike it. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] joining an array with a table or...?
I've a list of emails and a list of users (with emails). If the list of emails was already inside a table create table mails ( mail varchar(64) ); create table users ( name varchar(127), mail varchar(64) ); I'd do: select coalesce(u.mail, m.mail) from mails left join users on u.mail=m.mail; Now mails are into a php array and they may be in the range of 5000 but generally less. The final query will be something like insert into mailqueue (qid, uid, mail, ...) select ... and since some fields are pseudo random sequences computed from a serial, it would be "clean" to do it just in one query. Any clean technique? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] setweight detailed doc was: Ranking search results using multiple fields in PostgreSQL fulltext search
On Mon, 12 Oct 2009 20:02:16 +0530 Gaini Rajeshwar wrote: > > inputquery := setweight(cfg, inputtitle, 'A', '&'); > > inputquery := inputquery && setweight(cfg, inputsummary, 'B', > > '&'); > I didn't understand why did u use '&' operator in setweight > function. is that going to help in any way? I don't understand it either... I just copied and pasted from a working function I wrote long ago. select setweight('pg_catalog.english', 'java', 'A', '&'); I can't remember what was the meaning of that '&' and I can't find the docs. Could someone point me to a more detailed doc that explain in more details setweight? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Ranking search results using multiple fields in PostgreSQL fulltext search
On Mon, 12 Oct 2009 19:26:55 +0530 Gaini Rajeshwar wrote: > Ivan, > If i create a tsvector as you mentioned with concatenation > operator, my search query will search in any of these fields which > are concatenated in my tsvector. > For example, if i create tsvector like this, > UPDATE document_table SET search_col = > setweight(to_tsvector(coalesce(title,'')), 'A') || > setweight(to_tsvector(coalesce(summary,'')), 'B')); > > and do a query like this > select title, ts_rank(search_col, to_tsquery('this is my text > search') AS rank > FROM search_col @@ to_tsvector('this & is & my & text & search') > ORDER BY rank DESC > the above query will search in title and summary and will give me > the results. But i dont want in that way.When a user wants to > search in title, it should just search in title but the results > should be ranked based on * title* and *summary* field. Search *just* in title specifying the weight in the input query and rank on title and summary. /* -- somewhere else in your code... search_col := setweight(cfg, title, 'A', '&'); search_col := search_col && setweight(cfg, summary, 'B', '&'); */ select rank(search_col, to_tsquery(inputtitle)) as rank -- rank on both if search_col just contains title and summary ... where search_col @@ setweight(cfg, inputtitle, 'A', '&') -- return just matching title order by ts_rank(...) is it what you need? This is just one of the possible way to rank something... otherwise: really understand how rank is computed, keep columns/ts_vector separated, compute rank for each column and pass the result to some magic function that will compute a "cumulative" ranking... Or you could write your own ts_rank... but I tend to trust Oleg and common practice with pg rather than inventing my own ranking function. Right now ts_rank* are black boxes for me. I envisioned I may enjoy some finer tuning on ranking... but currently they really do a good job. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Ranking search results using multiple fields in PostgreSQL fulltext search
On Mon, 12 Oct 2009 19:11:01 +0530 Gaini Rajeshwar wrote: > I think you misunderstood my question. let me give clear idea > about what i need. > > I am using PostgreSQL fulltext search (tsearch2) feature to > implement searching on database. From readings i came to know that > we can give weights to different fields in database something like > this: > > *setweight(to_tsvector(title),'A')* > > Where 'A' is weight given to field title. i can give weights to > other fields in the same way. Where the weights 'A', 'B', 'C', 'D' > are in will be in the following order *A > B > C > D* according to > defalut fulltext search configuration. > > We can rank the search results using ts_rank function something > like this, > > *ts_rank(tsv_title,ts_query('this is my search text'))* > ** > But, i want to rank these reults not only based on just title, but > also using other fields like summary etc. > Is there a way around to do this? if you concatenate your fields with different weight in the *same* ts_vector, ranking will take into account your weight... Someone more knowledgeable than me chose how to use weight to give a reasonable ranking. Of course if you've field a, b and c and you want to search in a and b only, you'll have to concatenate just a and b. If you need different assortment in fields groups... you'll have to add some extra redundancy if you plan to store precomputed ts_vectors for each record. If you need to search "separately" in different fields (eg. title ~ 'gino' AND summary ~ 'pino') you just need to weight the input query as well inputquery := setweight(cfg, inputtitle, 'A', '&'); inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&'); ... -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Ranking search results using multiple fields in PostgreSQL fulltext search
On Mon, 12 Oct 2009 18:46:02 +0530 Gaini Rajeshwar wrote: > Hi, > is there a way to rank the search results based on multiple fields > in postgreSQL? > For example, > i have *title*, *abstract*, *summary*, *body* as fields/columns in > my database. When user searches on *title*, i want to rank the > results based on *title* field as well as *summary* field, where > importance(summary) > importance(title). But the results should be > exactly matching the terms in "title" rather than "title" OR > "summary" http://www.postgresql.org/docs/current/interactive/textsearch-controls.html Basically, as you can read in the docs: - you create a ts_vector concatenating and giving a weight the various fields. - then you compare your ts_vector with plainto_tsquery(config, yourinput) @@ yourpreviouslycomputedts_vector and order by ts_rank(yourpreviouslycomputedts_vector, yourinput) (or ts_rank_cd) -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Temp table or normal table for performance?
On 20 Aug 2009 13:43:10 GMT Jasen Betts wrote: > On 2009-08-19, Stephen Cook wrote: > > > Let's say I have a function that needs to collect some data from > > various tables and process and sort them to be returned to the > > user. > > plpgsql functions don't play well with temp tables IME. Why? you mean that since you generally use temp table for computation and looping several times over the table... a more expressive language would be suited? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "ownership" of sequences, pseudo random unique id
On Thu, 20 Aug 2009 14:31:02 -0400 Alvaro Herrera wrote: > Ivan Sergio Borgonovo wrote: > > I've > > > > create table pr( > > code varchar(16) primary key, > > ... > > ); > > create sequence pr_code_seq owned by pr.code; -- uh! > > actually stuff like: > > alter table pr drop column code; > > or just > > drop table pr > > > > seems to work as expected (they drop the sequence too). > > Should I be concerned of anything since it looks like a hack? > You need to ensure you have a retry loop in your insertion code, > because if the generated code conflicts with a manually inserted > code, it will cause an error. Other than that, seems like it > should work ... I was mainly concerned about assigning ownership of a sequence to a column that is not an int. This looks like an hack ;) but it looks to work as expected: dropping the column or the table drop the sequence. Assigning ownership just avoid me to remember that if I drop the column I don't need the sequence. So owned by just mean "drop if" regardless of type or anything else. I could even define the table as create table pr( code varchar(16) primary key, ... ); create sequence pr_code_seq owned by pr.code; alter table pr alter column code set default to_hex(feistel_encrypt(nextval('pr_code_seq'))); That will make more explicit the relationship between the sequence and the column. I think I can avoid conflict between auto and manually generated codes imposing a different format on input in client code. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] bayesian classification over tsvector
I was wondering if there is some apps, technique, tool to get inspiration or just use to classify rows according to a weighted tsvector. Each row contain a tsvector obtained concatenating 4 rows with different weight. I've a corpus for each group. I'd like to classify the rows that haven't been assigned to a group already. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Generating random unique alphanumeric IDs
On Thu, 20 Aug 2009 13:34:51 +0100 Thom Brown wrote: Correcting myself. a) it is a bad idea to pad an hex with an hex... so I should still find a quick way to change representation to [g-z] for the padding characters... or just pad with a constant string. select lpad( to_hex(feistel_encrypt(10)),8 , 'mjkitlh') ); b) this if from int (signed) to int (signed). begin; create or replace function feistel_encrypt(value int) returns int as $$ declare l1 int; l2 int; r1 int; r2 int; i int:=0; begin l1:= (value >> 16) & 65535; r1:= value & 65535; while i<3 loop l2:=r1; r2:=l1#1366.0 *r1+150889)%714025)/714025.0)*32767)::int; l1:=l2; r1:=r2; i:=i+1; end loop; return ((l1 << 16) | r1); end; $$ language plpgsql strict immutable; create or replace function feistel_decrypt(value int) returns int as $$ declare l1 int; l2 int; r1 int; r2 int; i int:=0; begin l2:= (value >> 16) & 65535; r2:= value & 65535; while i<3 loop r1=l2; l1:=r2#1366.0*l2+150889)%714025)/714025.0)*32767)::int; l2:=l1; r2:=r1; i:=i+1; end loop; return ((l2 << 16) | r2); end; $$ language plpgsql strict immutable; commit; select * from feistel_decrypt(feistel_encrypt((2^31-1)::int)) union select * from feistel_decrypt(feistel_encrypt((-2^31)::int)) union select * from feistel_decrypt(feistel_encrypt((0)::int)) union select * from feistel_decrypt(feistel_encrypt((-1)::int)) ; > This appears a lot more tricky than I had originally anticipated! > I may be misunderstanding your example, but by alphanumeric, I > mean beyond hex (i.e. a-z and possibly uppcase too). me too... but to_hex was there and a quick trick to shorten the string and get rid of a sign. > I've looked into LFSR, but I'm afraid it goes over my head. But There is too much dust on my copy of "Concrete Mathematics" still by popular culture (read wikipedia) it is said that LFSR are not cryptographically safe, while making 4 loops and choosing a suitable F, Feistel cypher is. Then it is just a problem of "shrinking the string" or representing it in another base... and that may result in some "waste". 5 bits are 32 char... you actually have more chars available even just considering a subset of ASCII. Picking 5 bits from LFSR algo isn't that different than converting to hex feistel cipher as I see it. The main advantage of hex over ASCII is that ints map very well to hex (no waste) and that to_hex has good chance to perform better than any plpgsql function. Since I'm generating "gift codes" It wouldn't look nice if I present the user with A as a gift code... And that's going to happen as soon as I'll have generated 232798491 gift codes. (/me wondering which is the smaller number with a corresponding one digit hex(fiestel()) transform.)[1]. So just to make gift codes look nicer I thought about padding them with some furter random noise... but the way initially described is not going to work. Variants could be to concat with something [^a-f0-9] (eg '-') and then padding with hex random noise A -> -A -> (random noise)-A I don't know if it is worth since it is another round of lpad. Even if I'm currently not overly concerned by performances I'm working with plpgsql and while I think that writing something to change base representation to an int can be done... it will be slow and ugly. If I was working with pgperl (?) I'd just google for some perl receipt. Given the premises I'll just embellish the hex with some padding. But if you really need to use letters and be compact and such... I think you're just looking for changing the base of your wathever-pseudo-random algorithm. That's a common problem you may just have to adapt to plpgsql. [1] select s.i, feistel_decrypt(s.i) from generate_series(0,16) as s(i) order by feistel_decrypt(s.i) did it in a hurry... didn't check -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] "ownership" of sequences, pseudo random unique id
I've create table pr( code varchar(16) primary key, ... ); create sequence pr_code_seq owned by pr.code; -- uh! pr.code will *mostly* be obtained as to_hex(feistel_encrypt(nextval('pr_code'))) and sometimes 'manually' inserting unique codes. actually stuff like: alter table pr drop column code; or just drop table pr seems to work as expected (they drop the sequence too). Should I be concerned of anything since it looks like a hack? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Generating random unique alphanumeric IDs
On Mon, 17 Aug 2009 12:37:33 +0200 "Daniel Verite" wrote: > http://archives.postgresql.org/pgsql-general/2009-07/msg00194.php As an exercise I wrote the decrypt version create or replace function feistel_encrypt(value int) returns int as $$ declare l1 int; l2 int; r1 int; r2 int; i int:=0; begin l1:= (value >> 16) & 65535; r1:= value & 65535; while i<3 loop l2:=r1; r2:=l1 # 1366.0 * r1+150889)%714025)/714025.0)*32767)::int; l1:=l2; r1:=r2; i:=i+1; end loop; return ((l1::bigint<<16) + r1); end; $$ language plpgsql strict immutable; create or replace function feistel_decrypt(value int) returns int as $$ declare l1 int; l2 int; r1 int; r2 int; i int:=0; begin l2:= (value >> 16) & 65535; r2:= value & 65535; while i<3 loop r1=l2; l1:=r2#1366.0*l2+150889)%714025)/714025.0)*32767)::int; l2:=l1; r2:=r1; i:=i+1; end loop; return ((l2::bigint<<16) + r2); end; $$ language plpgsql strict immutable; so that 10 = feistel_decrypt(feistel_encrypt(10)) Since I'm then converting to_hex to shorten the string I was thinking to add some more bits of randomness since eg. to_hex(10) = 'a' In the line of select lpad( to_hex(feistel_encrypt(10)),7 , to_hex((rand()*2^31)::int) ); I was wondering if there is any better way to get alphanumeric random string quickly. Since uniqueness is assured by passing a sequence to fesitel_encrypt, I just need turning into to alphanumeric quickly. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] design, plpgsql and sql injection in dynamically generated sql
On Tue, 18 Aug 2009 12:38:49 +0200 Pavel Stehule wrote: > some unsafe function: I suspected something similar. I think many would appreciate if you put these examples here http://www.okbob.blogspot.com/2008/06/execute-using-feature-in-postgresql-84.html and substitute the int example there with the text one. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] design, plpgsql and sql injection in dynamically generated sql
On Mon, 17 Aug 2009 12:48:21 +0200 Pavel Stehule wrote: > Hello > > I am not sure, if it's possible for you. PostgreSQL 8.4 has EXECUTE > USING clause, it is 100% safe. Sorry I don't get it. How can I use USING safely when the substitution involves a table name? The examples I've seen just involve column values. Where is the corresponding fine manual page? Still I don't get how USING could make safer plpgsql functions... well... I'm going to check some prejudices I have on pg functions firts... I thought that if you passed eg. text to create or replace function typetest(a int) returns text as $$ begin raise notice 'is this an int? %', a; -- don't do anything else with a and calling select * from typetest('tonno'); was going to raise an error anyway. So somehow I find the example here http://okbob.blogspot.com/2008/06/execute-using-feature-in-postgresql-84.html not really helpful in understanding what's going on. Maybe an example with text comparing a version using quote_literal and one using USING could be clearer... or am I completely missing the point? far from an optimal solution I've built a "client side" array of permitted table, key to dynamically build the query on the client side. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] design, plpgsql and sql injection in dynamically generated sql
I've several list of items that have to be rendered on a web apps in the same way. The structure is: create table items ( itemid int primary key, /* some fields */ ); create table headert1 ( ht1 int primary key, /* several other fields that varies in nature */ ); create table itemlistt1 ( ht1 int references headert1 ht1, itemid references items (itemid) ); The query always match this pattern: select i.fieldA, i.fieldB, ..., from itemlistt1 il join items i on i.itemid=il.itemid where il.ht1=[somevalue]; the nature of the lists and their usage pattern is very different. So unless someone come up with a better design I still would like to keep the item lists in different tables. I'd like to build up a function that takes the name of the table and the key to dynamically build up the query... but I don't know what should I use to sanitize them. create or replace function getitemlist(listtable text, listkey text, keyvalue int, , out ) rerurns setof records as $$ declare statement text; begin statement:='select i.fieldA, i.fieldB, ..., from ' || escapefunc1(listtable) || ' il join items i on i.itemid=il.itemid ' || ' where il.' || escapefunc2(listtable) || '=' || keyvalue; return query execute statement; // can I? is it quote_ident the right candidate for escapefuncN? But this is still at risk of misuse... (eg. passing ('items', 'fieldA', 1) may return something that was not meant to be seen. One way would be to build up a table of permitted (table, key) and then just pass the table and the key value. What should be the equivalent of quote_ident in PHP? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Generating random unique alphanumeric IDs
On Sun, 16 Aug 2009 12:48:39 +0100 Sam Mason wrote: > On Sun, Aug 16, 2009 at 12:07:27PM +0100, Thom Brown wrote: > > Does anyone know a way to generate a random and unique lowercase > > alphanumeric ID > > If you want it to be unique then it's not going to be random. The > easiest way to keep it from producing duplicates is to have some > monotonically increasing component. If you're OK with code/people > retrying the occasional duplicate then you're going to be relying > on statistical guarantees and you should look at "birthday > attacks" to see how often this is going to happen. > > > Notice that I don't mean hexadecimal values either. This would > > preferrably not resort to trying to generate the ID, then > > checking for a clash, and if there is one, do it again, although > > that could do as I can't think of how the ideal solution of a ID > > hashing algorithm would be possible. Sometimes ago Daniel Verite posted an implementation of a fiestel cipher in plpgsql. I'm happily using it to generate pseudo-random hex strings. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] acl_admin by Afilias CA (bug/feature report)
Hi, I can't find any more the place from where I downloaded acl_admin. There is a very small problem with chown_all Since a sequence may be owned by a table... and the function may try to change the ownership of the seq first the function may abort earlier. A quick hack that may works in most situations would be to order the select looping through relations since generally sequences have "longer" names than related tables. I haven't seen any change in that tool and it seems it dates back to 2004. Are there any more modern/complete tools for mass change of ownership/grant around? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] parameters in functions and overlap with names of columns
On Tue, 4 Aug 2009 16:01:58 +0200 Pavel Stehule wrote: > 2009/8/4 Ivan Sergio Borgonovo : > > I've > > > > create or replace function(... > > > > declare > > col1 varchar(32); > > ... > > > > create table pippo( > > col1 varchar(32), > > ... > > > > Unfortunately I can't schema specify the column to avoid name > > overlap. > > > > Is there another way other than just simply rename the variable? > > yes - the most common is an using of prefix '_' for local plpgsql > variables. Other possibility is using qualified names. Just to be sure... by qualified names you mean schema qualified name or table qualified names in case of columns... right... For a second I had the hope there was another way other than having a col1, a _col1 and a __col1 too ;) -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] parameters in functions and overlap with names of columns
I've create or replace function(... declare col1 varchar(32); ... create table pippo( col1 varchar(32), ... Unfortunately I can't schema specify the column to avoid name overlap. Is there another way other than just simply rename the variable? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] MSSQL to PostgreSQL
On Fri, 31 Jul 2009 13:47:39 +0100 Thom Brown wrote: > Hi all, > We're migrating the contents of an old MSSQL server to PostgreSQL > 8.3.7, so a full conversion is required. Does anyone know of any > guides which highlight common gotchas and other userful > information? http://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_PostgreSQL_Migration_by_Ian_Harding http://edoceo.com/creo/ms2pg These were the places I read when I had to migrate some MS SQL DB to Postgresql. I admit I mostly did it by exporting csv from MS SQL[1], refactoring the schema and rewriting from scratch functions... but those were the places where I read the info I needed. [1] suddenly MS SQL became unable to export proper csv so I remember I wrote an rudimentary odbc2csv tool as well. Since it was inspired by an even more rudimentary work of a colleague I'll ask if he can release it under GPL if you'll find yourself in the need of such a beast. You may also find useful FreeTDS http://www.freetds.org/ http://www.webthatworks.it/d1/page/odbc,_freetds_and_microsoft_sql_(and_php) and dblink http://www.postgresonline.com/journal/index.php?/archives/44-Using-DbLink-to-access-other-PostgreSQL-Databases-and-Servers.html http://www.postgresql.org/docs/8.3/static/dblink.html -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Constraint between 2 tables and taking a coherent snapshot of both
Suppose I've create table m1 ( a int primary key, cm1 int, cm2 int, ... ); create r1 ( a int references m1 (a), cr1 int, cr2 int, ... ); and cm1 possible values depends on some function of cr1 for a given a. I actually have a plpgsql function that returns the possible choices for cm1 for each set of cr1. a) r1 get filled with some values. b) The user is presented with the possible choices of cm1. c) I've to take a snapshot of m1 and r1. Since the user may change cr1 while I already started c) cm1 may not be compatible with the new set of cr1. The user shouldn't (if I coded it right) be able to change cm1 in a way that is not compatible with cr1. create or replace function t2c(_a int) returns void as $$ insert into m1c select * from m1 where a=_a; insert into r1c select * from m1 where a=_a; t2c should see a snapshot of m1 *and* r1 at a given time. Of course I'd like to use a system that is as much rollback/lock free. One way would be to put the function in a serializable transaction... but that has its drawback (rollback and retry) I could even create a create table m1pr1 ( a int, cm1 int, cm2 int, cr1 int, cr2 int ); insert into m1pr1 select m1.a, m1.cm1, m1.cm2, r1.cr1, r1.cr2 from m1 join r1 on m1.a=r1.a; but I'm not sure what's going to happen and this solution has its own drawback too (denormalized data). Actually a serializable transaction doesn't even seem a too bad solution... but I just would like to understand better how to manage this situation so that I could make it as simple as possible AND lower as much as possible the chances that the transaction will have to be rolled back. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] checking for temp tables information_schema vs. EXCEPTION
I didn't find a definitive answer on how to check for the existence of a temporary table. I did a not scientific test to see if I could see a temp table from another connection in the information_schema... and I can't. The schema system is more direct (so cleaner) but it seems to rely on some "behind the scene trick" I don't fully understand. I could do a list of insane things like: begin; create or replace function tt_test() returns void as $$ declare sch varchar(128); begin create temp table pippo (i int); select into sch table_schema from information_schema.tables where table_name='pippo' and table_type='LOCAL TEMPORARY'; execute 'create table ' || sch || '.pippo (i int);'; -- FAIL execute 'create schema ' || sch || ';'; -- NOT TESTED create table pippo (i int); -- SUCCEDE create temp table zzz as select * from information_schema.tables where table_name='pippo'; return; end; $$ language plpgsql; select * from tt_test(); commit; select * from zzz limit 10; It looks like an invisible search path is added. How temp schema name are obtained? Is there any place in the manual that say that pg_temp_.* is a "reserved schema pattern"? I didn't test but the EXCEPTION method may miss the difference between the temp table and the permanent table. And schema qualifying the temp table requires some further extra step. So EXCEPTION method doesn't look safe. Does EXCEPTION have some other hidden cost? Just for curiosity. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Documentation Improvement suggestions
On Mon, 20 Jul 2009 09:34:00 -0400 Robert James wrote: > Two small suggestions that might make it easier for newcomers to > take advantage of the wonderful database: > 1. Googling Postgres docs produces links for all different > versions. This is because incoming links are to different > versions. Besides being confusing, it pushes the pages lower in > Google, and makes it harder to find them. > Could the website offer a link to the 'current' version, whichever > it is. Eg instead of just : > http://www.postgresql.org/docs/8.1/static/creating-cluster.html > Have: > http://www.postgresql.org/docs/current/static/creating-cluster.html > which would keep all incoming links pointed to the current page. Did you actually tried the link you proposed? There is a page behind it... ant it actually links to 8.4 docs. People still need old docs so you can't just make docs for previous versions unavailable... and once they are available, they will be indexed. Unfortunately current version doesn't mean most used. Actually what I'd enjoy is a link to the "interactive" version and maybe aggregating all the comments of previous versions in the new version (indicating from which versions they come from). -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] temp tables and replication/connection sharing/pooling
I was wondering if I was going to use any of the replication available (I'm thinking especially to pgpool but I'm also interested in any other tool as slony, pgcluster...), should I take into account any side effect on temp tables? I'm currently using php and pg_connect for a web application. I'm not planning to use pg_pconnect. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL and Poker
On Wed, 8 Jul 2009 13:22:14 -0600 Scott Marlowe wrote: > On Wed, Jul 8, 2009 at 12:27 PM, Ivan Sergio > Borgonovo wrote: > > On Wed, 8 Jul 2009 19:39:16 +0200 > > "Massa, Harald Armin" wrote: > > > >> a quite interesting read. > >> > >> http://www.codingthewheel.com/archives/stranger-than-fiction-story-online-poker-tracker-postgresql > > > > There are a couple of comments comment that maybe someone could > > correct: > > > > "The popularity of PostgreSQL as DBMS for handhistories is by no > > means just a matter of some alleged technological superiority > > over MySQL. Let's not forget that Pokertracker, Holdem Manager > > etc is proprietary software, so they really don't have any other > > choice but to bundle with postgreSQL. If they were to ship their > > products with MySQL, they would either have to open-source their > > products according to the GPL, or pay hefty commercial license > > fees." > > > > or > > > > "Bogdan's comment is right on the money. There are licensing > > issues with MySQL. MySQL commercial licenses are contracts with > > Sun. Not cheap. It had to be PostgreSQL." > > > > I understand the license differences (and for my taste I prefer > > GPL over BSD) but the above affirmations seems to imply pg > > couldn't stand up just on its technical merits. > > > > I don't think this is the case. > Exactly, it could have been interbase / firebird, sqllite, > berkelydb, and a couple other choices that are free. MySQL's > licensing just took them out of the running right at the start. You can actually build up closed source software with MySQL as a server, it depends on how you do it. Aren't there any DB with LGPL library license? Still the above statement sounds too much as: pg wasn't chosen for it's technical merits but for the license. I don't think their only option was pg for licensing reasons. Or am I misunderstanding what you wrote? or... am I plainly wrong? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL and Poker
On Wed, 8 Jul 2009 19:39:16 +0200 "Massa, Harald Armin" wrote: > a quite interesting read. > > http://www.codingthewheel.com/archives/stranger-than-fiction-story-online-poker-tracker-postgresql There are a couple of comments comment that maybe someone could correct: "The popularity of PostgreSQL as DBMS for handhistories is by no means just a matter of some alleged technological superiority over MySQL. Let's not forget that Pokertracker, Holdem Manager etc is proprietary software, so they really don't have any other choice but to bundle with postgreSQL. If they were to ship their products with MySQL, they would either have to open-source their products according to the GPL, or pay hefty commercial license fees." or "Bogdan's comment is right on the money. There are licensing issues with MySQL. MySQL commercial licenses are contracts with Sun. Not cheap. It had to be PostgreSQL." I understand the license differences (and for my taste I prefer GPL over BSD) but the above affirmations seems to imply pg couldn't stand up just on its technical merits. I don't think this is the case. Someone more knowledgeable about licenses and with a better English than mine should correct the comments. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Feistel cipher, shorter string and hex to int
On Tue, 07 Jul 2009 12:07:48 +0200 "Daniel Verite" wrote: > Ivan Sergio Borgonovo wrote: > > > r2:=l1 # 1366.0*r1+150889)%714025)/714025.0)*32767)::int; > > -- but what about this? where does it come from? > > This function: > (1366.0*r1+150889)%714025 > implements a known method to get random numbers. I think it comes > from "Numerical recipes" by William Press. > Note that the algorithm is not tied to that function, it could be > replaced by something else (especially one that involves a private > key), but it has to be carefully chosen or the end result won't > look so random. I don't get the 1366.0 and the 714025.0. Writing 1366.0 isn't going to use float arithmetic? Is it there just to avoid an overflow? I'm going to see if using bigint is going to make any difference in speed. Finally... if I were (and I'm not) interested in using 30 bit, should I turn that *32767 into a *16383? For shift and bit mask it looks more obvious. Do you remember the name of this particular F? Since I don't see anything other than to_hex that could "shorten" an int to a string easily and quickly... it seems that returning a signed integer is OK. Everything else seems to need more processing at no real added value. Turning the int into base 32 [0-9A-N] with plpgsql looks expensive just to shorten the string to 4 char. Thanks. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] combine multiple row values in to one row
On Tue, 7 Jul 2009 01:59:35 +0430 Lee Harr wrote: > > Hi; > > I'm looking for a way to do this: > # select idn, magic() as codes FROM tbl; > idn | codes > -+-- >1 | A >2 | B, C >3 | A, C, E > (3 rows) > > > Right now, I use plpgsql functions, but each time I do it > I have to rewrite the function to customize it. > > Is there a generic way to do this? An aggregate maybe? array_accum http://www.postgresql.org/docs/8.2/static/xaggr.html ? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Feistel cipher, shorter string and hex to int
On Sat, 02 May 2009 11:26:28 +0200 "Daniel Verite" wrote: > Note that it returns a bigint because we don't have unsigned > integers in PG. If you're OK with getting negative values, the > return type can be changed to int. > Otherwise if you need a positive result that fits in 32 bits, it's > possible to tweak the code to use 15 bits blocks instead of 16, > but then the input will have to be less than 2^30. I need shorter values (because they should be easier to type. To be sure to modify the function in a sensible way I really would appreciate some pointer. Still if it return To further shrink the length of the result I was planning to to_hex (actually it would be nice to have a fast to_35base [0-9a-z])... but I wasn't able to find a way to convert back an hex string to an int. x'fff' seems to work just for literals. CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns bigint AS $$ DECLARE l1 int; l2 int; r1 int; r2 int; i int:=0; BEGIN l1:= (value >> 16) & 65535; -- modifying here seems trivial r1:= value&65535; -- l1:= (value >> 15) & B'111'::int; -- r1:= value & B'111'::int; WHILE i<3 LOOP l2:=r1; r2:=l1 # 1366.0*r1+150889)%714025)/714025.0)*32767)::int; -- but what about this? where does it come from? /* r2:=l1 # 1366.0*r1+150889)%714025)/714025.0)*B'11'::int)::int; */ -- ?? l1:=l2; r1:=r2; i:=i+1; END LOOP; return ((l1::bigint<<16) + r1); -- modifying here seems trivial END; $$ LANGUAGE plpgsql strict immutable; Anything else to suggest or copy from? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Replication and coding good practices
On Mon, 29 Jun 2009 19:11:43 +0800 Craig Ringer wrote: > On Sun, 2009-06-28 at 09:01 -0700, David Fetter wrote: > > > Are there any rules of thumb to consider for making an > > > application easier to work with a "general" replication > > > solution? > > > > > > The applications I mostly deal with are e-commerce sites. > > > > It really depends on what replication solution you choose, along > > with the environment you're deploying into. > > ... and why you need replication. Reliability/Availability? Data > storage redundancy? Performance? And if performance, read-mostly > performance or write-heavy performance? 1) performance, read-mostly 2) reliability I'm making large use of plpgsql mainly for: - encapsulation - single point of truth - implicit transaction Most of the write operations don't have to be aware of a multi user environment. Still in some part of the code things have to be aware of transactions, multi user environment (or better multiple connections from the same user) etc... Not only these parts are rare, they are seldom executed too. So my main concern about the parts that may be problematic in a replicated context is to keep maintenance low and development easy. eg. I've started to use temp tables but I guess they may cause some problems in conjunction with connection pooling systems. > > That said, I've noticed that the things that are generally good > > practice help you even more when you're doing replication. > > > > Practices I've seen help directly: > > > > * Separate read users and code from write users and code. > > > > * Separate DDL from both of the above. > > > > * Make DDL changes part of your deployment process and only > > allow them in files which track in your SCM system. > Version your schema, storing the schema version in a 1-row table > or even as a stable function. This makes it much easier for > deployment tools or staff to easily see what needs to be done to > get the schema and app to the latest version - there's no "what > the hell is the current state of this thing, anyway?" to worry > about. This is another area I'd like to learn more about available techniques for managing development. But currently I was more interested in coding techniques to avoid maintenance/porting problems once I'll have to support a replication solution. At the moment schema changes are saved in a file together with the web application code. I was thinking to automate the application of schema changes with a hook in svn, but right now it doesn't look as a good investment. Still I'm very interested in techniques to version schema changes and bring them together with code change and being able to diff them. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Replication and coding good practices
I'm starting to see the need of a replication solution in my horizon. I've learned that some replication solution require code modification, some require different assumption abut how connections behave... Are there any rules of thumb to consider for making an application easier to work with a "general" replication solution? The applications I mostly deal with are e-commerce sites. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] drawback of array vs join
On Tue, 23 Jun 2009 17:39:46 -0300 Emanuel Calvo Franco wrote: > 2009/6/23 Ivan Sergio Borgonovo : > > I've a temporary table where I'd like to resume data coming from > > several other tables (normalised the usual way). > > eg. > > a list of items that may belong to several group (eg. item, > > group, itemgroup table) will end up in something similar to: > > > > create temp table itemlisttemp( > > lid int, > > iid int, > > qty int, > > famid int, > > rating int, > > itemgroups int[], > > ); > > > > This because I've to loop over the same table several times and > > because it *should* make easier to write dynamic queries [built > > from a web interface] like > > > > select sum(qty) from where > > famid=3 > > and rating>0 > > and 10 = any itemgroups > > ... > > > > itemlisttemp tables will be very small (30 as an upper limit) and > > arrays as well will contain very few elements (4 as an upper > > limit). > Which improvement do you want to obtain with the array way? > If the table is very small, the improvement in terms of performance > will be negligible. I'll have to compute different kind of aggregates over itemlisttemp and update it as well, but while itemlisttemp will be small the tables from which it is derived aren't. Keeping the join will make appear as many row as "groups", that will make harder to understand and build the queries that will be used to build up the aggregates. Furthermore if I skip the condition on group, I'll have to skip the corresponding join otherwise I'll have a row for each group the item belongs to. select sum(l.qty) from list l join groupitem gi on l.iid=gi.iid where l.famid=3 and rating>0 and gi.gid=10 ; select sum(l.qty) from list l join groupitem gi on l.iid=gi.iid where l.famid=3 and rating>0 ; -- oops this is wrong compared to: select sum(qty) from where famid=3 and rating>0 and 10 = any itemgroups ; select sum(qty) from where famid=3 and rating>0 ; I'm still collecting what I really need in itemlisttemp but I'm expecting no more than 3-4 arrays in itemlisttemp for a very long time. Since I've to fill the arrays just one time... and compute aggregates with different conditions and update itemlisttemp I'm doing it mostly for readability and ease of building the statements dynamically. I was wondering if I could run in any other drawback other than not doing it the halal relational way. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] drawback of array vs join
I've a temporary table where I'd like to resume data coming from several other tables (normalised the usual way). eg. a list of items that may belong to several group (eg. item, group, itemgroup table) will end up in something similar to: create temp table itemlisttemp( lid int, iid int, qty int, famid int, rating int, itemgroups int[], ); This because I've to loop over the same table several times and because it *should* make easier to write dynamic queries [built from a web interface] like select sum(qty) from where famid=3 and rating>0 and 10 = any itemgroups ... itemlisttemp tables will be very small (30 as an upper limit) and arrays as well will contain very few elements (4 as an upper limit). I'm aware of the drawback of de-normalization. Should I be aware of anything else when using arrays this way? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] temp tables, sessions, pgpool and disk
On Mon, 22 Jun 2009 11:40:08 +0200 Martijn van Oosterhout wrote: > On Mon, Jun 22, 2009 at 09:53:59AM +0200, Ivan Sergio Borgonovo > wrote: > > > The OS knows much less about what anonymous memory (memory not > > > backed by a file) "means" to a program and can't be as clever > > > with it. Swapping tends to be _much_ more CPU expensive than > > > writing > > > > But issuing a write to disk Postgresql doesn't actually say > > anything more about what it is placing on the disk and how it is > > going to access it... and it is actually adding overhead to move > > it back and forward, no matter if this overhead happens on RAM > > or disk. Actually since temp table are private to the connection > > they should (?) be private to a postgresql process, so the OS > > should be able to do a good job. > > I don't see any atomicity constraint, so... if something fail > > while writing to RAM, as you said you shouldn't need a WAL. > > For the record, temp tables are in fact handled differently, in > particular they are not stored in the shared_buffers, but instead > are in backend local (private) buffers, whose size is controlled by > temp_buffers. They are indeed not WAL archived, nor written to disk > unless needed. > So yes, small temp tables will likely stay in memory, but large > temp tables may spill to disk. There's no flushing or syncing so > quite likely they'll end up in the OS disk cache for a while. Once > the temp table is deleted, the file is deleted and the OS throws > that data away. So temp tables most likely won't use any disk I/O, > but they *can* if the need arises. Just to make it extra-clear to people unaware of pg internals... since the second paragraph may seems to contradict the first one... could be "nor written to disk unless needed" rephrased as: even repeated UPDATE/INSERT won't issue writes (no matter if they end up on disk or not, it won't issue writes to the OS) if the table fit the buffer? I see the default is somehow "large" (8M) and it is not pre allocated. Looks nice. > Have a nice day, thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] temp tables, sessions, pgpool and disk
On Mon, 22 Jun 2009 07:26:56 +0800 Craig Ringer wrote: > > http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html > > "If specified, the table is created as a temporary table. > > Temporary tables are automatically dropped at the end of a > > session" > > > > I'd interpret it as a connection. > > Correctly. > > > I don't even know if it is possible to send more than one command > > over a single connection and wait for the results asynchronously. > > Any clarification? > > To an extent cursors provide that ability. The "result" is returned > quite promptly, but it's a placeholder that allows you to retrieve > the real results progressively as you need them. Whether the > database generates the results immediately and stores them to > return later, or whether it generates them on demand, isn't > something you can easily tell it's up to the database. http://www.postgresql.org/docs/8.3/interactive/libpq-async.html "PQsendQuery cannot be called again (on the same connection) until PQgetResult has returned a null pointer, indicating that the command is done." Asynchronous calls can't be made to parallelize postgres queries on the same "session", but just to parallelize client and server work. So a temp table seems as private as I need it. I wonder what will happen if I put something like pgpool between postgresql and a web app. > > So postgresql actually issues writes to disk and delegate to the > > OS management of the cache/actual write on disk. > Yes. > > I thought it could just try to hold them in RAM and still > > delegate to the OS to save them on disk in swap if the system is > > short on RAM. > For a variety of reasons, you REALLY don't want it to work that > way. mmm... first sorry for the noise... Interpret the following as reality checks. I'm perfectly aware building up a DB is not easy, and I'm not pretending I know how to write one. ;) > OS memory managers tend to be _much_ better and faster at managing > pages that're backed by a file. They'll write dirty data out > pre-emptively so that execution doesn't stall when memory runs > low; they write data to the file in order for best disk > performance; they efficiently buffer and read-ahead when pulling > the data back in, etc. > The OS knows much less about what anonymous memory (memory not > backed by a file) "means" to a program and can't be as clever with > it. Swapping tends to be _much_ more CPU expensive than writing But issuing a write to disk Postgresql doesn't actually say anything more about what it is placing on the disk and how it is going to access it... and it is actually adding overhead to move it back and forward, no matter if this overhead happens on RAM or disk. Actually since temp table are private to the connection they should (?) be private to a postgresql process, so the OS should be able to do a good job. I don't see any atomicity constraint, so... if something fail while writing to RAM, as you said you shouldn't need a WAL. > dirty buffers to a file. It's a lot more expensive to retrieve > from disk, too, and usually involves lots of seeks for quite > scattered pages instead of nice block readahead. Once you're running out of memory I see no guaranty your file will end up in a fast easily accessible area of your disk... and you're going to add the overhead associated with a file system (journalling, permissions/ownership, locks) swap is volatile... and it should offers the guaranty you "need" for a temp table. > The OS knows much better than PostgreSQL does when the table will > fit in RAM and when it needs to spill to disk, and it's much > better at managing that than Pg can ever be. It's great that Pg > just uses the OS's hardware knowledge, system-wide awareness, and > highly optimised memory manager + disk IO management to take care > of the problem. The same should be true for virtual memory, not just file management and postgresql has a configuration file that should give a clue to the DB about the expected workload and hardware. Surely postgresql can't forecast how many and how large the temp tables for a single connection will be... but substantially I got the idea that a connection is somehow serial in its execution and that storage could be garbage collected or just released early (drop table, on commit drop). This looks as it is taking temp tables very far from the standard. And yeah... once you want to do memory management/resource management inside SQL you've opened the doors of Hell. But well For what I could see about SQL99 the definition of temp table is very terse... and a bit confusing (at least for me) about global and local. I gave a quick look at what's available on MS SQL... and they have an sort of "in memory temp table" but you can't modify its schema. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] cursors, temp tables, dynamic sql technique (was and is: looping over a small record set over and over in a function)
On Sun, 21 Jun 2009 21:43:16 +0800 Craig Ringer wrote: > On Sun, 2009-06-21 at 14:57 +0200, Ivan Sergio Borgonovo wrote: > > > I think everything could be summed up as: > > > > select into t myaggregate1(field) from dataset where condition1; > > if(t>10) then > > update dataset set field=myfunc1(a,b,c) where condition1; > > end if; > > > > select into t myaggregate2(field) from dataset where condition2; > > if(t>44) then > > update dataset set field=myfunc2(a,b,c) where condition2; > > end if; > > That's really too simplified to see what you're actually doing. I'm still checking if real case go far beyond to the above or I've some more general case. > I've found that in the _vast_ majority of non-trigger cases where > I've considered using PL/PgSQL, a bit more thought and proper > consideration of the use of generate_series, subqueries in FROM, > join types, etc has allowed me to find a way to do it in SQL. It's > almost always faster, cleaner, and nicer if I do find a way to > express it in SQL, too. The stuff is really serial in its nature and can't be made parallel. > > I think I really don't have a clear picture of how temp tables > > really work. > > They can be seen by concurrent transactions in the same session. > Eh? In this context, what do you mean by "session"? Did you mean > consecutive rather than concurrent, ie: http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html "If specified, the table is created as a temporary table. Temporary tables are automatically dropped at the end of a session" I'd interpret it as a connection. I don't even know if it is possible to send more than one command over a single connection and wait for the results asynchronously. Any clarification? If a session is a connection and it's not possible to send more than one command and wait for the results asyncronously... or the server doesn't run in parallel several command sent across the same connection... than a temp table looks like what I need. > BEGIN; > CREATE TEMPORARY TABLE x (...); > -- do other work > COMMIT; > BEGIN; > -- Can see `x' from here > COMMIT; > > ? I was planning to use ON COMMIT DROP > Normally, you couldn't see another sessions temp tables, even after > commit, unless you explicitly schema-qualified their names - eg > 'pg_temp_4.x' . If even then; I haven't actually checked. > Two concurrent sessions that issue 'CREATE TEMPORARY TABLE x(...)' > get two DIFFERENT tables, both named `x', in different pg_temp > schema, eg 'pg_temp_2.x' and 'pg_temp_3.x' . That's perfect, and what I really need. > > But if the transaction in which a temp table is created is not > > committed yet, other transactions won't see it. > Of course, since PostgreSQL doesn't support the READ UNCOMMITTED > isolation level. > > So it may actually look more as a temporary, private storage > > that doesn't have to be aware of concurrency. > > So temp tables should require less disk IO. Is it? > The big thing is that they're private to a session, so different > sessions can concurrently be doing things with temp tables by the > same name without treading on each others' feet. > Because they're limited to the lifetime of the session, though, > PostgreSQL doesn't need to care about ensuring that they're > consistent in the case of a backend crash, unexpected server > reset, etc. Tom Lane recently pointed out that as a result writes > don't need to go through the WAL, so my understanding is that > you're avoiding the doubled-up disk I/O from that. They also don't > need to be written with O_SYNC or fsync()ed since we don't care if > writes make it to the table in order. > As a result I'm pretty sure temp tables don't ever have to hit the > disk. If the OS has enough write-cache space it can just store > them in RAM from creation to deletion. So postgresql actually issues writes to disk and delegate to the OS management of the cache/actual write on disk. I thought it could just try to hold them in RAM and still delegate to the OS to save them on disk in swap if the system is short on RAM. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] cursors, temp tables, dynamic sql technique (was and is: looping over a small record set over and over in a function)
On Sun, 21 Jun 2009 10:57:51 +0800 Craig Ringer wrote: > On Fri, 2009-06-19 at 20:23 +0200, Ivan Sergio Borgonovo wrote: > > > If I could easily load all the dataset into an array, loop > > through it and then just update the computed field it would be > > nice... but how? > > Are you sure you _really_ need a PL/PgSQL function for this? Not really sure. I'm investigating. But well I noticed that even working with a fully fledged procedural language I'll have to loop 2 times to replace/add to the "original" field the computed one, so it's surely not a big loss if I "loop" 2 times with SQL. I think everything could be summed up as: select into t myaggregate1(field) from dataset where condition1; if(t>10) then update dataset set field=myfunc1(a,b,c) where condition1; end if; select into t myaggregate2(field) from dataset where condition2; if(t>44) then update dataset set field=myfunc2(a,b,c) where condition2; end if; Actually there is a trick I could use to "skip" the update loop and pass the result to the next loop but it looks convenient in a more general case if the aggregate should be computed on the "updated" value. I'm still not sure if the class of function I'm working with are always of the above class where myaggregate works on the "original" field. Of course myaggregate could compute myfunc... but then I'll have to compute myfunc twice. In a loop I could create temp table t as select *, field as field1, field as field2, field as result from t1 join t2 on... join t3 on... where ... on commit drop; flipflop='field1'; flopflip='field2'; foreach(row) { if(condition on row) { flopflip=myfunc(row[flipflop]); } else { flopflip=row[flipflop]; } agg+=flopflip; } if(condition on agg) { switch flipflop } foreach(row) { if(condition on row) { flopflip=myfunc(row[flipflop]); } else { flopflip=row[flipflop]; } agg+=flopflip; } if(condition on agg) { switch flipflop } execute 'update t set result=' || flipflop; I think this could be obtained using cursors. This is going to be much more verbose since to use the "flipflop" technique I'll have to use dynamic statements and EXECUTE. EXECUTE 'UPDATE t SET' || flipflop || '=' || newfield || 'WHERE CURRENT OF cursor'; even "condition on row" should be expressed as a dynamic statement if condition involve the computed field. Compared to the select + update solution it is going to avoid looping and checking the condition twice but it is going to work on a larger dataset and run the update even for unchanged row (maybe the else condition could be omitted??). I'm not sure that actual set of functions have the update condition equal to the select condition anyway. But I think wasting updates has a larger cost if they require disk IO. I think stuff may look more "convenient" if there was a simple and cheap way to load a record set into an array, do stuff with the procedural language and substitute the old record set with the computed one. Currently I'll get most of the speed up from looping through a smaller data set avoiding to join over a very large table. Considering the very small data set I'm expecting (no more than 20 record) I don't think dynamic statement, lack of indexes etc... are going to have an impact. As soon as clients will increase, optimizing the loops may be necessary. I'll have to see if all the cases I'm dealing with could be solved by the SELECT + conditional UPDATE technique since it looks much easier to maintain. Furthermore I suspect that for my real case the conditional check will be made on the original value so that myfunction will be computed at most once and I can delay an aggregate computation on the computed field after the last loop has been executed. The remaining costs will be: - looping twice on the same table (getting the aggregate + updating the table) - updating if it causes disk IO I even suspect that the test will mostly fail so that updates will be rare but I still would like to understand how this could work in the most general case and how the temp table + cursor + flipflop technique is going to work internally. I think I really don't have a clear picture of how temp tables really work. They can be seen by concurrent transactions in the same session. But if the transaction in which a temp table is created is not committed yet, other transactions won't see it. If the transaction where the temp table is created is aborted... no other transaction will ever know about the existence of the temp table. If the temp table is defined as DROP ON COMMIT... somehow there is no need to make the changes happening on the temp table land on disk. So it may actually look more as a temporary, private storage that
Re: [GENERAL] looping over a small record set over and over in a function
On Sat, 20 Jun 2009 12:30:42 +0200 Alban Hertroys wrote: > You could add a column to your query as a placeholder for the > computed value. > For example, SELECT *, 0 AS computed_value FROM table. > If you use a scrollable cursor (possible in PL/pgSQL these days, > although it still has some limitations) you could just loop > through its results and rewind the cursor until you're done. > It does look like you're implementing some kind of aggregate > function though. Maybe you can implement it as an actual aggregate > function, maybe in combination with the windowing stuff in 8.4? > Then you could just add the aggregate to your query instead of > using your query results in a function. That's about the inverse > of what you're attempting now. > You seem to be implementing something that would look like SELECT > sum(SELECT * FROM table), while you may be better off aiming for > SELECT SUM(value) FROM table. > > Considering you want to loop over the results multiple times that > may not be possible; only you can tell. > > Is it really worth to load the whole record set in an array, loop > > over etc... in spite of eg. building a temp table with the same > > structure of the input record set + 1 field, loop over the table > > etc... what about all the UPDATEs involved to change field N+1 of > > the temp table? Will be they expensive? > > Neither of those look like a convenient approach. With the array > you lose all the meta-data the record set provided (types, column > names, etc), while the temp table approach looks like it will be > kind of inefficient. > Is the result of that function volatile or can you maybe store it > after computing it once and fire a trigger to update the computed > value if the underlying data changes? The function is volatile. What I was thinking about was to fire a trigger to wipe the temp table if the table on which the computation is made is changed. Considering I can't make the loop run in parallel aggregates may be a way to go... but somehow they look as they are making the implementation a bit hard to manage. Anyway it still have to be seen if I could actually implement the loops with aggregates since every loop has 2 "side effects" compute some aggregates for the whole record set and compute an extra field for each row. > I wonder why you need to re-iterate over your result set multiple > times? It's kind of rare that once isn't enough. > And lastly, if your current approach really is the only way to > compute what you're after, then maybe PL/pgSQL isn't the right > match for the problem; it looks like you'd be better served by a Yeah. I gave a look to python but I don't want to add one more language to the mix. I enjoy strict type checking of plpgsql even if some bit of syntactic sugar would help to make it more pleasing and I think it is the most lightweight among the offer. Still I don't know how easy it is with eg. python to load an array with a result set, change it and place it back into the table where it was coming from. > language that can work with arrays of typed structures. As I'm not > familiar with the other PL languages I can't tell whether they > would be suitable in that respect, but I suspect Python or Java > would be able to handle this better. Your suggestion about cursor could be the way... but I don't know enough about cursors internals to understand if updating a field of a cursor will cause disk writes. Currently my main concern is making this things readable and extensible. I'm not planning to optimise yet. The result set on which the computations are made is pretty small. It is just taken out from a large one. But I don't want to cut my way to optimisation. So one way could be: select into a temp table the record set. Build up a trigger that will wipe the temp table if the tables on which the record set is built changes. This may even not be necessary, since everything related to that record set is going to happen in one session. loop several times over the temp table Since every loop should actually correspond to a function... I may wonder if I could build up the cursor and pass it along to several functions. Every function will MOVE FIRST, update the one row of the record set and return some computed fields that will be used by the next function etc... If this is possible, this way have the advantage to be extensible. I wonder if it is efficient since I don't know if an UPDATE table set where current of cursor on a temp table is going to incur in any disk write. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] looping over a small record set over and over in a function
I've a record set on which I have to loop several times. The function should return the same record set + one more computed field. Something that in could look like: foreach(row) { // compute stuff if(...) { } // place stuff in field[N+1] of the row } if(some condition) { // } foreach(row) { // compute stuff if(...) { } // place stuff in a field[N+1] of the row } if(some condition) { // } ... actually return row + computed field. in pgplsql where each loop depends on the result of the previous. The dataset is very small. If I could easily load all the dataset into an array, loop through it and then just update the computed field it would be nice... but how? Is it really worth to load the whole record set in an array, loop over etc... in spite of eg. building a temp table with the same structure of the input record set + 1 field, loop over the table etc... what about all the UPDATEs involved to change field N+1 of the temp table? Will be they expensive? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] simulating high load for vacuum full
I'm trying to diagnose a problem that happened during vacuum full. It is a programming problem triggered by some lock, delay whatever, happening during vacuum. Making large updates to a bunch of tables is a PITA just to obtain a slow VACUUM FULL. Restoring a "fragmented" DB doesn't look as a working strategy. The restore shouldn't be fragmented. What are the "side effects" of a vacuum full? Any cheaper way to cause a heavy vacuum full or just its side effects? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why dropping a trigger may cause a deadlock
On Fri, 05 Jun 2009 17:35:19 -0400 Tom Lane wrote: > Ivan Sergio Borgonovo writes: > > I don't get it. > > Why dropping the triggers would cause a deadlock anyway? > > > I bet it is due to my naïve view of the problem but I think a > > trigger is a "function". Unless there is concurrent access to the > > table where the function is defined... I can't see why dropping > > the "functions" serially should cause a lock. > > They're not just functions, they are part of the DDL for a table. > Adding or removing a trigger on a table requires exclusive lock > on that table, otherwise you can't be sure what will happen in > concurrent transactions that might (or might not) be supposed to > fire the trigger. I'm still wondering why there was anything else requiring a lock on those tables. Referring to the previous example create table b( c3id int primary key, c3 text ); create table a( pid int primary key, fti tsvector, c1 text, c2 text, c3id int reference b(c3) c4 int; -- not used to build up fti ); there is a very small chance that while I was dropping the triggers something like an update a set c4=37 where pid=12; was running when I dropped the trigger. But I can't see how this should require a lock.. and still well... the chances the update statement happened during trigger drop are very very negligible. And... still I'm quite surprised that even that update happening when I was dropping the trigger resulted in a deadlock. Everything happening on table a and b that involves writes already happened in the same transaction dropping the triggers or is read only. Should I look into anything else to get a clue about what happened and try to avoid it? Thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why dropping a trigger may cause a deadlock
On Fri, 05 Jun 2009 10:46:11 -0400 Tom Lane wrote: > Ivan Sergio Borgonovo writes: > > I've encountered this error for the first time > > psql:./import_stock_scratch.sql:9: ERROR: deadlock detected > > DETAIL: Process 11095 waits for AccessExclusiveLock on relation > > 250545 of database 248569; blocked by process 11099. Process > > 11099 waits for AccessShareLock on relation 250510 of database > > 248569; blocked by process 11095. > > CONTEXT: SQL statement "drop trigger if exists > > FT1IDX_catalog_items_update_trigger on catalog_items" PL/pgSQL > > function "ft1idx_trigger_drop" line 3 at SQL statement > > > The function just drop 2 triggers that update a tsvector that is > > gist indexed. > > Are the triggers on two different tables? It looks like you're You're right. I forgot the exact schema. I've a table that contains most of the fields that end up in the tsvector and a reference to another table that contain a column that end up in the tsvector. create table a( fti tsvector, c1 text, c2 text, c3id int reference b(c3) ); create table b( c3id int primary key, c3 text ); One trigger is fired when c1, c2, c3id are modified. The other is fired when c3 is modified. Both trigger rebuild the tsvector that is obtained concatenating c1, c2, c3 > probably trying to acquire exclusive lock on two tables, and > deadlocking against some other process that gets a lesser lock > on the same tables but in the other order. I don't get it. Why dropping the triggers would cause a deadlock anyway? I bet it is due to my naïve view of the problem but I think a trigger is a "function". Unless there is concurrent access to the table where the function is defined... I can't see why dropping the "functions" serially should cause a lock. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] why dropping a trigger may cause a deadlock
I've encountered this error for the first time psql:./import_stock_scratch.sql:9: ERROR: deadlock detected DETAIL: Process 11095 waits for AccessExclusiveLock on relation 250545 of database 248569; blocked by process 11099. Process 11099 waits for AccessShareLock on relation 250510 of database 248569; blocked by process 11095. CONTEXT: SQL statement "drop trigger if exists FT1IDX_catalog_items_update_trigger on catalog_items" PL/pgSQL function "ft1idx_trigger_drop" line 3 at SQL statement The function just drop 2 triggers that update a tsvector that is gist indexed. Before running import_stock_scratch.sql I'm making an update to the columns that are then "aggregated" in the tsvector. All scripts are wrapped in transactions and are run serially. What's happening? How to prevent it? I'd expect that previous scripts don't interfere with the deadlocked one and at that moment the write activity on the table on which the triggers are acting is minimal if not absent. But I suspect my understanding of how these things work is very naive... so some general clue would be appreciated as well. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] optimize/cleanup SQL
On Fri, 29 May 2009 08:13:32 -0500 (CDT) Brandon Metcalf wrote: > For some reason this doesn't give me satisfaction that it's written > optimally, but I haven't found another way. > > SELECT round(CAST ((EXTRACT(EPOCH FROM clockout) > -EXTRACT(EPOCH FROM clockin))/3600 AS NUMERIC),2) AS > hours FROM timeclock; satisfying? template1=# select extract(days from ('2009-01-01'::timestamp - '2008-01-01'::timestamp))+5::int; ?column? ------ 371 (1 row) -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Excel and postgresql functions
On Tue, 26 May 2009 11:41:50 +0200 Ivan Sergio Borgonovo wrote: > I've built some functions just for encapsulation reasons and > avoiding to pass the same parameter over and over. > > I'd like to access the result from Excel but it seems (and I'm not > pretty sure it is the definitive answer) excel can suck data just > coming from views and tables. > > - can someone confirm Excel/Access 2003 can't return result sets > coming from functions? > - if Excel/Access can't return results from a function is there a > way to masquerade the function (accepting parameters) behind a > table/view. For all the people forced to fight with a software that has to be tricked and not instructed... The fault is Excel. - Data -> Import External Data -> New Database Query - Choose the odbc connection you created for Postgresql - Don't add any query. - Edit directly "SQL" - save an easily "greppable" query (eg. select 'ermenegildo';) - Ignore complaint. - Close Microsoft Query. - Say OK to "Import data" (just select a good "top left corner for your table) - Tools -> Macro -> Visual Basic Script Editor - search your query and change it with any valid SQL, add ? if you need parameters taken from cells - save - you'll be asked which cell contain the parameter (I didn't try to use more than one parameter) Now you can have an arbitrary query returned in an Excel sheet. So yes... PostgreSQL can happily work with Excel. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Excel and postgresql functions
I've built some functions just for encapsulation reasons and avoiding to pass the same parameter over and over. I'd like to access the result from Excel but it seems (and I'm not pretty sure it is the definitive answer) excel can suck data just coming from views and tables. - can someone confirm Excel/Access 2003 can't return result sets coming from functions? - if Excel/Access can't return results from a function is there a way to masquerade the function (accepting parameters) behind a table/view. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] psql Windows and variables
I need to loop from DOS [sic] on a set of files to get them imported into a table. I noticed that psql is wrapped up in a import.bat I wrote another bat that substantially do rem import.bat FOR %%f in (pattern) do "longpathtopsql.bat" --variable csvfile=%%f -f some.sql -- some.sql \copy import.transaction from :csvfile but I get some.sql:3: :csvfile: No such file or directory I tried -v --set etc... Same result. How can I pass variables to a sql script? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Aggregate Function to return most common value for a column
On Fri, 22 May 2009 17:48:44 +1000 "Michael Harris" wrote: > Hi Experts, > > I want to use an aggregate function that will return the most > commonly occurring value in a column. > > The column consists of VARCHAR(32) values. > > Is it possible to construct such an aggregate using PL/PgSql ? > > If I was trying to do something like this in Perl I would use a > hash table to store the values and the number of times each was > seen as the table was iterated, but PL/PgSql does not seem to have > an appropriate data type for that? > > I don't want to use PL/Perl to avoid the overhead of starting a > perl interpreter for that. > > Do I have to write the function in C maybe? Isn't it a job for group by? select count(*), myvalue from table group by myvalue order by count(*) desc limit 1; -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] package for debugger/wizard was: Feedback on PG?
On Wed, 20 May 2009 07:46:02 -0400 Dave Page wrote: > > Beside the fact I can only thank for all the great work around > > postgresql, is there a reason we can't have something similar on > > eg. Debian that will let us have a tuning wizard and a debugger > > that "just works" with an aptitude install? > No, other than resources. We maintain close to 100 installers now, > just for the EnterpriseDB supplied packages. Producing > platform-specific builds of them as well as the one-click > installers would be a mammoth task. My universe is Debian bound... so I even don't know if there is a *nix version of the tuning wizard. I'd consider it a quite useful tool even for "marketing" purposes on Linux too. Bad performance without tuning is a common thread here. I really didn't have time to investigate about the debugger, I'd expect that on Windows it "just works". While many things on *nix just work, debugging pg functions on Linux is not one of those. As you may have guessed my definition of "just works" in not that different from "aptitude install". I still have to find an howto for installing edb in Debian. Could it be packaged for Debian if there were resources? > It's pitched as a ''get you started" tool. We still expect you to Exactly. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Windows] Feedback on PG?
On Wed, 20 May 2009 06:59:28 -0400 Dave Page wrote: > On Wed, May 20, 2009 at 6:49 AM, Ivan Sergio Borgonovo > wrote: > ] > > I was surprised how polished the installation was. > > Thanks :-) Beside the fact I can only thank for all the great work around postgresql, is there a reason we can't have something similar on eg. Debian that will let us have a tuning wizard and a debugger that "just works" with an aptitude install? oh and yeah... I know a "Tuning Wizard" is evil and will hide all the true unleashed hidden wonderful power you can really squeeze out of Postgresql and corrupt your soul... but still ;) -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Windows] Feedback on PG?
On Wed, 20 May 2009 11:59:34 +0100 Raymond O'Donnell wrote: > On 20/05/2009 11:49, Ivan Sergio Borgonovo wrote: > > I think you could connect from Delphi through ODBC, meybe even on > > some .NET driver. I think ODBC should be more mature... but don't > > trust me. > > You can use ODBC and .NET (depending on which version of Delphi you What is the status of the .NET driver? Are my assumptions about ODBC vs. .NET valuable or where they just a remaining of the past/some nightmare I just had? Any appreciable difference between the Windows .NET driver and the *nix one? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Windows] Feedback on PG?
On Wed, 20 May 2009 09:32:13 +0200 Gilles wrote: > Hello > > I'm looking for an open-source DBMS that runs on Linux and > Windows. In order to make an informed choice, I would like some > feedback on the Windows port of PostgreSQL, as compared to MySQL > and Firebird (if you know of yet other solutions, I'm also > interested.) It was a while I didn't "seriously" touch Windows. Recently I had to install PostgreSQL on XP. This is a very superficial first impression. I was surprised how polished the installation was. On Windows the installer even comes with a couple of goodies more: namely a tuning wizard and some debugging components. Nothing that really make you miss the *nix environment... but still a quite nice impressing experience. I was even surprised to see the installer offer you a chance to install Drupal. > Is the Windows port on par with the *nix version, and with those > other alternatives? Apart from the fact that, unlike MySQL, > PostgreSQL doesn't require buying a license when developping > commercial applications, are there technical reasons why I should > choose PostgreSQL instead of MySQL or Firebird? From a very naive POV I'd say MySQL is still an SQL interface to the filesystem. PostgreSQL is a mature RDBMS. On small projects where data aren't that valuable and the application is nearly totally read-only or you're willing to reinvent the wheel of concurrent access to your data I'd go with MySQL. As C++ or python may be "boring/annoying" compared to VB or PHP, PostgreSQL may be "boring/annoying" compared to MySQL. But as soon as you reach some grade of complexity/value of your data you're really going to appreciate what PostgreSQL can offer. From a pure programming point of view, PostgreSQL is really much more fun to deal with. It's not sloppy accepting everything you throw at it... and regretting it later, it is much more standard compliant, it warns you earlier about problems and try to give you good hints on how to solve them, it let you use some more complex SQL features (and more to come) it has support for full text search and spatial/geographic data/indexes on a transactional engine, it has a plethora of extension modules. Functions, triggers and rules are much more mature than on MySQL. schemas may really come handy simply as namespaces or as a tool to manage grant. I think you could connect from Delphi through ODBC, meybe even on some .NET driver. I think ODBC should be more mature... but don't trust me. For "updating"... PostgreSQL is still not able to do "hot upgrades" from major versions. You've to backup->restore. On Debian... this happens auto-magically... I've no idea about what the Windows installer can do when upgrading. If you've availability constraint this could be a bit annoying. If you've HA constraints you're going to have some kind of replication solution that will somehow let you do "hot upgrades". Anyway this shouldn't be your concern if you're dealing with a SOHO. Coming from a web background but where transactions and some "advanced" features are essential to survival I got the impression that actually even for small SOHO applications even when server and client are on the same box pg may be a very good choice. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Excel and pg
On Mon, 18 May 2009 09:14:41 +0800 Craig Ringer wrote: > Ivan Sergio Borgonovo wrote: > > > I'd like to know if: > > - it is possible to "load" in an Excel sheet a table (view, query > > result) coming from postgresql and to use those data to do > > further computation/presentation work on Excel? > > Certainly. You can do it through the ODBC interface via VB, and I > think Excel also has some kind of "data browser" that lets the > user pull data from ODBC-accessed databases interactively. This is working. I had a bit of fun trying to find the right menu in the Chinese version. For the uninitiated/the ones that don't have fresh memory of working in a MS environment: - Install Postgresql and ODBC drivers and create a DB/user etc... - Create a system wide DSN connection. In XP is a bit hidden compared to my memories of W2K (Control Panel -> Performance an Maintenance -> Administrative Tools -> Data Source) - Open Excel, there should be a Data Menu... I can't go further since the Excel was localised in Chinese. There are some menu that follow but we weren't able to read them maybe due to the mismatch of OS and Excel localisation. You can import tables and view too and it seems you can apply a SQL filter on them. Dates seems to be imported correctly and I think localised according the setting of the OS (I'd ask, I think in mainland China data should follow the European format and not the UK/US format). > Beware, though. Excel has funny ideas about dates and has some > baked-in bugs in some of its functions. It doesn't know about or > respect the foreign key relationships and constraints in the > database, either. Fortunately I won't delegate anything related to data integrity to Excel. I'll keep an eye on dates. Having had some experience with MS SQL I had some difficulties with converting in and out dates from Postgresql at times. I know it shouldn't be the responsibility of the DB to convert between formats... but for some dirty works it can really comes handy. pg -> Excel worked perfectly. I hope Excel -> csv -> pg will be equally painless. I'm a bit worried considering the limited toolset I can rely on I may have some localisation problems when people will try to save Excel -> csv. COPY may not support all the flexibility I need if Chinese localised Excel/OS will output some strange stuff in csv. I chose to pass through pg just because I have to help to write down some business logic for reporting and I bet I'll have to get back at what I wrote in 1-6 months. I tried to implement some of the reporting logic in Excel but: - Something really fit SQL - I don't want to relearn how an IF works, especially if I have to switch back and forward to Polish notation - I've found something really weird. People say SQL is hard (yeah it may be...) but that S really shine once you compare it with the way to operate of a spread sheet ;) - Office SQL is a PITA. I gave up once I saw they don't implement COALESCE. If people would like to elaborate further on data coming from pg using Excel functions they will be on their own. > If you really must pull data into Excel, consider giving users an > account in PostgreSQL that _ONLY_ has access to read-only views of > the data. Those views should denormalize the data significantly and > otherwise make it as Excel-friendly as possible. Pull the data in > using a Visual Basic script that "protects" the data as soon as > it's been placed on the sheets, so the user can't accidentally > change it, just reference it. I've to import data in Postgresql... that comes in other Excel files. I can't rely on a complete programming environment. I was thinking about opening another ODBC connection and using dblink to import data from Excel to pg... but somehow it doesn't look it is going to really improve the procedure of importing data from a csv. I mean... someone doesn't do it properly (eg. some kind of lock/whatever on the Excel file) people won't be able to understand the problem. Saving the Excel file to csv seems something could be handled easier by the people that will do the job. I think that somehow "refreshing" data imported by Excel is going to run the associated query... so if I write a function that run \copy and place a "select * from importdata()" in Excel... everything people should do is save the excel as csv somewhere. > > - can postgresql load data from an Excel sheet? Or Excel write > > data to postgresql from an excel sheet? dblink? > The easiest way is via CSV. You could probably also do it with some > Visual Basic running in Excel that pushes the data via ODBC. > If you're going to even vaguely consider putting data from a > user-modifiable spreadsheet back
[GENERAL] Excel and pg
I've to deal with a volunteer pet project and I wouldn't like to get crazy fighting with the idiosyncrasies of Access but still I've no time to build up an interface that will be enough user friendly to overcome the resistance of something new. So I thought just to use Excel 2003 as the front-end to postgresql, everything on Windows XP. I'm not any more (if I have ever been) comfortable with MS Office stuff and Windows. I know using postgresql is like using an elephant to fight a microbe here, but still since I'll have to write the business logic and I don't want to spend hours understanding why I can't make a join or what is the equivalent of coalesce etc... I'd like to know if I'm getting into more trouble just to set the things up. I'd like to know if: - it is possible to "load" in an Excel sheet a table (view, query result) coming from postgresql and to use those data to do further computation/presentation work on Excel? I think the rough path should be use ODBC (OleDB?) Do I have to install anything more other than postgresql? - can postgresql load data from an Excel sheet? Or Excel write data to postgresql from an excel sheet? dblink? - am I going to incur in any localisation problem if the Windows stuff is localised in Chinese? I see I can chose the "language to be used during installation". I'd prefer localization to be in English but still let people that will use the front-end to use Chinese. What about the encoding (client/server)? - are there tools to make backup/restore very easy even for "point&click" kind of users? - anything that a non "desktop" oriented guy like me have to realise before promising to put up something that will have to be used by "desktop/GUI" people? I can't think about anything else other than backup they will have to deal with once they see their data in Excel and they can backup/restore easily. thanks BTW I saw a lot of nice things on the pg package for Windows... especially for debugging that I'm not sure I can find for Linux or can be as easily installed as with Stack Builder: - debugger - Tuning Wizard - replication solution (I wonder if it is easier to have a working solution with aptitude or Stack Builder) -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PGSQL-to-MYSQL Migration: Error in a 'simple' inner join query
On Mon, 4 May 2009 09:27:30 -0700 (PDT) DaNieL wrote: [snip] > Every id in every table is a PRIMARY KEY, UNIQUE, NOT NULL and > serial type.. > The query that i have problem with is: > --- > SELECT > orders.code, > customer.name, > SUM(order_item.price) > FROM > orders > INNER JOIN customer ON (customer.id = orders.id_customer) > INNER JOIN order_item ON (order_item.id_order = orders.id) > GROUP BY orders.id > Sorry, i know that this maybe is a basically problem, but i come > from mysql.. and in mysql that query works... You have to add explicitly all the columns not in aggregates. Postgresql is a bit stricter than mysql but it generally saves you a lot of debugging later. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Selling an 8.1 to 8.3 upgrade
On Fri, 24 Apr 2009 13:03:06 -0600 Scott Marlowe wrote: > On Fri, Apr 24, 2009 at 11:01 AM, Josh Trutwin > wrote: > > I've been asked to put together a list of reasons to upgrade a db > > from 8.1 to 8.3 and I've looked over the changelog, but they > > want a bullet list of 4-5 top things. I'm curious what others > > would say the most 5 important updates from 8.1 to 8.3 are. > > There were a lot of improvements from 8.1 to 8.3. For the system > at work, the compelling ones were: > HOT updates. > General improvements in the query planner. This has been true for > nearly every new major release. > Improved vacuuming, including multi-threaded vacuum capability. > Concurrent index creation. > Improved handling of joins > Fillfactor setting > Much improved checkpointing > Changes to the buffering methods so large seq scans don't evict > more popular data from the buffers. This is much more important > than it sounds. For me the reason to switch was the inclusion of the great work by Oleg and Teodor: Full-text search into core. It looks like 8.4 is going to be a great release too and I think it will be an enough cool reason to switch not only DB release but OS release as soon as 8.4 end up in the backport of Lenny. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance
On Thu, 23 Apr 2009 12:00:30 +0200 Karsten Hilbert wrote: > On Thu, Apr 23, 2009 at 11:33:34AM +0200, Ivan Sergio Borgonovo > wrote: > > > Karsten Hilbert wrote: > > > > > On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote: > > > > > > > I have a set of dynamically composed objects represented in > > > > Java, with string values for various attributes, which have > > > > variable length. In case you have suggestions for a better > > > > type for this case, it would be my pleasure to hear about > > > > them. > > > > > > Seref, he's suggesting you use TEXT instead of > > > VARCHAR(something). In PG it's actually usually *less* > > > overhead to use the unbounded text datatype (no length check > > > required). > > > > > > Length checks mandated by business logic can be added by > > > more dynamic means -- check constraints, triggers, etc which > > > allow for less invasive change if needed. > > > > Could you point us to some example of a constraint/trigger > > (etc...) that is going to provide the same checking of varchar > > and explain (if the code/example... doesn't make it clear) why > > it should be faster or less invasive? > > check constraint based: > > create table foo ( > txt text > check (char_length(txt) < 1001) > ); > > trigger based: > > create function trf_check_length_1000() ... returns > trigger ... $$...$$; > > create table foo ( > txt text > ); > > create trigger check_txt_length before INSERT or > UPDATE ... execute trf_check_length_1000(); > > faster: > > - TEXT is (judging by previous comments on this list) > marginally faster than VARCHAR(1000) because a) it runs > the same code but b) doesn't have to check for the 1000 > length > > - other options (VARCHAR, constraint, trigger) incur > additional overhead and are thus slower > > less invasive: > > Well, poor wording on my part, perhaps. What I meant is that > changing a check constraint or trigger appears to be a less > costly operation on a table than changing the datatype of a > column (although I seem to remember there being some > optimizations in place for the case of changing the *length* > of a varchar). I'll try to rephrase to check if I understood and for reference. varchar is slower than text since it has to do some "data type check". text is faster but if you add a check... it gets slower (slower than varchar?, faster?). constraint and trigger should have the advantage that in case of refactoring you're not going to touch the table definition that *may* end in being faster. But... if in postgresql implementation varchar is just text with a check... how can a change in type be faster? If it was a char(N) maybe there would be some kind of optimization since the length of the data is known in advance... so shrinking/enlarging a char(N) may have a different cost than shrinking a varchar(N) that in pg *should* have the same implementation than text. On the other end... you're preferring text just because they have the same implementation (modulo check) in Postgresql... but it may not be so in other DB. So *maybe* other DB do some optimization on varchar vs. text. Somehow I like the idea of considering a varchar a text with a check, but I think I prefer the "more traditional" approach since somehow is the "most expected". Nothing can handle strings of infinite length, and much before reaching infinite I'll get in trouble. People read differently what you'd like to say writing varchar(N). Most people read: 1) we expect a length around N Fewer people read: 2) There is something not working if we get something larger than N But it may also mean: 3) if we get something larger than N something is going to explode I think the same "ambiguity" is carried by check(). Anyway for a sufficiently large N 2) and 3) can be valid. Supposing the cost of loosing an insert for an unpredicted large value of N is high I'd be tempted to set N to at least protect me from 3) but I bet people may interpret it as 1). In my experience anyway varchar is a good early warning for troubles and the risk of being misunderstood/get caught by implementation dependent gotcha writing varchar(N) where N mean 3) largely encompass the risk of loosing an insert you didn't have to lose. Maybe I've spotted a potential advantage of check over varchar. If you use some kind of convention to name checks you could remove/re-apply them easier than spotting varchars(). The name of the constraint may contain metadata to help you. The name of the constraint may also suggest why it's there to your colleagues. But this works just if your implementation perform similarly on text over varchar(). -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance
On Thu, 23 Apr 2009 11:00:59 +0200 Karsten Hilbert wrote: > On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote: > > > I have a set of dynamically composed objects represented in > > Java, with string values for various attributes, which have > > variable length. In case you have suggestions for a better type > > for this case, it would be my pleasure to hear about them. > > Seref, he's suggesting you use TEXT instead of > VARCHAR(something). In PG it's actually usually *less* > overhead to use the unbounded text datatype (no length check > required). > > Length checks mandated by business logic can be added by > more dynamic means -- check constraints, triggers, etc which > allow for less invasive change if needed. Could you point us to some example of a constraint/trigger (etc...) that is going to provide the same checking of varchar and explain (if the code/example... doesn't make it clear) why it should be faster or less invasive? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] backup getting larger and larger
On Tue, 14 Apr 2009 13:26:24 -0700 Steve Crawford wrote: > Ivan Sergio Borgonovo wrote: > > I still have to investigate if the tables are getting really > > larger... but at a first guess there shouldn't be any good > > reason to see tables getting so large so fast... so I was > > wondering if anything could contribute to make a backup much > > larger than it was other than table containing more records? > > > > The only thing that should have been really changed is the > > number of concurrent connections during a backup. > > > Can we assume that by backup you mean pg_dump/pg_dumpall? If so, > then the change is likely due to increasing data in the database. > I have a daily report that emails me a crude but useful estimate > of table utilization based on this query: > > select > relname as table, > to_char(8*relpages, '999,999,999') as "size (kB)", > (100.0*relpages/(select sum(relpages) from pg_class where > relkind='r'))::numeric(4,1) as percent > from > pg_class > where > relkind = 'r' > order by > relpages desc > limit 20; Thanks, very useful. May I place it on my site as a reference, giving credits of course? Still puzzled... The first and second largest table make up for 70% of the overall DB size (1st 53%, 2nd 16.1%) The second one have very few small fields but ~2x the number of records of the first. Comparatively a row of the first one is at least 10x larger than a row in the second one. The first has 1M records. All the others following with a size larger than 1% grow as 1x the number of records of the first one. I had an increment of less than 10% of the number of records of the first table but an increment of roughly 80% of the size of backup. Maybe it is due to compression. The table that grew more can't be shrunk too well. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] backup getting larger and larger
I still have to investigate if the tables are getting really larger... but at a first guess there shouldn't be any good reason to see tables getting so large so fast... so I was wondering if anything could contribute to make a backup much larger than it was other than table containing more records? The only thing that should have been really changed is the number of concurrent connections during a backup. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "PostgreSQL in the cloud" now up on media.postgresql.org
On Mon, 13 Apr 2009 14:36:49 -0700 Steve Crawford wrote: > Scott Marlowe wrote: > >>> http://media.postgresql.org/sfpug/sfpug-cloud-20090408.mov > >>> Thanks to Josh Berkus for his organizational talents, and Dirk > >>> Jagdmann for running the live streaming during the event. > >> wget says it is a 1Gb file. > >> Is there anything smaller? > > If you'd like I can dl and rencode it quite a bit smaller I'm > > sure. It'll take several hours to dl, process, then upload, but > > that's mostly machine time, right? :) I was just thinking about a public "official" place for everybody. I'm going to sleep so I'm not going to miss the bandwidth required for downloading the full stuff ;) but still then it will have to find it's way back to a public place. > The people doing the actual work will correct me if I'm wrong, but > I believe it will be up on vimeo.com fairly soon as last month's > is. Can vimeo movie be downloaded? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "PostgreSQL in the cloud" now up on media.postgresql.org
On Mon, 13 Apr 2009 13:11:23 -0700 Christophe wrote: > http://media.postgresql.org/sfpug/sfpug-cloud-20090408.mov > > Thanks to Josh Berkus for his organizational talents, and Dirk > Jagdmann for running the live streaming during the event. wget says it is a 1Gb file. Is there anything smaller? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database corruption
On Wed, 8 Apr 2009 22:14:38 -0400 "Jeff Brenton" wrote: > > There are no filesystem level content size restrictions that I am > aware of on this system. The user pgsql should have full access > to the filesystems indicated except for the root filesystem. finished inodes? A lot of small files (even empty) somewhere? It happened to me when I was running a spider using curl and the spiders where not exiting properly when another process was killing them... -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] recovery after segmentation fault
On Wed, 8 Apr 2009 23:59:43 +0200 Martijn van Oosterhout wrote: > What might be easier is turning on core dumps (ulimit -S -c > unlimited) and then start postgres and see if it drops a core thanks. > > Is there a way to just kill recovery for one DB? Just don't > > start it at all? > > Unfortunatly, the XLOG is shared betweens all databases on one > cluster. bwaaa. That's a bit of a pain. I'm trying to understand this a bit better... I think nothing terrible really happened since: a) the DB that has the higher write load was actually the one that caused the problem and I restored from a backup. b) the other DBs have some writes too... but the software using them doesn't have any idea about transactions so it is built with atomic statement in mind... No operation I can think of was writing in more than one table and I'd think most (all?) the operations were atomic at the statement level. So if I lost some writes in logs for the other DBs... that shouldn't be a problem, right? I just lost some data... not coherency? right? > > This is the same DB having problem with recreation of gin index > > BTW... and I've the feeling that the problem is related to that > > index once more... I was vacuuming full, I aborted... > > I think the DB is trying to recreate the index but due to some > > problem (can I say bug or is it too early?) it segfaults. > Interesting, hope you can get a good backtrace. I backed up all the data dir. I'm currently transferring it to my dev box. I've already the same DB... but it is on lenny. And it never gave me a problem. Version are slightly different anyway: Version: 8.3.6-1 (working) Version: 8.3.4-1~bpo40+1 (sometimes problematic[1]) 8.4 is at the door... and the only choice I have to fix the problem on that box is: - upgrade to lenny - build postgresql from source, that is going to be a maintenance pain. Could anything related to vacuum and/or gin index had been fixet between 8.3.4 and 8.3.6? I think that if I'll stick with some rituals I can live with it. Avoid vacuum full when there is load and restart the server before doing it. [1] slow vacuum full and gin index update -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there performance advantages in storing bulky field in separate table?
On Wed, 8 Apr 2009 17:39:02 +0100 Sam Mason wrote: > On Wed, Apr 08, 2009 at 04:56:35PM +0100, Ian Mayo wrote: > > One more thing: hey, did you hear? I just got some advice from > > Tom Lane! > > Statistically speaking; he's the person most likely to answer you > by quite a long way. Out of the ~24k emails going back to Oct > 2007 I've got from pgsql-general the most common people who wrote > them are: >who num mails of total > Tom Lane 1,9358.0% > Scott Marlowe 1,0774.5% > Alvaro Herrera 5212.2% > Joshua Drake4681.9% > Richard Huxton 4321.8% > Craig Ringer3381.4% > Ivan Sergio Borgonovo 3141.3% I just wrote privately to Tom that I'm ashamed I ask so much and answer so few. But well I'm an exception ;) I'm the top of non-contributors. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] recovery after segmentation fault
On Wed, 08 Apr 2009 10:59:54 -0400 Tom Lane wrote: > Ivan Sergio Borgonovo writes: > > 2009-04-08 16:36:53 CEST LOG: startup process (PID 3176) was > > terminated by signal 11: Segmentation fault 2009-04-08 16:36:53 > > CEST LOG: aborting startup due to startup process failure > > Hmm, what Postgres version is this? Can you get a stack trace from > the startup process crash? How on Debian? Debian does all it's automagic stuff in init. I never learned how to start pg manually. > The only simple way out of this is to delete the presumably-corrupt > WAL log by running pg_resetxlog. That will destroy the evidence I couldn't find it... mmm what a strange place for an executable: /usr/lib/postgresql/8.3/bin/pg_resetxlog > about what went wrong, though, so if you'd like to contribute to > preventing such problems in future you need to save a copy of > everything beforehand (eg, tar up all of $PGDATA). Also you might > have a corrupt database afterwards :-( What if I just don't care about recovery of *one* DB (that is maybe the culprit) and just see the server restart then just do a restore from a VERY recent backup? Is there a way to just kill recovery for one DB? Just don't start it at all? This is the same DB having problem with recreation of gin index BTW... and I've the feeling that the problem is related to that index once more... I was vacuuming full, I aborted... I think the DB is trying to recreate the index but due to some problem (can I say bug or is it too early?) it segfaults. I think this could be of some help: 2009-04-08 16:47:13 CEST LOG: database system was not properly shut down; automatic recovery in progress 2009-04-08 16:47:13 CEST LOG: redo starts at 72/9200EBC8 BTW: Linux amd64, debian stock kernel Debian etch/backport: Version: 8.3.4-1~bpo40+1 Now let's learn how to use pg_resetxlog thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] recovery after segmentation fault
postgresql suddenly died... during recovery 2009-04-08 16:35:34 CEST FATAL: the database system is starting up ^^^ several 2009-04-08 16:35:34 CEST LOG: incomplete startup packet 2009-04-08 16:36:53 CEST FATAL: the database system is starting up 2009-04-08 16:36:53 CEST LOG: startup process (PID 3176) was terminated by signal 11: Segmentation fault 2009-04-08 16:36:53 CEST LOG: aborting startup due to startup process failure It could be something wrong with the recovery process in an aborted transaction that is causing the segfault... How can I resurrect the server and load a backup? It was serving more than one DB and I assume that only one is causing problems. Can I skip just that one from recovery and start from backup? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] media server (mp3 player) on postgresql
This is a fancy request... but maybe among so many postgresql aficionados there is someone that know if there is any "media server/player" that support postgresql. Once upon a time amarok did support postgres to store and search metadata on songs. It did some pretty tricks with id3 data, postgresql and the file system. That was pretty comfort if you shared your media files across your home and you also wanted to share "scores" among your family. Now amarok dropped all DB support with the exception of mysql and most of my lan services are based on postgresql. KDE4.X.X just landed on sid, so the old amarok that supported postgresql is going to be wiped by the new one... I was wondering if there was something around that could be divided in 3 functional units: - a DB and file server used to search and "serve" media - a client (even web) to search and select media and decide on which box to play it - several "servers" that will actually play the media thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] reducing IO and memory usage: sending the content of a table to multiple files
On Fri, 3 Apr 2009 02:05:19 +0100 Sam Mason wrote: > On Thu, Apr 02, 2009 at 09:48:33PM +0200, Ivan Sergio Borgonovo > wrote: > > I didn't find any elegant example of cursor use in PHP... OK PHP > > is not the most elegant language around... but still any good > > exapmle someone could point me at? > You can obviously increase the "FETCH" upwards and if you're > feeling fancy you could even run the FETCH async from the code > that processes the results. Maybe something like: What kind of advantage should I get running asynchronously? oh I didn't mean you were suggesting any advantage... just wondering. It could be an option if once everything is up I want to keep under control resources sucked by this process (?). > > So I think the largest cost of the operation will be IO. > > \copy should be optimised for "raw" data output, but maybe all > > its advantages get lost once I've to use pipes and adding > > complexity to filtering. > Streaming IO is pretty fast, I think you'll be hard pushed to keep > up with it from PHP and you'll end up CPU bound in no time. Be > interesting to find out though. Filtering is currently very simple... I'm building a very simple xml just queueing constant strings and what comes out of the DB. But if I had to parse a CSV (split) or just assign names to columns (and I expect this stuff is going to be adjusted frequently) or use regexp... I was expecting to waste more human cycles or CPU cycles than avoiding to rely on optimised IO of \copy (if any). Most of the operations end up being: $output=SOMEHEADER; gzwrite($f1); gzwrite($f2); gzwrite($f3); gzwrite($f4); while(...) { $record1='.SOMECONST1.$row['col4'].'.$row['col3'].''; $record2='.SOMECONST2.$row['col4'].'.$row['col3'].''; gzwrite($f1); gzwrite($f2); gzwrite($f3); gzwrite($f4); } $output=SOMEFOOTER; gzwrite($f1); gzwrite($f2); gzwrite($f3); gzwrite($f4); I've the largest table of my DB to be sliced into multiple xml files that have to be written on disk. So actually 1) reading the whole table and returning 30% of its fields 2) writing all these data multiple times. This is by far the largest write load the server is going to incur in a day. But well it may be the largest CPU load it is going to incur in a day as well considering I've to gzip all the files. Still I think I've read on this list that compression was going to be a bottleneck more than IO. I just did a preliminary test and xml-ing and gzipping 80K records out of 1M takes less than 2sec. So maybe I was over concerned. Anyway I'd like to understand a bit better how IO and memory consumption is managed once you've cursor vs. plain select and client drivers in the middle. > > I was reading about all the php documents and trying to > > understand how buffers and memory usage works, so I gave a look > > to MySQL documents too... > > Not sure about PG, but the C api pretty much always buffers > everything in memory first. There was mention of getting control > of this, but I've got no idea where it got. buffer *everything*? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] reducing IO and memory usage: sending the content of a table to multiple files
On Thu, 2 Apr 2009 17:27:55 +0100 Sam Mason wrote: > On Thu, Apr 02, 2009 at 11:20:02AM +0200, Ivan Sergio Borgonovo > wrote: > > This is the work-flow I've in mind: > > > > 1a) take out *all* data from a table in chunks (M record for each > > file, one big file?) (\copy??, from inside a scripting language?) > > What about using cursors here? The only way I've seen using cursors with php involve defining a function... It looks a bit messy for a 10 lines script having to define a function just as a shell for a sql statement. I'd even write it in python if the project didn't start to look as a small Frankenstein... and sooner or later I bet I'll have to include some php files to recycle some function. I didn't find any elegant example of cursor use in PHP... OK PHP is not the most elegant language around... but still any good exapmle someone could point me at? If you could point me to some clean way to use cursors in php I'd happy to learn. I was thinking about using another embedded language that better suits data processing (an unsafe version?) so I could directly output to files from within a postgresql function... > > 2a) process each file with awk to produce N files very similar > > each other (substantially turn them into very simple xml) > > 3a) gzip them > GZIP uses significant CPU time; there are various lighter weight > schemes available that may be better depending on where this data > is going. That's a requirement. > > 2b) use any scripting language to process and gzip them avoiding > > a bit of disk IO > What disk IO are you trying to save and why? Because this is going to be the largest write operation the all system will have to handle during the day. I'm not interested in fast complicated queries, planning, transactions, caching... I just need to get a whole table pass it through a filter and output several filtered "versions" of the same table. So I think the largest cost of the operation will be IO. \copy should be optimised for "raw" data output, but maybe all its advantages get lost once I've to use pipes and adding complexity to filtering. > > Does PostgreSQL offer me any contrib, module, technique... to > > save some IO (and maybe disk space for temporary results?). > > > > Are there any memory usage implication if I'm doing a: > > pg_query("select a,b,c from verylargetable; --no where clause"); > > vs. > > the \copy equivalent > > any way to avoid them? > > As far as I understand it will get all the data from the database > into memory first and then your code gets a chance. For large > datasets this obviously doesn't work well. CURSORs are you friend > here. I was reading about all the php documents and trying to understand how buffers and memory usage works, so I gave a look to MySQL documents too... MySQL has mysql_unbuffered_query. So I was wondering how memory is managed on the server and on clients. What's going to happen when I do a $result=pg_query("select * from t1;"); while($row=pg_fetch_array($result)) { } vs. using cursors... vs. asynchronous query (they just look as non stopping queries with no relationship with memory usage) Where are the buffers etc... thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] reducing IO and memory usage: sending the content of a table to multiple files
This is the work-flow I've in mind: 1a) take out *all* data from a table in chunks (M record for each file, one big file?) (\copy??, from inside a scripting language?) 2a) process each file with awk to produce N files very similar each other (substantially turn them into very simple xml) 3a) gzip them 2b) use any scripting language to process and gzip them avoiding a bit of disk IO Does PostgreSQL offer me any contrib, module, technique... to save some IO (and maybe disk space for temporary results?). Are there any memory usage implication if I'm doing a: pg_query("select a,b,c from verylargetable; --no where clause"); vs. the \copy equivalent any way to avoid them? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question about hosting and server grade
On Wed, 25 Mar 2009 13:19:12 -0600 Scott Marlowe wrote: > Spend your money on more RAM, (32G isn't much more than 16G and > I've seen it make a world of difference on our servers). Spend it > on disks. Number of disks is often more important than RPM etc. > Spend it on fast RAID controllers with battery backed cache. > Then, consider upgrading your CPUs. We have 8 opteron cores in > our servers, and 12 Disk RAID-10s under a very fast RAID > controller, and we are still I/O not CPU bound. [snip] > But all of this depends on the type of workload your db has to > do. If you're running memory hungry select queries, focus on more > memory. If you're running lots and lots of little queries with a > mix of update, insert, delete and select, focus on the drives / > controller. If you're running queries that require a lot of CPU, > then focus more on that. Could IO load show up as apparent CPU load? I mean I've a pretty small DB. It should fit nearly all in RAM... or at least... after 1 day of load I can see the box may use 50K of swap. Anyway when I update the main table (~1M rows and a gin index) I can see the CPU reaching its limit. Most frequent updates involves 5K-20K changed record. On normal workload the most intensive queries run in 200ms with few exceptions and the BIG table is mostly in read access only. It would be nice if the update would be a bit faster since I'm still forced to do them during working hours... because people on the other side are still convinced it is not worth to clean rubbish at the source, so sometimes updates fail for inconsistent data. Unfortunately... I can add ram and disks but all the sockets for CPU are used. The box has 2 old Xeon HT at 3.2GHz. It's on RAID5 (not my choice) on a decent controller and has 4Gb of RAM. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] text column constraint, newbie question
On Mon, 23 Mar 2009 14:11:28 -0700 (PDT) RebeccaJ wrote: > now. Before, I was planning to have CHECK constraints in all of my > text or char fields, to keep out all semicolons, single quotes, and > anything else that looked dangerous. Now I'm thinking that I'll be > using htmlentities(), pg_escape_string() and pg_query_params() as check, htmlentities, pg_escape_string and pg_query_params really don't belong to the same family of "functions" and serve very different purposes. simplifying it very much: - check are used to control the quality of data that get stored in the db - htmlentities is about formatting for web output - pg_escape_string is to prepare input for sql and avoiding sql injection - pg_query_params is a relative of pg_escape_string but somehow used differently -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] text column constraint, newbie question
On Mon, 23 Mar 2009 03:30:09 -0600 Scott Marlowe wrote: > > I think pg_query_params should make a difference between floats > > and integers and signal an error if you pass float where > > integers are expected... but I'm not sure. > > Not really a security concern, but an early warning for some > > mistake. > So, what are the performance implications? Do both methods get > planned / perform the same on the db side? I don't think there is any appreciable advantage. Maybe all the stuff ala fprint perform better in C rather than building up a string concatenating and escaping in php. Still I wouldn't consider it a first source of slowdown. For making a difference in plan management you've to use another family of functions pg_prepare/pg_execute. I'm not an expert but not every time caching plans is what you'd like to do. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] text column constraint, newbie question
On Mon, 23 Mar 2009 01:07:18 -0600 Scott Marlowe wrote: > On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook > wrote: > > You should use pg_query_params() rather than build a SQL > > statement in your code, to prevent SQL injection attacks. Also, > > if you are going to read this data back out and show it on a web > > page you probably should make sure there is no rogue HTML or > > JavaScript or anything in there with htmlentities() or somesuch. > > Are you saying pg_quer_params is MORE effective than > pg_escape_string at deflecting SQL injection attacks? I didn't follow the thread from the beginning but I'd say yes. It should avoid queueing multiple statements and it is a more "general" method that let you pass parameters in one shot in spite of building the string a bit at a time for every parameter you insert (string, float, integer...). Of course if you correctly escape/cast/whatever everything injecting 2 statements shouldn't be possible... but if you don't you give more freedom to the attacker. $sql='select * from '.$table.' where a=$1 and $b=$2'; //oops I made a mistake. $result=db_query_params($sql,array(1,'bonzo')); If $table is external input and an attacker pass existingtable; delete from othertable; -- The attack may just result in a DOS if existingtable is there but your othertable shouldn't be wiped. untested... but I recall pg_query and pg_query_params use different C calls PGexec vs. PGexecParams and the later "Unlike PQexec, PQexecParams allows at most one SQL command in the given string." http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html I think pg_query_params should make a difference between floats and integers and signal an error if you pass float where integers are expected... but I'm not sure. Not really a security concern, but an early warning for some mistake. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] bash & postgres
On Mon, 23 Mar 2009 15:03:15 +1100 Greenhorn wrote: > Hi, > > I'm trying to pass variables on a bash script embedded with psql > commands. > > cat header.txt > > "to1","from1","subject1" > "to2","from2","subject2" > "to3","from3","subject3" > "to4","from4","subject4" > > cat b.sh > > #!/bin/bash > two="2" > > psql -h localhost -U postgres -d mobile -c "create temp table > header ( I enjoy another technique that's not exactly embedding but makes the sql file easily executable from other shells to and easier to maintain (eg. you don't lose syntax highlight, you don't need to make wide use of x bit, you can concatenate files...). echo "select :a;" | psql --set a=3 test ?column? -- 3 (1 row) of course in spite of piping your sql, you could put it into a file. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is PGSQL enough safe for internet?
On Wed, 18 Mar 2009 09:32:56 +0100 durumdara wrote: > Possible he can install an another pgsql service that can be > opened to the net and that can usable for ONLY THIS PROJECT. But I > don't know, that is possible or not; and how to upgrade later, if > needed... :-( If you can't use another DB... or you don't want to use another DB (this depends on how much separation/security you really think is suited...) you can configure postgresql to respond just to certain IP over SSL to access just certain DB even on a different port. http://developer.postgresql.org/pgdocs/postgres/auth-pg-hba-conf.html http://www.postgresql.org/docs/8.3/interactive/client-authentication.html > Can I use some functions in PGSQL to make some protecting rules? > Like in Mail Servers: if the IP is same and pwd is wrong in X > times, the IP blocked for N hours... I wouldn't recommend this approach. Someone may just close you out from your own home. Unless you're accessing the server from static IP and you can use white listing. -- Ivan Sergio Borgonovo http://www.webthatworks.it - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] schema, view and role
I've this view: create or replace view catalog_promosimple_v as select p.PromoSimpleID, p.IsPromo, p.Percent, p.OnListPrice, p.Vendible, p.OnStock, p.Dist, p.PromoStart, p.PromoEnd, pi.ItemID, pi.Discount from catalog_promosimple p join catalog_promosimpleitem pi on pi.PromoSimpleID=p.PromoSimpleID where p.PromoStart=now() and p.IsPromo=true; and I've 3 role, one is just a group and the others are an actual user with their corresponding schema. The table mentioned in the view actually belong to the user's schema. They don't exist in public. The owner of everything is the group role. group_role no schema user1_role (member of group_role) -> user1_role schema user2_role (member of group_role) -> user2_role schema Tables in user1_role schema contain some record. Tables in user2_role contain no record. I create the above view connected as the group_role. I get no error message. (?) If I select * from catalog_promosimple_v I get no error and no record, no matter with which user I'm logged in. If I create the view with one of the userN_role I get the expected result. Why creating the view from group_role doesn't report any error? After all group_role shouldn't have in the search path userN_role schema... and in fact it just has: show search_path ; search_path "$user",public If the view is just an alias for the SQL as "text" and it is not interpreted at creation time... once I log in with one of the userN_role I should see the correct result. If it gets interpreted when I create the view it should report an error since those tables don't exist in the public or group_role schema. -- Ivan Sergio Borgonovo http://www.webthatworks.it - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general