from:"Ivan Sergio Borgonovo"

Re: [GENERAL] pg_dump excluding tables content but not table schema

2009-12-28 Thread Ivan Sergio Borgonovo

On Mon, 28 Dec 2009 19:39:36 +
Raymond O'Donnell  wrote:

> On 28/12/2009 18:41, Ivan Sergio Borgonovo wrote:
> 
> > I'd like to just dump the table schema without dumping the table
> > contend.

> pg_dump -s -t  

My fault. I was not clear enough.
I'd like to make a "mostly" full backup, excluding from backup just
the data of some tables but backing up the schema of those tables.

mmm let me try if

pg_dump -Fc -Z9 -s -t *.cache* -d mydb > schema_only.bak
pg_dump -Fc -Z9 -T *.cache* -d mydb > nearly_full.bak

cat nearly_full.bak schema_only.bak | pg_restore -1 -d mydb

It seems it is working... I'll test if everything is there.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_dump excluding tables content but not table schema

2009-12-28 Thread Ivan Sergio Borgonovo

I've some tables that are just cache.

I'd like to just dump the table schema without dumping the table
contend.

I think I could do it in 2 steps but I'd like to avoid it.

Is there a way?

thanks


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] flagging first row inserted for each "group" of key

2009-12-17 Thread Ivan Sergio Borgonovo

On Thu, 17 Dec 2009 10:38:32 +0100
"A. Kretschmer"  wrote:

> In response to Ivan Sergio Borgonovo :
> > I've a web application.
> > 
> > I'm logging data related to the landing page and subsequent
> > selected hits.
> > 
> > create table track_hit (
> >   hitid serial not null,
> >   /* pk? I don't mind if serial wrap around
> >   pk could be (hitid, tracking_time) */
> >   esid varchar(32), -- related to session
> >   tracking_time timestamp not null default now(),
> >   -- some other stats
> >   -- first_hit boolean not null default false, -- ???
> > );
> >   
> > I'd like to be sure I just count one "first hit" in a session
> > (same esid) in an efficient way that means I'd like to mark them
> > with a
> 
> select distinct on (esid) esid, tracking_time from track_hit order
> by esid, tracking_time;
> 
> returns only one record for each esid, ordered by tracking_time.
> Should work with 8.x, maybe sice 7.x (I'm not sure)

I think I've tried to resolve a concurrency problem in the wrong
place... still... what is the difference between:

select min(hitid) as h from track_hit group by esid;

and

select distinct on (esid) hitid from track_hit order by esid,
track_time;

I haven't enough test data to see if they perform differently. The
second form seems to perform a little bit faster.
I'd expect the opposite: the first performing better.

I think I'll add an index on track_time for reporting and maybe make
primary key (hitid, tracking_time).
I don't want to be bothered by hitid wrap around, so I don't want to
make it a pk alone, still I may need a pk.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] flagging first row inserted for each "group" of key

2009-12-17 Thread Ivan Sergio Borgonovo

I've a web application.

I'm logging data related to the landing page and subsequent selected
hits.

create table track_hit (
  hitid serial not null,
  /* pk? I don't mind if serial wrap around
  pk could be (hitid, tracking_time) */
  esid varchar(32), -- related to session
  tracking_time timestamp not null default now(),
  -- some other stats
  -- first_hit boolean not null default false, -- ???
);
  
I'd like to be sure I just count one "first hit" in a session (same
esid) in an efficient way that means I'd like to mark them with a
flag and avoid a group by, min subquery cycle when I'm doing
reporting.
I can't trust the browser and I just want one first hit for each
esid, no matter if they have the same tracking_time.

Of course the problem is concurrency, but maybe I missed some pg
trick that could help me.

I'm on 8.3 and no plan to move to 8.4 shortly, so no windowing
functions that will make reporting easier/faster.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [Bacula-users] [GENERAL] Catastrophic changes to PostgreSQL 8.4

2009-12-03 Thread Ivan Sergio Borgonovo

On Thu, 3 Dec 2009 12:22:50 +0100 (CET)
"Kern Sibbald"  wrote:

> Yes, that is my experience too.  I understand Craig's comments,
> but I would much prefer that Bacula just backup and restore and
> leave the checking of filename consistencies to other programs.
> At least for the moment, that seems to work quite well.  Obviously
> if users mix character sets, sometime display of filenames in
> Bacula will be wierd, but nevertheless Bacula will backup and
> restore them so that what was on the system before the backup is
> what is restored.

I expect a backup software has a predictable, reversible behaviour
and warn me if I'm shooting myself in the foot.

It should be the responsibility of the admin to restore files in a
proper place knowing that locales may be a problem.

I think Bacula is taking the right approach.

Still I'd surely appreciate as a feature a "tool" that will help me
to restore files in a system with a different locale than the
original one or warn me if the locale is different or it can't be
sure it is the same.
That's exactly what Postgresql is doing: at least warning you.
Even Postgresql is taking the right approach.

An additional "guessed original locale" field and a tool/option to
convert/restore with selected locale could be an interesting feature.

What is Bacula going to do with xattr on different systems?

Postgresql seems to offer a good choice of tools to convert between
encodings and deal with bytea.
Formally I'd prefer bytea but in real use it may just be an
additional pain and other DB may not offer the same tools for
encoding/bytea conversions.

Is it possible to search for a file in a backup set?
What is it going to happen if I'm searching from a system that has a
different locale from the one the backup was made on?
Can I use regexp? Can accents be ignored during searches?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] duplicating a schema

2009-12-01 Thread Ivan Sergio Borgonovo

On Tue, 1 Dec 2009 09:53:12 +0100
Ivan Sergio Borgonovo  wrote:

> On Tue, 1 Dec 2009 11:39:06 +0900
> Schwaighofer Clemens  wrote:
> 
> > On Sun, Nov 29, 2009 at 21:21, Ivan Sergio Borgonovo
> >  wrote:
> > > I need to create a new schema with all the content in an
> > > existing one, just with a new name.
> 
> > > The way I've found is:
> > > - make a backup
> > > - load it in a dev box
> > > - rename the schema
> > > - make a backup of the new schema
> > > - restore the new schema on the original DB.
> 
> > > Is there a more efficient approach?
> 
> > Sadly no. With smaller DBs I do a sed on the dump ... wished
> > there would be a restore with not only a target DB but also a
> > target schema.

> I thought about sed but I think postgresql parse better SQL than me
> and sed together.

semi-tested solution:

pg_dump -d mydb -Fp --schema=XXX > mydb.bak

(echo 'begin;'; echo 'alter schema XXX rename to YYY'; echo 'create
schema XXX' authorization AAA; pg_restore --schema=XXX < mydb.bak;
echo 'commit;') > psql -d mydb

And some further notes and a script here

http://www.webthatworks.it/d1/node/page/howto_duplicating_schema_postgresql

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] duplicating a schema

2009-12-01 Thread Ivan Sergio Borgonovo

On Tue, 1 Dec 2009 11:39:06 +0900
Schwaighofer Clemens  wrote:

> On Sun, Nov 29, 2009 at 21:21, Ivan Sergio Borgonovo
>  wrote:
> > I need to create a new schema with all the content in an existing
> > one, just with a new name.

> > The way I've found is:
> > - make a backup
> > - load it in a dev box
> > - rename the schema
> > - make a backup of the new schema
> > - restore the new schema on the original DB.

> > Is there a more efficient approach?

> Sadly no. With smaller DBs I do a sed on the dump ... wished there
> would be a restore with not only a target DB but also a target
> schema.

I thought about sed but I think postgresql parse better SQL than me
and sed together.
Why do you prefer sed over backup/restore on smaller DB?

I didn't test this... but I think it could be even better if I
wouldn't prefer to have a full backup before such operation:

pg_dump -dmydb --schema=XXX -Fp > XXX.bak

begin;
alter schema XXX rename to YYY;
create schema XXX;
\i XXX.bak;
commit;

This could be obtained with a pipe... but in case something goes
wrong I'd prefer to have the "backup" of the schema somewhere
in spite of needing to recreate it.

Renaming a schema seems pretty fast.
So I don't think in case the transaction abort it would make any big
difference compared to changing the schema name in another DB.

Still being able to have a schema as a target would make things
cleaner, faster and safer.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] duplicating a schema

2009-11-29 Thread Ivan Sergio Borgonovo

I need to create a new schema with all the content in an existing
one, just with a new name.

The way I've found is:
- make a backup
- load it in a dev box
- rename the schema
- make a backup of the new schema
- restore the new schema on the original DB.

Is there a more efficient approach?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] advocating pgsql was:passing parameters to multiple statements

2009-11-18 Thread Ivan Sergio Borgonovo

On Wed, 18 Nov 2009 14:39:05 +0100
Pavel Stehule  wrote:

> Standard is good. And I prefere standard everywhere, where is

[snip]

> Any dogmatism is wrong - yes. But minimally me - and probably David
> has very bad experience with design ala "all sql code for all
> databases". And I have good experience with different strategy -
> early decomposition and separation application and database (engine
> specific) layer. Nothing less, nothing more.

[snip]

I'm not competing on the technical position of the issue, yours is
very respected. I'd like to put the accent on the "social" part of it
and on the fall back on our beloved DB.

I think there are far more people knowing just one DB and badly than
companies with such strict (insane) policies and... well PostgreSQL
is very standard compliant.

I'd beat another horse ;)

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] passing parameters to multiple statements

2009-11-18 Thread Ivan Sergio Borgonovo

On Wed, 18 Nov 2009 11:38:46 +0100
Pavel Stehule  wrote:

> 2009/11/18 Ivan Sergio Borgonovo :
> > On Tue, 17 Nov 2009 20:16:36 -0800
> > David Fetter  wrote:
> >
> >> On Tue, Nov 17, 2009 at 09:33:05AM -0700, Konstantin Izmailov
> >> wrote:
> >> > Some companies have policy to stay DB agnostic, i.e. use
> >> > standard SQL only.
> >
> >> That's called shooting yourself in the head.
> >
> > I'm a small fish. I use just Free software and still I think that
> > departing from agnosticity has its cost even if you don't have to
> > pay license costs.
> > Especially if you did it without knowing it or with no reason.
> > Many times departing from agnostic code is caused by:
> > - lack of knowledge of standards/more than one DB
> > - early optimization
> >
> > It's just a matter of where you're going to compromise and why,
> > but you've to do it consciously.
> >
> > eg. a lot of code could run on mysql and postgresql as well at no
> > cost, but many people just ignore there is something else other
> > than mysql.
> > That's shooting yourself in the head without even knowing the
> > reason.

> Sorry, but David has true. I understand, so management is happy,

I didn't say he was wrong.

As usual it is a matter of knowledge and trade off.
How can you say what's better if:
- you don't know what is standard
- you don't know the performance impact of writing something in a
  dialect of SQL rather than in a standard way

One thing is saying you accept the need of breaking compatibility
for some DB another is saying that pursuing writing standard code is
reckless since it makes all projects too complex and bloated.

Ignorance and dogmatism are strict relatives, but I'd say the former
is the root of the later.

In fact what I generally observe is:
- we just know [this] (ignorance)
- this *looks* like it will run faster/be easier to write if we write
  it this way
- we tried it on another DB and it performed really bad/was really
  complicated to rewrite
- everything else other than [this] is bad, why should we care
  (dogmatism)

Depending on the domain of the application the DB may not be such a
critical part of the overall, and still many things may easily be
written in a way that is independent from the DB.

In my experience you may end up writing 90% of code that could easily
be written in a standard way and with no appreciable difference in
costs (performance/coding).

Writing stuff in a way that it will make cheaper porting code may
even protect you from the changes in the DB you chose as a target.

A policy that mandates the use of portable SQL code for any part of
any application that you're going to write in a company is equally
insane as a policy that mandates to write all code for all
applications in python and make them in such a way that they could
be automatically translated in any language whose name start with
p ;)

But I think such kind of policy is rarer than the programmers that
know more than a couple of SQL dialects.

I don't think companies with such an high level of dogmatism can
survive enough long to get involved in something that is not
trivial, while it is far more frequent to see applications that
don't have such an high coupling with the DB that still are dependent
on it just for lack of knowledge of SQL.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] passing parameters to multiple statements

2009-11-18 Thread Ivan Sergio Borgonovo

On Tue, 17 Nov 2009 20:16:36 -0800
David Fetter  wrote:

> On Tue, Nov 17, 2009 at 09:33:05AM -0700, Konstantin Izmailov
> wrote:
> > Some companies have policy to stay DB agnostic, i.e. use standard
> > SQL only.

> That's called shooting yourself in the head.

I'm a small fish. I use just Free software and still I think that
departing from agnosticity has its cost even if you don't have to
pay license costs.
Especially if you did it without knowing it or with no reason.
Many times departing from agnostic code is caused by:
- lack of knowledge of standards/more than one DB
- early optimization

It's just a matter of where you're going to compromise and why, but
you've to do it consciously.

eg. a lot of code could run on mysql and postgresql as well at no
cost, but many people just ignore there is something else other than
mysql.
That's shooting yourself in the head without even knowing the reason.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] impersonating a user/ownership problems

2009-11-17 Thread Ivan Sergio Borgonovo

I've several schemas each one is owned by a user.
Then there are websites sharing the same code base using different
schemas and connecting with different users.

When I have to refactor I generally have to make the same changes
for all the schemas.

I log in as a user that have enough right to all schemas and apply
the changes.

Since I've to make all the changes to all schemas I can't apply
changes to each schema with different connections.

But this ends up in writing a lot of code for altering ownership of
objects and eg. turning an int into a serial becomes particularly
painful.

create sequence
alter table ... owner to
alter sequence ... owned by ...
alter table ... set default ...

Is there a less painful approach?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] knowing which table/schema is going to be chosen

2009-11-12 Thread Ivan Sergio Borgonovo

On Thu, 12 Nov 2009 10:38:27 +
Richard Huxton  wrote:

> Ivan Sergio Borgonovo wrote:
> > I have a search_path that may not just consist of $user, public.
> > eg.
> > $user, public, test, import
> > 
> > I'd like to know which table is going to be chosen if I do a
> > select * from mytable;
> 
> > Is there a way to ask postgresql the schema of the table that
> > will be chosen?
> 
> Hmm - I don't know of a specific function. You could do something
> like this though:

> SELECT nspname FROM pg_namespace
> WHERE oid = (
>   SELECT relnamespace FROM pg_class
>   WHERE oid = 'mytable'::regclass::oid
> );

This surely meet my needs, and I'm going to place it in my toolbox
still... is there a way that could use information_schema?

My need was caused by a compromise with 2 immature API... so I'm not
surprised that a solution looks like an hack but I was wondering if
in other cases knowing in advance which table postgresql is going to
pick up could be a legit interest.

BTW I think I've spotted an error in the docs:
http://www.postgresql.org/docs/8.3/interactive/ddl-schemas.html
http://www.postgresql.org/docs/8.4/interactive/ddl-schemas.html

5.7.2. The Public Schema
 In the previous sections we created tables without specifying any
 schema names. By default, such tables (and other objects) are
 automatically put into a schema named "public". Every new database
 contains such a schema. Thus, the following are equivalent: ...

CREATE TABLE products ( ... );
 and:
CREATE TABLE public.products ( ... );

I think they are not equivalent if the search_path contains the name
of an existing schema.

Is there anything equivalent to search_path in the SQL standard?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] knowing which table/schema is going to be chosen

2009-11-12 Thread Ivan Sergio Borgonovo

I may have several tables with the same name in different schema.

pina.mytable, test.mytable, import.mytable

I have a search_path that may not just consist of $user, public.
eg.
$user, public, test, import

I'd like to know which table is going to be chosen if I do a
select * from mytable;

In this case test.mytable will be chosen.

Is there a way to ask postgresql the schema of the table that will be
chosen?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] xml import/export tools and performance

2009-11-05 Thread Ivan Sergio Borgonovo

I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pk vs unique not null differences between 8.3.4 and 8.3.8

2009-11-02 Thread Ivan Sergio Borgonovo

On Mon, 02 Nov 2009 09:53:06 -0500
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > This statement succede in 8.3.8
> > alter table shop_commerce_gift drop constraint
> > shop_commerce_gift_pkey;
> > but failed in 8.3.4.
> 
> Uh, that doesn't seem to match the name of the index?

Yep... that was my surprise.

I don't have a time machine to be absolutely sure about what I did
but I should have created a pk on both machines with a
create table shop_commerce_gift(
  giftcode varchar primary key
...
);

but in the newer pg I actually find the pk constraint... in the
older I find a not null + an unique index with a different name from
the pk name of the former.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pk vs unique not null differences between 8.3.4 and 8.3.8

2009-11-02 Thread Ivan Sergio Borgonovo

I prepared a script to upgrade the schema in a 8.3.8 dev box and now
I was trying to replicate the change in a staging 8.3.4 box... and
it failed.

I don't have a recent backup of my dev box so I can't check if I'm
day dreaming.

This statement succede in 8.3.8
alter table shop_commerce_gift drop constraint
shop_commerce_gift_pkey;

but failed in 8.3.4.

I checked the table def in the 8.3.4 and it reports:

giftcode| character varying(16) | not null
...
Indexes:
"shop_commerce_gift_giftcode_key" UNIQUE, btree (giftcode)

but then when I try to drop the index... pg says that the index is
needed for shop_commerce_gift_giftcode_key constraint

was it something that was fixed between the releases or I just
didn't take note of what I really did in the staging box?

btw I was using the same pgsql version from my desktop to \d the
tables.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] still on joining array/inline values was and is: design, ref integrity and performance

2009-10-28 Thread Ivan Sergio Borgonovo

On Wed, 28 Oct 2009 10:12:19 -0500
Peter Hunsberger  wrote:

> > The first approach requires a distinct/group by that may be
> > expensive.
> > The second one requires I keep in memory all the emails while the
> > first statement run.

> Unless you're dealing with 100,000's of these things I think you're
> engaging in a process of "premature optimization".  Group by can
> work efficiently over millions of rows.

We may get in the range of half that number occasionally but not
feeding emails directly from a HTTP request.
Still the number of passwords generated in one run may be in the
range of 50K. But well I could calmly wait 2 or 3 seconds.
Making some very rough test on a similar box to the one I'll have to
use it takes few milliseconds on a not indexed table.

> Do the simplest thing possible.  Get it working, then see if you
> have any new problems you need to solve.  Every issue you've
> described so far is database design 101 and should present no real
> problem.  I think you're agonizing over nothing...

That's always a good advice. Sometimes you're out just for moral
support.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] still on joining array/inline values was and is: design, ref integrity and performance

2009-10-28 Thread Ivan Sergio Borgonovo

On Tue, 27 Oct 2009 10:54:06 +
Richard Huxton  wrote:

> > Association between email and password is just meant to build up
> > a queue for mailing and there is no uniqueness constraint on
> > (password, email) pair.
> 
> > create table pw_email(
> >   password varchar(16),
> >   email varchar(64)
> > );
> 
> > create table pw_resource(
> >   res int references ...
> >   password varchar(16)
> > );

> > But I've to generate password/email couples first before filling
> > pw_resource.
> 
> The simplest thing would be to do them the other way around, but
> assuming you can't...

Trying to understand why I can't do the other way around I made
clearer to myself the constraints.

- There is no reason to have a password without an associated
  recipient.
- There is no reason to have a password without an associated
  resource resource.
- I don't want the same password for more than one resource.

So to satisfy the first 2 requirements... it looks to me the table
should be:

create table resources(
  res int primary key
  /* other characteristics of the resource... */
);

create table pw(
  res int references resources(ref) on delete cascade,
  password varchar(16) not null,
  email varchar(64) not null,
  /* Should I chose some unique constraint on the couples? which? */
);

To have the 3rd constraint I'd have a table:
create table pw_res(
  password varchar(16) primary key,
  res int references resources (res) on delete cascade
);

This comes handy for 2 reasons:
- it helps me to enforce the third constraint
- it makes it easier to find which resource is associated with a
  password that will be a common search

But this introduces one more problem if I decide to delete a
password. This could happen for "shared" and non shared passwords.
I don't think it is something that may happen frequently... but it
may happen... and once you start to introduce mess in your data it
is hard to come back.

So a fk on pw.password may help... but... once I've that fk I can't
insert res,pw,email without pw baing in pw_res.

If I do the other way around inserting password(s) in pw_res I may
end up with passwords with no destination associated.

Surely I could wrap everything in a transaction so that if I can't
succede to insert email/pw records everything will be rolled back.

But I can't see how to make it cleaner.

I can get emails and associate them with a resource and a password
in one sql statement if I can defer the constraint check on password.
The next statement would be used to fill pw_res with distinct values
from pw.

If I do the other way around in case I need different passwords for
different emails I'll have to:
- count the emails and fill pw_res with as many password as needed.
  I think I'll need generate_series.
- find a way to associate these passwords with emails

I don't know how to do the later.
Still supposing I knew how to write the sql statement that will take
values('email1', 'email2'...)
as suggested by Merlin Moncure in a different thread I can see some
pros and cons of these approaches.

The first approach requires a distinct/group by that may be
expensive.
The second one requires I keep in memory all the emails while the
first statement run.

I'd think that if memory consumption start to be an issue, that
means that the number of emails start to be too large to be loaded
from an HTTP request, so I could have it on file and load it in a
temp table and so on...

Still before I make any test I need to find how to associate emails
with passwords considering that I'd like to pass email "inline" the
SQL statement and I'll have the passwords in pw_res.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] design, ref integrity and performance

2009-10-27 Thread Ivan Sergio Borgonovo

On Tue, 27 Oct 2009 09:17:59 +
Richard Huxton  wrote:

> Ivan Sergio Borgonovo wrote:
> > Hi,

> > I've to generate unique password and associate them with emails.
> > Association with emails is just to mail the password, email +
> > password aren't "the password", just the password is.

> > So a bunch of emails may be associated with the same password.

> > So there are 2 kind of passwords:
> > - shared, multiple use
> > - individual (they could be multiple or single use)

> So are you saying I login using *just* a password, not using my
> email as a user-name?

yes

> How do you know who is logging in and what does it mean for the
> password to be shared?

I don't care who "logged in".
Shared just mean several people could use a resource providing the
same password.
They are a sort of "promotional codes". Some of them can be shared,
I don't mind if people give them to others provided they let our
service be known by others.
Then there are unique passwords on which we may or may not check if
they are used by multiple users of the site... but that's another
problem.

Association between email and password is just meant to build up a
queue for mailing and there is no uniqueness constraint on
(password, email) pair.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] design, ref integrity and performance

2009-10-27 Thread Ivan Sergio Borgonovo

Hi,

I've to generate unique password and associate them with emails.
Association with emails is just to mail the password, email +
password aren't "the password", just the password is.

So a bunch of emails may be associated with the same password.

So there are 2 kind of passwords:
- shared, multiple use
- individual (they could be multiple or single use)

I've as input a list of emails and according to the kind of
passwords I've to generate I fill a table that is
create table pw_email(
  password varchar(16),
  email varchar(64)
);

So data inside may look like

/* bunch of shared passwords */
abcdefg, 1...@example.com
abcdefg, 2...@example.com
abcdefg, 3...@example.com
abcdefg, 4...@example.com
/* bunch of individual passwords */
abcdefg1, 1...@example.com
abcdefg2, 2...@example.com
abcdefg3, 5...@example.com
abcdefg4, 6...@example.com

Now each password is linked to the access of one or more resource.
I could just add the resource id to the pw_email table but:
- it is not normalized
- considering many password will be duplicated, I'll have to check
  on a larger table to see which password give access to what
- I'd like to associate different passwords to different resource so
  that one password doesn't grant access to more than one resource.

create table pw_resource(
  res int references ...
  password varchar(16)
);

Of course if one password is in pw_email but it is not in
pw_resource I have a problem.

But I've to generate password/email couples first before filling
pw_resource.

I could make the constraint deferrable, add a on delete cascade to
pw_email.password but:
- I'm not sure it is a good design
- I'm still not sure it can work as expected

Before testing if it can work I'd like to hear some comment on the
design.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] joining an array with a table or...?

2009-10-26 Thread Ivan Sergio Borgonovo

On Mon, 26 Oct 2009 14:56:26 -0400
Merlin Moncure  wrote:

> On Mon, Oct 26, 2009 at 11:05 AM, Ivan Sergio Borgonovo
>  wrote:
> > To make it more concrete I came up with:
> >
> > select coalesce(u.mail,j.mail) from (
> >  select (array['m...@example1.com','m...@example2.com'])[i] as mail
> >   from generate_series(1,2) i) j
> >   left join users u on upper(u.mail)=upper(j.mail);
> 
> how about this:
> select coalesce(u.mail,j.mail) from
> (
>  values ('m...@example1.com'), ('m...@example2.com')
> ) j(mail)
>  left join users u on upper(u.mail)=upper(j.mail);

Yours is between 4 to 10 times faster excluding time on client side
to escape the strings.

I'll play a bit with client code to see if the advantage is kept.

It looks nicer too.

Currently I'm testing with very few match between input array and
user table.
Will this have different impact on the 2 methods?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] joining an array with a table or...?

2009-10-26 Thread Ivan Sergio Borgonovo

On Mon, 26 Oct 2009 14:15:26 +0100
Ivan Sergio Borgonovo  wrote:

> I've a list of emails and a list of users (with emails).
> 
> If the list of emails was already inside a table
> 
> create table mails (
>  mail varchar(64)
> );
> 
> create table users (
>   name varchar(127),
>   mail varchar(64)
> );
> 
> I'd do:
> select coalesce(u.mail, m.mail) from mails left join users on
> u.mail=m.mail;
> 
> Now mails are into a php array and they may be in the range of 5000
> but generally less.
> 
> The final query will be something like
> insert into mailqueue (qid, uid, mail, ...) select ...
> 
> and since some fields are pseudo random sequences computed from a
> serial, it would be "clean" to do it just in one query.

> Any clean technique?

To make it more concrete I came up with:

select coalesce(u.mail,j.mail) from (
  select (array['m...@example1.com','m...@example2.com'])[i] as mail
   from generate_series(1,2) i) j
   left join users u on upper(u.mail)=upper(j.mail);

but I sincerely dislike it.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] joining an array with a table or...?

2009-10-26 Thread Ivan Sergio Borgonovo

I've a list of emails and a list of users (with emails).

If the list of emails was already inside a table

create table mails (
 mail varchar(64)
);

create table users (
  name varchar(127),
  mail varchar(64)
);

I'd do:
select coalesce(u.mail, m.mail) from mails left join users on
u.mail=m.mail;

Now mails are into a php array and they may be in the range of 5000
but generally less.

The final query will be something like
insert into mailqueue (qid, uid, mail, ...) select ...

and since some fields are pseudo random sequences computed from a
serial, it would be "clean" to do it just in one query.

Any clean technique?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] setweight detailed doc was: Ranking search results using multiple fields in PostgreSQL fulltext search

2009-10-12 Thread Ivan Sergio Borgonovo

On Mon, 12 Oct 2009 20:02:16 +0530
Gaini Rajeshwar  wrote:

> > inputquery := setweight(cfg, inputtitle, 'A', '&');
> > inputquery := inputquery && setweight(cfg, inputsummary, 'B',
> > '&');

> I didn't understand why did u use '&'  operator in setweight
> function. is that going to help in any way?

I don't understand it either...
I just copied and pasted from a working function I wrote long ago.

select setweight('pg_catalog.english', 'java', 'A', '&');

I can't remember what was the meaning of that '&' and I can't find
the docs.

Could someone point me to a more detailed doc that explain in more
details setweight?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Ranking search results using multiple fields in PostgreSQL fulltext search

2009-10-12 Thread Ivan Sergio Borgonovo

On Mon, 12 Oct 2009 19:26:55 +0530
Gaini Rajeshwar  wrote:

> Ivan,
> If i create a tsvector as you mentioned with concatenation
> operator, my search query will search in any of these fields which
> are concatenated in my tsvector.
> For example, if i create tsvector like this,
> UPDATE document_table SET search_col =
>   setweight(to_tsvector(coalesce(title,'')), 'A') ||
>   setweight(to_tsvector(coalesce(summary,'')), 'B'));
> 
> and do a query like this
> select title, ts_rank(search_col, to_tsquery('this is my text
> search') AS rank
> FROM search_col @@ to_tsvector('this & is & my & text & search')
> ORDER BY rank DESC
> the above query will search in title and summary and will give me
> the results. But i dont want in that way.When a user wants to
> search in title, it should just search in title but the results
> should be ranked based on * title* and *summary* field.

Search *just* in title specifying the weight in the input query and
rank on title and summary.

/*
-- somewhere else in your code...
search_col := setweight(cfg, title, 'A', '&');
search_col := search_col && setweight(cfg, summary, 'B', '&');
*/

select rank(search_col, to_tsquery(inputtitle)) as rank
-- rank on both if search_col just contains title and summary
...
where search_col @@ setweight(cfg, inputtitle, 'A', '&')
-- return just matching title
order by ts_rank(...)

is it what you need?

This is just one of the possible way to rank something...

otherwise: really understand how rank is computed, keep
columns/ts_vector separated, compute rank for each column and pass
the result to some magic function that will compute a "cumulative"
ranking...
Or you could write your own ts_rank... but I tend to trust Oleg and
common practice with pg rather than inventing my own ranking
function.

Right now ts_rank* are black boxes for me. I envisioned I may enjoy
some finer tuning on ranking... but currently they really do a good
job.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Ranking search results using multiple fields in PostgreSQL fulltext search

2009-10-12 Thread Ivan Sergio Borgonovo

On Mon, 12 Oct 2009 19:11:01 +0530
Gaini Rajeshwar  wrote:

> I think you misunderstood my question. let me give clear idea
> about what i need.
> 
> I am using PostgreSQL fulltext search (tsearch2) feature to
> implement searching on database. From readings i came to know that
> we can give weights to different fields in database something like
> this:
> 
> *setweight(to_tsvector(title),'A')*
> 
> Where 'A' is weight given to field title. i can give weights to
> other fields in the same way. Where the weights 'A', 'B', 'C', 'D'
> are in will be in the following order *A > B > C > D* according to
> defalut fulltext search configuration.
> 
> We can rank the search results using ts_rank function something
> like this,
> 
> *ts_rank(tsv_title,ts_query('this is my search text'))*
> **
> But, i want to rank these reults not only based on just title, but
> also using other fields like summary etc.
> Is there a way around to do this?

if you concatenate your fields with different weight in the *same*
ts_vector, ranking will take into account your weight...

Someone more knowledgeable than me chose how to use weight to give a
reasonable ranking.

Of course if you've field a, b and c and you want to search in a and
b only, you'll have to concatenate just a and b.

If you need different assortment in fields groups... you'll have to
add some extra redundancy if you plan to store precomputed
ts_vectors for each record.

If you need to search "separately" in different fields
(eg. title ~ 'gino' AND summary ~ 'pino')
you just need to weight the input query as well

inputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&');

...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Ranking search results using multiple fields in PostgreSQL fulltext search

2009-10-12 Thread Ivan Sergio Borgonovo

On Mon, 12 Oct 2009 18:46:02 +0530
Gaini Rajeshwar  wrote:

> Hi,
> is there a way to rank the search results based on multiple fields
> in postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in
> my database. When user searches on *title*, i want to rank the
> results based on *title* field as well as *summary* field, where
> importance(summary) > importance(title). But the results should be
> exactly matching the terms in "title" rather than "title" OR
> "summary"

http://www.postgresql.org/docs/current/interactive/textsearch-controls.html

Basically, as you can read in the docs:
- you create a ts_vector concatenating and giving a weight the
  various fields.
- then you compare your ts_vector with
  plainto_tsquery(config, yourinput) @@
  yourpreviouslycomputedts_vector

and order by ts_rank(yourpreviouslycomputedts_vector, yourinput)
(or ts_rank_cd)


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Temp table or normal table for performance?

2009-08-21 Thread Ivan Sergio Borgonovo

On 20 Aug 2009 13:43:10 GMT
Jasen Betts  wrote:

> On 2009-08-19, Stephen Cook  wrote:
> 
> > Let's say I have a function that needs to collect some data from
> > various tables and process and sort them to be returned to the
> > user.
> 
> plpgsql functions don't play well with temp tables IME.

Why?

you mean that since you generally use temp table for computation and
looping several times over the table... a more expressive language
would be suited?


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] "ownership" of sequences, pseudo random unique id

2009-08-21 Thread Ivan Sergio Borgonovo

On Thu, 20 Aug 2009 14:31:02 -0400
Alvaro Herrera  wrote:

> Ivan Sergio Borgonovo wrote:
> > I've
> > 
> > create table pr(
> >   code varchar(16) primary key,
> >   ...
> > );
> > create sequence pr_code_seq owned by pr.code; -- uh!

> > actually stuff like:
> > alter table pr drop column code;
> > or just
> > drop table pr
> > 
> > seems to work as expected (they drop the sequence too).

> > Should I be concerned of anything since it looks like a hack?

> You need to ensure you have a retry loop in your insertion code,
> because if the generated code conflicts with a manually inserted
> code, it will cause an error.  Other than that, seems like it
> should work ...

I was mainly concerned about assigning ownership of a sequence to a
column that is not an int.
This looks like an hack ;) but it looks to work as expected:
dropping the column or the table drop the sequence.
Assigning ownership just avoid me to remember that if I drop the
column I don't need the sequence.
So owned by just mean "drop if" regardless of type or anything else.

I could even define the table as
create table pr(
  code varchar(16) primary key,
  ...
);
create sequence pr_code_seq owned by pr.code;
alter table pr
  alter column code
set default 
to_hex(feistel_encrypt(nextval('pr_code_seq')));
That will make more explicit the relationship between the sequence
and the column.

I think I can avoid conflict between auto and manually generated
codes imposing a different format on input in client code.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] bayesian classification over tsvector

2009-08-20 Thread Ivan Sergio Borgonovo

I was wondering if there is some apps, technique, tool to get
inspiration or just use to classify rows according to a weighted
tsvector.

Each row contain a tsvector obtained concatenating 4 rows with
different weight.

I've a corpus for each group.

I'd like to classify the rows that haven't been assigned to a group
already.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Generating random unique alphanumeric IDs

2009-08-20 Thread Ivan Sergio Borgonovo

On Thu, 20 Aug 2009 13:34:51 +0100
Thom Brown  wrote:

Correcting myself.
a) it is a bad idea to pad an hex with an hex... so I should still
find a quick way to change representation to [g-z] for the padding
characters... or just pad with a constant string.
select lpad(
 to_hex(feistel_encrypt(10)),8 , 'mjkitlh')
);

b) this if from int (signed) to int (signed).

begin;
create or replace function feistel_encrypt(value int)
  returns int as
  $$
  declare
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
  begin
l1:= (value >> 16) & 65535;
r1:= value & 65535;
while i<3 loop
  l2:=r1;
  r2:=l1#1366.0
*r1+150889)%714025)/714025.0)*32767)::int;
  l1:=l2;
  r1:=r2;
  i:=i+1;
end loop;
return ((l1 << 16) | r1);
  end;
  $$ language plpgsql strict immutable;
create or replace function feistel_decrypt(value int)
  returns int as
  $$
  declare
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
  begin
l2:= (value >> 16) & 65535;
r2:= value & 65535;
while i<3 loop
  r1=l2;
  l1:=r2#1366.0*l2+150889)%714025)/714025.0)*32767)::int;
  l2:=l1;
  r2:=r1;
  i:=i+1;
end loop;
return ((l2 << 16) | r2);
  end;
  $$ language plpgsql strict immutable;
commit;

select * from feistel_decrypt(feistel_encrypt((2^31-1)::int))
union
select * from feistel_decrypt(feistel_encrypt((-2^31)::int))
union
select * from feistel_decrypt(feistel_encrypt((0)::int))
union
select * from feistel_decrypt(feistel_encrypt((-1)::int))
;

> This appears a lot more tricky than I had originally anticipated!
> I may be misunderstanding your example, but by alphanumeric, I
> mean beyond hex (i.e. a-z and possibly uppcase too).

me too... but to_hex was there and a quick trick to shorten the
string and get rid of a sign.

> I've looked into LFSR, but I'm afraid it goes over my head.   But

There is too much dust on my copy of "Concrete Mathematics" still by
popular culture (read wikipedia) it is said that LFSR are not
cryptographically safe, while making 4 loops and choosing a suitable
F, Feistel cypher is.

Then it is just a problem of "shrinking the string" or
representing it in another base... and that may result in some
"waste".
5 bits are 32 char... you actually have more chars available even
just considering a subset of ASCII.

Picking 5 bits from LFSR algo isn't that different than converting
to hex feistel cipher as I see it. The main advantage of hex over
ASCII is that ints map very well to hex (no waste) and that to_hex
has good chance to perform better than any plpgsql function.

Since I'm generating "gift codes" It wouldn't look nice if I present
the user with

A

as a gift code...

And that's going to happen as soon as I'll have generated 232798491
gift codes. (/me wondering which is the smaller number with a
corresponding one digit hex(fiestel()) transform.)[1].
So just to make gift codes look nicer I thought about padding them
with some furter random noise... but the way initially described is
not going to work. Variants could be to concat with something
[^a-f0-9] (eg '-') and then padding with hex random noise

A -> -A -> (random noise)-A

I don't know if it is worth since it is another round of lpad.

Even if I'm currently not overly concerned by performances I'm
working with plpgsql and while I think that writing something to
change base representation to an int can be done... it will be slow
and ugly.
If I was working with pgperl (?) I'd just google for some perl
receipt.
Given the premises I'll just embellish the hex with some padding.
But if you really need to use letters and be compact and such... I
think you're just looking for changing the base of your
wathever-pseudo-random algorithm.

That's a common problem you may just have to adapt to plpgsql.

[1]
select s.i, feistel_decrypt(s.i)  from generate_series(0,16) as s(i)
order by feistel_decrypt(s.i)
did it in a hurry... didn't check

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] "ownership" of sequences, pseudo random unique id

2009-08-20 Thread Ivan Sergio Borgonovo

I've

create table pr(
  code varchar(16) primary key,
  ...
);
create sequence pr_code_seq owned by pr.code; -- uh!

pr.code will *mostly* be obtained as

to_hex(feistel_encrypt(nextval('pr_code')))
and sometimes 'manually' inserting unique codes.

actually stuff like:
alter table pr drop column code;
or just
drop table pr

seems to work as expected (they drop the sequence too).

Should I be concerned of anything since it looks like a hack?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Generating random unique alphanumeric IDs

2009-08-20 Thread Ivan Sergio Borgonovo

On Mon, 17 Aug 2009 12:37:33 +0200
"Daniel Verite"  wrote:

> http://archives.postgresql.org/pgsql-general/2009-07/msg00194.php


As an exercise I wrote the decrypt version

create or replace function feistel_encrypt(value int)
  returns int as
  $$
  declare
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
  begin
l1:= (value >> 16) & 65535;
r1:= value & 65535;
while i<3 loop
  l2:=r1;
  r2:=l1 # 1366.0 *
r1+150889)%714025)/714025.0)*32767)::int;
  l1:=l2;
  r1:=r2;
  i:=i+1;
end loop;
return ((l1::bigint<<16) + r1);
  end;
  $$ language plpgsql strict immutable;
create or replace function feistel_decrypt(value int)
  returns int as
  $$
  declare
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
  begin
l2:= (value >> 16) & 65535;
r2:= value & 65535;
while i<3 loop
  r1=l2;
  l1:=r2#1366.0*l2+150889)%714025)/714025.0)*32767)::int;
  l2:=l1;
  r2:=r1;
  i:=i+1;
end loop;
return ((l2::bigint<<16) + r2);
  end;
  $$ language plpgsql strict immutable;

so that

10 = feistel_decrypt(feistel_encrypt(10))

Since I'm then converting to_hex to shorten the string I was
thinking to add some more bits of randomness since eg.

to_hex(10) =  'a'

In the line of
select lpad(
  to_hex(feistel_encrypt(10)),7 , to_hex((rand()*2^31)::int)
);

I was wondering if there is any better way to get alphanumeric
random string quickly. Since uniqueness is assured by passing a
sequence to fesitel_encrypt, I just need turning into to
alphanumeric quickly.


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] design, plpgsql and sql injection in dynamically generated sql

2009-08-18 Thread Ivan Sergio Borgonovo

On Tue, 18 Aug 2009 12:38:49 +0200
Pavel Stehule  wrote:

> some unsafe function:

I suspected something similar.

I think many would appreciate if you put these examples here
http://www.okbob.blogspot.com/2008/06/execute-using-feature-in-postgresql-84.html
and substitute the int example there with the text one.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] design, plpgsql and sql injection in dynamically generated sql

2009-08-18 Thread Ivan Sergio Borgonovo

On Mon, 17 Aug 2009 12:48:21 +0200
Pavel Stehule  wrote:

> Hello
> 
> I am not sure, if it's possible for you. PostgreSQL 8.4 has EXECUTE
> USING clause, it is 100% safe.

Sorry I don't get it.

How can I use USING safely when the substitution involves a table
name?

The examples I've seen just involve column values.

Where is the corresponding fine manual page?

Still I don't get how USING could make safer plpgsql functions...
well... I'm going to check some prejudices I have on pg functions
firts...

I thought that if you passed eg. text to
create or replace function typetest(a int) returns text as
$$
begin
raise notice 'is this an int? %', a;
-- don't do anything else with a

and calling

select * from typetest('tonno');

was going to raise an error anyway.

So somehow I find the example here
http://okbob.blogspot.com/2008/06/execute-using-feature-in-postgresql-84.html
not really helpful in understanding what's going on.

Maybe an example with text comparing a version using quote_literal
and one using USING could be clearer...

or am I completely missing the point?

far from an optimal solution I've built a "client side" array of
permitted table, key to dynamically build the query on the client
side.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] design, plpgsql and sql injection in dynamically generated sql

2009-08-17 Thread Ivan Sergio Borgonovo

I've several list of items that have to be rendered on a web apps in
the same way.

The structure is:

create table items (
  itemid int primary key,
  /* some fields */
);

create table headert1 (
  ht1 int primary key,
  /* several other fields that varies in nature */
);

create table itemlistt1 (
  ht1 int references headert1 ht1,
  itemid references items (itemid)
);

The query always match this pattern:

select i.fieldA, i.fieldB, ..., from itemlistt1 il
  join items i on i.itemid=il.itemid
  where il.ht1=[somevalue];

the nature of the lists and their usage pattern is very different.
So unless someone come up with a better design I still would like to
keep the item lists in different tables.

I'd like to build up a function that takes the name of the table and
the key to dynamically build up the query... but I don't know what
should I use to sanitize them.

create or replace function getitemlist(listtable text, listkey text,
keyvalue int,
, out ) rerurns setof records as
$$
declare
  statement text;
begin
  statement:='select i.fieldA, i.fieldB, ..., from ' ||
   escapefunc1(listtable) ||
   ' il join items i on i.itemid=il.itemid ' ||
   ' where il.' || escapefunc2(listtable) || '=' || keyvalue;
  return query execute statement; // can I?

is it quote_ident the right candidate for escapefuncN?

But this is still at risk of misuse... (eg. passing ('items',
'fieldA', 1) may return something that was not meant to be seen.

One way would be to build up a table of permitted (table, key) and
then just pass the table and the key value.
What should be the equivalent of quote_ident in PHP?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Generating random unique alphanumeric IDs

2009-08-16 Thread Ivan Sergio Borgonovo

On Sun, 16 Aug 2009 12:48:39 +0100
Sam Mason  wrote:

> On Sun, Aug 16, 2009 at 12:07:27PM +0100, Thom Brown wrote:
> > Does anyone know a way to generate a random and unique lowercase
> > alphanumeric ID
> 
> If you want it to be unique then it's not going to be random.  The
> easiest way to keep it from producing duplicates is to have some
> monotonically increasing component.  If you're OK with code/people
> retrying the occasional duplicate then you're going to be relying
> on statistical guarantees and you should look at "birthday
> attacks" to see how often this is going to happen.
> 
> > Notice that I don't mean hexadecimal values either. This would
> > preferrably not resort to trying to generate the ID, then
> > checking for a clash, and if there is one, do it again, although
> > that could do as I can't think of how the ideal solution of a ID
> > hashing algorithm would be possible.


Sometimes ago Daniel Verite posted an implementation of a fiestel
cipher in plpgsql.

I'm happily using it to generate pseudo-random hex strings.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] acl_admin by Afilias CA (bug/feature report)

2009-08-06 Thread Ivan Sergio Borgonovo

Hi,

I can't find any more the place from where I downloaded acl_admin.
There is a very small problem with chown_all

Since a sequence may be owned by a table... and the function may try
to change the ownership of the seq first the function may abort
earlier.

A quick hack that may works in most situations would be to order the
select looping through relations since generally sequences have
"longer" names than related tables.

I haven't seen any change in that tool and it seems it dates back to
2004. Are there any more modern/complete tools for mass change of
ownership/grant around?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] parameters in functions and overlap with names of columns

2009-08-04 Thread Ivan Sergio Borgonovo

On Tue, 4 Aug 2009 16:01:58 +0200
Pavel Stehule  wrote:

> 2009/8/4 Ivan Sergio Borgonovo :
> > I've
> >
> > create or replace function(...
> >
> > declare
> >  col1 varchar(32);
> > ...
> >
> >  create table pippo(
> >    col1 varchar(32),
> > ...
> >
> > Unfortunately I can't schema specify the column to avoid name
> > overlap.
> >
> > Is there another way other than just simply rename the variable?
> 
> yes - the most common is an using of prefix '_' for local plpgsql
> variables. Other possibility is using qualified names.

Just to be sure... by qualified names you mean schema qualified name
or table qualified names in case of columns... right...

For a second I had the hope there was another way other than having a
col1, a _col1 and a __col1 too ;)

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] parameters in functions and overlap with names of columns

2009-08-04 Thread Ivan Sergio Borgonovo

I've

create or replace function(...

declare
  col1 varchar(32);
...

  create table pippo(
col1 varchar(32),
...

Unfortunately I can't schema specify the column to avoid name
overlap.

Is there another way other than just simply rename the variable?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] MSSQL to PostgreSQL

2009-07-31 Thread Ivan Sergio Borgonovo

On Fri, 31 Jul 2009 13:47:39 +0100
Thom Brown  wrote:

> Hi all,

> We're migrating the contents of an old MSSQL server to PostgreSQL
> 8.3.7, so a full conversion is required.  Does anyone know of any
> guides which highlight common gotchas and other userful
> information?

http://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_PostgreSQL_Migration_by_Ian_Harding

http://edoceo.com/creo/ms2pg

These were the places I read when I had to migrate some MS SQL DB to
Postgresql.

I admit I mostly did it by exporting csv from MS SQL[1],
refactoring the schema and rewriting from scratch functions... but
those were the places where I read the info I needed.

[1] suddenly MS SQL became unable to export proper csv so I remember
I wrote an rudimentary odbc2csv tool as well. Since it was inspired
by an even more rudimentary work of a colleague I'll ask if he can
release it under GPL if you'll find yourself in the need of such a
beast.

You may also find useful FreeTDS
http://www.freetds.org/
http://www.webthatworks.it/d1/page/odbc,_freetds_and_microsoft_sql_(and_php)
and dblink
http://www.postgresonline.com/journal/index.php?/archives/44-Using-DbLink-to-access-other-PostgreSQL-Databases-and-Servers.html
http://www.postgresql.org/docs/8.3/static/dblink.html

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Constraint between 2 tables and taking a coherent snapshot of both

2009-07-26 Thread Ivan Sergio Borgonovo

Suppose I've

create table m1 (
 a int primary key,
 cm1 int,
 cm2 int,
 ...
);

create r1 (
 a int references m1 (a),
 cr1 int,
 cr2 int,
 ...
);

and cm1 possible values depends on some function of cr1 for a given
a.

I actually have a plpgsql function that returns the possible choices
for cm1 for each set of cr1.

a) r1 get filled with some values.
b) The user is presented with the possible choices of cm1.
c) I've to take a snapshot of m1 and r1.

Since the user may change cr1 while I already started c) cm1 may not
be compatible with the new set of cr1.

The user shouldn't (if I coded it right) be able to change cm1 in a
way that is not compatible with cr1.


create or replace function t2c(_a int) returns void as
$$
  insert into m1c select * from m1 where a=_a;
  insert into r1c select * from m1 where a=_a;


t2c should see a snapshot of m1 *and* r1 at a given time.

Of course I'd like to use a system that is as much rollback/lock
free.

One way would be to put the function in a serializable
transaction... but that has its drawback (rollback and retry)

I could even create a

create table m1pr1 (
 a int,
 cm1 int,
 cm2 int,
 cr1 int,
 cr2 int
);

insert into m1pr1 select m1.a, m1.cm1, m1.cm2, r1.cr1, r1.cr2 from
m1 join r1 on m1.a=r1.a;

but I'm not sure what's going to happen and this solution has its
own drawback too (denormalized data).

Actually a serializable transaction doesn't even seem a too bad
solution... but I just would like to understand better how to manage
this situation so that I could make it as simple as possible AND
lower as much as possible the chances that the transaction will have
to be rolled back.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] checking for temp tables information_schema vs. EXCEPTION

2009-07-21 Thread Ivan Sergio Borgonovo

I didn't find a definitive answer on how to check for the existence
of a temporary table.

I did a not scientific test to see if I could see a temp table from
another connection in the information_schema... and I can't.

The schema system is more direct (so cleaner) but it seems to rely on
some "behind the scene trick" I don't fully understand.

I could do a list of insane things like:

begin;
create or replace function tt_test() returns void as
$$
declare
sch varchar(128);
begin
  create temp table pippo (i int);
  select into sch table_schema from information_schema.tables
  where
  table_name='pippo' and
  table_type='LOCAL TEMPORARY';

  execute 'create table ' || sch || '.pippo (i int);'; -- FAIL
  execute 'create schema ' || sch || ';'; -- NOT TESTED
  create table pippo (i int);  -- SUCCEDE

  create temp table zzz as
select * from information_schema.tables
where
table_name='pippo';

  return;
end;
$$ language plpgsql;
select * from tt_test();
commit;
select * from zzz limit 10;

It looks like an invisible search path is added.
How temp schema name are obtained? Is there any place in the manual
that say that pg_temp_.* is a "reserved schema pattern"?

I didn't test but the EXCEPTION method may miss the difference
between the temp table and the permanent table. And schema
qualifying the temp table requires some further extra step.
So EXCEPTION method doesn't look safe.

Does EXCEPTION have some other hidden cost? Just for curiosity.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Documentation Improvement suggestions

2009-07-20 Thread Ivan Sergio Borgonovo

On Mon, 20 Jul 2009 09:34:00 -0400
Robert James  wrote:

> Two small suggestions that might make it easier for newcomers to
> take advantage of the wonderful database:
> 1. Googling Postgres docs produces links for all different
> versions.  This is because incoming links are to different
> versions.  Besides being confusing, it pushes the pages lower in
> Google, and makes it harder to find them.
> Could the website offer a link to the 'current' version, whichever
> it is. Eg instead of just :
> http://www.postgresql.org/docs/8.1/static/creating-cluster.html
> Have:
> http://www.postgresql.org/docs/current/static/creating-cluster.html
> which would keep all incoming links pointed to the current page.

Did you actually tried the link you proposed?
There is a page behind it... ant it actually links to 8.4 docs.
People still need old docs so you can't just make docs for previous
versions unavailable... and once they are available, they will be
indexed.
Unfortunately current version doesn't mean most used.

Actually what I'd enjoy is a link to the "interactive" version and
maybe aggregating all the comments of previous versions in
the new version (indicating from which versions they come from).

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] temp tables and replication/connection sharing/pooling

2009-07-08 Thread Ivan Sergio Borgonovo

I was wondering if I was going to use any of the replication
available (I'm thinking especially to pgpool but I'm also interested
in any other tool as slony,  pgcluster...), should I take into
account any side effect on temp tables?

I'm currently using php and pg_connect for a web application.
I'm not planning to use pg_pconnect.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL and Poker

2009-07-08 Thread Ivan Sergio Borgonovo

On Wed, 8 Jul 2009 13:22:14 -0600
Scott Marlowe  wrote:

> On Wed, Jul 8, 2009 at 12:27 PM, Ivan Sergio
> Borgonovo wrote:
> > On Wed, 8 Jul 2009 19:39:16 +0200
> > "Massa, Harald Armin"  wrote:
> >
> >> a quite interesting read.
> >>
> >> http://www.codingthewheel.com/archives/stranger-than-fiction-story-online-poker-tracker-postgresql
> >
> > There are a couple of comments comment that maybe someone could
> > correct:
> >
> > "The popularity of PostgreSQL as DBMS for handhistories is by no
> > means just a matter of some alleged technological superiority
> > over MySQL. Let's not forget that Pokertracker, Holdem Manager
> > etc is proprietary software, so they really don't have any other
> > choice but to bundle with postgreSQL. If they were to ship their
> > products with MySQL, they would either have to open-source their
> > products according to the GPL, or pay hefty commercial license
> > fees."
> >
> > or
> >
> > "Bogdan's comment is right on the money. There are licensing
> > issues with MySQL. MySQL commercial licenses are contracts with
> > Sun. Not cheap. It had to be PostgreSQL."
> >
> > I understand the license differences (and for my taste I prefer
> > GPL over BSD) but the above affirmations seems to imply pg
> > couldn't stand up just on its technical merits.
> >
> > I don't think this is the case.

> Exactly, it could have been interbase / firebird, sqllite,
> berkelydb, and a couple other choices that are free.  MySQL's
> licensing just took them out of the running right at the start.

You can actually build up closed source software with MySQL as a
server, it depends on how you do it.
Aren't there any DB with LGPL library license?

Still the above statement sounds too much as: pg wasn't chosen for
it's technical merits but for the license.

I don't think their only option was pg for licensing reasons.
Or am I misunderstanding what you wrote? or... am I plainly wrong?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL and Poker

2009-07-08 Thread Ivan Sergio Borgonovo

On Wed, 8 Jul 2009 19:39:16 +0200
"Massa, Harald Armin"  wrote:

> a quite interesting read.
> 
> http://www.codingthewheel.com/archives/stranger-than-fiction-story-online-poker-tracker-postgresql

There are a couple of comments comment that maybe someone could
correct:

"The popularity of PostgreSQL as DBMS for handhistories is by no
means just a matter of some alleged technological superiority over
MySQL. Let's not forget that Pokertracker, Holdem Manager etc is
proprietary software, so they really don't have any other choice but
to bundle with postgreSQL. If they were to ship their products with
MySQL, they would either have to open-source their products
according to the GPL, or pay hefty commercial license fees."

or

"Bogdan's comment is right on the money. There are licensing issues
with MySQL. MySQL commercial licenses are contracts with Sun. Not
cheap. It had to be PostgreSQL."

I understand the license differences (and for my taste I prefer GPL
over BSD) but the above affirmations seems to imply pg couldn't
stand up just on its technical merits.

I don't think this is the case.

Someone more knowledgeable about licenses and with a better English
than mine should correct the comments.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Feistel cipher, shorter string and hex to int

2009-07-07 Thread Ivan Sergio Borgonovo

On Tue, 07 Jul 2009 12:07:48 +0200
"Daniel Verite"  wrote:

>   Ivan Sergio Borgonovo wrote:
> 
> > r2:=l1 # 1366.0*r1+150889)%714025)/714025.0)*32767)::int;
> > -- but what about this? where does it come from?
> 
> This function:
> (1366.0*r1+150889)%714025
> implements a known method to get random numbers. I think it comes
> from "Numerical recipes" by William Press.
> Note that the algorithm is not tied to that function, it could be 
> replaced by something else (especially one that involves a private 
> key), but it has to be carefully chosen or the end result won't
> look so random.

I don't get the 1366.0 and the 714025.0.
Writing 1366.0 isn't going to use float arithmetic?
Is it there just to avoid an overflow?
I'm going to see if using bigint is going to make any difference in
speed.

Finally... if I were (and I'm not) interested in using 30 bit,
should I turn that *32767 into a *16383?
For shift and bit mask it looks more obvious.
Do you remember the name of this particular F?

Since I don't see anything other than to_hex that could "shorten" an
int to a string easily and quickly... it seems that returning a
signed integer is OK.

Everything else seems to need more processing at no real added value.
Turning the int into base 32 [0-9A-N] with plpgsql looks expensive
just to shorten the string to 4 char.

Thanks.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] combine multiple row values in to one row

2009-07-06 Thread Ivan Sergio Borgonovo

On Tue, 7 Jul 2009 01:59:35 +0430
Lee Harr  wrote:

> 
> Hi;
> 
> I'm looking for a way to do this:

> # select idn, magic() as codes FROM tbl;
>  idn | codes
> -+--
>1 | A
>2 | B, C
>3 | A, C, E
> (3 rows)
> 
> 
> Right now, I use plpgsql functions, but each time I do it
> I have to rewrite the function to customize it.
> 
> Is there a generic way to do this? An aggregate maybe?

array_accum
http://www.postgresql.org/docs/8.2/static/xaggr.html
?


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Feistel cipher, shorter string and hex to int

2009-07-06 Thread Ivan Sergio Borgonovo

On Sat, 02 May 2009 11:26:28 +0200
"Daniel Verite"  wrote:

> Note that it returns a bigint because we don't have unsigned
> integers in PG. If you're OK with getting negative values, the
> return type can be changed to int.
> Otherwise if you need a positive result that fits in 32 bits, it's 
> possible to tweak the code to use 15 bits blocks instead of 16,
> but then the input will have to be less than 2^30.

I need shorter values (because they should be easier to type.
To be sure to modify the function in a sensible way I really would
appreciate some pointer.
Still if it return 

To further shrink the length of the result I was planning to to_hex
(actually it would be nice to have a fast to_35base [0-9a-z])... but
I wasn't able to find a way to convert back an hex string to an int.
x'fff' seems to work just for literals.


CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns
bigint AS $$
DECLARE
 l1 int;
 l2 int;
 r1 int;
 r2 int;
 i int:=0;
BEGIN
  l1:= (value >> 16) & 65535;
-- modifying here seems trivial
  r1:= value&65535;
--  l1:= (value >> 15) & B'111'::int;
--  r1:= value & B'111'::int;
  WHILE i<3 LOOP
l2:=r1;
r2:=l1 # 1366.0*r1+150889)%714025)/714025.0)*32767)::int;
-- but what about this? where does it come from?
/*
   r2:=l1 #
   1366.0*r1+150889)%714025)/714025.0)*B'11'::int)::int;
*/ -- ??
   l1:=l2; r1:=r2; i:=i+1;
  END LOOP;
  return ((l1::bigint<<16) + r1);
-- modifying here seems trivial
END;
$$ LANGUAGE plpgsql strict immutable;


Anything else to suggest or copy from?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Replication and coding good practices

2009-06-29 Thread Ivan Sergio Borgonovo

On Mon, 29 Jun 2009 19:11:43 +0800
Craig Ringer  wrote:

> On Sun, 2009-06-28 at 09:01 -0700, David Fetter wrote:
> > > Are there any rules of thumb to consider for making an
> > > application easier to work with a "general" replication
> > > solution?
> > > 
> > > The applications I mostly deal with are e-commerce sites.
> > 
> > It really depends on what replication solution you choose, along
> > with the environment you're deploying into.
> 
> ... and why you need replication. Reliability/Availability? Data
> storage redundancy? Performance? And if performance, read-mostly
> performance or write-heavy performance?

1) performance, read-mostly
2) reliability
I'm making large use of plpgsql mainly for:
- encapsulation
- single point of truth
- implicit transaction

Most of the write operations don't have to be aware of a multi
user environment.
Still in some part of the code things have to be aware of
transactions, multi user environment (or better multiple connections
from the same user) etc...

Not only these parts are rare, they are seldom executed too.
So my main concern about the parts that may be problematic in a
replicated context is to keep maintenance low and development easy.

eg. I've started to use temp tables but I guess they may cause some
problems in conjunction with connection pooling systems.

> > That said, I've noticed that the things that are generally good
> > practice help you even more when you're doing replication.
> > 
> > Practices I've seen help directly:
> > 
> > * Separate read users and code from write users and code.
> > 
> > * Separate DDL from both of the above.
> > 
> > * Make DDL changes part of your deployment process and only
> > allow them in files which track in your SCM system.

> Version your schema, storing the schema version in a 1-row table
> or even as a stable function. This makes it much easier for
> deployment tools or staff to easily see what needs to be done to
> get the schema and app to the latest version - there's no "what
> the hell is the current state of this thing, anyway?" to worry
> about.

This is another area I'd like to learn more about available
techniques for managing development.
But currently I was more interested in coding techniques to avoid
maintenance/porting problems once I'll have to support a replication
solution.
At the moment schema changes are saved in a file together with the
web application code.
I was thinking to automate the application of schema changes with a
hook in svn, but right now it doesn't look as a good investment.

Still I'm very interested in techniques to version schema changes
and bring them together with code change and being able to diff them.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Replication and coding good practices

2009-06-28 Thread Ivan Sergio Borgonovo

I'm starting to see the need of a replication solution in my horizon.

I've learned that some replication solution require code
modification, some require different assumption abut how connections
behave...

Are there any rules of thumb to consider for making an application
easier to work with a "general" replication solution?

The applications I mostly deal with are e-commerce sites.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] drawback of array vs join

2009-06-23 Thread Ivan Sergio Borgonovo

On Tue, 23 Jun 2009 17:39:46 -0300
Emanuel Calvo Franco  wrote:

> 2009/6/23 Ivan Sergio Borgonovo :
> > I've a temporary table where I'd like to resume data coming from
> > several other tables (normalised the usual way).
> > eg.
> > a list of items that may belong to several group (eg. item,
> > group, itemgroup table) will end up in something similar to:
> >
> > create temp table itemlisttemp(
> >  lid int,
> >  iid int,
> >  qty int,
> >  famid int,
> >  rating int,
> >  itemgroups int[],
> > );
> >
> > This because I've to loop over the same table several times and
> > because it *should* make easier to write dynamic queries [built
> > from a web interface] like
> >
> > select sum(qty) from where
> >  famid=3
> >  and rating>0
> >  and 10 = any itemgroups
> > ...
> >
> > itemlisttemp tables will be very small (30 as an upper limit) and
> > arrays as well will contain very few elements (4 as an upper
> > limit).

> Which improvement do you want to obtain with the array way?
> If the table is very small, the improvement in terms of performance
> will be negligible.

I'll have to compute different kind of aggregates over itemlisttemp
and update it as well, but while itemlisttemp will be small
the tables from which it is derived aren't.

Keeping the join will make appear as many row as "groups", that will
make harder to understand and build the queries that will be used to
build up the aggregates.

Furthermore if I skip the condition on group, I'll have to skip
the corresponding join otherwise I'll have a row for each group the
item belongs to.

select sum(l.qty) from list l
  join groupitem gi on l.iid=gi.iid
  where
l.famid=3
and rating>0
and gi.gid=10
;

select sum(l.qty) from list l
  join groupitem gi on l.iid=gi.iid
  where
l.famid=3
and rating>0
; -- oops this is wrong

compared to:

select sum(qty) from where
 famid=3
 and rating>0
 and 10 = any itemgroups
;
select sum(qty) from where
 famid=3
 and rating>0
;

I'm still collecting what I really need in itemlisttemp but I'm
expecting no more than 3-4 arrays in itemlisttemp for a very long
time.
Since I've to fill the arrays just one time... and compute
aggregates with different conditions and update itemlisttemp I'm
doing it mostly for readability and ease of building the statements
dynamically.

I was wondering if I could run in any other drawback other than not
doing it the halal relational way.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] drawback of array vs join

2009-06-23 Thread Ivan Sergio Borgonovo

I've a temporary table where I'd like to resume data coming from
several other tables (normalised the usual way).
eg.
a list of items that may belong to several group (eg. item,
group, itemgroup table) will end up in something similar to:

create temp table itemlisttemp(
  lid int,
  iid int,
  qty int,
  famid int,
  rating int,
  itemgroups int[],
);

This because I've to loop over the same table several times and
because it *should* make easier to write dynamic queries [built from
a web interface] like

select sum(qty) from where
  famid=3
  and rating>0
  and 10 = any itemgroups
...

itemlisttemp tables will be very small (30 as an upper limit) and
arrays as well will contain very few elements (4 as an upper limit).

I'm aware of the drawback of de-normalization.

Should I be aware of anything else when using arrays this way?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] temp tables, sessions, pgpool and disk

2009-06-22 Thread Ivan Sergio Borgonovo

On Mon, 22 Jun 2009 11:40:08 +0200
Martijn van Oosterhout  wrote:

> On Mon, Jun 22, 2009 at 09:53:59AM +0200, Ivan Sergio Borgonovo
> wrote:
> > > The OS knows much less about what anonymous memory (memory not
> > > backed by a file) "means" to a program and can't be as clever
> > > with it. Swapping tends to be _much_ more CPU expensive than
> > > writing
> > 
> > But issuing a write to disk Postgresql doesn't actually say
> > anything more about what it is placing on the disk and how it is
> > going to access it... and it is actually adding overhead to move
> > it back and forward, no matter if this overhead happens on RAM
> > or disk. Actually since temp table are private to the connection
> > they should (?) be private to a postgresql process, so the OS
> > should be able to do a good job.
> > I don't see any atomicity constraint, so... if something fail
> > while writing to RAM, as you said you shouldn't need a WAL.
> 
> For the record, temp tables are in fact handled differently, in
> particular they are not stored in the shared_buffers, but instead
> are in backend local (private) buffers, whose size is controlled by
> temp_buffers. They are indeed not WAL archived, nor written to disk
> unless needed.

> So yes, small temp tables will likely stay in memory, but large
> temp tables may spill to disk. There's no flushing or syncing so
> quite likely they'll end up in the OS disk cache for a while. Once
> the temp table is deleted, the file is deleted and the OS throws
> that data away. So temp tables most likely won't use any disk I/O,
> but they *can* if the need arises.

Just to make it extra-clear to people unaware of pg internals...
since the second paragraph may seems to contradict the first one...

could be "nor written to disk unless needed" rephrased as:
even repeated UPDATE/INSERT won't issue writes (no matter if they end
up on disk or not, it won't issue writes to the OS) if the table fit
the buffer?

I see the default is somehow "large" (8M) and it is not pre
allocated. Looks nice.

> Have a nice day,

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] temp tables, sessions, pgpool and disk

2009-06-22 Thread Ivan Sergio Borgonovo

On Mon, 22 Jun 2009 07:26:56 +0800
Craig Ringer  wrote:

> > http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
> > "If specified, the table is created as a temporary table.
> > Temporary tables are automatically dropped at the end of a
> > session"
> > 
> > I'd interpret it as a connection.
> 
> Correctly.
> 
> > I don't even know if it is possible to send more than one command
> > over a single connection and wait for the results asynchronously.
> > Any clarification?
> 
> To an extent cursors provide that ability. The "result" is returned
> quite promptly, but it's a placeholder that allows you to retrieve
> the real results progressively as you need them. Whether the
> database generates the results immediately and stores them to
> return later, or whether it generates them on demand, isn't
> something you can easily tell it's up to the database.

http://www.postgresql.org/docs/8.3/interactive/libpq-async.html
"PQsendQuery cannot be called again (on the same connection) until
PQgetResult has returned a null pointer, indicating that the command
is done."

Asynchronous calls can't be made to parallelize postgres queries on
the same "session", but just to parallelize client and server work.

So a temp table seems as private as I need it.

I wonder what will happen if I put something like pgpool between
postgresql and a web app.

> > So postgresql actually issues writes to disk and delegate to the
> > OS management of the cache/actual write on disk.

> Yes.

> > I thought it could just try to hold them in RAM and still
> > delegate to the OS to save them on disk in swap if the system is
> > short on RAM.

> For a variety of reasons, you REALLY don't want it to work that
> way.

mmm... first sorry for the noise...
Interpret the following as reality checks.
I'm perfectly aware building up a DB is not easy, and I'm not
pretending I know how to write one. ;)

> OS memory managers tend to be _much_ better and faster at managing
> pages that're backed by a file. They'll write dirty data out
> pre-emptively so that execution doesn't stall when memory runs
> low; they write data to the file in order for best disk
> performance; they efficiently buffer and read-ahead when pulling
> the data back in, etc.

> The OS knows much less about what anonymous memory (memory not
> backed by a file) "means" to a program and can't be as clever with
> it. Swapping tends to be _much_ more CPU expensive than writing

But issuing a write to disk Postgresql doesn't actually say anything
more about what it is placing on the disk and how it is going to
access it... and it is actually adding overhead to move it back and
forward, no matter if this overhead happens on RAM or disk.
Actually since temp table are private to the connection they should
(?) be private to a postgresql process, so the OS should be able to
do a good job.
I don't see any atomicity constraint, so... if something fail while
writing to RAM, as you said you shouldn't need a WAL.

> dirty buffers to a file. It's a lot more expensive to retrieve
> from disk, too, and usually involves lots of seeks for quite
> scattered pages instead of nice block readahead.

Once you're running out of memory I see no guaranty your file will
end up in a fast easily accessible area of your disk... and you're
going to add the overhead associated with a file system
(journalling, permissions/ownership, locks) swap is volatile... and
it should offers the guaranty you "need" for a temp table.

> The OS knows much better than PostgreSQL does when the table will
> fit in RAM and when it needs to spill to disk, and it's much
> better at managing that than Pg can ever be. It's great that Pg
> just uses the OS's hardware knowledge, system-wide awareness, and
> highly optimised memory manager + disk IO management to take care
> of the problem.

The same should be true for virtual memory, not just file management
and postgresql has a configuration file that should give a clue to
the DB about the expected workload and hardware.
Surely postgresql can't forecast how many and how large the temp
tables for a single connection will be... but substantially I got
the idea that a connection is somehow serial in its execution and
that storage could be garbage collected or just released early
(drop table, on commit drop).
This looks as it is taking temp tables very far from the standard.
And yeah... once you want to do memory management/resource management
inside SQL you've opened the doors of Hell.
But well 

For what I could see about SQL99 the definition of temp table is very
terse... and a bit confusing (at least for me) about global and
local.
I gave a quick look at what's available on MS SQL... and they have
an sort of "in memory temp table" but you can't modify its schema.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] cursors, temp tables, dynamic sql technique (was and is: looping over a small record set over and over in a function)

2009-06-21 Thread Ivan Sergio Borgonovo

On Sun, 21 Jun 2009 21:43:16 +0800
Craig Ringer  wrote:

> On Sun, 2009-06-21 at 14:57 +0200, Ivan Sergio Borgonovo wrote:
> 
> > I think everything could be summed up as:
> > 
> > select into t myaggregate1(field) from dataset where condition1;
> > if(t>10) then
> >   update dataset set field=myfunc1(a,b,c) where condition1;
> > end if;
> > 
> > select into t myaggregate2(field) from dataset where condition2;
> > if(t>44) then
> >   update dataset set field=myfunc2(a,b,c) where condition2;
> > end if;
> 
> That's really too simplified to see what you're actually doing.

I'm still checking if real case go far beyond to the above or I've
some more general case.

> I've found that in the _vast_ majority of non-trigger cases where
> I've considered using PL/PgSQL, a bit more thought and proper
> consideration of the use of generate_series, subqueries in FROM,
> join types, etc has allowed me to find a way to do it in SQL. It's
> almost always faster, cleaner, and nicer if I do find a way to
> express it in SQL, too.

The stuff is really serial in its nature and can't be made parallel.

> > I think I really don't have a clear picture of how temp tables
> > really work.
> > They can be seen by concurrent transactions in the same session.

> Eh? In this context, what do you mean by "session"? Did you mean
> consecutive rather than concurrent, ie:

http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
"If specified, the table is created as a temporary table. Temporary
tables are automatically dropped at the end of a session"

I'd interpret it as a connection.
I don't even know if it is possible to send more than one command
over a single connection and wait for the results asynchronously.
Any clarification?

If a session is a connection and it's not possible to send more than
one command and wait for the results asyncronously... or the server
doesn't run in parallel several command sent across the same
connection... than a temp table looks like what I need.

> BEGIN;
> CREATE TEMPORARY TABLE x (...);
> -- do other work
> COMMIT;
> BEGIN;
> -- Can see `x' from here
> COMMIT;
> 
> ?

I was planning to use ON COMMIT DROP

> Normally, you couldn't see another sessions temp tables, even after
> commit, unless you explicitly schema-qualified their names - eg
> 'pg_temp_4.x' . If even then; I haven't actually checked.

> Two concurrent sessions that issue 'CREATE TEMPORARY TABLE x(...)'
> get two DIFFERENT tables, both named `x', in different pg_temp
> schema, eg 'pg_temp_2.x' and 'pg_temp_3.x' .

That's perfect, and what I really need.

> > But if the transaction in which a temp table is created is not
> > committed yet, other transactions won't see it.

> Of course, since PostgreSQL doesn't support the READ UNCOMMITTED
> isolation level.

> > So it may actually look more as a temporary, private storage
> > that doesn't have to be aware of concurrency.

> > So temp tables should require less disk IO. Is it?

> The big thing is that they're private to a session, so different
> sessions can concurrently be doing things with temp tables by the
> same name without treading on each others' feet.

> Because they're limited to the lifetime of the session, though,
> PostgreSQL doesn't need to care about ensuring that they're
> consistent in the case of a backend crash, unexpected server
> reset, etc. Tom Lane recently pointed out that as a result writes
> don't need to go through the WAL, so my understanding is that
> you're avoiding the doubled-up disk I/O from that. They also don't
> need to be written with O_SYNC or fsync()ed since we don't care if
> writes make it to the table in order.

> As a result I'm pretty sure temp tables don't ever have to hit the
> disk. If the OS has enough write-cache space it can just store
> them in RAM from creation to deletion.

So postgresql actually issues writes to disk and delegate to the OS
management of the cache/actual write on disk.
I thought it could just try to hold them in RAM and still delegate
to the OS to save them on disk in swap if the system is short on RAM.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] cursors, temp tables, dynamic sql technique (was and is: looping over a small record set over and over in a function)

2009-06-21 Thread Ivan Sergio Borgonovo

On Sun, 21 Jun 2009 10:57:51 +0800
Craig Ringer  wrote:

> On Fri, 2009-06-19 at 20:23 +0200, Ivan Sergio Borgonovo wrote:
> 
> > If I could easily load all the dataset into an array, loop
> > through it and then just update the computed field it would be
> > nice... but how?
> 
> Are you sure you _really_ need a PL/PgSQL function for this?

Not really sure. I'm investigating.

But well I noticed that even working with a fully fledged procedural
language I'll have to loop 2 times to replace/add to the "original"
field the computed one, so it's surely not a big loss if I "loop" 2
times with SQL.

I think everything could be summed up as:

select into t myaggregate1(field) from dataset where condition1;
if(t>10) then
  update dataset set field=myfunc1(a,b,c) where condition1;
end if;

select into t myaggregate2(field) from dataset where condition2;
if(t>44) then
  update dataset set field=myfunc2(a,b,c) where condition2;
end if;

Actually there is a trick I could use to "skip" the update loop and
pass the result to the next loop but it looks convenient in a more
general case if the aggregate should be computed on the "updated"
value. I'm still not sure if the class of function I'm working with
are always of the above class where myaggregate works on the
"original" field.

Of course myaggregate could compute myfunc... but then I'll have
to compute myfunc twice.

In a loop I could

create temp table t as
  select *, field as field1, field as field2, field as result
  from t1
  join t2 on...
  join t3 on...
  where ...
  on commit drop;

flipflop='field1';
flopflip='field2';

foreach(row) {
  if(condition on row) {
flopflip=myfunc(row[flipflop]);
  } else {
flopflip=row[flipflop];
  }
  agg+=flopflip;
}
if(condition on agg) {
  switch flipflop
}

foreach(row) {
  if(condition on row) {
flopflip=myfunc(row[flipflop]);
  } else {
flopflip=row[flipflop];
  }
  agg+=flopflip;
}
if(condition on agg) {
  switch flipflop
}

execute 'update t set result=' || flipflop;

I think this could be obtained using cursors.
This is going to be much more verbose since to use the "flipflop"
technique I'll have to use dynamic statements and EXECUTE.

EXECUTE
 'UPDATE t SET' || flipflop || '=' || newfield ||
   'WHERE CURRENT OF cursor';
even "condition on row" should be expressed as a dynamic statement
if condition involve the computed field.

Compared to the select + update solution it is going to avoid
looping and checking the condition twice but it is going to work on
a larger dataset and run the update even for unchanged row (maybe
the else condition could be omitted??).
I'm not sure that actual set of functions have the update condition
equal to the select condition anyway.
But I think wasting updates has a larger cost if they require disk
IO.

I think stuff may look more "convenient" if there was a simple and
cheap way to load a record set into an array, do stuff with the
procedural language and substitute the old record set with
the computed one.

Currently I'll get most of the speed up from looping through a
smaller data set avoiding to join over a very large table.
Considering the very small data set I'm expecting (no more than 20
record) I don't think dynamic statement, lack of indexes etc... are
going to have an impact.

As soon as clients will increase, optimizing the loops may be
necessary.
I'll have to see if all the cases I'm dealing with could be solved
by the SELECT + conditional UPDATE technique since it looks much
easier to maintain.
Furthermore I suspect that for my real case the conditional check
will be made on the original value so that myfunction will be
computed at most once and I can delay an aggregate computation on
the computed field after the last loop has been executed.
The remaining costs will be:
- looping twice on the same table (getting the aggregate + updating
  the table)
- updating if it causes disk IO

I even suspect that the test will mostly fail so that updates will
be rare but I still would like to understand how this could work in
the most general case and how the temp table + cursor + flipflop
technique is going to work internally.

I think I really don't have a clear picture of how temp tables
really work.
They can be seen by concurrent transactions in the same session.
But if the transaction in which a temp table is created is not
committed yet, other transactions won't see it.
If the transaction where the temp table is created is aborted... no
other transaction will ever know about the existence of the temp
table.

If the temp table is defined as DROP ON COMMIT... somehow there is
no need to make the changes happening on the temp table land on
disk. So it may actually look more as a temporary, private storage
that

Re: [GENERAL] looping over a small record set over and over in a function

2009-06-20 Thread Ivan Sergio Borgonovo

On Sat, 20 Jun 2009 12:30:42 +0200
Alban Hertroys  wrote:

> You could add a column to your query as a placeholder for the
> computed value.
> For example, SELECT *, 0 AS computed_value FROM table.

> If you use a scrollable cursor (possible in PL/pgSQL these days,  
> although it still has some limitations) you could just loop
> through its results and rewind the cursor until you're done.

> It does look like you're implementing some kind of aggregate
> function though. Maybe you can implement it as an actual aggregate
> function, maybe in combination with the windowing stuff in 8.4?
> Then you could just add the aggregate to your query instead of
> using your query results in a function. That's about the inverse
> of what you're attempting now.
> You seem to be implementing something that would look like SELECT  
> sum(SELECT * FROM table), while you may be better off aiming for  
> SELECT SUM(value) FROM table.
> 
> Considering you want to loop over the results multiple times that
> may not be possible; only you can tell.

> > Is it really worth to load the whole record set in an array, loop
> > over etc... in spite of eg. building a temp table with the same
> > structure of the input record set + 1 field, loop over the table
> > etc... what about all the UPDATEs involved to change field N+1 of
> > the temp table? Will be they expensive?
> 
> Neither of those look like a convenient approach. With the array
> you lose all the meta-data the record set provided (types, column
> names, etc), while the temp table approach looks like it will be
> kind of inefficient.

> Is the result of that function volatile or can you maybe store it  
> after computing it once and fire a trigger to update the computed  
> value if the underlying data changes?

The function is volatile.
What I was thinking about was to fire a trigger to wipe the temp
table if the table on which the computation is made is changed.

Considering I can't make the loop run in parallel aggregates may be
a way to go... but somehow they look as they are making the
implementation a bit hard to manage. Anyway it still have to be seen
if I could actually implement the loops with aggregates since every
loop has 2 "side effects" compute some aggregates for the whole
record set and compute an extra field for each row.

> I wonder why you need to re-iterate over your result set multiple  
> times? It's kind of rare that once isn't enough.

> And lastly, if your current approach really is the only way to
> compute what you're after, then maybe PL/pgSQL isn't the right
> match for the problem; it looks like you'd be better served by a

Yeah. I gave a look to python but I don't want to add one more
language to the mix.
I enjoy strict type checking of plpgsql even if some bit of
syntactic sugar would help to make it more pleasing and I think it
is the most lightweight among the offer.
Still I don't know how easy it is with eg. python to load an array
with a result set, change it and place it back into the table where
it was coming from.

> language that can work with arrays of typed structures. As I'm not
> familiar with the other PL languages I can't tell whether they
> would be suitable in that respect, but I suspect Python or Java
> would be able to handle this better.

Your suggestion about cursor could be the way... but I don't know
enough about cursors internals to understand if updating a field of
a cursor will cause disk writes.

Currently my main concern is making this things readable and
extensible. I'm not planning to optimise yet. The result set on which
the computations are made is pretty small. It is just taken out from
a large one. But I don't want to cut my way to optimisation.

So one way could be:

select into a temp table the record set.

Build up a trigger that will wipe the temp table if the tables on
which the record set is built changes. This may even not be
necessary, since everything related to that record set is going to
happen in one session.

loop several times over the temp table

Since every loop should actually correspond to a function... I may
wonder if I could build up the cursor and pass it along to several
functions.
Every function will MOVE FIRST, update the one row of the record
set and return some computed fields that will be used by the next
function etc...

If this is possible, this way have the advantage to be extensible.
I wonder if it is efficient since I don't know if an
UPDATE table set where current of cursor
on a temp table is going to incur in any disk write.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] looping over a small record set over and over in a function

2009-06-19 Thread Ivan Sergio Borgonovo


I've a record set on which I have to loop several times.
The function should return the same record set + one more computed
field.
Something that in could look like:

foreach(row) {
  // compute stuff
  if(...) {
  }
  // place stuff in field[N+1] of the row
}
if(some condition) {
  //
}
foreach(row) {
  // compute stuff
  if(...) {
  }
  // place stuff in a field[N+1] of the row
}
if(some condition) {
  //
}
...

actually return row + computed field.

in pgplsql

where each loop depends on the result of the previous.
The dataset is very small.


If I could easily load all the dataset into an array, loop through
it and then just update the computed field it would be nice... but
how?
Is it really worth to load the whole record set in an array, loop
over etc... in spite of eg. building a temp table with the same
structure of the input record set + 1 field, loop over the table
etc... what about all the UPDATEs involved to change field N+1 of
the temp table? Will be they expensive?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] simulating high load for vacuum full

2009-06-17 Thread Ivan Sergio Borgonovo

I'm trying to diagnose a problem that happened during vacuum full.

It is a programming problem triggered by some lock, delay whatever,
happening during vacuum.

Making large updates to a bunch of tables is a PITA just to obtain a
slow VACUUM FULL.

Restoring a "fragmented" DB doesn't look as a working strategy.
The restore shouldn't be fragmented.

What are the "side effects" of a vacuum full?
Any cheaper way to cause a heavy vacuum full or just its side
effects?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] why dropping a trigger may cause a deadlock

2009-06-08 Thread Ivan Sergio Borgonovo

On Fri, 05 Jun 2009 17:35:19 -0400
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > I don't get it.
> > Why dropping the triggers would cause a deadlock anyway?
> 
> > I bet it is due to my naÃ¯ve view of the problem but I think a
> > trigger is a "function". Unless there is concurrent access to the
> > table where the function is defined... I can't see why dropping
> > the "functions" serially should cause a lock.
> 
> They're not just functions, they are part of the DDL for a table.
> Adding or removing a trigger on a table requires exclusive lock
> on that table, otherwise you can't be sure what will happen in
> concurrent transactions that might (or might not) be supposed to
> fire the trigger.

I'm still wondering why there was anything else requiring a lock on
those tables.

Referring to the previous example

create table b(
  c3id int primary key,
  c3 text
);

create table a(
  pid int primary key,
  fti tsvector,
  c1 text,
  c2 text,
  c3id int reference b(c3)
  c4 int; -- not used to build up fti
);

there is a very small chance that while I was dropping the triggers
something like an

update a set c4=37 where pid=12;

was running when I dropped the trigger.

But I can't see how this should require a lock.. and still well...
the chances the update statement happened during trigger drop are
very very negligible.

And... still I'm quite surprised that even that update happening
when I was dropping the trigger resulted in a deadlock.

Everything happening on table a and b that involves writes already
happened in the same transaction dropping the triggers or is read
only.

Should I look into anything else to get a clue about what happened
and try to avoid it?

Thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] why dropping a trigger may cause a deadlock

2009-06-05 Thread Ivan Sergio Borgonovo

On Fri, 05 Jun 2009 10:46:11 -0400
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > I've encountered this error for the first time
> > psql:./import_stock_scratch.sql:9: ERROR:  deadlock detected
> > DETAIL:  Process 11095 waits for AccessExclusiveLock on relation
> > 250545 of database 248569; blocked by process 11099. Process
> > 11099 waits for AccessShareLock on relation 250510 of database
> > 248569; blocked by process 11095.
> > CONTEXT:  SQL statement "drop trigger if exists
> > FT1IDX_catalog_items_update_trigger on catalog_items" PL/pgSQL
> > function "ft1idx_trigger_drop" line 3 at SQL statement
> 
> > The function just drop 2 triggers that update a tsvector that is
> > gist indexed.
> 
> Are the triggers on two different tables?  It looks like you're

You're right. I forgot the exact schema.
I've a table that contains most of the fields that end up in the
tsvector and a reference to another table that contain a column that
end up in the tsvector.

create table a(
  fti tsvector,
  c1 text,
  c2 text,
  c3id int reference b(c3)
);
create table b(
  c3id int primary key,
  c3 text
);

One trigger is fired when c1, c2, c3id are modified.
The other is fired when c3 is modified.
Both trigger rebuild the tsvector that is obtained concatenating c1,
c2, c3

> probably trying to acquire exclusive lock on two tables, and
> deadlocking against some other process that gets a lesser lock
> on the same tables but in the other order.

I don't get it.
Why dropping the triggers would cause a deadlock anyway?

I bet it is due to my naïve view of the problem but I think a
trigger is a "function". Unless there is concurrent access to the
table where the function is defined... I can't see why dropping the
"functions" serially should cause a lock.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] why dropping a trigger may cause a deadlock

2009-06-05 Thread Ivan Sergio Borgonovo

I've encountered this error for the first time

psql:./import_stock_scratch.sql:9: ERROR:  deadlock detected
DETAIL:  Process 11095 waits for AccessExclusiveLock on relation
250545 of database 248569; blocked by process 11099. Process 11099
waits for AccessShareLock on relation 250510 of database 248569;
blocked by process 11095.
CONTEXT:  SQL statement "drop trigger if exists
FT1IDX_catalog_items_update_trigger on catalog_items" PL/pgSQL
function "ft1idx_trigger_drop" line 3 at SQL statement

The function just drop 2 triggers that update a tsvector that is
gist indexed.

Before running import_stock_scratch.sql I'm making an update to the
columns that are then "aggregated" in the tsvector.

All scripts are wrapped in transactions and are run serially.

What's happening? How to prevent it?


I'd expect that previous scripts don't interfere with the deadlocked
one and at that moment the write activity on the table on which the
triggers are acting is minimal if not absent.

But I suspect my understanding of how these things work is very
naive... so some general clue would be appreciated as well.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] optimize/cleanup SQL

2009-05-29 Thread Ivan Sergio Borgonovo

On Fri, 29 May 2009 08:13:32 -0500 (CDT)
Brandon Metcalf  wrote:

> For some reason this doesn't give me satisfaction that it's written
> optimally, but I haven't found another way.
> 
>   SELECT round(CAST ((EXTRACT(EPOCH FROM clockout)
>   -EXTRACT(EPOCH FROM clockin))/3600 AS NUMERIC),2) AS
> hours FROM timeclock;

satisfying?

template1=# select extract(days from ('2009-01-01'::timestamp -
'2008-01-01'::timestamp))+5::int;

?column?
------
  371
(1 row)


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Excel and postgresql functions

2009-05-26 Thread Ivan Sergio Borgonovo

On Tue, 26 May 2009 11:41:50 +0200
Ivan Sergio Borgonovo  wrote:

> I've built some functions just for encapsulation reasons and
> avoiding to pass the same parameter over and over.
> 
> I'd like to access the result from Excel but it seems (and I'm not
> pretty sure it is the definitive answer) excel can suck data just
> coming from views and tables.
> 
> - can someone confirm Excel/Access 2003 can't return result sets
>   coming from functions?
> - if Excel/Access can't return results from a function is there a
> way to masquerade the function (accepting parameters) behind a
>   table/view.

For all the people forced to fight with a software that has to be
tricked and not instructed...

The fault is Excel.

- Data -> Import External Data -> New Database Query
- Choose the odbc connection you created for Postgresql
- Don't add any query.
- Edit directly "SQL"
- save an easily "greppable" query (eg. select 'ermenegildo';)
- Ignore complaint.
- Close Microsoft Query.
- Say OK to "Import data" (just select a good "top left corner for
  your table)

- Tools -> Macro -> Visual Basic Script Editor
- search your query and change it with any valid SQL, add ? if you
  need parameters taken from cells
- save
- you'll be asked which cell contain the parameter (I didn't try to
  use more than one parameter)

Now you can have an arbitrary query returned in an Excel sheet.

So yes... PostgreSQL can happily work with Excel.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Excel and postgresql functions

2009-05-26 Thread Ivan Sergio Borgonovo

I've built some functions just for encapsulation reasons and
avoiding to pass the same parameter over and over.

I'd like to access the result from Excel but it seems (and I'm not
pretty sure it is the definitive answer) excel can suck data just
coming from views and tables.

- can someone confirm Excel/Access 2003 can't return result sets
  coming from functions?
- if Excel/Access can't return results from a function is there a way
  to masquerade the function (accepting parameters) behind a
  table/view.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] psql Windows and variables

2009-05-23 Thread Ivan Sergio Borgonovo

I need to loop from DOS [sic] on a set of files to get them imported
into a table.

I noticed that psql is wrapped up in a import.bat

I wrote another bat that substantially do

rem import.bat
FOR %%f in (pattern) do "longpathtopsql.bat" --variable csvfile=%%f
-f some.sql


-- some.sql
\copy import.transaction from :csvfile


but I get 
some.sql:3: :csvfile: No such file or directory

I tried -v --set etc...

Same result.

How can I pass variables to a sql script?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Aggregate Function to return most common value for a column

2009-05-22 Thread Ivan Sergio Borgonovo

On Fri, 22 May 2009 17:48:44 +1000
"Michael Harris"  wrote:

> Hi Experts,
> 
> I want to use an aggregate function that will return the most
> commonly occurring value in a column.
> 
> The column consists of VARCHAR(32) values.
> 
> Is it possible to construct such an aggregate using PL/PgSql ? 
> 
> If I was trying to do something like this in Perl I would use a
> hash table to store the values and the number of times each was
> seen as the table was iterated, but PL/PgSql does not seem to have
> an appropriate data type for that?
> 
> I don't want to use PL/Perl to avoid the overhead of starting a
> perl interpreter for that.
> 
> Do I have to write the function in C maybe?

Isn't it a job for group by?

select count(*), myvalue from table group by myvalue order by
count(*) desc limit 1;

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] package for debugger/wizard was: Feedback on PG?

2009-05-20 Thread Ivan Sergio Borgonovo

On Wed, 20 May 2009 07:46:02 -0400
Dave Page  wrote:

> > Beside the fact I can only thank for all the great work around
> > postgresql, is there a reason we can't have something similar on
> > eg. Debian that will let us have a tuning wizard and a debugger
> > that "just works" with an aptitude install?

> No, other than resources. We maintain close to 100 installers now,
> just for the EnterpriseDB supplied packages. Producing
> platform-specific builds of them as well as the one-click
> installers would be a mammoth task.

My universe is Debian bound... so I even don't know if there is a
*nix version of the tuning wizard.
I'd consider it a quite useful tool even for "marketing" purposes on
Linux too.
Bad performance without tuning is a common thread here.
I really didn't have time to investigate about the debugger, I'd
expect that on Windows it "just works".
While many things on *nix just work, debugging pg functions on Linux
is not one of those.
As you may have guessed my definition of "just works" in not that
different from "aptitude install".

I still have to find an howto for installing edb in Debian.
Could it be packaged for Debian if there were resources?

> It's pitched as a ''get you started" tool. We still expect you to

Exactly.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [Windows] Feedback on PG?

2009-05-20 Thread Ivan Sergio Borgonovo

On Wed, 20 May 2009 06:59:28 -0400
Dave Page  wrote:

> On Wed, May 20, 2009 at 6:49 AM, Ivan Sergio Borgonovo
>  wrote:
> ]
> > I was surprised how polished the installation was.
> 
> Thanks :-)

Beside the fact I can only thank for all the great work around
postgresql, is there a reason we can't have something similar on eg.
Debian that will let us have a tuning wizard and a debugger that
"just works" with an aptitude install?

oh and yeah... I know a "Tuning Wizard" is evil and will hide all
the true unleashed hidden wonderful power you can really squeeze out
of Postgresql and corrupt your soul... but still ;)

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [Windows] Feedback on PG?

2009-05-20 Thread Ivan Sergio Borgonovo

On Wed, 20 May 2009 11:59:34 +0100
Raymond O'Donnell  wrote:

> On 20/05/2009 11:49, Ivan Sergio Borgonovo wrote:
> > I think you could connect from Delphi through ODBC, meybe even on
> > some .NET driver. I think ODBC should be more mature... but don't
> > trust me.
> 
> You can use ODBC and .NET (depending on which version of Delphi you

What is the status of the .NET driver?
Are my assumptions about ODBC vs. .NET valuable or where they just a
remaining of the past/some nightmare I just had?

Any appreciable difference between the Windows .NET driver and the
*nix one?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [Windows] Feedback on PG?

2009-05-20 Thread Ivan Sergio Borgonovo

On Wed, 20 May 2009 09:32:13 +0200
Gilles  wrote:

> Hello
> 
>   I'm looking for an open-source DBMS that runs on Linux and
> Windows. In order to make an informed choice, I would like some
> feedback on the Windows port of PostgreSQL, as compared to MySQL
> and Firebird (if you know of yet other solutions, I'm also
> interested.)

It was a while I didn't "seriously" touch Windows.
Recently I had to install PostgreSQL on XP.
This is a very superficial first impression.

I was surprised how polished the installation was.
On Windows the installer even comes with a couple of goodies more:
namely a tuning wizard and some debugging components.
Nothing that really make you miss the *nix environment... but still
a quite nice impressing experience.

I was even surprised to see the installer offer you a chance to
install Drupal.

> Is the Windows port on par with the *nix version, and with those 
> other alternatives? Apart from the fact that, unlike MySQL, 
> PostgreSQL doesn't require buying a license when developping 
> commercial applications, are there technical reasons why I should 
> choose PostgreSQL instead of MySQL or Firebird?

From a very naive POV I'd say MySQL is still an SQL interface to the
filesystem. PostgreSQL is a mature RDBMS.
On small projects where data aren't that valuable and the
application is nearly totally read-only or you're willing to reinvent
the wheel of concurrent access to your data I'd go with MySQL.
As C++ or python may be "boring/annoying" compared to VB or PHP,
PostgreSQL may be "boring/annoying" compared to MySQL. But as soon
as you reach some grade of complexity/value of your data you're
really going to appreciate what PostgreSQL can offer.

From a pure programming point of view, PostgreSQL is really much
more fun to deal with. It's not sloppy accepting everything you
throw at it... and regretting it later, it is much more standard
compliant, it warns you earlier about problems and try to give you
good hints on how to solve them, it let you use some more complex
SQL features (and more to come) it has support for full text search
and spatial/geographic data/indexes on a transactional engine, it
has a plethora of extension modules.
Functions, triggers and rules are much more mature than on MySQL.
schemas may really come handy simply as namespaces or as a tool to
manage grant.

I think you could connect from Delphi through ODBC, meybe even on
some .NET driver. I think ODBC should be more mature... but don't
trust me.
For "updating"... PostgreSQL is still not able to do "hot upgrades"
from major versions. You've to backup->restore.
On Debian... this happens auto-magically... I've no idea about what
the Windows installer can do when upgrading.
If you've availability constraint this could be a bit annoying. If
you've HA constraints you're going to have some kind of replication
solution that will somehow let you do "hot upgrades".
Anyway this shouldn't be your concern if you're dealing with a SOHO.

Coming from a web background but where transactions and some
"advanced" features are essential to survival I got the impression
that actually even for small SOHO applications even when server and
client are on the same box pg may be a very good choice.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Excel and pg

2009-05-18 Thread Ivan Sergio Borgonovo

On Mon, 18 May 2009 09:14:41 +0800
Craig Ringer  wrote:

> Ivan Sergio Borgonovo wrote:
> 
> > I'd like to know if:
> > - it is possible to "load" in an Excel sheet a table (view, query
> >   result) coming from postgresql and to use those data to do
> > further computation/presentation work on Excel?
> 
> Certainly. You can do it through the ODBC interface via VB, and I
> think Excel also has some kind of "data browser" that lets the
> user pull data from ODBC-accessed databases interactively.

This is working. I had a bit of fun trying to find the right menu in
the Chinese version.
For the uninitiated/the ones that don't have fresh memory of working
in a MS environment:
- Install Postgresql and ODBC drivers and create a DB/user etc...
- Create a system wide DSN connection. In XP is a bit hidden
  compared to my memories of W2K (Control Panel -> Performance an
  Maintenance -> Administrative Tools -> Data Source)
- Open Excel, there should be a Data Menu... I can't go further
  since the Excel was localised in Chinese. There are some menu
  that follow but we weren't able to read them maybe due to the
  mismatch of OS and Excel localisation.

You can import tables and view too and it seems you can apply a SQL
filter on them. Dates seems to be imported correctly and I think
localised according the setting of the OS (I'd ask, I think in
mainland China data should follow the European format and not the
UK/US format).

> Beware, though. Excel has funny ideas about dates and has some
> baked-in bugs in some of its functions. It doesn't know about or
> respect the foreign key relationships and constraints in the
> database, either.

Fortunately I won't delegate anything related to data integrity to
Excel. I'll keep an eye on dates.
Having had some experience with MS SQL I had some difficulties with
converting in and out dates from Postgresql at times. I know it
shouldn't be the responsibility of the DB to convert between
formats... but for some dirty works it can really comes handy.
pg -> Excel worked perfectly. I hope Excel -> csv -> pg will be
equally painless. I'm a bit worried considering the limited toolset
I can rely on I may have some localisation problems when people will
try to save Excel -> csv.
COPY may not support all the flexibility I need if Chinese localised
Excel/OS will output some strange stuff in csv.

I chose to pass through pg just because I have to help to write down
some business logic for reporting and I bet I'll have to get back at
what I wrote in 1-6 months. I tried to implement some of the
reporting logic in Excel but:
- Something really fit SQL
- I don't want to relearn how an IF works, especially if I have to
  switch back and forward to Polish notation
- I've found something really weird. People say SQL is hard (yeah it
  may be...) but that S really shine once you compare it with the
  way to operate of a spread sheet ;)
- Office SQL is a PITA. I gave up once I saw they don't implement
  COALESCE.

If people would like to elaborate further on data coming from pg
using Excel functions they will be on their own.

> If you really must pull data into Excel, consider giving users an
> account in PostgreSQL that _ONLY_ has access to read-only views of
> the data. Those views should denormalize the data significantly and
> otherwise make it as Excel-friendly as possible. Pull the data in
> using a Visual Basic script that "protects" the data as soon as
> it's been placed on the sheets, so the user can't accidentally
> change it, just reference it.

I've to import data in Postgresql... that comes in other Excel files.
I can't rely on a complete programming environment.

I was thinking about opening another ODBC connection and using
dblink to import data from Excel to pg... but somehow it doesn't
look it is going to really improve the procedure of importing data
from a csv.
I mean... someone doesn't do it properly (eg. some kind of
lock/whatever on the Excel file) people won't be able to understand
the problem.
Saving the Excel file to csv seems something could be handled easier
by the people that will do the job.

I think that somehow "refreshing" data imported by Excel is going to
run the associated query... so if I write a function that run
\copy
and place a "select * from importdata()" in Excel... everything
people should do is save the excel as csv somewhere.

> > - can postgresql load data from an Excel sheet? Or Excel write
> > data to postgresql from an excel sheet? dblink?

> The easiest way is via CSV. You could probably also do it with some
> Visual Basic running in Excel that pushes the data via ODBC.

> If you're going to even vaguely consider putting data from a
> user-modifiable spreadsheet back

[GENERAL] Excel and pg

2009-05-17 Thread Ivan Sergio Borgonovo

I've to deal with a volunteer pet project and I wouldn't like to get
crazy fighting with the idiosyncrasies of Access but still I've no
time to build up an interface that will be enough user friendly to
overcome the resistance of something new.

So I thought just to use Excel 2003 as the front-end to postgresql,
everything on Windows XP.

I'm not any more (if I have ever been) comfortable with MS Office
stuff and Windows.
I know using postgresql is like using an elephant to fight a
microbe here, but still since I'll have to write the business logic
and I don't want to spend hours understanding why I can't make a
join or what is the equivalent of coalesce etc... I'd like to know
if I'm getting into more trouble just to set the things up.


I'd like to know if:
- it is possible to "load" in an Excel sheet a table (view, query
  result) coming from postgresql and to use those data to do further
  computation/presentation work on Excel? I think the rough path
  should be use ODBC (OleDB?) Do I have to install anything more
  other than postgresql?
- can postgresql load data from an Excel sheet? Or Excel write data
  to postgresql from an excel sheet? dblink?
- am I going to incur in any localisation problem if the Windows
  stuff is localised in Chinese? I see I can chose the "language to
  be used during installation". I'd prefer localization to be in
  English but still let people that will use the front-end to use
  Chinese. What about the encoding (client/server)?
- are there tools to make backup/restore very easy even for
  "point&click" kind of users?
- anything that a non "desktop" oriented guy like me have to realise
  before promising to put up something that will have to be used by
  "desktop/GUI" people? I can't think about anything else other than
  backup they will have to deal with once they see their data in
  Excel and they can backup/restore easily.

thanks

BTW I saw a lot of nice things on the pg package for Windows...
especially for debugging that I'm not sure I can find for Linux or
can be as easily installed as with Stack Builder:
- debugger
- Tuning Wizard
- replication solution (I wonder if it is easier to have a working
  solution with aptitude or Stack Builder)

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PGSQL-to-MYSQL Migration: Error in a 'simple' inner join query

2009-05-04 Thread Ivan Sergio Borgonovo

On Mon, 4 May 2009 09:27:30 -0700 (PDT)
DaNieL  wrote:

[snip]

> Every id in every table is a PRIMARY KEY, UNIQUE, NOT NULL and
> serial type..
> The query that i have problem with is:
> ---
> SELECT
>  orders.code,
>  customer.name,
>  SUM(order_item.price)
> FROM
>  orders
>  INNER JOIN customer ON (customer.id = orders.id_customer)
>  INNER JOIN order_item ON (order_item.id_order = orders.id)
> GROUP BY orders.id

> Sorry, i know that this maybe is a basically problem, but i come
> from mysql.. and in mysql that query works...

You have to add explicitly all the columns not in aggregates.
Postgresql is a bit stricter than mysql but it generally saves you a
lot of debugging later.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Selling an 8.1 to 8.3 upgrade

2009-04-24 Thread Ivan Sergio Borgonovo

On Fri, 24 Apr 2009 13:03:06 -0600
Scott Marlowe  wrote:

> On Fri, Apr 24, 2009 at 11:01 AM, Josh Trutwin
>  wrote:
> > I've been asked to put together a list of reasons to upgrade a db
> > from 8.1 to 8.3 and I've looked over the changelog, but they
> > want a bullet list of 4-5 top things.  I'm curious what others
> > would say the most 5 important updates from 8.1 to 8.3 are.
> 
> There were a lot of improvements from 8.1 to 8.3.  For the system
> at work, the compelling ones were:
> HOT updates.
> General improvements in the query planner.  This has been true for
> nearly every new major release.
> Improved vacuuming, including multi-threaded vacuum capability.
> Concurrent index creation.
> Improved handling of joins
> Fillfactor setting
> Much improved checkpointing
> Changes to the buffering methods so large seq scans don't evict
> more popular data from the buffers.  This is much more important
> than it sounds.

For me the reason to switch was the inclusion of the great work by
Oleg and Teodor: Full-text search into core.

It looks like 8.4 is going to be a great release too and I think it
will be an enough cool reason to switch not only DB release but OS
release as soon as 8.4 end up in the backport of Lenny.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Ivan Sergio Borgonovo

On Thu, 23 Apr 2009 12:00:30 +0200
Karsten Hilbert  wrote:

> On Thu, Apr 23, 2009 at 11:33:34AM +0200, Ivan Sergio Borgonovo
> wrote:
> 
> > Karsten Hilbert  wrote:
> > 
> > > On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote:
> > > 
> > > > I have a set of dynamically composed objects represented in
> > > > Java, with string values for various attributes, which have
> > > > variable length. In case you have suggestions for a better
> > > > type for this case, it would be my pleasure to hear about
> > > > them.
> > > 
> > > Seref, he's suggesting you use TEXT instead of
> > > VARCHAR(something). In PG it's actually usually *less*
> > > overhead to use the unbounded text datatype (no length check
> > > required).
> > > 
> > > Length checks mandated by business logic can be added by
> > > more dynamic means -- check constraints, triggers, etc which
> > > allow for less invasive change if needed.
> > 
> > Could you point us to some example of a constraint/trigger
> > (etc...) that is going to provide the same checking of varchar
> > and explain (if the code/example... doesn't make it clear) why
> > it should be faster or less invasive?
> 
> check constraint based:
> 
>   create table foo (
>   txt text
>   check (char_length(txt) < 1001)
>   );
> 
> trigger based:
> 
>   create function trf_check_length_1000() ... returns
> trigger ... $$...$$;
> 
>   create table foo (
>   txt text
>   );
> 
>   create trigger check_txt_length before INSERT or
> UPDATE ... execute trf_check_length_1000();
> 
> faster:
> 
> - TEXT is (judging by previous comments on this list)
>   marginally faster than VARCHAR(1000) because a) it runs
>   the same code but b) doesn't have to check for the 1000
>   length
> 
> - other options (VARCHAR, constraint, trigger) incur
>   additional overhead and are thus slower
> 
> less invasive:
> 
> Well, poor wording on my part, perhaps. What I meant is that
> changing a check constraint or trigger appears to be a less
> costly operation on a table than changing the datatype of a
> column (although I seem to remember there being some
> optimizations in place for the case of changing the *length*
> of a varchar).

I'll try to rephrase to check if I understood and for reference.

varchar is slower than text since it has to do some "data type
check".

text is faster but if you add a check... it gets slower (slower than
varchar?, faster?).

constraint and trigger should have the advantage that in case of
refactoring you're not going to touch the table definition that
*may* end in being faster.

But... if in postgresql implementation varchar is just text with a
check... how can a change in type be faster?
If it was a char(N) maybe there would be some kind of optimization
since the length of the data is known in advance... so
shrinking/enlarging a char(N) may have a different cost than
shrinking a varchar(N) that in pg *should* have the same
implementation than text.

On the other end... you're preferring text just because they have
the same implementation (modulo check) in Postgresql... but it may
not be so in other DB.
So *maybe* other DB do some optimization on varchar vs. text.

Somehow I like the idea of considering a varchar a text with a
check, but I think I prefer the "more traditional" approach since
somehow is the "most expected".

Nothing can handle strings of infinite length, and much before
reaching infinite I'll get in trouble.
People read differently what you'd like to say writing varchar(N).
Most people read:
1) we expect a length around N
Fewer people read:
2) There is something not working if we get something larger than N
But it may also mean:
3) if we get something larger than N something is going to explode
I think the same "ambiguity" is carried by check().
Anyway for a sufficiently large N 2) and 3) can be valid.

Supposing the cost of loosing an insert for an unpredicted large
value of N is high I'd be tempted to set N to at least protect me
from 3) but I bet people may interpret it as 1).

In my experience anyway varchar is a good early warning for troubles
and the risk of being misunderstood/get caught by implementation
dependent gotcha writing varchar(N) where N mean 3) largely
encompass the risk of loosing an insert you didn't have to lose.

Maybe I've spotted a potential advantage of check over varchar.
If you use some kind of convention to name checks you could
remove/re-apply them easier than spotting varchars().
The name of the constraint may contain metadata to help you.
The name of the constraint may also suggest why it's there to your
colleagues.
But this works just if your implementation perform similarly on text
over varchar().

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] varchar vs. text + constraint/triggers was: Help request to improve function performance

2009-04-23 Thread Ivan Sergio Borgonovo

On Thu, 23 Apr 2009 11:00:59 +0200
Karsten Hilbert  wrote:

> On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote:
> 
> > I have a set of dynamically composed objects represented in
> > Java, with string values for various attributes, which have
> > variable length. In case you have suggestions for a better type
> > for this case, it would be my pleasure to hear about them.
> 
> Seref, he's suggesting you use TEXT instead of
> VARCHAR(something). In PG it's actually usually *less*
> overhead to use the unbounded text datatype (no length check
> required).
> 
> Length checks mandated by business logic can be added by
> more dynamic means -- check constraints, triggers, etc which
> allow for less invasive change if needed.

Could you point us to some example of a constraint/trigger (etc...)
that is going to provide the same checking of varchar and explain
(if the code/example... doesn't make it clear) why it should be
faster or less invasive?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] backup getting larger and larger

2009-04-14 Thread Ivan Sergio Borgonovo

On Tue, 14 Apr 2009 13:26:24 -0700
Steve Crawford  wrote:

> Ivan Sergio Borgonovo wrote:
> > I still have to investigate if the tables are getting really
> > larger... but at a first guess there shouldn't be any good
> > reason to see tables getting so large so fast... so I was
> > wondering if anything could contribute to make a backup much
> > larger than it was other than table containing more records?
> >
> > The only thing that should have been really changed is the
> > number of concurrent connections during a backup.
> >   
> Can we assume that by backup you mean pg_dump/pg_dumpall? If so,
> then the change is likely due to increasing data in the database.
> I have a daily report that emails me a crude but useful estimate
> of table utilization based on this query:
> 
> select
>   relname as table,
>   to_char(8*relpages, '999,999,999')  as "size (kB)",
>   (100.0*relpages/(select sum(relpages) from pg_class where 
> relkind='r'))::numeric(4,1) as percent
> from
>   pg_class
> where
>   relkind = 'r'
> order by
>   relpages desc
> limit 20;

Thanks, very useful.
May I place it on my site as a reference, giving credits of course?

Still puzzled...

The first and second largest table make up for 70% of the overall DB
size (1st 53%, 2nd 16.1%)
The second one have very few small fields but ~2x the number of
records of the first.
Comparatively a row of the first one is at least 10x larger than a
row in the second one.
The first has 1M records.

All the others following with a size larger than 1% grow as 1x the
number of records of the first one.
I had an increment of less than 10% of the number of records of the
first table but an increment of roughly 80% of the size of backup.

Maybe it is due to compression. The table that grew more can't be
shrunk too well.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] backup getting larger and larger

2009-04-14 Thread Ivan Sergio Borgonovo

I still have to investigate if the tables are getting really
larger... but at a first guess there shouldn't be any good reason to
see tables getting so large so fast... so I was wondering if
anything could contribute to make a backup much larger than it was
other than table containing more records?

The only thing that should have been really changed is the number of
concurrent connections during a backup.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] "PostgreSQL in the cloud" now up on media.postgresql.org

2009-04-13 Thread Ivan Sergio Borgonovo

On Mon, 13 Apr 2009 14:36:49 -0700
Steve Crawford  wrote:

> Scott Marlowe wrote:

> >>> http://media.postgresql.org/sfpug/sfpug-cloud-20090408.mov

> >>> Thanks to Josh Berkus for his organizational talents, and Dirk
> >>> Jagdmann for running the live streaming during the event.

> >> wget says it is a 1Gb file.
> >> Is there anything smaller?

> > If you'd like I can dl and rencode it quite a bit smaller I'm
> > sure. It'll take several hours to dl, process, then upload, but
> > that's mostly machine time, right? :)

I was just thinking about a public "official" place for everybody.
I'm going to sleep so I'm not going to miss the bandwidth required
for downloading the full stuff ;) but still then it will have to
find it's way back to a public place.

> The people doing the actual work will correct me if I'm wrong, but
> I believe it will be up on vimeo.com fairly soon as last month's
> is.

Can vimeo movie be downloaded?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] "PostgreSQL in the cloud" now up on media.postgresql.org

2009-04-13 Thread Ivan Sergio Borgonovo

On Mon, 13 Apr 2009 13:11:23 -0700
Christophe  wrote:

> http://media.postgresql.org/sfpug/sfpug-cloud-20090408.mov
> 
> Thanks to Josh Berkus for his organizational talents, and Dirk  
> Jagdmann for running the live streaming during the event.

wget says it is a 1Gb file.
Is there anything smaller?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] database corruption

2009-04-09 Thread Ivan Sergio Borgonovo

On Wed, 8 Apr 2009 22:14:38 -0400
"Jeff Brenton"  wrote:

> 
> There are no filesystem level content size restrictions that I am
> aware of on this system.  The user pgsql should have full access
> to the filesystems indicated except for the root filesystem. 

finished inodes?
A lot of small files (even empty) somewhere?
It happened to me when I was running a spider using curl and the
spiders where not exiting properly when another process was killing
them...


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] recovery after segmentation fault

2009-04-08 Thread Ivan Sergio Borgonovo

On Wed, 8 Apr 2009 23:59:43 +0200
Martijn van Oosterhout  wrote:

> What might be easier is turning on core dumps (ulimit -S -c
> unlimited) and then start postgres and see if it drops a core

thanks.

> > Is there a way to just kill recovery for one DB? Just don't
> > start it at all?
> 
> Unfortunatly, the XLOG is shared betweens all databases on one
> cluster.

bwaaa. That's a bit of a pain.

I'm trying to understand this a bit better...
I think nothing terrible really happened since:
a) the DB that has the higher write load was actually the one that
caused the problem and I restored from a backup.
b) the other DBs have some writes too... but the software using them
doesn't have any idea about transactions so it is built with atomic
statement in mind... No operation I can think of was writing in more
than one table and I'd think most (all?) the operations were atomic
at the statement level.

So if I lost some writes in logs for the other DBs... that shouldn't
be a problem, right? I just lost some data... not coherency? right?

> > This is the same DB having problem with recreation of gin index
> > BTW... and I've the feeling that the problem is related to that
> > index once more... I was vacuuming full, I aborted...

> > I think the DB is trying to recreate the index but due to some
> > problem (can I say bug or is it too early?) it segfaults.

> Interesting, hope you can get a good backtrace.

I backed up all the data dir.
I'm currently transferring it to my dev box.
I've already the same DB... but it is on lenny.
And it never gave me a problem.
Version are slightly different anyway:

Version: 8.3.6-1 (working)
Version: 8.3.4-1~bpo40+1 (sometimes problematic[1])

8.4 is at the door... and the only choice I have to fix the problem
on that box is:
- upgrade to lenny
- build postgresql from source, that is going to be a maintenance
  pain.

Could anything related to vacuum and/or gin index had been fixet
between 8.3.4 and 8.3.6?

I think that if I'll stick with some rituals I can live with it.
Avoid vacuum full when there is load and restart the server before
doing it.

[1] slow vacuum full and gin index update

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there performance advantages in storing bulky field in separate table?

2009-04-08 Thread Ivan Sergio Borgonovo

On Wed, 8 Apr 2009 17:39:02 +0100
Sam Mason  wrote:

> On Wed, Apr 08, 2009 at 04:56:35PM +0100, Ian Mayo wrote:
> > One more thing:  hey, did you hear?  I just got some advice from
> > Tom Lane!
> 
> Statistically speaking; he's the person most likely to answer you
> by quite a long way.  Out of the ~24k emails going back to Oct
> 2007 I've got from pgsql-general the most common people who wrote
> them are:

>who  num mails  of total
>   Tom Lane  1,9358.0%
>   Scott Marlowe 1,0774.5%
>   Alvaro Herrera  5212.2%
>   Joshua Drake4681.9%
>   Richard Huxton  4321.8%
>   Craig Ringer3381.4%
>   Ivan Sergio Borgonovo   3141.3%

I just wrote privately to Tom that I'm ashamed I ask so much and
answer so few.
But well I'm an exception ;) I'm the top of non-contributors. 

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] recovery after segmentation fault

2009-04-08 Thread Ivan Sergio Borgonovo

On Wed, 08 Apr 2009 10:59:54 -0400
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > 2009-04-08 16:36:53 CEST LOG:  startup process (PID 3176) was
> > terminated by signal 11: Segmentation fault 2009-04-08 16:36:53
> > CEST LOG:  aborting startup due to startup process failure
> 
> Hmm, what Postgres version is this?  Can you get a stack trace from
> the startup process crash?

How on Debian?
Debian does all it's automagic stuff in init. I never learned how to
start pg manually.

> The only simple way out of this is to delete the presumably-corrupt
> WAL log by running pg_resetxlog.  That will destroy the evidence

I couldn't find it... mmm what a strange place for an executable:
/usr/lib/postgresql/8.3/bin/pg_resetxlog

> about what went wrong, though, so if you'd like to contribute to
> preventing such problems in future you need to save a copy of
> everything beforehand (eg, tar up all of $PGDATA).  Also you might
> have a corrupt database afterwards :-(

What if I just don't care about recovery of *one* DB (that is maybe
the culprit) and just see the server restart then just do a restore
from a VERY recent backup?

Is there a way to just kill recovery for one DB? Just don't start it
at all?

This is the same DB having problem with recreation of gin index
BTW... and I've the feeling that the problem is related to that
index once more... I was vacuuming full, I aborted...

I think the DB is trying to recreate the index but due to some
problem (can I say bug or is it too early?) it segfaults.

I think this could be of some help:

2009-04-08 16:47:13 CEST LOG:  database system was not properly shut
down; automatic recovery in progress
2009-04-08 16:47:13 CEST LOG: redo starts at 72/9200EBC8

BTW:
Linux amd64, debian stock kernel
Debian etch/backport: Version: 8.3.4-1~bpo40+1

Now let's learn how to use pg_resetxlog

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] recovery after segmentation fault

2009-04-08 Thread Ivan Sergio Borgonovo

postgresql suddenly died...

during recovery 

2009-04-08 16:35:34 CEST FATAL:  the database system is starting up
^^^ several
2009-04-08 16:35:34 CEST LOG:  incomplete startup packet
2009-04-08 16:36:53 CEST FATAL:  the database system is starting up
2009-04-08 16:36:53 CEST LOG:  startup process (PID 3176) was
terminated by signal 11: Segmentation fault 2009-04-08 16:36:53 CEST
LOG:  aborting startup due to startup process failure

It could be something wrong with the recovery process in an aborted
transaction that is causing the segfault...

How can I resurrect the server and load a backup?
It was serving more than one DB and I assume that only one is
causing problems. Can I skip just that one from recovery and start
from backup?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] media server (mp3 player) on postgresql

2009-04-06 Thread Ivan Sergio Borgonovo

This is a fancy request... but maybe among so many postgresql
aficionados there is someone that know if there is any "media
server/player" that support postgresql.

Once upon a time amarok did support postgres to store and search
metadata on songs. It did some pretty tricks with id3 data,
postgresql and the file system.

That was pretty comfort if you shared your media files across your
home and you also wanted to share "scores" among your family.

Now amarok dropped all DB support with the exception of mysql and
most of my lan services are based on postgresql.
KDE4.X.X just landed on sid, so the old amarok that supported
postgresql is going to be wiped by the new one...

I was wondering if there was something around that could be divided
in 3 functional units:
- a DB and file server used to search and "serve" media
- a client (even web) to search and select media and decide on which
  box to play it
- several "servers" that will actually play the media

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] reducing IO and memory usage: sending the content of a table to multiple files

2009-04-03 Thread Ivan Sergio Borgonovo

On Fri, 3 Apr 2009 02:05:19 +0100
Sam Mason  wrote:

> On Thu, Apr 02, 2009 at 09:48:33PM +0200, Ivan Sergio Borgonovo
> wrote:
> > I didn't find any elegant example of cursor use in PHP... OK PHP
> > is not the most elegant language around... but still any good
> > exapmle someone could point me at? 

> You can obviously increase the "FETCH" upwards and if you're
> feeling fancy you could even run the FETCH async from the code
> that processes the results.  Maybe something like:

What kind of advantage should I get running asynchronously?
oh I didn't mean you were suggesting any advantage... just wondering.
It could be an option if once everything is up I want to keep under
control resources sucked by this process (?).

> > So I think the largest cost of the operation will be IO.
> > \copy should be optimised for "raw" data output, but maybe all
> > its advantages get lost once I've to use pipes and adding
> > complexity to filtering.

> Streaming IO is pretty fast, I think you'll be hard pushed to keep
> up with it from PHP and you'll end up CPU bound in no time.  Be
> interesting to find out though.

Filtering is currently very simple... I'm building a very simple xml
just queueing constant strings and what comes out of the DB.
But if I had to parse a CSV (split) or just assign names to columns
(and I expect this stuff is going to be adjusted frequently) or use
regexp... I was expecting to waste more human cycles or CPU cycles
than avoiding to rely on optimised IO of \copy (if any).
Most of the operations end up being:
$output=SOMEHEADER;
gzwrite($f1);
gzwrite($f2);
gzwrite($f3);
gzwrite($f4);
while(...) {
 $record1='.SOMECONST1.$row['col4'].'.$row['col3'].'';
 $record2='.SOMECONST2.$row['col4'].'.$row['col3'].'';
 gzwrite($f1);
 gzwrite($f2);
 gzwrite($f3);
 gzwrite($f4);
}
$output=SOMEFOOTER;
gzwrite($f1);
gzwrite($f2);
gzwrite($f3);
gzwrite($f4);

I've the largest table of my DB to be sliced into multiple xml files
that have to be written on disk.
So actually 1) reading the whole table and returning 30% of its
fields 2) writing all these data multiple times.
This is by far the largest write load the server is going to incur
in a day.
But well it may be the largest CPU load it is going to incur in a
day as well considering I've to gzip all the files.
Still I think I've read on this list that compression was going
to be a bottleneck more than IO.
I just did a preliminary test and xml-ing and gzipping 80K records
out of 1M takes less than 2sec.
So maybe I was over concerned.

Anyway I'd like to understand a bit better how IO and memory
consumption is managed once you've cursor vs. plain select and
client drivers in the middle.

> > I was reading about all the php documents and trying to
> > understand how buffers and memory usage works, so I gave a look
> > to MySQL documents too...
> 
> Not sure about PG, but the C api pretty much always buffers
> everything in memory first.  There was mention of getting control
> of this, but I've got no idea where it got.

buffer *everything*?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] reducing IO and memory usage: sending the content of a table to multiple files

2009-04-02 Thread Ivan Sergio Borgonovo

On Thu, 2 Apr 2009 17:27:55 +0100
Sam Mason  wrote:

> On Thu, Apr 02, 2009 at 11:20:02AM +0200, Ivan Sergio Borgonovo
> wrote:
> > This is the work-flow I've in mind:
> >
> > 1a) take out *all* data from a table in chunks (M record for each
> > file, one big file?) (\copy??, from inside a scripting language?)
> 
> What about using cursors here?

The only way I've seen using cursors with php involve defining a
function... It looks a bit messy for a 10 lines script having to
define a function just as a shell for a sql statement.
I'd even write it in python if the project didn't start to look as a
small Frankenstein... and sooner or later I bet I'll have to include
some php files to recycle some function.
I didn't find any elegant example of cursor use in PHP... OK PHP is
not the most elegant language around... but still any good exapmle
someone could point me at? 

If you could point me to some clean way to use cursors in php I'd
happy to learn.

I was thinking about using another embedded language that better
suits data processing (an unsafe version?) so I could directly
output to files from within a postgresql function...

> > 2a) process each file with awk to produce N files very similar
> > each other (substantially turn them into very simple xml)
> > 3a) gzip them

> GZIP uses significant CPU time; there are various lighter weight
> schemes available that may be better depending on where this data
> is going.

That's a requirement.

> > 2b) use any scripting language to process and gzip them avoiding
> > a bit of disk IO

> What disk IO are you trying to save and why?

Because this is going to be the largest write operation the all
system will have to handle during the day.
I'm not interested in fast complicated queries, planning,
transactions, caching... I just need to get a whole table pass it
through a filter and output several filtered "versions" of the same
table.
So I think the largest cost of the operation will be IO.
\copy should be optimised for "raw" data output, but maybe all its
advantages get lost once I've to use pipes and adding complexity to
filtering.

> > Does PostgreSQL offer me any contrib, module, technique... to
> > save some IO (and maybe disk space for temporary results?).
> > 
> > Are there any memory usage implication if I'm doing a:
> > pg_query("select a,b,c from verylargetable; --no where clause");
> > vs.
> > the \copy equivalent
> > any way to avoid them?
> 
> As far as I understand it will get all the data from the database
> into memory first and then your code gets a chance.  For large
> datasets this obviously doesn't work well.  CURSORs are you friend
> here.

I was reading about all the php documents and trying to understand
how buffers and memory usage works, so I gave a look to MySQL
documents too...
MySQL has mysql_unbuffered_query.
So I was wondering how memory is managed on the server and on
clients.

What's going to happen when I do a
$result=pg_query("select * from t1;");
while($row=pg_fetch_array($result)) {
}
vs.
using cursors...
vs.
asynchronous query (they just look as non stopping queries with no
relationship with memory usage)

Where are the buffers etc...

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] reducing IO and memory usage: sending the content of a table to multiple files

2009-04-02 Thread Ivan Sergio Borgonovo

This is the work-flow I've in mind:

1a) take out *all* data from a table in chunks (M record for each
file, one big file?) (\copy??, from inside a scripting language?)

2a) process each file with awk to produce N files very similar each
other (substantially turn them into very simple xml)
3a) gzip them

2b) use any scripting language to process and gzip them avoiding a
bit of disk IO

Does PostgreSQL offer me any contrib, module, technique... to save
some IO (and maybe disk space for temporary results?).

Are there any memory usage implication if I'm doing a:
pg_query("select a,b,c from verylargetable; --no where clause");
vs.
the \copy equivalent
any way to avoid them?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about hosting and server grade

2009-03-26 Thread Ivan Sergio Borgonovo

On Wed, 25 Mar 2009 13:19:12 -0600
Scott Marlowe  wrote:

> Spend your money on more RAM, (32G isn't much more than 16G and
> I've seen it make a world of difference on our servers).  Spend it
> on disks.  Number of disks is often more important than RPM etc.
> Spend it on fast RAID controllers with battery backed cache.
> Then, consider upgrading your CPUs.  We have 8 opteron cores in
> our servers, and 12 Disk RAID-10s under a very fast RAID
> controller, and we are still I/O not CPU bound.

[snip]

> But all of this depends on the type of workload your db has to
> do.  If you're running memory hungry select queries, focus on more
> memory.  If you're running lots and lots of little queries with a
> mix of update, insert, delete and select, focus on the drives /
> controller.  If you're running queries that require a lot of CPU,
> then focus more on that.

Could IO load show up as apparent CPU load?

I mean I've a pretty small DB. It should fit nearly all in RAM... or
at least... after 1 day of load I can see the box may use 50K of
swap.
Anyway when I update the main table (~1M rows and a gin index) I can
see the CPU reaching its limit. Most frequent updates involves
5K-20K changed record.

On normal workload the most intensive queries run in 200ms with few
exceptions and the BIG table is mostly in read access only.

It would be nice if the update would be a bit faster since I'm still
forced to do them during working hours... because people on the
other side are still convinced it is not worth to clean rubbish at
the source, so sometimes updates fail for inconsistent data.
Unfortunately... I can add ram and disks but all the sockets for CPU
are used.

The box has 2 old Xeon HT at 3.2GHz. It's on RAID5 (not my choice)
on a decent controller and has 4Gb of RAM.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] text column constraint, newbie question

2009-03-23 Thread Ivan Sergio Borgonovo

On Mon, 23 Mar 2009 14:11:28 -0700 (PDT)
RebeccaJ  wrote:

> now. Before, I was planning to have CHECK constraints in all of my
> text or char fields, to keep out all semicolons, single quotes, and
> anything else that looked dangerous. Now I'm thinking that I'll be
> using htmlentities(), pg_escape_string() and pg_query_params() as

check, htmlentities, pg_escape_string and pg_query_params really
don't belong to the same family of "functions" and serve very
different purposes.

simplifying it very much:
- check are used to control the quality of data that get stored in
  the db
- htmlentities is about formatting for web output
- pg_escape_string is to prepare input for sql and avoiding sql
  injection
- pg_query_params is a relative of pg_escape_string but somehow used
  differently

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] text column constraint, newbie question

2009-03-23 Thread Ivan Sergio Borgonovo

On Mon, 23 Mar 2009 03:30:09 -0600
Scott Marlowe  wrote:

> > I think pg_query_params should make a difference between floats
> > and integers and signal an error if you pass float where
> > integers are expected... but I'm not sure.
> > Not really a security concern, but an early warning for some
> > mistake.

> So, what are the performance implications?  Do both methods get
> planned / perform the same on the db side?

I don't think there is any appreciable advantage. Maybe all the
stuff ala fprint perform better in C rather than building up a
string concatenating and escaping in php.
Still I wouldn't consider it a first source of slowdown.

For making a difference in plan management you've to use another
family of functions pg_prepare/pg_execute.

I'm not an expert but not every time caching plans is what you'd like
to do.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] text column constraint, newbie question

2009-03-23 Thread Ivan Sergio Borgonovo

On Mon, 23 Mar 2009 01:07:18 -0600
Scott Marlowe  wrote:

> On Mon, Mar 23, 2009 at 12:59 AM, Stephen Cook 
> wrote:
> > You should use pg_query_params() rather than build a SQL
> > statement in your code, to prevent SQL injection attacks. Also,
> > if you are going to read this data back out and show it on a web
> > page you probably should make sure there is no rogue HTML or
> > JavaScript or anything in there with htmlentities() or somesuch.
> 
> Are you saying pg_quer_params is MORE effective than
> pg_escape_string at deflecting SQL injection attacks?

I didn't follow the thread from the beginning but I'd say yes.
It should avoid queueing multiple statements and it is a more
"general" method that let you pass parameters in one shot in spite
of building the string a bit at a time for every parameter you
insert (string, float, integer...).

Of course if you correctly escape/cast/whatever everything injecting
2 statements shouldn't be possible... but if you don't you give more
freedom to the attacker.

$sql='select * from '.$table.' where a=$1 and $b=$2'; //oops I made
a mistake.
$result=db_query_params($sql,array(1,'bonzo'));

If $table is external input and an attacker pass
existingtable; delete from othertable; --

The attack may just result in a DOS if existingtable is there but
your othertable shouldn't be wiped.

untested... but I recall pg_query and pg_query_params use different C
calls PGexec vs. PGexecParams and the later "Unlike PQexec,
PQexecParams allows at most one SQL command in the given string."

http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html

I think pg_query_params should make a difference between floats and
integers and signal an error if you pass float where integers are
expected... but I'm not sure.
Not really a security concern, but an early warning for some mistake.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] bash & postgres

2009-03-23 Thread Ivan Sergio Borgonovo

On Mon, 23 Mar 2009 15:03:15 +1100
Greenhorn  wrote:

> Hi,
> 
> I'm trying to pass variables on a bash script embedded with psql
> commands.
> 
> cat header.txt
> 
> "to1","from1","subject1"
> "to2","from2","subject2"
> "to3","from3","subject3"
> "to4","from4","subject4"
> 
> cat b.sh
> 
> #!/bin/bash
> two="2"
> 
> psql -h localhost -U postgres -d mobile -c "create temp table
> header (

I enjoy another technique that's not exactly embedding but makes the
sql file easily executable from other shells to and easier to
maintain (eg. you don't lose syntax highlight, you don't need to
make wide use of x bit, you can concatenate files...).

echo "select :a;" | psql --set a=3 test
 ?column?
--
3
(1 row)

of course in spite of piping your sql, you could put it into a file.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is PGSQL enough safe for internet?

2009-03-18 Thread Ivan Sergio Borgonovo

On Wed, 18 Mar 2009 09:32:56 +0100
durumdara  wrote:

> Possible he can install an another pgsql service that can be
> opened to the net and that can usable for ONLY THIS PROJECT. But I
> don't know, that is possible or not; and how to upgrade later, if
> needed... :-(

If you can't use another DB... or you don't want to use another DB
(this depends on how much separation/security you really think is
suited...) you can configure postgresql to respond just to certain
IP over SSL to access just certain DB even on a different port.

http://developer.postgresql.org/pgdocs/postgres/auth-pg-hba-conf.html
http://www.postgresql.org/docs/8.3/interactive/client-authentication.html

> Can I use some functions in PGSQL to make some protecting rules?
> Like in Mail Servers: if the IP is same and pwd is wrong in X
> times, the IP blocked for N hours...

I wouldn't recommend this approach. Someone may just close you out
from your own home.
Unless you're accessing the server from static IP and you can use
white listing.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] schema, view and role

2009-03-18 Thread Ivan Sergio Borgonovo

I've this view:

create or replace view catalog_promosimple_v as
 select p.PromoSimpleID,
  p.IsPromo, p.Percent, p.OnListPrice, p.Vendible,
  p.OnStock, p.Dist, p.PromoStart, p.PromoEnd, pi.ItemID, pi.Discount
  from catalog_promosimple p
   join catalog_promosimpleitem pi on
pi.PromoSimpleID=p.PromoSimpleID
 where p.PromoStart=now() and p.IsPromo=true;

and I've 3 role, one is just a group and the others are an actual
user with their corresponding schema.

The table mentioned in the view actually belong to the user's schema.
They don't exist in public.
The owner of everything is the group role.

group_role no schema
user1_role (member of group_role) -> user1_role schema
user2_role (member of group_role) -> user2_role schema

Tables in user1_role schema contain some record.
Tables in user2_role contain no record.

I create the above view connected as the group_role.
I get no error message. (?)
If I select * from catalog_promosimple_v I get no error and no
record, no matter with which user I'm logged in.

If I create the view with one of the userN_role I get the expected
result.

Why creating the view from group_role doesn't report any error?
After all group_role shouldn't have in the search path userN_role
schema... and in fact it just has:

show search_path ;
  search_path

 "$user",public

If the view is just an alias for the SQL as "text" and it is not
interpreted at creation time... once I log in with one of the
userN_role I should see the correct result. If it gets interpreted
when I create the view it should report an error since those tables
don't exist in the public or group_role schema.


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

< 1 2 3 4 5 6 >

101 - 200 of 520 matches

Mail list logo