from:"Alban Hertroys"

Re: [GENERAL] multicolumn index join

2008-05-15 Thread Alban Hertroys


On May 13, 2008, at 5:27 PM, Stephen Ince wrote:

I have to do a multicolumn self-join to find the median. I am using  
postgres 8.2.  How do I force postgres to use an index scan?  I  
have a multicolumn index but postgres is not using it.


Query

explain select e.time
from page_view e, page_view d
where e.test_run_id = d.test_run_id and e.web_resource_id =  
d.web_resource_id

and e.web_resource_id = 3961 and e.test_run_id = 2


How does that get you the median? It just gives you all the records  
in page_view e with the same (test_run_id, web_resource_id) tuple.  
AFAICS you could leave out the entire self-join and get the same  
results. No wonder it uses a seqscan...


Personally to find the median I'd use a scrolling cursor. Execute  
your query, ordered by time, using a scrolling cursor; scroll to the  
last row; determine the row_count from that; scroll back to half way  
your result set now that you know what size it is.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,482bd864927661472788033!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PHP Warning: pg_connect(): Unable to connect to PostgreSQL server: could not connect to server:

2008-05-13 Thread Alban Hertroys


On May 12, 2008, at 11:58 AM, Abdus Samad Ansari wrote:


PHP Warning:  pg_connect(): Unable to connect to PostgreSQL server:
could not connect to server:
I have setup PHP/Postgres and is running fine upto document root
i.e. /var/www/html, but when i am calling it through a cgi-bin php  
file

it is giving log error as :
[error] [client 127.0.0.1] PHP Warning:  pg_connect(): Unable to  
connect

to PostgreSQL server: could not connect to server: \x04\xe6\xe3\xbfPF;


Isn't that the server name it's printing after the message? Seems an  
odd name for a server...
If I enter that in Vim I get ^Dæã¿PF, which doesn't even seem a  
valid UTF-8 sequence according to my quick checks.



what may be the solution.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,482930f1927662132392431!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] operator is not unique: integer || integer

2008-05-06 Thread Alban Hertroys


another example?:

RAISE NOTICE error during validation % :, 'ks:'||ks||'@'|| 
loopdate||'';  (here LoopDate is a DateTime)


Ehm... What's wrong with RAISE NOTICE error during validation  
ks:%@% :', ks, loopdate; ? (I don't quite understand the purpose  
of that colon at the end, btw).
Allows you to format the date to your liking too, just add a to_char 
(loopdate, format string).


I know these were just a few examples of your troubles, but so far it  
appears it's desirable to get rid of them for better code.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,481ffc80927661001715755!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] complex query using postgresql

2008-04-30 Thread Alban Hertroys

))
hladrb11102 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB1*1102'::character varying))
hladrb11103 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB1*1103'::character varying))
hladrb11104 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB1*1104'::character varying))
hladrb11301 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB1*1301'::character varying))
hladrb11302 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB1*1302'::character varying))
hladrb11501 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB1*1501'::character varying))
hladrb40101 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB4*0101'::character varying))
hladrb50101 btree (idr(p1, p4, p6, p7, p9, 'HLA- 
DRB5*0101'::character varying))

iid btree (id)
ip1 btree (p1)
ip4 btree (p4)
ip6 btree (p6)
ip7 btree (p7)
ip9 btree (p9)


--
Pau Marc Muñoz Torres

Laboratori de Biologia Computacional
Institut de Biotecnologia i Biomedicina Vicent Villar
Universitat Autonoma de Barcelona
E-08193 Bellaterra (Barcelona)

telèfon: 93 5812807
Email : [EMAIL PROTECTED] 


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4818580a927661384610962!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] I think this is a BUG?

2008-04-24 Thread Alban Hertroys


On Apr 24, 2008, at 10:11 AM, Kaloyan Iliev wrote:

regbgrgr=# ALTER TABLE test ADD COLUMN not_null INT NOT NULL ;
ERROR:  column id contains null values

==EXAMPLE2 
==

Example2:
In this case the postgress fill the NOT NULL column ID from the  
sequence.


What sequence? You never told it you wanted one. A PRIMARY KEY  
doesn't automatically add a sequence nor does a NOT NULL constraint,  
the serial type does that but you defined the column as type int, not  
as type serial.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4810d219927662597012045!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Need to update all my 60 million rows at once without transactional integrity

2008-04-23 Thread Alban Hertroys


On Apr 21, 2008, at 12:19 AM, [EMAIL PROTECTED] wrote:


Hi!

How can I make a Update of a column in a very large table for all  
rows without using the double amount of disc space and without any  
need for atomic operation?


I have a very large table with about 60 million rows. I sometimes  
need to do a simple update to ALL rows that resets a status-flag to  
zero.


I'll assume they're wide rows then.

I don't need to have transactional integrity (but of course if the  
system crashes, there should be no data corruption. A separate flag  
in the file system can well save the fact that that bulk update was  
in progress) for this, I don't care or want a abort or all or  
nothing like SQL mandates. The idea is basically that either this  
update succeeds or it succeeds or - there is no not. It must  
succeed. It must be tried until it works. If the update is halfway  
finished, that's okay.


If I just do an
UPDATE table SET flag=0;
then Pg will make a copy of every row which must be cleaned up by  
vaccuum. I understand - and don't like during this specific problem  
- that PG is a MVCC database and will keep my old versions of the  
rows before the update alive. This is very bad.


I'm not sure what you're trying to do exactly, but updating a flag on  
every row you changed will double your data size, as update means an  
insert/delete combination in MVCC. If there was no reason to update  
the row except for changing the flag, that's probably not what you want.


You could instead use referential integrity to do this job for you.  
Create a (temporary) table have a foreign key to your records (define  
it as both PRIMARY key and FOREIGN key to keep a 1 to 1 relation).  
Setting the flag is done by inserting a record into the temp table  
referencing the record you updated. Existence of the record in the  
temp table would signify the row was changed.


This way you'll only have inserts of small rows (into the temp  
table), and less than 60 million if you didn't succeed.


That still leaves the problem of disabling atomicity of that update  
of course. I think you actually mean to also update data in the rows  
you want to set those flags for, in which case my suggestion is of  
little help...




If I do a batched loop like this:
UPDATE table SET flag=0 where id=0 and id 200;
UPDATE table SET flag=0 where id=200 and id 400;
UPDATE table SET flag=0 where id=400 and id 600;
...

then PG will seek all over my harddrive I think.

It would be much better if it could just start in the beginning of  
the table and work it's way towards the end. But which sort- 
criteria would satisfy this? I don't think that there is any SQL  
that does something like that.


If that table is clustered over an index, that would be a good  
candidate as clustering means the data is mostly ordered on disk  
according to that index.



Another ideas (and I think it's a quite good idea) would be to
drop the column and recreate it with a new default value.

But the problem is that this is not actually MY database, but an  
appliance (which has a harddrive that does not have the double  
amount of space available btw) and it has to work reliably whenever  
something special happens.


And I don't think I should create new columns (the old one would be  
hidden and their internal column ids lost I think) all the time,  
that might have a limit.


Can I do it maybe every day??


Is there any other way to go?

I would really like to kind of skip transactions. Of course basic  
data integrity in on disc structures, but not atomicy for this bulk  
update!


I read that PG has an internal command language that is used to  
build up a database when all the tables for e.g. table-names are  
not present yet.


Could I use that to hack my way around transactions?

Basically I can do everything to this PG installation, this is an  
extremly controlled, locked down environment. I don't need to care  
for anyone else, it's always the same schema, the same version,  
same OS, etc. and I could happily apply any hack that solves the  
problem.


Many thanks for any pointers or ideas how to solve this problem!
 Christian
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general






Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,480fba09927662091310159!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem. createdb: could not connect to database postgres: could not connect to server: No such file or directory

2008-04-12 Thread Alban Hertroys


On Apr 12, 2008, at 7:11 AM, Jaisen N.D. wrote:

localhost:/home/user# su - postgres
[EMAIL PROTECTED]:~$ /usr/lib/postgresql/8.1/bin/initdb -D /var/ 
lib/postgresql/data
The files belonging to this database system will be owned by user  
postgres.

This user must also own the server process.

The database cluster will be initialized with locale en_IN.
The default database encoding has accordingly been set to UTF8.

initdb: directory /var/lib/postgresql/data exists but is not empty
If you want to create a new database system, either remove or empty
the directory /var/lib/postgresql/data or run initdb
with an argument other than /var/lib/postgresql/data.


It says right here. You forgot to move your old data directory out of  
the way. As it's from a Postgres 8.3 installation, Postgres 8.1 has  
no way of knowing how to handle what's in there.


[EMAIL PROTECTED]:~$ /usr/lib/postgresql/8.1/bin/pg_ctl -D /var/ 
lib/postgresql/data -l logfile start

postmaster starting
[EMAIL PROTECTED]:~$ /usr/lib/postgresql/8.1/bin/createdb test
createdb: could not connect to database postgres: could not connect  
to server: No such file or directory

Is the server running locally and accepting
connections on Unix domain socket /var/run/ 
postgresql/.s.PGSQL.5432?

[EMAIL PROTECTED]:~$
-
Why it can't connect to server? How can I resolve it??



Because initialisation of the database (initdb) failed for the  
aforementioned reason. There are probably some messages in your logs  
about postgres failing to start or to operate on its data directory.  
You can't just run a different major version of postgres on an  
existing data directory.


You didn't tell what your goal is. Are you trying to downgrade an  
existing database? If so, why would you want to do that?


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,48008eed927663372713408!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Quoting table/column names vs performance

2008-04-09 Thread Alban Hertroys


On Apr 9, 2008, at 5:27 PM, Jozef Ševčík wrote:

In MSSQL I had something like:
SELECT Column1,Column2 from MyTable

In PgSQL I write:
SELECT “Column1”, “Column2” from “MyTable”

Which is fine and working, I have no doubt about it. I’m just  
guessing if this does not affect performance

in any way.


What are you trying to fix that you don't just write SELECT  
Column1,Column2 from MyTable ?
Postgres understands that fine, why do you want to quote those  
identifiers? Maybe your application code is case-sensitive with  
regards to column (and maybe table) names?


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47fcf8df927661984376163!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Alban Hertroys


On Apr 7, 2008, at 1:32 AM, David Wilson wrote:

I have a reasonably large table (~75m rows,~18gb) called vals. It
includes an integer datestamp column with approximately 4000 unique
entries across the rows; there is a normal btree index on the
datestamp column. When I attempt something like select distinct
datestamp from vals, however, explain tells me it's doing a
sequential scan:

explain select distinct datestamp from vals;
  QUERY PLAN
-- 


 Unique  (cost=15003047.47..15380004.83 rows=4263 width=4)
   -  Sort  (cost=15003047.47..15191526.15 rows=75391472 width=4)
 Sort Key: datestamp
 -  Seq Scan on vals v  (cost=0.00..1531261.72  
rows=75391472 width=4)


The databases estimates seem consistent with yours, so why is it  
doing this? Could you provide an EXPLAIN ANALYSE? It shows the actual  
numbers next to the estimates, although I figure that query might  
take a while...


Pg estimates the costs quite high too. It's almost as if there isn't  
an index on that column and it has no other way then doing a  
sequential scan... Could you show us the table definition and its  
indexes? What version of Pg is this?


It may be that your index on vals.datestamp doesn't fit into memory;  
what are the relevant configuration parameters for your database?


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47f9b995927662100729983!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] select distinct and index usage

2008-04-07 Thread Alban Hertroys


On Apr 7, 2008, at 9:47 AM, David Wilson wrote:

On Mon, Apr 7, 2008 at 2:05 AM, Alban Hertroys
[EMAIL PROTECTED] wrote:
The databases estimates seem consistent with yours, so why is it  
doing
this? Could you provide an EXPLAIN ANALYSE? It shows the actual  
numbers next

to the estimates, although I figure that query might take a while...


explain analyze select distinct datestamp from vals;
   QUERY
PLAN
-- 
---

 Unique  (cost=15003047.47..15380004.83 rows=4263 width=4) (actual
time=649599.159..721671.595 rows=4252 loops=1)
   -  Sort  (cost=15003047.47..15191526.15 rows=75391472 width=4)
(actual time=649599.157..694392.602 rows=75391476 loops=1)
 Sort Key: datestamp
 Sort Method:  external merge  Disk: 1178592kB
 -  Seq Scan on vals  (cost=0.00..1531261.72 rows=75391472
width=4) (actual time=9.104..93130.468 rows=75391476 loops=1)
 Total runtime: 722379.434 ms


Wow, great estimates! The planner obviously knows how your data is  
structured. So much for the bad planner estimation scenario...


I haven't seen this external merge Disk-sort method before, maybe  
it's new in 8.3, but it doesn't look promising for query performance.  
Considering it's using 1.1GB it seems the planner may have chosen for  
the least memory exhaustive method; I have to admit I don't know the  
planner in that much detail. Take this with a grain of salt, but my  
guess is that as the index is even bigger, the planner figures this  
approach would involve the least disk i/o and will therefore be faster.


Have you tried this query with enable_seqscan=off? If my guess is  
right (and the planners, in that case) it'd be even slower.


Something that might help you, but I'm not sure whether it might hurt  
the performance of other queries, is to cluster that table on  
val_datestamp_idx. That way the records are already (mostly) sorted  
on disk in the order of the datestamps, which seems to be the brunt  
of above query plan.


 Pg estimates the costs quite high too. It's almost as if there  
isn't an
index on that column and it has no other way then doing a  
sequential scan...
Could you show us the table definition and its indexes? What  
version of Pg

is this?


Pg is 8.3.1

Table definition:
CREATE TABLE vals (
sid integer NOT NULL,
eid integer NOT NULL,
datestamp integer NOT NULL,
val_dur integer NOT NULL,
acc real NOT NULL,
yld real NOT NULL,
rt real NOT NULL,
ydev real NOT NULL,
vydev real NOT NULL,
adev real NOT NULL,
achange real NOT NULL,
ychange real NOT NULL,
arsi real NOT NULL,
yrsi real NOT NULL,
UNIQUE (sid,eid,val_dur,datestamp),
FOREIGN KEY (sid,eid,datestamp) REFERENCES preds
(sid,eid,datestamp) ON DELETE CASCADE
);
create index val_datestamp_idx on vals(datestamp);
create index val_evaluator_idx on vals(eid);
create index val_search_key on vals(val_dur,eid,datestamp);
create index val_vd_idx on vals(val_dur,datestamp);


There seems to be quite a bit of overlap in your index definitions.  
From my experience this can confuse the planner.


I suggest you combine them, but not knowing your data... Maybe  
rewriting your UNIQUE constraint to (val_dur, datestamp, eid, sid)  
would be enough to replace all those other indexes.
If not, it's probably better to have one index per column, so that  
the planner is free to combine them as it sees fit. That'd result in  
a bitmap index scan, btw.


(The various indices are for a variety of common queries into the  
table)


 It may be that your index on vals.datestamp doesn't fit into  
memory; what

are the relevant configuration parameters for your database?


That's a very good question. I recently had to rebuild this particular
database and haven't played with the configuration parameters as much
as I'd like- what parameters would be most relevant here? I hadn't
realized that an index needed to fit into memory.


Well, it doesn't _need_ to fit into memory, but if the database needs  
to fetch different parts of it from disk or swap, the costs of using  
the index will sear up. Especially random access would be bad.


Anything that fits entirely into memory will be faster than having to  
fetch it from disk, as long as it doesn't mean other things will have  
to come from disk instead.


I'm not a postgresql tuning expert (I know my way around though),  
other people can explain you way better than I can. Bruce Momjian for  
example: http://www.linuxjournal.com/article/4791



pg_total_relation_size('vals') - pg_relation_size('vals') gives 11gb.
All indexed columns are integers. My guess is that this means that
it's likely the index doesn't fit into memory.


That calculation doesn't look familiar to me, I'm more used to:
 select pg_size_pretty(pg_relation_size('...'));

You can put

[GENERAL] Using tsearch2 in a Bayesian filter

2008-04-06 Thread Alban Hertroys


Hi all,

In my spare time I've started on a general purpose Bayesian filter  
based on the now built-in tsearch2 functionality. The ability to stem  
words from a message into lexemes, removing stop words and gist  
indexes look promising enough to attempt this. However, my experience  
with tsearch is somewhat limited, so I have a few questions...


The messages entering the filter will be in different languages and  
encoding. For example, I get a lot of Cyrillic spam these days, while  
I get a lot of English messages and a few in Dutch. Especially the  
spam is likely to lie about it's encoding. Some messages will be  
plain text, but many will be HTML.

- Is it possible to stem words from that wide a variety of content?
- If so, what approach would be best?
- Do I need to strip out the HTML tags or can they serve as lexemes  
themselves?


Next, to determine the probability of a lexeme being of a certain  
classification (for example spam or not spam), I need to be able to  
count the number of occurrences of that lexeme in a text. I can't  
store a probability, as the numbers aren't fixed[*] (was hoping to  
abuse score() here, but that's probably a no-op). I haven't found any  
tsearch functions to determine the number of occurrences of each  
lexeme in a text. Ideally I'd have a resultset with ( lexeme, number  
of occurrences) tuples, so that I can use that directly in a query.

- How do I determine the number of occurrences of each lexeme in a text?

Thanks for your time.


[*] As more messages enter the system, there will be more occurrences  
of lexemes in messages and in classifications. If I start out with  
one lexeme occurring once in a single message, the chance that lexeme  
is in a message is 1. As soon as another message arrives not  
containing that lexeme, the chance is 0.5. The number of messages,  
occurrence of lexemes in messages and classifications is a  
continuously moving number, so I will need the numbers the  
probability was based on (might still decide to add a column with the  
probability calculated from those numbers for speed, of course).


Regards,

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47f8b050927661534911704!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem with planner choosing nested loop

2008-04-03 Thread Alban Hertroys



On Apr 2, 2008, at 9:02 PM, Alex Solovey wrote:
The reduced database example has the same problem in EXPLAIN  
ANALYZE as production one, here:


Seq Scan on bar  (cost=0.00..393.07 rows=1 width=4) (actual  
time=0.098..3.561 rows=24 loops=1)


Hang on... You prefer sequential scans because indexes make your  
database too slow, but you don't want a sequential scan now? What  
kind of solution do you expect then? An oracle maybe?


You will need an index if this query is too slow for you, or you will  
have to live with the slowness of this query. Pick one ;)


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47f47a7a927661963919006!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Client-requested cast mode to emulate Pg8.2 on v8.3

2008-03-21 Thread Alban Hertroys


On Mar 21, 2008, at 5:58 PM, Anton Melser wrote:


Tom the Champion strikes again!
Cheers
Anton


I have the suspicion that his mother is named Lois, his father is  
unknown and he has a sensitivity to Kryptonite. But that's just  
speculation of course...


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47e3ecbe9784203213352!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem with async notifications of table updates

2008-03-18 Thread Alban Hertroys


On Mar 18, 2008, at 3:58 AM, Tyler, Mark wrote:


I suggest rethinking your dislike of NOTIFY.


I have thought very hard about using NOTIFY for this but it has two
large problems (from my point of view). The first is that it forces me
to put far more smarts and state into the subscriber applications.  
This
is because I cannot pass any information with the NOTIFY apart from  
the

fact that something happened. Due to this restriction my subscriber
apps would have to go and look up some secondary table to get  
sufficient

information to construct the real query. That is just plain ugly in my
view.


You will have the same problem if you want to send a message about a  
record change in combination with transactions. You can either send a  
message about an /uncommitted/ transaction and include what record  
changed, /or/ you send a message about a /committed/ transaction  
which possibly changed multiple of those records - in which case  
there's no possibility to send a single id along with your message.
You could try sending a set after commit, equivalent to how INSERT  
RETURNING works, but you'll have to marshall those id's into your  
message yourself. And that's pretty similar to putting those id's in  
a table and fetch them from your application - it's just moving the  
work around.


Regards,

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47df69e69781418010441!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Conditional JOINs ?

2008-03-18 Thread Alban Hertroys


On Mar 18, 2008, at 8:06 PM, Leon Mergen wrote:


Hello,

Now, based on a previous post on the PostgreSQL mailing list
[http://archives.postgresql.org/pgsql-general/2007-11/msg00723.php] I
came up with this solution:

###
-- table where all common request data is stored
CREATE TABLE log.requests (
 id BIGSERIAL,
 type SMALLINT NOT NULL,
 ref1 BIGINT CHECK (type = 1) = (ref1 IS NOT NULL)),
 ref2 BIGINT CHECK (type = 2) = (ref2 IS NOT NULL)),
 timestamp INTEGER NOT NULL,
 -- all kinds of common data
)

-- only rows where log.requests.type = 1
CREATE TABLE log.requests1 (
 request_id BIGINT NOT NULL
)

-- only rows where log.requests.type = 2
CREATE TABLE log.requestsz2 (
 request_id BIGINT NOT NULL
)

etc (about 10 different types)

the current VIEW:
CREATE VIEW requests_view AS
 SELECT * FROM log.requests LEFT JOIN log.requests1 ON
(log.requests.ref1 = log.requests1.request_id) LEFT JOIN log.requests2
ON (log.requests.ref2 = log.requests2.request_id) -- etc
###

Now, in my theory, you would say that if postgresql encounters ref1 =
NULL, it will not attempt to JOIN the log.requests1 table. However,
I've been told that because the PostgreSQL planner doesn't know that
ref1 (or any other refX for that matter) is NULL, it will attempt to
JOIN all tables for all rows.

Is this true, and if so.. is there a workaround for this (perhaps that
my database design is flawed) ?


This looks almost like table partitioning. If you inherit your  
requestxxx tables from a common  requests table and add a check  
constraint to each inheriting table (a partition), the planner is  
smart enough to figure out that no rows in that partition can  
possibly match (constraint exclusion) and skips it.


Instead of joining, it uses something equivalent to a UNION ALL btw,  
which I think is what you're looking for anyway.


There have been plenty of posts about how to set up table  
partitioning, there's even an article about it in the standard  
Postgres documentation:


http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47e019899786732118417!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem with async notifications of table updates

2008-03-18 Thread Alban Hertroys


On Mar 19, 2008, at 2:35 AM, Tyler, Mark wrote:


Alban Hertroys wrote:

On Mar 18, 2008, at 3:58 AM, Tyler, Mark wrote:

I suggest rethinking your dislike of NOTIFY.

I have thought very hard about using NOTIFY for this but it has two
large problems (from my point of view). The first is that it forces

me

to put far more smarts and state into the subscriber applications.
This
is because I cannot pass any information with the NOTIFY apart from
the fact that something happened. Due to this restriction my
subscriber apps would have to go and look up some secondary table to
get sufficient information to construct the real query. That is just
plain ugly in my view.


You will have the same problem if you want to send a message about a
record change in combination with transactions. You can either send
a message about an /uncommitted/ transaction and include what record
changed, /or/ you send a message about a /committed/ transaction  
which



possibly changed multiple of those records - in which case there's no
possibility to send a single id along with your message.


I think you are suggesting that instead of having the trigger function
FOR EACH ROW it should be FOR EACH STATEMENT. In the case of an AFTER
trigger using a FOR EACH STATEMENT clause; does it get fired / 
after/ the

commit? (the documentation for CREATE TRIGGER does not really specify)


I think FOR EACH ROW or FOR EACH STATEMENT will make little or no  
difference here, unless you can guarantee that you'll never change  
more than one row in a transaction. Rows changed in a transaction are  
usually in different tables, that's what transactions are for after  
all - Transaction is a good name for what it does.


As an example in an employee database, if you want to send a NOTIFY  
when an employee's salary gets INSERTed, the transaction will usually  
also contain the INSERT of the employee record. You want those two  
statements together in a transaction, or you risk creating an  
employee without a salary (because the application crashed, the  
connection was lost, the office got flooded, etc) - and be sure that  
employee will be grumpy about that!



For the difference between ON EACH ROW and ON EACH STATEMENT in  
triggers, that's best illustrated by queries such as INSERT INTO  
table1 (column1, column2, fkey1) SELECT 'value1', 'value2', id FROM  
table2 WHERE column1 = 'value3';
This single statement could result in multiple rows being inserted,  
and so you can either choose to fire a trigger for each inserted row  
or for the entire statement at once.


That said, if you'll need to batch your statements for a client to be  
notified at the end of the transaction, sending a notify with an ON  
EACH STATEMENT trigger will yield NOTIFY's at the same moment in time  
(at COMMIT).



That may work for me if I can guarantee that the publisher only ever
changes single rows for notifiable tables in a transaction.


That depends on the importance of your data's integrity in relation  
to transactions, as per the employee example above. I can't say  
anything about that.



You could try sending a set after commit, equivalent to how INSERT

RETURNING

works, but you'll have to marshall those id's into your message

yourself. And

that's pretty similar to putting those id's in a table and fetch them

from your

application - it's just moving the work around.


I prefer to put as much of the knowledge about the technicalities  
of the
process into the publishing side of the system keeping the  
interface the
subscriber apps have to deal with relatively simple. Anything I can  
do,

within reason, to help this is good.




Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47e0a7459781583513226!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Permission template

2008-03-15 Thread Alban Hertroys


Hi all,

Reading up on the recent (and recurring) discussion about granting  
privileges on multiple tables I suddenly wondered whether the CREATE  
TABLE LIKE approach could be used to copy privileges to a to-be- 
created table. From the documentation it doesn't look like it, but  
wouldn't that be neat? It's pretty much like using a database as a  
template, but for tables instead of databases.


Here's an example of what I mean:
CREATE ROLE normal_user;

-- Create our template table
CREATE TABLE user_template ();
REVOKE ALL ON user_template TO normal_user;
GRANT SELECT ON user_template TO normal_user;

-- Use our template table to create other tables with the same  
privileges

CREATE TABLE accounts (
account_id  serial  PRIMARY KEY,
nametextNOT NULL,
LIKE user_template INCLUDING PRIVILEGES
);

etc.

After which accounts would have only SELECT privileges for normal_users.

Of course with this approach you'd still have to alter privileges for  
each table if you change your mind on who can access what, but it  
should make setting up privileges in a consistent way easier. Some  
way to copy privileges from another object might be a nice addition  
to this... Comments?


Regards,

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47dba99b233093511810745!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Trigger to run @ connection time?

2008-03-13 Thread Alban Hertroys


On Mar 13, 2008, at 2:00 AM, Berend Tober wrote:


Alban Hertroys wrote:

On Mar 11, 2008, at 10:28 PM, Tom Lane wrote:
An ON CONNECT trigger enforced by the database seems a bit scary  
to me.

If it's broken, how you gonna get into the DB to fix it?

regards, tom lane
If creating the trigger wouldn't be possible from within the  
database that it's defined for (which would be strange anyway as  
far as I'm concerned, since you are already connected at that  
point and thus missed an opportunity to fire that trigger) this  
shouldn't be a problem.

To put that into an SQL statement, something like:
#template1 $ CREATE TRIGGER my_database_connect_trg BEFORE CONNECT  
ON my_database EXECUTE PROCEDURE my_database_setup()
Although of course that begs the question where that procedure  
would be stored; Rather not in template1, I suppose! This points  
to another problem with ON CONNECT triggers, you'll likely need to  
be connected to reach the stored procedure that the trigger calls!  
A nice chicken and egg problem, with some scope issues on the  
horizon...


I envision this not so much as a BEFORE connect trigger, but rather  
as an event that happens after the point of the user being  
successfully authenticated, but before executing any user  
application commands -- in fact before even starting to listen for  
any incoming application commands.


It doesn't matter whether you're talking about a BEFORE or an AFTER  
trigger, that was just an example to illustrate the problem - which  
is the same for both cases.


What I was saying is that if the ON CONNECT trigger is defined in the  
database you're connecting to and it contains an error, there's  
nothing you can do to fix the error (starting with logging into that  
database). Which was what Tom pointed out already.


My idea around this was to define the ON CONNECT trigger outside the  
database you're connecting to, so that you'll at least be able to fix  
or disable the problematic code. It's a bit like the situation with  
PAM authentication to your (UNIX-based) OS - you can define your own  
authorisation methods, but if they are broken your only option is to  
boot into single-user mode and disable that part of authorisation. In  
Postgres template1 is a bit like single-user mode in UNIX.


Another option would be to not fire the trigger if a super user  
connects to the database, but that pretty much depends on what the  
trigger does, which is unknown by definition.


Thinking this over a bit more, it seems you're not so much looking  
for an ON CONNECT trigger, but for an ON AUTHORISE trigger. The  
problems remain pretty much the same though.


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47d8d2ed233091559156500!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Trigger to run @ connection time?

2008-03-12 Thread Alban Hertroys


On Mar 11, 2008, at 10:28 PM, Tom Lane wrote:

An ON CONNECT trigger enforced by the database seems a bit scary to  
me.

If it's broken, how you gonna get into the DB to fix it?

regards, tom lane


If creating the trigger wouldn't be possible from within the database  
that it's defined for (which would be strange anyway as far as I'm  
concerned, since you are already connected at that point and thus  
missed an opportunity to fire that trigger) this shouldn't be a problem.


To put that into an SQL statement, something like:
#template1 $ CREATE TRIGGER my_database_connect_trg BEFORE CONNECT ON  
my_database EXECUTE PROCEDURE my_database_setup()


Although of course that begs the question where that procedure would  
be stored; Rather not in template1, I suppose! This points to another  
problem with ON CONNECT triggers, you'll likely need to be connected  
to reach the stored procedure that the trigger calls! A nice chicken  
and egg problem, with some scope issues on the horizon...


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47d85f64233091819183316!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ISO something like #if 0 ... #endif for SQL code

2008-03-10 Thread Alban Hertroys


On Mar 10, 2008, at 4:50 PM, Kynn Jones wrote:

Hi!  When it comes to programming SQL, my newbie approach is to  
write my code in a file test.sql, which I test from within psql by  
using


  my_db= \i /some/path/test.sql

...and (once I'm satisfied with the code) copy and paste it to a  
different file that has the SQL I've written so far for the project.


It's a rather cumbersome system, all the more so because I don't  
have a convenient way to comment out large chunks of code (which  
may include C-style comments).


I'm thinking of something like the trick of surrounding C code with  
pairs of #if 0 and #endif, which effectively comments out code,  
even when it contains /* C-style comments */.


Is there some similar trick for SQL code?


I have been playing with the idea of using cpp to pre-process such  
files, and maybe even put them in a Makefile. I don't think there's  
any reason that wouldn't be possible with SQL files. Added bonus, you  
can use macros in your SQL, for things like environment paths (for  
including other SQL files for example) etc.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47d573f5233091666920879!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] searching using indexes 8.3

2008-03-10 Thread Alban Hertroys


On Mar 10, 2008, at 6:33 PM, LARC/J.L.Shipman/jshipman wrote:


Hi,
When I do a search such as:

 EXPLAIN ANALYZE SELECT * FROM itemsbyauthor;
QUERY PLAN
-- 
-
 Seq Scan on itemsbyauthor  (cost=0.00..971.34 rows=53634 width=34)  
(actual time=0.029..25.831 rows=53634 loops=1)

 Total runtime: 34.851 ms
(2 rows)


I have an index for the table, but it is not being utilized.  Why  
is this?  How do I get the index to be used?



To fetch all rows unordered from a table the fastest method is a  
sequential scan. An index scan would be slower, probably significantly.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47d5748b233097069826464!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Trigger to run @ connection time?

2008-03-10 Thread Alban Hertroys


On Mar 10, 2008, at 9:07 PM, Kynn Jones wrote:

So I tried to find some system table that would get modified  
whenever a new connection was made, but I was unsuccessful.  The  
closest I found was the VIEW pg_activity, and the crucial  
information I need from this view comes from procedures like  
pg_stat_get_backend_pid().


You can't define triggers on system tables.

Is there a bona fide table (not a view!) that I could use to define  
an on connect trigger?  (This table would not only have to  
undergo some INSERT or UPDATE event at the time of the new  
connection, but it should also provide enough information to allow  
my code to determine which database is being connected to.)


I'm afraid not, although there's been mention on this ML of on  
connect triggers before. Not sure whether that got solved though...



If not, is there some other way to set up a trigger that


Closest thing I can think of is defining a table that you insert a  
record in as soon as you connect and put a trigger on that.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47d5c63d233091216612506!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Ask ctid

2008-03-07 Thread Alban Hertroys


On Mar 7, 2008, at 9:28 AM, Achmad Nizar Hidayanto wrote:


Thank you for the comment,

I just wonder, how come i have two identic rows. I have set the  
primary key and set it as a unique. That's why i take
a look at ctid (in real, i don't use this id. I just tried to trace  
why i have two identic rows. After examining the physical
id using ctid, i found that the two identic rows differ in their  
ctid).


Having this case, can i conclude that postgre cannot guarantee the  
uniqueness of primary key? or is it just a bug of old

version of postgre?


Well, it's certainly surprising your Postgres intallation is doing  
that - it's one of the things Postgres is good at, but we lack  
information to see what's really going on here.


You're mentioning you're using an old version, which one? The output  
of select version(); should do.


Can you show us the table definition with the primary key and the two  
identical records you mentioned?



Many thanks for your help.


Nizar
=

Tom Lane wrote:

Achmad Nizar Hidayanto [EMAIL PROTECTED] writes:


I implement database in my faculty using Postgre. I have a  
problem with ctid in my tables. Let say, i have table STUDENT  
with #STU as the primary key. I don't know what happend in this  
table, some rows have exactly the same value ( i also have set  
the #STU as unique). After tracing the table, i found that the  
two rows differ in ctid value. As the impact, my application  
cannot operate well.
There are some known bugs in older PG releases that could lead to  
duplicate rows (actually, to multiple versions of a row all being  
seen as live). If you're not on the latest minor version of your  
release series, update. regards, tom lane





Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47d19038233091286811304!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Regarding interval conversion functions and a seeming lack of usefulness

2008-02-28 Thread Alban Hertroys


On Feb 27, 2008, at 3:47 PM, Bill Moran wrote:

Something like:
$ SELECT CONVERT('12 days 13 hours'::INTERVAL AS hour);
 hour
--
   301
$ SELECT CONVERT('6 hours 17 minutes'::INTERVAL AS hour);
 hour
--
6.2833

Am I approaching this problem wrong? or is there something out there
and my Google skills are lacking?


One of the obvious problems with this is that you cannot convert  
months to something more fine-grained without knowing the date the  
interval is relative to. I mean, what would be the answer of:


$ SELECT CONVERT('2 months'::INTERVAL AS days);

This month that would be 60 days, next month 61, this month next year  
59, etc.
And I haven't even started on leap seconds and daylight saving time  
yet...


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c69dd2233091191611641!



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] How can I get the first and last date of a week, based on the week number and the year

2008-02-27 Thread Alban Hertroys


On Feb 27, 2008, at 2:32 AM, Bruno Baguette wrote:


Hello !

I have a week number (ISO 8601) and a year, based on theses values,  
I would like to get the first and the last dates of that week.


How I can do that ?

The only solution is doing a big generate_series to build a subset  
that contains the week of all the dates between the 01/01 || year  
and the 31/12 || year. But I find that solution quite dirty and  
ressources consumming.


Is there a cleanest way to do that ?


You can use to_date for most of that, like:
development= select to_date('01 02 2008', 'ID IW ') AS start,
to_date('07 02 2008', 'ID IW ') AS end;
   start|end
+
 2008-01-07 | 2008-01-07
(1 row)

I'm a bit surprised that specifying the weekdays doesn't make any  
difference here, maybe it's my version?:


development= select version();
version
 

 PostgreSQL 8.2.4 on i386-portbld-freebsd6.2, compiled by GCC cc  
(GCC) 3.4.6 [FreeBSD] 20060305

(1 row)


Anyway, you can solve that by adding an interval '6 days' to the end  
result.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c525af233091991417831!



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Regex query not using index

2008-02-24 Thread Alban Hertroys


On Feb 20, 2008, at 5:51 PM, Postgres User wrote:


Now to end my fixation, one last item.  What about the case of a null
or empty param value- is there a way to assign a condition value that
Postgres will ignore when processing the query?

This syntax results in a seq scan:   WHERE fielda = Coalesce(param,  
fielda)

because it applies only to non-nulls

Is there another way to write this- perhaps using your array syntax on
an empty array?  Basically I'd PG to ignore the condition just as it
ignores   WHERE 1 = 1


Not sure whether you want no rows returned if param is NULL or all  
rows (all rows looking at your example), but you could simply check  
param for NULL before comparing it, so either:


WHERE param IS NOT NULL AND fielda = param

or

WHERE param IS NULL OR fielda = param

In the second case, if param IS NULL you will get a sequential scan  
of course, as that's the most efficient way to return all rows.




On Wed, Feb 20, 2008 at 8:31 AM, Tom Lane [EMAIL PROTECTED] wrote:

Postgres User [EMAIL PROTECTED] writes:


My users are developers and the goal was to accept a simple
comma-delimited list of string values as a function's input  
parameter.

 The function would then parse this input param into a valid regex
expression.


 Why are you fixated on this being a regex?  If you aren't actually
 trying to expose regex capabilities to the users, you'll just be  
having

 to suppress a bunch of strange behaviors for special characters.

 ISTM that the best solution is to use an array-of-text parameter,
 along the lines of

where name = any (array['Smith', 'Jones', ...])

 For what you're doing, you'd not actually want the array[] syntax,
 it would look more like

where name = any ('{Smith,Jones}'::text[])

 This should optimize into an indexscan in 8.2 or later.

regards, tom lane



---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match






Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c1522f233091890169212!



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Trigram performance penalty on varchar?

2008-02-24 Thread Alban Hertroys


On Feb 21, 2008, at 10:56 AM, Mario Lopez wrote:


Hi,

I am indexing a 100 million record table composed of varchar(255)  
as the field to be indexed. I have always seen that examples of  
pg_trgm are based on text type fields. Is this by any special reason?.


A varchar is internally represented as text, with a size constraint  
of 255 characters in your case (I'm assuming your data requires that  
constraint?).


My computer is creating the index since 5 hours ago so I guess  
there must be something wrong...


I guess your server is running low on memory and the index being  
created doesn't fit in memory. You may want to have a look at http:// 
www.postgresql.org/docs/8.3/interactive/populate.html#POPULATE-WORK-MEM


Another reason may be an exclusive lock on a row that you're trying  
to index, but that would mean that some transaction on some client  
somewhere is keeping that lock for a very long time (should not  
happen). You can check the pg_locks and pg_stat_activity tables for  
that.


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c159af233092392031086!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] How to make update rapidly?

2008-02-24 Thread Alban Hertroys


On Feb 20, 2008, at 5:03 AM, hewei wrote:


table:
CREATE TABLE price (
  TIMESTAMP Timestamp NULL,
  idnumeric(5,0)  NOT NULL,
  price numeric(10,3) NULL,
  primary key (id)
);
sql:
update price set price=* where id=*;


So you have about 714us on average per query. That's not impossible,  
but your hardware and database configuration need to be up to the  
task. Updates are generally slower than selects, as they have to find  
a spot for the new record, check constraints, write it, etc.


Your problem could be that you're using a prepared statement. For  
prepared statements the query plan gets calculated when the prepared  
statement is created, without any knowledge of the actual values to  
look up. That can result in a non-optimal plan. EXPLAIN ANALYZE of  
that query should show more. Re-preparing it after analysing the  
table may improve the performance, not sure about that.


Another possible problem, as you're doing updates, is that your data  
files get bloated with old rows that don't exist anymore (in your  
current transaction). An update is effectively an insert and a delete  
(has to be, due to visibility to other transactions - MVCC), so every  
update changes one row into two. If you don't vacuum often enough  
there will be many more than 100,000 rows to search through.
Added to that; if you don't analyze, the query planner is working  
with outdated information and may decide on a bad plan (not a  
sequential scan probably, but non-optimal still).


Additionally, if you're trying to update the same row concurrently  
from multiple sessions, you're waiting on locks. Not much you can do  
about that, not something you're likely to encounter in a real  
situation though.



On Feb 20, 2008 11:56 AM, Webb Sprague [EMAIL PROTECTED] wrote:
Post the table, the query, and the explain output, and then we can  
help you.


On Feb 19, 2008 7:38 PM, hewei [EMAIL PROTECTED] wrote:
 Hi,Every body;
I have a table contains 100,000 rows, and has a primary key(int).
   Now ,I need to execute sql command like update ..  
where id=*(id

 is primary key).
   I expect execute 1200-1600 sqlcommands per second(1200-1600/s).
In test,when the id increase by degrees in sqlcommands, then I  
can reach

 the speed(1600/s);
   But in fact , the id  in sqlcommands  is out of rule, then the  
speed is

 very slow, just 100/s.
   what can i do? can you help me ?





Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c15fde233095552171742!



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] configure build flags

2008-02-24 Thread Alban Hertroys


On Feb 22, 2008, at 3:29 PM, [EMAIL PROTECTED] wrote:


Hi All.

Anyone knows if rebuilding the postgresql sources with the flag -- 
without-tcl --without-perl in ./configure file

can prevent the correct use of the triggers wrote in plpgsql language?

Wich behaviour I've to expect in postgres using these flags?

Thanks in advance.

Luca




That's like asking whether we'd still be able to speak English if you  
configure Europe without Latin and Hebrew... Of course you can! ;)


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47c166b7233091851639626!



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] type casting in 8.3

2008-02-12 Thread Alban Hertroys


On Feb 12, 2008, at 2:55 AM, [EMAIL PROTECTED] wrote:


On Feb 11, 3:15 am, ${spencer} [EMAIL PROTECTED] wrote:

I just literally ran my first search of the day and got the same
error.

i changed my query so that the integer was cast into text and then it
worked fine.

[EMAIL PROTECTED] wrote:




Ya know, I found and fixed my problem today at work. I just explicitly
cast everything in the query and as I worked my way through it I
watched how the error message changed. At the very end I came up to
something like ...and store != $store. I changed it to ...and
store != '$store' and, bingo the query worked again. Then I backed
out all the :: casting I had done and everything still worked. 'magine
that.


Maybe the two of you should be wondering why you're comparing text to  
integer values? Something is wrong there in your design, obviously.  
You could be in for some unpleasant surprises, for example wrt  
ordering (1, 2, 10 vs '1', '10', '2').


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47b18f65233091858395181!



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Mechanics of Select

2008-02-11 Thread Alban Hertroys


On Feb 11, 2008, at 2:23 AM, Willem Buitendyk wrote:

As others have suggested my big problem with the function I wrote  
was that I had made it Volatile instead of Immutable (it is no  
doubt suffering from code bloat as well).  That made all the  
difference. Curiously though - I tried it just with the date_trunc  
function and it was just as slow as my old Volatile function.


select * from track where datetime = '2007-04-01' and datetime   
date_trunc('month','2007-04-01'::timestamp)+interval '1 month'; was  
about 55s


That's probably because '2007-04-01'::timestamp can be at different  
time zones depending on client configuration and hence is volatile.


If you need a timestamp you probably want to use the servers TZ,  
which you can specify using: timestamp at your timezone


Are you always entering the first day of a month for start date? In  
that case you can leave out the entire date_trunc as the interval  
already calculates the correct length internally:


template1= select '2007-04-01'::date + interval '1 month';
  ?column?
-
 2007-05-01 00:00:00

select * from track where datetime = '2007-04-01' and datetime   
first_day_next_month('2007-04-01'); was about 36s


Also, specifying dates like this without specifying their format is a  
bad habit in my book. You're in trouble as soon as the date format  
for your database changes (different system, for example). I suggest  
you use to_date('2007-04-01', '-MM-DD') instead.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47b015f9167323996417255!



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Mechanics of Select

2008-02-10 Thread Alban Hertroys


On Feb 11, 2008, at 12:43 AM, brian wrote:

Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
 RETURNS date AS
$BODY$
DECLARE
resultdate date;
BEGIN
SELECT INTO resultdate to_date(to_char((inputdate + interval \
'1 month'), '-MM') || '-01', '-mm-dd');
RETURN resultdate;
END;
$BODY$
LANGUAGE 'plpgsql';



No need for the variable or the SELECT, and it's an immutable  
function, so better define that. Besides that it's probably better to  
use the date_trunc function here.


Try:

CREATE OR REPLACE FUNCTION first_day_next_month(inputdate date)
 RETURNS date AS
$BODY$
BEGIN
RETURN date_trunc('month', inputdate + interval '1 month');
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

And with that I wonder why you'd even need a function :)

Another thing I've taught myself is to prefix local variables and  
parameters in functions, so that they can NEVER accidentally match a  
column name that you use in a query (I usually use '_'). Otherwise  
you can get silly queries like SELECT * FROM table WHERE x = x that  
look perfectly fine while you're writing them down, being perfectly  
able to make the distinction between *variable x* and *column x* in  
your mind.


Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47af8f8e167321323610058!



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] first message: SELECT column FROM t

2008-02-02 Thread Alban Hertroys


On Feb 2, 2008, at 6:56 PM, Bill Moran wrote:

preserved that.  PostgreSQL is case-sensative, so try matching the  
column
name exactly and putting  around it.  If that doesn't work,  
provide some


That is just plain incorrect, PostgreSQL is *not* case sensitive. The  
real problem here (as Scott pointed out) is that the column in the  
table is defined so that the case is preserved (by quoting the column  
name), but the column in the query isn't and thus gets folded (to  
lowercase) and can't be found.


I'm inclined to call this a bug in PgAdmin III (except that I hardly  
know it), it shouldn't quote identifiers to keep case if the user  
doesn't explicitly ask it to. This question pops up frequently, so it  
would be nice if something was done about it (my preference would be  
by not /implicitly/ quote identifiers). It certainly seems to confuse  
novice users.


My €0.02.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47a4c3e7817485094119420!



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] case dumbiness in return from functions

2008-01-18 Thread Alban Hertroys


On Jan 18, 2008, at 1:14 PM, Nico Sabbi wrote:

Tom Lane ha scritto:
The SQL standard specifies that unquoted identifiers are case- 
insensitive.

   


You're welcome to spell them as camelCase in your source code if you
feel like it, but don't expect that PG, or any other SQL-compliant
database, will pay attention.

regards, tom lane



yet I find disturbing that Postgres doesn't make the effort
to respect the case specified by the user. If I created a field
called REF why should Postgres call it ref in the output of  
queries

if the standard doesn't specify any obligation to convert the name ?



If you want to use case sensitive identifiers, then quote them! It's  
not that hard. In your example above you're doing just that, so your  
statement does not even apply ;)


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,47909a669491882451502!



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] query performance

2008-01-17 Thread Alban Hertroys


On Jan 14, 2008, at 3:58 AM, pepone.onrez wrote:

I have this query in a table with 150 thowsand tuples and it takes  
to long


t_documentcontent._id AS _id
FROM t_documentcontent LIMIT 50 OFFSET 8



You want an ORDER BY there. Not only will it probably speed things  
up, without it there's no guaranteed order in the results returned.


As table records have no specific order and updates and inserts on  
that table take the first free position, you risk ending up showing  
some records twice in your set (same id on different 'pages') and  
missing others because you've already skipped past them when there's  
concurrent access.


There's still some risk for that if you order, but as you seem to  
order on a sequence-generated column, updates aren't a problem for  
you and inserts end up at the end anyway. But only if you order them.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,478f1e139491365710960!



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-17 Thread Alban Hertroys


On Jan 15, 2008, at 3:03 PM, Ivan Sergio Borgonovo wrote:


On Tue, 15 Jan 2008 14:43:35 +0100
Alban Hertroys [EMAIL PROTECTED] wrote:


You need to scroll to the last row to find the size of the result
set, but after that it's pretty easy to return random rows by
scrolling to them (and marking them 'read' in some way to prevent
accidentally returning the same row again).


Could you post a snippet of code or something giving a more detailed
idea of it?

BTW since cursors support offset if you're not interested if the
order of the retrieved rows is random too you don't even have to
remember which one you read I think.


I posted it on this list a while ago when I came up with this  
solution. I had some trouble finding my old post in the pgsql-general  
archives though - I could find the thread, just not my final posting,  
and searching didn't even turn up the thread.


I did find it here: http://www.mail-archive.com/pgsql- 
[EMAIL PROTECTED]/msg103670.html
The thread contains several other approaches to the problem, it  
really depends on your problem domain which one fits your bill.


I think the function in my original posting could do with clearer  
comments though, so here's the function again:


/*
 * Return $limit random rows from the result set of SQL query $query
 */
function randomSet(
$query, // The query to execute
$limit  // The (max) number of random rows required
) {
// SQL to declare the cursor
query(DECLARE _cur SCROLL CURSOR WITHOUT HOLD FOR $query);

/* Get the range for random(1, n)
 *
 * Determined by scrolling the cursor to the last row.
 * Equivalent to select count(*), but without a separate query.
 */
query(MOVE FORWARD ALL IN _cur);
$count = pg_affected_rows();

$uniques = array(); // A list of used cursor offsets
$resultSet = array();

// Fetch random rows until we have enough or there are no more
while ($limit  0  count($uniques)  $count) {
// Determine random scroll offset
$idx = random(1, $count);

// Skip records with an index we already used
if (in_array($idx, $uniques))
continue;

//Fetch the random row
$record = query(FETCH ABSOLUTE $idx FROM _cur);

// Add the row offset to the list of used offsets
$uniques[] = $idx;

$resultSet[] = $record;
$limit--;
}

// query
query(CLOSE _cur);

return $resultSet;
}



Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,478f32e59497683469944!



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-15 Thread Alban Hertroys


On Jan 9, 2008, at 8:07 PM, Scott Marlowe wrote:

I could see a use for an approximate count(*) with where clause, just
like I could see a use for the ability to retrieve random rows from a
table without using order by random() on it.  And those are both
things that would require some form of hacking in the db that I'm
certainly not capable of pulling off...


About returning random rows... I've successfully applied a scrolling  
cursor for that.


You need to scroll to the last row to find the size of the result  
set, but after that it's pretty easy to return random rows by  
scrolling to them (and marking them 'read' in some way to prevent  
accidentally returning the same row again).


It does require some specific application code though - doing it  
server side would mean to pass the query as a function argument  
(which still requires unnatural SQL statements in your application  
code) or write a function for each query (*cough*).


Performance was quite adequate (a few 100 ms) for a query returning  
random 5 rows from 3 joined tables or more, some of which had a few  
100k rows. Calculating random() for each record in the result set (to  
sort on) was taking much longer. That was on a dual 64-bit opteron  
with 4GB RAM, iirc.


Of course a built-in statement would be preferable, I just felt like  
pointing out that order by random() isn't necessarily the best  
alternative ;)


Regards,
Alban Hertroys.

!DSPAM:737,478cb43e9496078213597!



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Better alternative for Primary Key then serial??

2007-12-15 Thread Alban Hertroys


On Dec 13, 2007, at 14:12, John D. Burger wrote:


Alban Hertroys wrote:

The problem the OP is pointing out seems difficult to solve. A  
sequence doesn't know about existing records with a possibly  
higher number than the sequence is at.


This may be worked around by keeping a list of numbers used up  
beyond the current sequence value so the sequence knows what  
numbers to skip, but that has problems of its own (if there are  
many such numbers, or if the sequence gets created after data has  
been added to the list). It gets ugly.


I would just have a trigger that sets the serial to NEW.id + 1.   
Dunno if this has concurrency issues, though, and it may leave huge  
gaps in the key space, and (more importantly) use up your sequence  
too quickly.


That thought had crossed my mind, but I rejected it.

Besides those issues, you'd need to add that trigger to every table  
that potentially has this issue, it does indeed have concurrency  
issues, and you're in for some fun if someone decides to use the last  
possible value of a serial field to define a special case (although  
the sequence could be defined to end before that value of course) and  
causes immediate wraparound of the sequence (provided it's allowed to  
rotate) and thus causing duplicate key violations as soon as the  
sequence matches the first record in the table.


I was looking at a general solution that would work as automatic as  
sequences already do. Not that it's a big issue anyway, you just have  
to keep in mind that it works like it does.


Just to prevent any mistakes, I am not requesting a change in  
behaviour, I'm fine with how it works (and has worked for ages in  
many database servers). It is an interesting puzzle though ;)


I have, in fact, had situations where I wanted a serial PK, =and= I  
needed to insert with external IDs sometimes - essentially a mix of  
natural and surrogate keys (shudder).  It turned out that the  
natural keys were always positive, so I set up the sequence to  
range =downward= from 0.


That's a common solution to the problem, although it leaves the  
possibility that people are being smart and enter negative integers  
exactly to prevent this problem. And of course you cannot rely on  
sorting it by index to get your data more or less in the order inserted.


--
Alban Hertroys

If you lose your memory,
 you can't remember where you left it.



!DSPAM:737,476419a19654199211162!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] For the SQL gurus out there

2007-12-13 Thread Alban Hertroys


On Dec 13, 2007, at 6:55, Uwe C. Schroeder wrote:
What I would like to do is to create a view that sucks the comments  
for a
given blog_id in the order they should be displayed (very commonly  
seen in

pretty much all blogging apps), i.e.

Blog
comment 1
  comment on comment 1
comment on comment on comment 1
comment 2
etc.


Question is, is there a smart way I'm not able to figure out to  
create a
single query on the blog comment table that will return the  
comments in the
right order? Sure I could write a recursive method that assembles  
the data in
correct order, but I'd prefer to leave that to the database to  
handle in a

view.

The solution can be very postgresql specific, because I don't  
intend to run it

on any other db server.


Have a look at the ltree contrib package. It allows you to specify  
the comment hierarchy as a path and order by that, while look-ups on  
it are still quite fast (no need to look up child records separately,  
for example).


--
Alban Hertroys

It's not a bug!
 It's a six-legged feature!



!DSPAM:737,47610cc19659110557619!



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Better alternative for Primary Key then serial??

2007-12-13 Thread Alban Hertroys


On Dec 13, 2007, at 10:19, Jorge Godoy wrote:


Em Wednesday 12 December 2007 03:42:55 pilzner escreveu:


Does stuff like this cause any aches and pains to developers out  
there, or
do I just need to get in a new mindset??? Also, is there a way to  
be sure
the primary key is *ONLY* ever given a value by serial, and not  
subject to

updates???


Shouldn't the definition of a primary key be an immutable thing  
that is unique

to the row?  If you change it, then it is not immutable anymore...


Why? If you're worried about foreign keys pointing to them, you can  
either define them to follow the change (on update cascade) or to  
throw an integrity violation error (default behaviour).


Now I realise this isn't particularly useful for surrogate (primary)  
keys, but it sure is for natural keys. They're both keys, is there a  
reason to handle surrogate keys differently from natural keys?


The problem the OP is pointing out seems difficult to solve. A  
sequence doesn't know about existing records with a possibly higher  
number than the sequence is at.


This may be worked around by keeping a list of numbers used up beyond  
the current sequence value so the sequence knows what numbers to  
skip, but that has problems of its own (if there are many such  
numbers, or if the sequence gets created after data has been added to  
the list). It gets ugly.


The convention with sequences is that if you use a sequence on a  
column (beyond defining one) that you don't insert records with hand- 
coded values for that column (unless you're sure you're using an  
existing gap before the sequences current value).


Regards,
--
Alban Hertroys

Sometimes you wake up thinking:
Galileo was right, the world does turn






!DSPAM:737,476112479655680816383!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Trigger - will not perform INSERT

2007-12-12 Thread Alban Hertroys


On Dec 11, 2007, at 21:35, smiley2211 wrote:



ex:

CREATE TRIGGER mytrig AFTER INSERT OR UPDATE ON foo1 FOR EACH ROW  
EXECUTE

PROCEDURE updatefoo1('datarow');

The command itself is able to be executed without error. However,  
what then
happens, is that whenever the application attempts to insert a  
record into

foo1, it simply doesn't insert. Once I take the trigger off, it beings
inserting again.

I have checked permissions but INSERT only FAILS while trigger is  
enabled..


Thanks...Michelle


This would be expected behaviour if the trigger were a BEFORE INSERT  
one (instead of AFTER) and the procedure returned NULL. You might  
want to check that is really not the case.


AFTER INSERT triggers don't fire until the row is actually inserted  
into the table, so the only possibility I can see for the behaviour  
you describe is that the stored procedure removes the record that was  
just inserted.
Maybe there are statements in that procedure that attempt to remove  
possible duplicates that also happen to match on the new record?


Regards,
--
Alban Hertroys

If you throw your hands up in the air,
how're you gonna catch them?




!DSPAM:737,475fc5969651302216542!



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Abbreviation list

2007-10-19 Thread Alban Hertroys

Steve Crawford wrote:
 My vote is to add Appendix I. Abbreviations. Don't know if it's
 practical for 8.3 documentation but it would be nice to add even if it
 only has a few entries as additional ones could be collected via the
 user notes.
 
 I suggest as a discussion starting-point the following inclusion criteria:
 
 1. Any abbreviation/acronym that appears in the PostgreSQL documentation
 (even if those terms may not be PG specific - we shouldn't assume that
 everyone knows them). Good documentation practice recommends defining
 abbreviations the first time they are used. Better still, ensure that
 they are in the abbreviation list.

Isn't this just what the ABBR tag in html is for?

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Slow TSearch2 performance for table with 1 million documents.

2007-10-05 Thread Alban Hertroys

Benjamin Arai wrote:
 Hi,
 
 I have very slow performance for a TSearch2 table.  I have pasted the
 EXPLAIN ANALYZE queries below.  12 seconds is slow for almost any
 purpose.  Is there any way to speed this up?
 
 # explain analyze select * FROM fulltext_article,
 to_tsquery('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti,
 q) DESC;

Admittedly I'm kind of new to tsearch, but wouldn't

SELECT *
  FROM fulltext_article
 WHERE idxfti @@ to_tsquery('simple','dog')
 ORDER BY rank(idxfti, to_tsquery('simple', 'dog')) DESC;

be faster?

Quick testing shows a similar query in our database to not use a nested
loop and a function scan. For comparison, here are our plans:

Your approach:

QUERY PLAN

---
 Sort  (cost=4.86..4.87 rows=1 width=164) (actual time=0.151..0.161
rows=5 loops=1)
   Sort Key: rank(fulltext_article.idxfti, q.q)
   -  Nested Loop  (cost=0.00..4.85 rows=1 width=164) (actual
time=0.067..0.119 rows=5 loops=1)
 -  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.010..0.012 rows=1 loops=1)
 -  Index Scan using fulltext_article_idxfti_idx on
fulltext_article  (cost=0.00..4.82 rows=1 width=132) (actual
time=0.033..0.056 rows=5 loops=1)
   Index Cond: (fulltext_article.idxfti @@ outer.q)
   Filter: (fulltext_article.idxfti @@ outer.q)
 Total runtime: 0.242 ms
(8 rows)


My suggested approach:

   QUERY
PLAN
-
 Sort  (cost=4.84..4.84 rows=1 width=132) (actual time=0.085..0.095
rows=5 loops=1)
   Sort Key: rank(idxfti, '''dog'''::tsquery)
   -  Index Scan using fulltext_article_idxfti_idx on fulltext_article
 (cost=0.00..4.83 rows=1 width=132) (actual time=0.025..0.052 rows=5
loops=1)
 Index Cond: (idxfti @@ '''dog'''::tsquery)
 Filter: (idxfti @@ '''dog'''::tsquery)
 Total runtime: 0.163 ms
(6 rows)

I hope this helps.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Select too many ids..

2007-10-03 Thread Alban Hertroys

Rodrigo De León wrote:
 On 10/1/07, Abandoned [EMAIL PROTECTED] wrote:
 Hi..
 I have a id list and id list have 2 million dinamic elements..
 I want to select what id have point..
 I try:

 SELECT id, point FROM table WHERE id in (IDLIST)

 This is working but too slowly and i need to performance..

 I'm sorry my bad english.
 King regards..
 
 DDL please...

I guess the OP's problem is with the time it takes to push a query
containing 2 million ID's through the SQL parser.

A few things that may help:

- See if you can find any uninterrupted ranges of values in your ID's
and combine them into separate where clauses (ie. where id between x and
y). This won't help you much for ranges where y = x+1, but it does help
if (for example) y = x+10...

- Try if it helps COPY-ing your ID's into a temp table and join your
table with that instead of using a huge IN list.

- If you're still using PG 7.4, try a PG 8. There have been significant
performance improvements with IN lists in queries in the 8-series.

Regards,

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Tsearch2 Dutch snowball stemmer in PG8.1

2007-10-03 Thread Alban Hertroys

Hello,

I'm trying to get a Dutch snowball stemmer in Postgres 8.1, but I can't
find how to do that.

I found CREATE FULLTEXT DICTIONARY commands in the tsearch2 docs on
http://www.sai.msu.su/~megera/postgres/fts/doc/index.html, but these
commands are apparently not available on PG8.1.

I also found the tables pg_ts_(cfg|cfgmap|dict|parser), but I have no
idea how to add a Dutch stemmer to those.

I did find some references to stem.[ch] files that were suggested to
compile into the postgres sources, but I cannot believe that's the right
way to do this (besides that I don't have sufficient privileges to
install such a version).

So... How do I do this?

The system involved is some version of Debian Linux (2.6 kernel); are
there any packages for a Dutch stemmer maybe?

I'm in a bit of a hurry too, as we're on a tight deadline :(

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Tsearch2 Dutch snowball stemmer in PG8.1

2007-10-03 Thread Alban Hertroys

Alban Hertroys wrote:
 The only odd thing is that to_tsvector('dutch', 'some dutch text') now
 returns '|' for stop words...
 
 For example:
  select to_tsvector('nederlands', 'De beste stuurlui staan aan wal');
   to_tsvector
 
  '|':1,5 'bes':2 'wal':6 'staan':4 'stuurlui':3

I found the cause. The stop words list I found contained comments
prefixed by '|' signs. Removing the contents and recreating the database
solved the problem. Just updating the reference didn't seem to help...

There's undoubtedly some cleaner way to replace the stop words list, but
at the current stage of our project this was the simplest to achieve.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Tsearch2 Dutch snowball stemmer in PG8.1

2007-10-03 Thread Alban Hertroys

Oleg Bartunov wrote:
 Alban,
 
 the documentation you're refereed on is for upcoming 8.3 release.
 For 8.1 and 8.2 you need to do all machinery by hand. It's not
 difficult, for example:

Thanks Oleg.
I think I managed to do this right, although I had to google for some of
the files (we don't have ispell installed).

You also seem to have mixed russion and english dictionaries in your
example, I'm not sure that was on purpose?

Anyway, I changed your example to use dutch dictionaries and locale
where I thought it applicable, and I got something working apparently.
Quite some guess work was involved, so I have a few questions left.

The only odd thing is that to_tsvector('dutch', 'some dutch text') now
returns '|' for stop words...

For example:
 select to_tsvector('nederlands', 'De beste stuurlui staan aan wal');
  to_tsvector

 '|':1,5 'bes':2 'wal':6 'staan':4 'stuurlui':3


A minor nit... You ended the script with a hidden commit (END;). I would
have preferred to experiment with the results a bit before commiting...

I mixed in a few questions below, if you could answer them please?

 -- sample tsearch2 configuration for search.postgresql.org
 -- Creates configuration 'pg' - default, should match server's locale !!!
 -- Change 'ru_RU.UTF-8'
 
 begin;
 
 -- create special (default) configuration 'pg'
 update pg_ts_cfg set locale=NULL where locale = 'ru_RU.UTF-8';

I suppose this disables a possibly existing stemmer for that locale?

 insert into pg_ts_cfg values('pg','default','ru_RU.UTF8');
 
 -- register 'pg_dict' dictionary using synonym template
 -- postgrespg
 -- pgsql   pg
 -- postgresql  pg
 insert into pg_ts_dict
 (select 'pg_dict',dict_init,
 '/usr/local/pgsql-dev/share/contrib/pg_dict.txt',
 dict_lexize, 'pg-specific dictionary'
 from pg_ts_dict
 where dict_name='synonym'
 );
 
 -- register ispell dictionary, check paths and stop words
 -- I used iconv for english files, since there are some cyrillic stuff
 insert into pg_ts_dict
 (SELECT 'en_ispell', dict_init,
 'DictFile=/usr/local/share/dicts/ispell/utf8/english-utf8.dict,'
  'AffFile=/usr/local/share/dicts/ispell/utf8/english-utf8.aff,'
  'StopFile=/usr/local/share/dicts/ispell/utf8/english-utf8.stop',
  dict_lexize
  FROM pg_ts_dict
  WHERE dict_name = 'ispell_template'
  );

I actually use a .lat file here. I have no idea whether that's
compatible (but it appears to have worked).

I got my .lat and .aff files from:
http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html#Dutch-dicts

My stop words file is from:
http://snowball.tartarus.org/algorithms/dutch/stop.txt

  -- use the same stop-word list as 'en_ispell' dictionary
 UPDATE pg_ts_dict set dict_initoption='/usr/local/share/dicts/english.stop'
 where dict_name='en_stem';

Why change the stop words for the English dictionary? I skipped this
step. Is that right?

 -- default token-dicts mappings
 insert into pg_ts_cfgmap  select 'pg', tok_alias, dict_name from
 public.pg_ts_cfgmap where ts_name='default';
 
 -- modify mappings for latin words for configuration 'pg'
 update pg_ts_cfgmap set dict_name = '{pg_dict,en_ispell,en_stem}'
 where tok_alias in ( 'lword', 'lhword', 'lpart_hword' )
 and ts_name = 'pg';
 
 -- we won't index/search some tokens
 update pg_ts_cfgmap set dict_name = NULL
 --where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
 where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float')
 and ts_name = 'pg';
 
 end;
 
 -- testing
 
 select * from ts_debug('
 PostgreSQL, the highly scalable, SQL compliant, open source
 object-relational
 database management system, is now undergoing beta testing of the next
 version of our software: PostgreSQL 8.2.
 ');
 
 
 Oleg

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alban Hertroys

A. Kretschmer wrote:
 Again: an index can't help! Because of MVCC: 'select count(*)' without
 WHERE-condition forces an seq. table-scan. 

That has very little to do with MVCC.

If I understand correctly, MVCC is about the availability of records in
the current transaction. For that it makes no difference whether a seq
scan or an index scan is performed - both cases need to check at the
record level whether it's visible (where the seq scan is already looking
at the actual record, of course).

The only difference MVCC makes here is that the balance between a seq
scan or an index scan being more efficient is somewhat sooner in favour
of the seq scan than on some other databases, because the index scan
needs to look at the actual record for visibility.

I pleed not guilty ;)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alban Hertroys

Mike Charnoky wrote:
 With respect to the ALTER TABLE SET STATISTICS... how do I determine a
 good value to use?  This wasn't really clear in the pg docs.  Also, do I
 need to run ANALYZE on the table after I change the statistics?
 
 Here are the EXPLAINs from the queries:
 
 db=# explain select count(*) from prediction_accuracy where evtime
 between '2007-09-25' and '2007-09-26';
 
 QUERY PLAN
 
 -
  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
-  Index Scan using pred_acc_evtime_index on prediction_accuracy
 (cost=0.00..51.44 rows=12490383 width=0)
  Index Cond: ((evtime = '2007-09-25 00:00:00-07'::timestamp
 with time zone) AND (evtime = '2007-09-26 00:00:00-07'::timestamp with
 time zone))
 (3 rows)
 
 db=# explain select count(*) from prediction_accuracy where evtime
 between '2007-09-26' and '2007-09-27';
 
 QUERY PLAN
 
 -
  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
-  Index Scan using pred_acc_evtime_index on prediction_accuracy
 (cost=0.00..454671.07 rows=12777586 width=0)
  Index Cond: ((evtime = '2007-09-26 00:00:00-07'::timestamp
 with time zone) AND (evtime = '2007-09-27 00:00:00-07'::timestamp with
 time zone))
 (3 rows)

Interesting, same plans and no sequential scans... Yet totally different
run times. Almost as if something prevents you to read some records
between 26 and 27 september...

I'm no expert on locking in Postgres, but AFAIK locks that prevent you
from reading records are rather rare and probably only issued from
userland code.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] 3 tables join update

2007-10-01 Thread Alban Hertroys

rihad wrote:
 Richard Broersma Jr wrote:
 UPDATE Foo foo
 SET ...
 FROM LEFT JOIN Bar bar USING(common_field)
 WHERE blah='blah' AND bar.common_field IS NULL;
 
 ERROR:  syntax error at or near JOIN
 
 
 I know I'm misusing UPDATE ... FROM because I don't really want Bar's
 values to go into Foo, but only using them for a conditional update
 (atomically I hope).

Oh, you mean:

UPDATE foo
SET ...
WHERE blah = 'blah'
  AND NOT EXISTS (
SELECT 1 FROM baz WHERE foo.common_field = baz.common_field
  )

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alban Hertroys

Albe Laurenz wrote:
 Alban Hertroys wrote:
 A. Kretschmer wrote:
 Again: an index can't help! Because of MVCC: 'select count(*)'
 without
 WHERE-condition forces an seq. table-scan.
 That has very little to do with MVCC.

 [...] For that it makes no difference whether a seq
 scan or an index scan is performed - both cases need to check at the
 record level whether it's visible (where the seq scan is
 already looking at the actual record, of course).
 
 If you do not use MVCC (say, you use DB2), you need not check
 the record itself because if it is there (which it is if there
 is an index entry), it will be 'visible'.

Still, that's not because of MVCC, but because of the way it is
implemented in PostgreSQL.

There has been talk in the past (regularly) about why the MVCC
information is not in the index and whether it should be, see the ML
archives.

Besides, there are still many situations where a sequential scan
(whether for count(*) or not) is faster than an index scan, no matter
whether you have MVCC or not.

As I said, MVCC has little to do with it.
The real problem is that in postgres you cannot tell from an index
whether a record is visible or not, while you can in DB2 (because it has
an index entry or not).

 I pleed not guilty ;)
 
 Declined, sorry.

Overruled, sorry.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Normalized Tables SELECT [was: Find smallest common year]

2007-10-01 Thread Alban Hertroys

Stefan Schwarzer wrote:
 
 
 BTW, You didn't actually use type text for your year column, did you? No
 quotes needed then. Otherwise you'd have to make sure your year values
 are all the same length or sorting gets... interesting.
 
 Yep, my comment just before concerns especially this paragraph, I guess.
 With not only years but also time spans for my data - so, not only 1970,
 1971 but also 1970-75, 1975-80 etc. I would need indeed to use text
 for my year column. Why get sorting than interesting?
 
 Stef

Text sorts alphabetically, not numerically:

integer text
--- ---
1   '1'
2   '10'
3   '2'
4   '3'
5   '4'
6   '5'
7   '6'
8   '7'
9   '8'
10  '9'


I guess you won't have any problems with sorting, as you use full
century prefixes and the differences in syntax ('1970' vs '1970-75') are
at the rear of your values.


An entirely different question is whether it is a good idea to write a
range as a value that the database cannot interpret correctly (referring
to the '1970-75' notation). You cannot group records by value this way
if you need to (for example) combine data from '1970' with data from
'1970-75'.

But you seem to use these values just for labels, which I assume are
unique across years (eg. if you have a value '1970-75' you don't have
values '1970', 1971'..'1974'), in which case this is safe to use. As
pointed out by several people earlier, they make an excellent foreign
key too (provided they're unique).

Other options to handle these years involve having a start/end year or
date (to_date('01/01/1970', 'MM/DD/'), to_date('12/31/1975',
'MM/DD/')) or a start date and an interval (to_date('01/01/1970',
'MM/DD/'), INTERVAL '1 year').

That makes operators like OVERLAPS and BETWEEN available to your
queries, which may be convenient ;)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Normalized Tables SELECT [was: Find smallest common year]

2007-10-01 Thread Alban Hertroys

Stefan Schwarzer wrote:
 
 An entirely different question is whether it is a good idea to write a
 range as a value that the database cannot interpret correctly (referring
 to the '1970-75' notation). You cannot group records by value this way
 if you need to (for example) combine data from '1970' with data from
 '1970-75'.

 But you seem to use these values just for labels, which I assume are
 unique across years (eg. if you have a value '1970-75' you don't have
 values '1970', 1971'..'1974'), in which case this is safe to use. As
 pointed out by several people earlier, they make an excellent foreign
 key too (provided they're unique).
 
 Yep, this is question I posed myself too. In the moment, when doing for
 example per Capita calculations on the fly of a variable which has
 something like 1970-75, I would then sum up the Total Population over
 the given period, divide it through the number of years and then use it
 with the selected variable to get the per Capita data.
 
 But if I would instead insert yearly data, it would mean that it had
 five lines with the same values. No problem with that?

Not entirely what I suggested, but also a viable solution, sure.

I was suggesting to add a column to your yearly data marking the end of
the range. Given your above examples, you could then do queries like:

SELECT population / num_years FROM my_data;

(Assuming you add the length of the interval as a number of years, which
seems plausible because you don't seem to calculate with any intervals
not dividable by a year).

Adding this additional column may justify putting the years (and their
durations) into their own table.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Re: Why the ERROR: duplicate key violates unique constraint master_pkey is raised? - Is this a Bug?

2007-09-28 Thread Alban Hertroys

Nis Jørgensen wrote:
 Alban Hertroys skrev:
 Would something like

 UPDATE master set m2 = master2.m2
   FROM (
  SELECT m2 +1
FROM master m
   WHERE m.master_id = master.master_id
   ORDER BY m2 DESC
  ) master2

 work? I think it might be faster (and possibly cause less index bloat)
 than doing two consequent updates.
 
 
 I don't understand your query. I don't think you can use a correlated
 subquery in that way.

Hmm indeed, it complains something vague: ERROR:  subquery in FROM may
not refer to other relations of same query level. Not sure why?

Effectively it orders the updates descending, so that the new value of
m2 can never be updated to an already existing value, because that has
been updated previously.

The WHERE condition makes the query look a bit more complex than it
actually is, but is necessary of course.

 Anyway, tricks like these might work. They might stop working without
 warning, if the plan changes. Relying on unspecified behavior is a
 recipe for trouble.

If I specifically ask for an ordering, I don't think the planner should
change or ignore that ordering. So I'm not relying on unspecified behaviour.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Normalized Tables SELECT [was: Find smallest common year]

2007-09-28 Thread Alban Hertroys

Stefan Schwarzer wrote:
 Ok, I do understand that.
 
 So, instead of the earlier mentioned database design, I would have
 something like this:
 
- one table for the country names/ids/etc. (Afghanistan, 1; Albania,
 2)
- one table for the variable names/ids/etc. (GDP, 1; Population, 2;
 Fish Catch, 3;)


- one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3;
 )

If you _do_ need this table (because you want to constrain your
statistical data to only contain a specific set of years, or because you
need a quick list of available years to select from): Make the year
primary key and drop the artificial index.

Years are perfectly fine data to constrain on, and it saves you the
joins with that table (the foreign key constraint constrains your data
sufficiently).

You may want to constrain your years further by adding a check
constraint, fe. CHECK (year  1900).


The same argument goes for the country names, but I generally don't like
to have text data for primary key.

Contrary to years, their values might change (for which an ON UPDATE
CASCADE FK constraint trigger would have to go through your data to
update all the references). Plus they take up a bit more space than
integers, although with country names that won't matter that much.

OTOH... If your country names contain non-ascii characters and the
database is configured to collate on those properly, it will be slower.

That aside (they're kind of minor points), the country name is also a
good candidate to become key (primary and foreign respectively).

 and
- one table for all statistical data with four fields -
 id_variable, id_country, id_year, and the actual value

Some things may have changed here due to my above suggestions.

 It seems to me more difficult now to produce a non-normalized output
 based on the normalized table. How would look a query like, if I need
 now to SELECT, say 100 countries and 20 years? Something like this
 (simplified and without joins):
 
 SELECT
 value,
 id.year
 FROM
 main_table
 WHERE
 year = '1970' OR
 year = '1971' OR
 
 country_name = 'Afghanistan' OR
  country_name = 'Albania' OR
 ...

I don't really see what the problem is; with the years as columns in
your table they're just in a different places in this query.

Without knowing what problem you're trying to solve with this query it's
a bit of a guess. I assume those years and countries are user input?

I usually use BETWEEN or IN (...) for such cases, but that boils down to
the same query. It's just shorter to write.

BTW, You didn't actually use type text for your year column, did you? No
quotes needed then. Otherwise you'd have to make sure your year values
are all the same length or sorting gets... interesting.

SELECT
value,
year
FROM
main_table
WHERE
year BETWEEN 1970 AND 1975
AND country_name IN ('Afghanistan', 'Albania', ...)


But I think you'll usually be looking for aggregates, so GROUP BY and
HAVING are probably more appropriate for your queries.

SELECT country_name, year, SUM(value)
  FROM main_table
 WHERE country_name IN ('Afghanistan', 'Albania', ...)
   AND year BETWEEN 1970 AND 1975
 GROUP BY country_name, year
 HAVING SUM(value)  0;


-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Re: Why the ERROR: duplicate key violates unique constraint master_pkey is raised? - Is this a Bug?

2007-09-28 Thread Alban Hertroys

Nis Jørgensen wrote:
 If you can't wait, you are probably better off working around the
 problem. Standard solution is to do:
 
 UPDATE  master SET m2 = -m2;
 UPDATE  master SET m2 = -m2+1;
 
 or something similar.

Would something like

UPDATE master set m2 = master2.m2
  FROM (
SELECT m2 +1
  FROM master m
 WHERE m.master_id = master.master_id
 ORDER BY m2 DESC
 ) master2

work? I think it might be faster (and possibly cause less index bloat)
than doing two consequent updates.

Haven't tested this though...

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Re: Why the ERROR: duplicate key violates unique constraint master_pkey is raised? - Is this a Bug?

2007-09-28 Thread Alban Hertroys

I had to manipulate the headers a bit, as I hadn't noticed the message
that reached me first was from the newsgroup instead of the ML.

Nis Jørgensen wrote:
 Alban Hertroys skrev:

 As I said, I don't understand what you think it does. What you are doing
 is similar to writing
 
 SELECT m2
 FROM master, (
   SELECT m2
   FROM master m
   WHERE m.master_id = master.master_id
   )
 
 Which doesn' make any sense either.
 
 You probably want
 
 UPDATE master set m2 = master2.m2
   FROM (
   SELECT m2 +1
   FROM master m
   ORDER BY m2 DESC
   ) master2
 WHERE master2.master_id = master.master_id

So you do understand.

As I mentioned earlier, I didn't test that query. The extra alias
bothered me as unnecessary, and now I see why - I put the where clause
at the wrong place.

 According to the SQL spec, all the updates happen at the same time. Thus
 any order this happens in is an implementation detail.

According to the SQL spec the original update statement should have worked.

But it doesn't, so the updates _don't_ all happen at the same time. That
means there is an order in which they occur, and that order is likely to
be manipulatable.

 The fact that you stick an ORDER BY into a subquery guarantees
 nothing. The planner might even see that it has no effect (according to
 the spec) and ignore it. For instance this
 
 SELECT *
 FROM (SELECT *
   FROM mytable
   ORDER BY id
 )
 WHERE some_criteria
 
 is not guaranteed to return an ordered result set. Thus the planner can
 ignore the ORDER BY (but might not do so).

You are probably right that there's no way to guarantee that ordering,
but the method I suggested works in at least the version of Postgres I
have available (8.1.8), and they'll also work in database versions that
update atomically.

There _might_ be a small window of future PG versions where the planner
outsmarts this trick while it doesn't yet update atomically, but I
believe that to be rather unlikely.

I expect the priorities of the developers to be on atomic updates as
opposed to filtering out explicitly requested but unnecessary ordering.
The latter may be in use by many to massage the planner into picking a
different plan (even though it's not the right way to fix a bad plan of
course).

Here's some proof:

CREATE TABLE update_test (
update_id serial NOT NULL PRIMARY KEY,
num integer NOT NULL UNIQUE
);

INSERT INTO update_test (num) SELECT * FROM generate_series(5, 15);

-- fails
UPDATE update_test SET num = u2.num
FROM (
SELECT update_id, num +1 AS num
FROM update_test
) u2
WHERE update_test.update_id = u2.update_id;

-- succeeds
UPDATE update_test SET num = u2.num
FROM (
SELECT update_id, num +1 AS num
FROM update_test
ORDER BY num DESC
) u2
WHERE update_test.update_id = u2.update_id;

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] datestyle question

2007-09-27 Thread Alban Hertroys

Diego Gil wrote:
 Hi,
 
 I have a file to import to postgresql that have an unusual date format.
 For example, Jan 20 2007 is 20022007, in DDMM format, without any
 separator. I know that a 20072002 (MMDD) is ok, but I don't know how
 to handle the DDMM dates.

You could try importing those fields in a text field in a temporary
table and then convert them from there into your final tables using the
to_date() function.

If 20022007 really means 20 Jan instead of 20 Feb, try something like:

insert into my_table (my_date_field)
select to_date(my_date_text_field, 'DDMM') - interval '1 month'
  from my_temp_table;

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Help tuning a large table off disk and into RAM

2007-09-26 Thread Alban Hertroys

James Williams wrote:
 The box has 4 x Opterons, 4Gb RAM  five 15k rpm disks, RAID 5.  We
 wanted fast query/lookup.  We know we can get fast disk IO.

RAID 5 is usually adviced against here. It's not particularly fast or
safe, IIRC. Try searching the ML archives for RAID 5 ;)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] not in clause too slow?

2007-09-24 Thread Alban Hertroys

Ottavio Campana wrote:
 2) how can I speed it up? by using indexes? or by changing the query?

Do you have indices on mytable.id and copy_mytable.id?
Does using NOT EXISTS get you any better results?

 db=# EXPLAIN ANALYZE select * from mytable where id not in (select id
 from copy_mytable);
   QUERY PLAN
 
 --
  Seq Scan on mytable  (cost=53.21..148.34 rows=925 width=96) (actual
 time=9.813..9.813 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
  -  Seq Scan on copy_mytable  (cost=0.00..48.57 rows=1857 width=4)
 (actual time=0.031..3.132 rows=1857 loops=1)
  Total runtime: 10.291 ms
 


-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] avg() of array values

2007-09-12 Thread Alban Hertroys

Martijn van Oosterhout wrote:
 avg(*) is not valid, same for sum(*)

Doh!

Thanks.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] ANY

2007-09-11 Thread Alban Hertroys

Tom Lane wrote:
 Alban Hertroys [EMAIL PROTECTED] writes:
 Richard Huxton wrote:
 AFAIK there are two variants of ANY()
 1. sets
 2. arrays

 So you should be able to do:
 ... WHERE x = ANY( ARRAY[a, b, c] )
 
 But then the documentation isn't entirely correct. It suggests that it
 works similar to IN, but it doesn't.
 
 The subquery variants (section 9.19) do work the same.  There is nothing
 on the other page (9.20) making such a comparison.

Oh right, there are 2 sections about those expressions... Never noticed
that there was 9.16. Subquery Expressions and 9.17. Row and Array
Comparisons listing the same expressions.

(Apparently there are 2 more sections before it in your documentation)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] avg() of array values

2007-09-11 Thread Alban Hertroys

Hi,

I'm trying to get an avg value of 2 dates (to get to the month that most
part of an interval is in). I found SP's to generate rows from array
values, which I figured I could use with the avg aggregate, but to my
surprise:

* create or replace function explode_array(in_array anyarray) returns
setof anyelement as
- $$
$
$ select ($1)[s] from generate_series(1,array_upper($1, 1)) as s;
$
$ $$
- language sql immutable;
CREATE FUNCTION
* select * from explode_array(array[1, 3]);
 explode_array
---
 1
 3
(2 rows)

* select avg(*) from explode_array(array[1, 3]);
  avg

 1.
(1 row)

* select sum(*) from explode_array(array[1, 3]);
 sum
-
   2
(1 row)


I would have expected an avg of 2.0 and a sum of 4, where am I going wrong?
Or is there a better way to get the avg of 2 dates (median would
suffice, but I don't know the interval length in days before hand - and
thus not the middle point of the interval).

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Need suggestion on how best to update 3 million rows

2007-09-06 Thread Alban Hertroys

Ow Mun Heng wrote:
 I found 2 new ways to do this.
 
 option 1
 ---
 
 create table foo as select unique_id, rtrim(number) as number from foo;
 alter table add primary key...
 create index...
 drop org_table
 alter table rename...
 All this is ~10min

This only works if you don't have foreign key constraints involving that
table. Otherwise you just lost your data integrity (although I expect an
error to be thrown).

 option 2
 
 This I saw in the mysql archives (in my laptop).. when I say this I
 went.. WTF? This is possible?? Dang IT! 
 
 update a set number=replace(number,'ABC ', 'ABC') where reptest like '%
 ABC%';

Ehrm... yes, nothing special about it. Basic SQL really ;)
But shouldn't you be using trim() or rtrim() instead?:

update table set number = trim(number)

you could probably speed that up by only querying the records that need
trimming, for example:

create index tmp_idx on table(number) where number != trim(number);
analyze table;
update table set number = trim(number) where number != trim(number);

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

ANY (was: Re: [GENERAL] Alias all fields?)

2007-09-06 Thread Alban Hertroys

Richard Huxton wrote:
 Well you can get closer:
 
 SELECT * FROM gdp WHERE (y1970+y1971+...+y2005) IS NOT NULL;
 
 This makes use of the fact that X+NULL = NULL

I was going to suggest

SELECT * FROM gdp WHERE NULL NOT IN (y1970, y1971, y1972);

But that doesn't work.
So I tried using ANY with IS NOT NULL, as according to the documentation
IN is equivalent to = ANY
(
http://www.postgresql.org/docs/8.2/static/functions-subquery.html#AEN13967).

So maybe:

SELECT * FROM gdp WHERE NULL IS NOT ANY(y1970, y1971, y1972);

I get nothing but syntax errors... I remember trying to use ANY in the
past and never got it to work...

So, how do you use ANY with a fixed set of values (the way IN can)? And
can this be used to solve the OP's problem without using tricks like
summing NULL values?

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] ANY

2007-09-06 Thread Alban Hertroys

Richard Huxton wrote:
 Alban Hertroys wrote:

 SELECT * FROM gdp WHERE NULL IS NOT ANY(y1970, y1971, y1972);

 AFAIK there are two variants of ANY()
   1. sets
   2. arrays
 
 So you should be able to do:
   ... WHERE x = ANY( ARRAY[a, b, c] )

But then the documentation isn't entirely correct. It suggests that it
works similar to IN, but it doesn't.

 Of course, where x is NULL, I don't think that'll help you (x = NULL
 returns NULL). Oh, and you couldn't have nulls in arrays before 8.2 iirc.

Well, as ANY allows the use of an operator, I figured you could use IS
NULL with it.


-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Statistics collection question

2007-09-04 Thread Alban Hertroys

Phoenix Kiula wrote:
 On 04/09/07, Tom Lane [EMAIL PROTECTED] wrote:
 I'm wondering about some transaction taking exclusive lock on the table
 and sitting on it for a minute or so, and also about network problems
 delaying transmission of data to the client.

 How can I check what is causing the lack? When I restart pgsql it goes

You mean the lock?

You can check for active locks querying pg_locks

 away. The log is empty for a day or too (I'm only logging errors or
 slow queries) and the queries are super fast, but after a day it
 starts filling up with abysmally slow queries, even on simple queries
 with the WHERE clauses that have only one constant on the indexed
 column!

That's new information that we could have used earlier, as it means that
postgres does pick the right plan (at least initially) and things like
network and dns apparently work.

Was the explain analyze you sent from the super fast periods or from a
slow period? It'd be interesting to see a query plan of a problematic query.

I suppose if you try one of your super fast queries it is slow once
other queries slow down too? I ask, because I expect that query to not
be in the cache at that moment, so it could be a good candidate for an
explain analyze.

 Basically, what I am missing is some info on actually tweaking the
 postgresql.conf to suit my system. I run Apache, MySQL, Postgres, and
 Exim (email server) on the same dedicated hosting server. I don't mind
 if Postgres hogs 2GB of memory, but I need to know how to tweak it. I
 have made about eight posts on this list with my entire
 postgresql.conf posted in. I have read and re-read the manual and

Yes, but you gave us conflicting information. Only now it is clear what
your problem is.

 that makes compiles all of it and presents the system's missteps and
 guidelines may be useful, ala Tuning Primer script from MySQL camp)
 but I am not sure where to begin!

I've seen pgadmin III doing quite a nice job at that. Haven't really
used it myself, I usually prefer the command line.

 Would appreciate any help. Why do indexed queries take so much time?
 It's a simple DB with 10 relations including tables and indexes.
 Simple inserts and updates, about 5000 a day, but non-trivial

It looks like your indexes get bloated. Do you vacuum enough?
It'd be a good idea to at least analyze the tables involved in those
inserts regularly.

If you do those inserts in a batch, be sure to call ANALYZE after
commiting that batch. That helps quite a bit.

Besides that... How are those disks configured? You didn't put them in a
raid-5 array I hope? That wouldn't explain the above problem, but it
would slow things down (such has been mentioned on this list a few
times) and may thus be exaggerating the problem.

Good luck!

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Obtaining random rows from a result set

2007-09-04 Thread Alban Hertroys

To follow up on my own post, I came up with a workable solution based on
scrolling cursors. The SP approach didn't work out for me, I didn't
manage to declare a cursor in PL/pgSQL that could be positioned
absolutely (maybe that's due to us still using PG 8.1.something?).

A solution to that would be appreciated.

Anyway, I solved the problem in our application (PHP). I even got a
workable solution to prevent returning the same record more than once.
Here goes:

function randomSet($query, $limit, $uniqueColumn) {

// queries; depends on your DB connector
DECLARE _cur SCROLL CURSOR WITHOUT HOLD FOR $query;
MOVE FORWARD ALL IN _cur;

//GET DIAGNOSTICS _count := ROW_COUNT;
$count = pg_affected_rows();

$uniques = array();
$resultSet = array();
while ($limit  0  count($uniques)  $count) {
$idx = random(1, $count);

//query
$record = FETCH ABSOLUTE $idx FROM _cur;

// Skip records with a column value we want to be unique
if (in_array($record[$uniqueColumn], $uniques)
continue;

$uniques[] = $record[$uniqueColumn];
$resultSet[] = $record;
$limit--;
}

// query
CLOSE _cur;

return $resultSet;
}

I hope this is useful to anyone. It worked for us; it is definitely
faster than order by random(), and more random than precalculated column
values. Plus it translates directly to what we are requesting :)

Alban Hertroys wrote:
 I thought of another solution (with only a few calculations of random())
 that can be deployed in existing versions of PG, using a set-returning
 function with a scrolling cursor that accepts the query string as input
 like this (in pseudoish-code):
 
 
 create function random(text _query, integer _limit)
 returns set
 volatile
 as $$
 DECLARE
 _cur cursor;
 _cnt bigint;
 _idx integer;
 _rowpos bigint;
 
 _rec record;
 BEGIN
 open _cur for execute query;
 fetch forward all into _rec;
 -- select total nr of records into _cnt
 
 for _idx in 1.._limit loop
 _rowpos := random() * _cnt;
 
 fetch absolute _rowpos into _rec;
 return next _rec;
 end loop;
 
 return;
 END;
 $$
 language 'plpgsql';
 

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Obtaining random rows from a result set

2007-09-04 Thread Alban Hertroys

Alban Hertroys wrote:
 To follow up on my own post, I came up with a workable solution based on
 scrolling cursors. The SP approach didn't work out for me, I didn't
 manage to declare a cursor in PL/pgSQL that could be positioned
 absolutely (maybe that's due to us still using PG 8.1.something?).

Doh! I mean I couldn't use MOVE FORWARD ALL IN _cur for some reason, it
kept saying Syntax error.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Statistics collection question

2007-09-03 Thread Alban Hertroys

Phoenix Kiula wrote:
 Lots of posts here in reponse to performance question have the
 recommendation increase the stats on that column. From whatever
 succint reading is made available on the postgres site, I gather that
 this aids the planner in getting some info about some of the data. Am
 I missing something here, or totally off-base?

As I understand it it's a sample of how the data is distributed.
Probably it's based on statistical mathematics that specifies a minimum
size for a representive sample of a given data set. It boils down to:
If you want to know how many people like vanilla ice cream, how many
people do you need to ask their preference?.

 The issue is that I don't quite get why MySQL can fetch one indexed
 row (i.e., SQL that ends with a very simple WHERE indexed_column =
 'constant' ) in a matter of milliseconds, but PgSQL is taking 5 to 6
 seconds on an average at least for the first time. I use RAPTOR 15K
 drives, they're not SCSI but they're not exactly cheap disks either.
 And I have 4GB RAM. The explain select  shows that index is being
 used!

That's definitely not normal. I have a smallish table here containing
2.5 million records, and querying for one with a specific index takes
141 micro(!) seconds. The hardware involved is a dual opteron with 4G,
in a xen domain; I don't know what disks are used, but I doubt they're
raptors.

So something is wrong with your setup, that much is obvious. I sincerely
doubt that postgres is to blame here.

You did check that you're not connecting through the internet and
getting a DNS timeout?

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Obtaining random rows from a result set

2007-09-01 Thread Alban Hertroys


On Aug 31, 2007, at 15:54, Martijn van Oosterhout wrote:


On Fri, Aug 31, 2007 at 02:42:18PM +0200, Alban Hertroys wrote:

Examples:
* random(maxrows) would return random rows from the resultset.
* median() would return the rows in the middle of the result set  
(this

would require ordering to be meaningful).


It would be possible to write an aggregate that returns a single  
random

value from a set. The algorithm is something like:

n = 1
v = null
for each row
  if random()  1/n:
 v = value of row
  n = n + 1

return v


Doesn't this always return the first record, since random() is always  
less than 1/1?
I don't think this method has a linear distribution, but then again I  
don't understand what 'value of row' refers to...



It does require a seqscan though.


I doubt that a seqscan can be entirely avoided to fetch random rows  
from a set, at least not until the last random result has been  
returned, _unless_ the number of matching records would be known  
before starting taking random samples.



If you're asking for 5 random rows
you probably mean 5 random but distinct rows, which is different to
just running the above set 5 times in parallel.


Indeed, that is one of the distinctions that need some thought for my  
original preposition. I left it out, as it's an implementation detail  
(an important one, admittedly).



I don't know if there's a similar method for median...


I'm not entirely sure, but I think your method is the only one  
suggested that doesn't involve calculating random() a million times  
(for a million records) to return 5 (random) records.



My suggestion involved a way to calculate random() only when  
retrieving records from the result set (only 5 times for a million  
records in this case).
For a linearly distributed random set it does require knowing the  
number of records in the set though, an estimate would make it non- 
linear (although only a little bit if accurate enough).



OTOH, I'm starting to think that the last sort step of an order by  
can be postponed to the result set fetching cycle under the  
conditions that:

- the ordering expression is unrelated to the records involved, and
- only a fraction of the total number of records will be returned.
(Which is somewhat similar to the condition for an index being more  
efficient than a seqscan, btw)


Comparing records with each other for something not related seems a  
waste of effort, while the result set has already been determined  
(just not ordered in any particular way), am I right?


With that change (postponing sorting) my original ORDER BY random()  
LIMIT 5 would perform quite adequately, I think - it'd only involve  
calculating random() at least 5 times, not as often as the number of  
records in the result set.



Or is order by random() acting as some kind of shuffling method? Is  
that a requirement to get a linearly distributed set to randomly draw  
from?
I can see how it wouldn't be linear if you'd start randomly comparing  
records from the beginning of the result set... (Which would be the  
logical method if you don't know the size of the set before hand)



I thought of another solution (with only a few calculations of random 
()) that can be deployed in existing versions of PG, using a set- 
returning function with a scrolling cursor that accepts the query  
string as input like this (in pseudoish-code):



create function random(text _query, integer _limit)
returns set
volatile
as $$
DECLARE
_cur cursor;
_cnt bigint;
_idx integer;
_rowpos bigint;

_rec record;
BEGIN
open _cur for execute query;
fetch forward all into _rec;
-- select total nr of records into _cnt

for _idx in 1.._limit loop
_rowpos := random() * _cnt;

fetch absolute _rowpos into _rec;
return next _rec;
end loop;

return;
END;
$$
language 'plpgsql';


This method could return the same record twice though, I'll need to  
build in some accounting for used up rowpos'es.

Would it be more efficient than the usual methods?

Sorry for the brain dump, I tried to get everything into this single  
message. I hope it is at least comprehensible and useful, or  
interesting or at least mildly amusing if not.


Regards,

Alban Hertroys
magproductions b.v.

!DSPAM:737,46d93d9b289906550616460!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] JOIN issues (Left vs Right for sorting), and Nested Loop problem

2007-09-01 Thread Alban Hertroys

 a
RIGHT JOIN (my sql above shows a LEFT JOIN). Getting results from a
right join is fine as long as the column is not null in the second
table, but if it is null, then nothing is returned. This is why I do a
LEFT join in the first place! So my question: how can I do a left
join, which is the logic that I wish to accomplish, but get the
sorting to work from the second table and if a column is null then
just return as 0 instead of nothing at all? (The LEFT JOIN used to
work in Mysql).


That's very odd, the right join should work fine.
You constrain tradecounts to require a matching record in trades, so  
a right join with tradecounts can not return NULL values for columns  
in trades; Except where status is null (which is possible), in which  
case the record doesn't match your WHERE-clause. Are you sure you  
don't have NULL values for statuses?


You don't say what not really sorted means in your left joins, but  
I expect the rows with NULL values for u_count to be grouped together  
at the top (in no particular order, you didn't specify any other  
order than by u_count) and the rest ordered by u_count as expected.


You could use ORDER BY COALESCE(tradecount.u_count, 0) desc if you  
want it to behave like you say mysql sorted it.



Apparently mysql treats NULL values as 0 when ordering? Or do they  
just order them first instead of last like PG does (which is just a  
matter of preference, really)?


You should realize that NULL means 'unknown', so theoretically you  
could encounter databases that put them at random places in your  
otherwise sorted result set, not touching their position among other  
records because they can't know how to compare them.


If you want certain behaviour from NULL values you'll need to specify  
what you want or expect surprises ;)


--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //




!DSPAM:737,46d95276289901944772347!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Obtaining random rows from a result set

2007-09-01 Thread Alban Hertroys


On Sep 1, 2007, at 12:44, Alban Hertroys wrote:

It would be possible to write an aggregate that returns a single  
random

value from a set. The algorithm is something like:

n = 1
v = null
for each row
  if random()  1/n:
 v = value of row
  n = n + 1

return v


Doesn't this always return the first record, since random() is  
always less than 1/1?
I don't think this method has a linear distribution, but then again  
I don't understand what 'value of row' refers to...


Oh, now I see... The first time guarantees that v has a value (as  
random()  1/1), and after that there is a decreasing chance that a  
new row gets re-assigned to v. That means the last row has a chance  
of 1/n, which would be it's normal chance if the distribution were  
linear, but doesn't the first row have a chance of 1/(n!) to be  
returned?


--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //




!DSPAM:737,46d9551a289904044091126!



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] JOIN issues (Left vs Right for sorting), and Nested Loop problem

2007-09-01 Thread Alban Hertroys



On Sep 1, 2007, at 14:48, Phoenix Kiula wrote:


On 01/09/07, Alban Hertroys [EMAIL PROTECTED] wrote:


On Sep 1, 2007, at 11:46, Phoenix Kiula wrote:

.
..snip


However, there's a nested loop in there as the EXPLAIN ANALYZE shows
below. What is causing this nested loop?


It looks like it's used to match trades to tradecounts. I think that
makes sense, as the number of matching records from both tables isn't
necessarily equal. The query is looping over trades until each
tradecount has all its trades (for user 'jondoe' with status 'Y')
associated.



So are you suggesting that it would help performance if the number of
rows in each table were to be exactly the same? It can be done I
suppose, but according to our business logic at the moment, the counts
table gets a corresponding row when there is at least one count.
Otherwise, there is nothing for an id in the tradecount table, so
u_count comes back to us as null.


No, it wouldn't help I think. The query planner still would have no  
way of being sure of that, it doesn't know about your business logic.  
I'm not entirely sure that's the problem even...


Is that combination of user_id with a specific status something  
you'll be querying a lot? In that case it may help to create an index  
over that combination, or a partial index on user_id where status =  
'Y' holds true.


I am kind of surprised that the planner doesn't understand that a  
foreign key with a unique constraint (which a primary key is) means  
there is a 0..1 to 1 relationship with the target table.



It is kind of confusing that you're using the id column in
tradecounts for both primary key and foreign key, and I'm not sure
what that implies to the query planner. It suggests that there can be
only (up to) one tradecounts record for each trade count, but it
appears that either the planner doesn't realise that...



If I drop the primary key and leave only the foreign key, will this
column still be indexed (sorry if this is a stupid question). I can
drop primary if that is true, but I do want to leave the foreign key
intact because of the ON DELETE CASCADE feature to maintain data
integrity.


The index wouldn't drop with the dropping of the constraint. It also  
has no relevance to the ON DELETE CASCADE; that's part of the foreign  
key constraint and unrelated to other indices on that table.


Having an index on that column would help though, and if it's  
required to be unique I'd probably opt for a unique constraint on it  
(which creates a unique index for you). PostgreSQL doesn't  
automatically create indices on foreign keys, btw.


In fact there's nothing wrong with your combined primary/foreign key,  
except that I think it _might_ confuse the planner. I am not  
knowledgeable enough to say for sure.



Is 10 ms problematic for this query?



I think you got 10ms from the query plan? These queries are very fast
after they have been executed once. But the first time is huge.
Sometimes I have to wait as much as 10 seconds (10,000ms?)


10s for a join of what... 2 times 386 rows? That can't be right.  
Sequential scans would be faster than that (by much). Are you running  
out of memory for that query maybe? Or are you looking at a DNS time  
out? Something is wrong there.




You could use ORDER BY COALESCE(tradecount.u_count, 0) desc if you
want it to behave like you say mysql sorted it.


Yes, this does it! I didn't think about the NULL stuff, and yes MySQL
returns NULLs in integer columns as a 0, so those queries work. I


It does? Oh dear... Then how do they expect you to see that there was  
an actual 0 in that column instead of a NULL?



guess I could use the IFNULL or something, but thanks for the COALESCE
idea, this is great. It works. I just hope sorting by a function does
not cause a major hit to query performance, so I'll be watching over
the next few days.


Not much, AFAIK. But in the worst case you could create a functional  
index on that column. That'd move the calculation into the creation  
of the index and would only add a small penalty on inserting and  
updating.


--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //




!DSPAM:737,46d97e1e289908046410233!



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Obtaining random rows from a result set

2007-09-01 Thread Alban Hertroys


On Sep 1, 2007, at 14:44, Martijn van Oosterhout wrote:


On Sat, Sep 01, 2007 at 02:24:25PM +0200, Alban Hertroys wrote:

Oh, now I see... The first time guarantees that v has a value (as
random()  1/1), and after that there is a decreasing chance that a
new row gets re-assigned to v. That means the last row has a chance
of 1/n, which would be it's normal chance if the distribution were
linear, but doesn't the first row have a chance of 1/(n!) to be
returned?


No. Consider at the first row it has chance 1 of being selected. At  
the

second row it has chance 1/2 of being *kept*. At the third row it has
chance 2/3 of being kept. At row four it's 3/4. As you see, the
numerators and denominators cancel, leaving 1/n at the end...


Ah, now I see where I went wrong. If the first row got through to any  
next iteration, of course there's no chance anymore that it didn't.



Neat huh?


Neat from an algorithmic point of view, yes. But it also means that  
it's calculating random() for every record just like the rest of the  
suggested solutions :(


I'm still convinced doing that isn't the right approach to the problem.

I think I'll do some experimenting with the set returning function on  
Monday to see how that performs comparative to ordering by random().




The problem with the approaches that use pre-calculated random values  
is I need my result to be truly random each time. If I'd return a  
number of records starting at a record that has a certain random  
value, I'd end up returning the records directly after that in the  
same order every time because their order is pre-determined.


I realise the chances of that happening are slim provided enough  
records to choose from, and chance dictates that it could happen  
anyway, but you couldn't conscientiously sell that as an equal  
chance... My boss thinks otherwise though, maybe I'll have to settle  
for almost fair :P


--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //




!DSPAM:737,46d983fa289902833059189!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[GENERAL] Obtaining random rows from a result set

2007-08-31 Thread Alban Hertroys

Hello,

I've recently been busy improving a query that yields a fixed number of
random records matching certain conditions. I have tried all the usual
approaches, and although they do work, they're all limited in some way
and don't translate really well to what you want. They're kludges, IMHO.

The methods I've tried are explained quite well on
http://people.planetpostgresql.org/greg/index.php?/archives/40-Getting-random-rows-from-a-database-table.html

All these methods involve calculating a random number for every record
in the result set at some point in time, which is really not what I'm
trying to model. I think the database should provide some means to get
those records, so...

Dear Santa,

I'd like my database to have functionality analogue to how LIMIT works,
but for other - non-sequential - algorithms.

I was thinking along the lines of:

SELECT *
  FROM table
 WHERE condition = true
 RANDOM 5;

Which would (up to) return 5 random rows from the result set, just as
LIMIT 5 returns (up to) the first 5 records in the result set.


Or maybe even with a custom function, so that you could get non-linear
distributions:

SELECT *
  FROM table
 WHERE condition = true
 LIMIT 5 USING my_func();

Where my_func() could be a user definable function accepting a number
that should be (an estimate of?) the number of results being returned so
that it can provide pointers to which rows in the resultset will be
returned from the query.

Examples:
* random(maxrows) would return random rows from the resultset.
* median() would return the rows in the middle of the result set (this
would require ordering to be meaningful).

What do people think, is this feasable? Desirable? Necessary?

If I'd have time I'd volunteer for at least looking into this, but I'm
working on three projects simultaneously already. Alas...

Regards,
Alban Hertroys.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] SQL Diff ?

2007-08-27 Thread Alban Hertroys

Kevin Kempter wrote:
 Hi List;
 
 I have a very large table (52million rows) - I'm creating a copy of it to rid 
 it of 35G worth of dead space, then I'll do a sync, drop the original table 
 and rename table2.
 
 Once I have the table2 as a copy of table1 what's the best way to select all 
 rows that have been changed, modified in table1  since the initial laod from 
 table1 into table2?

I think you could get smart having a few rules for insert/update/delete
on 'table' that keep track of what happens during your work on table2.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [HACKERS] [GENERAL] Undetected corruption of table files

2007-08-27 Thread Alban Hertroys

Jonah H. Harris wrote:
 On 8/27/07, Tom Lane [EMAIL PROTECTED] wrote:
 that and the lack of evidence that they'd actually gain anything
 
 I find it somewhat ironic that PostgreSQL strives to be fairly
 non-corruptable, yet has no way to detect a corrupted page.  The only
 reason for not having CRCs is because it will slow down performance...
 which is exactly opposite of conventional PostgreSQL wisdom (no
 performance trade-off for durability).

Why? I can't say I speak for the developers, but I think the reason is
that data corruption can (with the very rare exception of undetected
programming errors) only be caused by hardware problems.

If you have a proper production database server, your memory has error
checking, and your RAID controller has something of the kind as well. If
not you would probably be running the database on a filesystem that has
reliable integrity verification mechanisms.

In the worst case (all the above mechanisms fail), you have backups.

IMHO the problem is covered quite adequately. The operating system and
the hardware cover for the database, as they should; it's _their_ job.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] PostgreSQL vs Firebird feature comparison finished

2007-08-24 Thread Alban Hertroys

Joshua D. Drake wrote:
 I agree with you on the multi-threaded.  I think I will add a note
 saying the the multi-threaded architecture is only advantageous  on
 Windows.
 
 And Solaris.

I'm not entirely sure what makes multi-threading be advantageous on a
specific operating system, but I think FreeBSD should be added to that
list as well... They've been bench marking their threading support using
multi-threading in MySQL (not for the db, mind you - just for load ;),
and it performs really well.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] PostgreSQL vs Firebird feature comparison finished

2007-08-24 Thread Alban Hertroys

Dave Page wrote:
 Alban Hertroys wrote:
 Joshua D. Drake wrote:
 I agree with you on the multi-threaded.  I think I will add a note
 saying the the multi-threaded architecture is only advantageous  on
 Windows.
 And Solaris.

 I'm not entirely sure what makes multi-threading be advantageous on a
 specific operating system, but I think FreeBSD should be added to that
 list as well... They've been bench marking their threading support using
 multi-threading in MySQL (not for the db, mind you - just for load ;),
 and it performs really well.

 
 I'm not sure I necessarily agree with those two - we have no real proof
 that a multithreaded architecture would be significantly more efficient
 than a multi process. It certainly wouldn't be as robust as an error in
 one backend thread could bring down the entire server.
 
 Windows is a special case in this regard. The OS has been designed from
 the outset as a threaded environment. The important point is not that
 Windows threads are necessarily any more efficient than their Solaris or
 FreeBSD counterparts, but that the multi-process architecture is alien
 to Windows and is inherently slower. Two of the major bottlenecks we
 have on Windows as a result are backend startup time and shared memory
 access speed - both of which are significantly slower than on *nix.
 
 Regards, Dave

Thanks for explaining (again).

So actually the remark shouldn't be that the multi-threaded
architecture is only advantageous  on Windows, but more like the
multi-process architecture is disadvantageous on Windows and hence a
multi-threaded architecture is preferred (on that particular OS).

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Converting non-null unique idx to pkey

2007-08-23 Thread Alban Hertroys

Ed L. wrote:
 On Tuesday 21 August 2007 1:45 pm, Scott Marlowe wrote:
 If you have a large db in 7.4.6, you should do two things.

 1: Update to 7.4.19 or whatever the latest flavor of 7.4 is,
 right now.  There are a few known data eating bugs in 7.4.6.
 
 Sounds like good advice from a strictly technical viewpoint.  
 Unfortunately, in our particular real world, there are also 
 political, financial, and resource constraints and impacts from 
 downtime that at times outweigh the technical merits of 
 upgrading 'right now'.

Since you're setting up replication to another database, you might as
well try replicating to a newer release and swap them around once it's
done. I've seen that method of upgrading mentioned on this list a few times.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] [RESEND] Transaction auto-abort causes grief with Spring Framework

2007-08-17 Thread Alban Hertroys

Tyson Lloyd Thwaites wrote:
 I am not opposed to introducing checkpoints to our API, but it would be
 nicer if I didn't have to. At the moment I have resigned myself to
 turning off spring declarative txns for certain methods, and handling
 them manually by doing multiple txn blocks. In the above example
 however, the bit that I want to allow to fail is inside a method that
 would have to be wrapped in a transaction  see the web of
 complexity that is growing?

Isn't the 'try' statement rather similar to a 'savepoint' command? I
realize it would be difficult to override the behaviour of try {...}
catch (...) {...}, but it shouldn't be too hard to wrap it somehow for
exceptions in database code.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] [RESEND] Transaction auto-abort causes grief with Spring Framework

2007-08-17 Thread Alban Hertroys

Webb Sprague wrote:
 I am not sure how you can insert into a log even with savepoints,
 unless you put the logging statement first and then follow it with the
 insert.

and delete it after success?

Alternatively you could use one connection for your normal queries, and
another for auditing. Your application will need twice as many
connections that way, though... In that case you shouldn't commit
records on the 'normal' connection before the audit records have been
committed I suppose?

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Repeat posts

2007-08-17 Thread Alban Hertroys

Raymond O'Donnell wrote:
 Hi all,
 
 Is it just me? :-) from time to time I get repeat broadcasts from
 various PG mailing lists - posts that I've already received several days
 previously are sent again.
 
 It's not a major problem, nor even annoying in any wayI was just
 wondering if anyone else has noticed it.

I just finished going through my new mail since this morning, which
contained several fresh duplicates of messages I already read. So yes,
it happens to me too.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Moving to postgresql and some ignorant questions

2007-08-14 Thread Alban Hertroys

Phoenix Kiula wrote:
 Agree. That's what I am trying to do. Include as many UPDATEs etc into
 the same TRANSACTION block, but my worry is when I read about
 autocommit and how it is enabled by default in postgresql 8.
 Specifying explicit BEGIN and COMMIT blocks should only commit when I
 want the DB to, or will each UPDATE in the middle of this block get
 executed?

The database doesn't have autocommit, AFAIK.
Some of the clients have, but it only applies if you don't put your SQL
statements between BEGIN; and COMMIT; (or ROLLBACK;) statements.

I never really use anything but psql for a client, so I can'treally say
how other clients (pgadmin fe.) handle this.


 Incidentally, most data integrity checks are handled with CHECK constraints
 and FOREIGN KEY constraints rather than manual triggers. They're both easier
 and cheaper.
 
 The problem with simple CHECK constraints is that they can only
 reference the primary key in another table. What if I want more than
 one column to be the same as the referenced table, but do not want to
 have a compound primary key in the referenced table? From reading and
 re-reading the manual, I dont think FKs allow for this. Only primary
 key references are supported.

You're confusing CHECK constraints and FOREIGN KEY constraints. They're
different things ;)

CHECK constraints verify that data in a certain column matches a certain
condition. I'm not sure they can reference columns in other tables,
unless you wrap those checks in stored procedures maybe...

For example:
CREATE TABLE test (
age int NOT NULL CHECK (age  0)
);

Next to that, you can define DOMAINs - basically your own customized
data types that can follow _your_ rules. Admittedly I have never done
that yet, but it's supposed to be one of the key features of the
relational model (I've seen claims that you're actually not supposed to
use the base types, but define domains for all your data types).

*And* you can define compound foreign key constraints,
for example:

CREATE TABLE employee (
employee_id serial NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT employee_pk
PRIMARY KEY (employee_id, company_id)
);

CREATE TABLE division (
employee_id integer NOT NULL,
company_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT division_fk
FOREIGN KEY (employee_id, company_id)
REFERENCES employee
ON DELETE SET NULL
ON UPDATE CASCADE
);


Also a nice trick, when performing DDL statements (CREATE TABLE and
friends), you can wrap them in a transaction and commit (or rollback) if
you like the end result (or not). I believe the only exception to that
rule is CREATE DATABASE.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] LIKE conditions in PGSQL very, very slow!

2007-08-13 Thread Alban Hertroys

.ep wrote:
 Hi,
 
 I'm moving from the mysql camp and quite liking things like functions
 and such, but a lot of my functionality depends on queries such as
 
SELECT id, name, start_date
FROM customer
WHERE name LIKE 'eri%';
 
 These kinds of queries are super fast in MySQL because eri% type
 conditions also use the index. Is this not the case with PG?
 
 Here's the EXPLAIN output:
 
 
 CUSTDB=# explain select id,name,start_date from customer where name
 like 'eri%';
QUERY PLAN
 
  Seq Scan on customer  (cost=0.00..86032.18 rows=1 width=111)
Filter: ((name)::text ~~ 'eri%'::text)
 (2 rows)

I think there's either no index on customer.name or you didn't analyze
the table, so PG has outdated statistics on its contents (probably
stating the table is still empty) and thinks a sequential scan will be
faster. You probably want to become acquainted with autovacuum.

Another possibility is that most of your customers names start with
'eri', in which case a seq scan is actually faster... In that case you
should probably do something about your customer base ;)

Regards,

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Modified FIFO queue and insert rule

2007-08-08 Thread Alban Hertroys

Leif B. Kristensen wrote:
 CREATE RULE placelimit AS
 ON INSERT TO recent_places DO ALSO
 DELETE FROM recent_places
 WHERE
 -- this clause doesn't work
 -- (place_fk = NEW.place_fk AND id  NEW.id) OR
 id NOT IN (SELECT id FROM recent_places ORDER BY id DESC LIMIT 10);
 
 When I try to use the commented clause above, no records are written to 
 the table at all! Why?

Do you use nextval() for that id?
In that case I think you immediately delete the record after inserting
it, as nextval gets called again in the delete statement, and thus id 
NEW.id.

You should probably use a trigger (a before one maybe) instead of a rule.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

2007-08-02 Thread Alban Hertroys

Josh Tolley wrote:

 So please respond, if you feel so inclined, describing things you like
 to monitor in your PostgreSQL instances as well as things you would
 like to be able to easily monitor in a more ideal world.

I can think of a few things I'd like to be able to monitor...

Connection usage:
- total number of connections
- number of idle vs active connections
- total number per user/database
- number of idle vs active connections per user/database

I'm not entirely sure whether to split on user or on database or maybe both?

Also interesting: The number of queries that take more than an arbitraty
amount of time to complete. Maybe per user/database?
I suppose this number is only interesting on an uncongested database
server. Otherwise there will be queries passing that treshold that
normally wouldn't, because they have to wait for the real troublemakers
to finish.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] psql 8.2 client vs pg 8.1 server problem

2007-07-31 Thread Alban Hertroys

We have psql 8.2 clients on our workstations, while we still have pg 8.1
on our development and production servers. This causes problems like the
following:

database \d table
ERROR:  column i.indisvalid does not exist
database

We can log into the server and use the local client of course, but this
problem is causing some confusion among our new employees and doesn't
exactly inspire them with the reliability we know PostgreSQL has...

Is there a better workaround, or may this be a bug even?

Versions:

psql --version
psql (PostgreSQL) 8.2.4
contains support for command-line editing


database select version();
   version

-
 PostgreSQL 8.1.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.5
(Debian 1:3.3.5-13)
(1 row)


-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] psql 8.2 client vs pg 8.1 server problem

2007-07-31 Thread Alban Hertroys

Dave Page wrote:
 Alban Hertroys wrote:
 We have psql 8.2 clients on our workstations, while we still have pg 8.1
 on our development and production servers. This causes problems like the
 following:

 database \d table
 ERROR:  column i.indisvalid does not exist
 database

 We can log into the server and use the local client of course, but this
 problem is causing some confusion among our new employees and doesn't
 exactly inspire them with the reliability we know PostgreSQL has...

 Is there a better workaround, or may this be a bug even?
 
 Install the 8.1 version of psql on your workstations (and symlink it to
 psql81 or something if necessary). psql doesn't make any promises about
 backward compatibility - for that, you'll need to use something like
 pgAdmin (even that may need to be pointed at the older versions of
 pg_dump etc if you're going to run backups from the workstations).

But wouldn't it suffice to issue the old versions of the command
queries on an old server? It shouldn't be that hard to keep backward
compatibility among minor versions of psql. Even the server version is
known... They're just queries, right? How hard can it be to keep various
versions around so as to match the server version?

I can understand that new psql client features wouldn't be backward
compatible, but normal commands like \d should keep working.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] query to match '\N'

2007-07-30 Thread Alban Hertroys

Bertram Scharpf wrote:
 Hi,
 
 Am Freitag, 27. Jul 2007, 18:35:21 - schrieb pc:
 I have a table test with columns  col1 col2.col2 contains an entry
 '\N' .I want to select all entries which have '\N'  in col2.How do i
 do that?

 select * from test where col2 like '\N' ;
 select * from test where col2 like '\\N' ;
 
   select * from test where col2 like E'N';
   select * from test where col2 =E'\\N';
 
 Why use `like' here at all?

Presumably he wanted col2 like E'%N%'.
But doesn't \N mean NULL, or would the OP be looking for literal '\N'
strings in his data? Because if he's looking for NULLs it may be better
to query for col2 IS NULL.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] query to match '\N'

2007-07-30 Thread Alban Hertroys

Nis Jørgensen wrote:
 Alban Hertroys skrev:
 
 Presumably he wanted col2 like E'%N%'.
 But doesn't \N mean NULL, or would the OP be looking for literal '\N'
 strings in his data? Because if he's looking for NULLs it may be better
 to query for col2 IS NULL.
 
 My guess is that this string was used to signify NULL in the file
 originally imported into the db.

Which is basically what I was pointing out ;)
It's all speculation anyway, we're guessing at what problem the OP tries
to solve.

I think either he is looking for NULL column values that exist in his
input file as '\N' strings (strings cannot contain NULLs, so using
like is pointless), or he is looking for failed conversions of \N from
his input file that thus may have ended up as literal \N characters in
column data.

In the latter case there shouldn't be any columns that match like
'%N%' but not = 'N'. OTOH, we may be talking about an import
failure, in which case anything is possible. Fixing that would probably
be more difficult than fixing the cause of the failure and re-doing the
import.

As I said, it's all speculation. Without input from the OP there's not
much point in continuing this discussion.

Regards,
-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] a JOIN on same table, but 'slided over'

2007-06-28 Thread Alban Hertroys

Rafal Pietrak wrote:
Gurjeet,

Focusing on the standars solution, I did some 'exercises' - works fine,
just learning.

But the ambarasing thing is, that I looks like I really don't get it,
meaning - what exactly the internal query does. I've never ever seen or
used a subquery with data/params from 'upper level' query used within a
subquery - any time I've written a hierarchical query (e.g. with
subqueries), the relations were always hierarchical. In other words, I
was always able to run an internal subquery outside of the compound
query and get consistant results. With this one I cannot do that due to
the 'entanglement' of t3 and t1.

This is called a 'correlated subquery'. Basically the subquery is
performed for each record in the top query.

Google gave me this:
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/sqlp/rbafycorrs.htm

And there's probably more to find. Interestingly enough wikipedia
doesn't seem to have an article on the subject.

Postgress query plan from EXPLAIN doesn't help me here - probably I'm
unable to interpret it correctly without 'a paradigm mind shift'.

So, would you mind commenting a little on how exactly the t1.id
influences subquery (with t3), and the result influences back the
selection of t1 set?

Will greatly apreciate that.

--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Re: [GENERAL] AutoVacuum Behaviour Question

2007-06-28 Thread Alban Hertroys

Bruce McAlister wrote:
 Which brings me onto a possibly related question. I've noticed that in
 this particular database, that there are temporary tables that are
 created. I'm not 100% sure how/why these temporary tables are being
 created, but I do assume that it must be by some sort of SQL query that
 runs against the database. How does postgresql handle these temporary
 tables, i mean, if a temporary table is created by some sql query, is it
 up to the user performing the query to remove the temporary table
 afterwards, or does postgresql automagically remove the temporary table
 when the query completes?

That would defeat the purpose of temporary tables. You usually create
them to perform queries on a data set from another query from the same
session.

AFAIK temporary tables are dropped when the session in which they were
created terminates, or optionally on commit if specified that way (ON
COMMIT DROP).

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: Intervals (was: [GENERAL] DeadLocks..., DeadLocks...)

2007-06-18 Thread Alban Hertroys

Tom Allison wrote:
 I have a question though.
 I noticed a particular format for identifying dates like:
 now()-'3 days'::interval;
 
 What's '::interval' and why should I use it?

Intervals are convenient, simply said. They are a special type dealing
with date calculations relative to a given date. Basically they move
calculation of relative dates to the database server instead of the
programmer (always a good thing IMO).

Next to that, they're much more readable compared to the alternative
(which is in fact an implicit interval type measured in days, I suppose).

Compare:

SELECT now() + INTERVAL '1 month';
SELECT now() + CASE WHEN extract('month' from now()) IN (1, 3, 5, 7, 8,
10, 12) THEN 31 WHEN ...etc... END

or:

SELECT now() + INTERVAL '3 weeks - 5 days'
SELECT now() + 16;

The only drawback I know is that various query engines (ie. PHP's pg_
functions) don't know how to handle intervals. Suffice to say, I'm a big
fan of the interval type.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Q: Tree traversal with SQL query?

2007-06-15 Thread Alban Hertroys

Adrian von Bidder wrote:
 Hi,
 
 Is there any way to do tree traversal with only one SQL query (no 
 procedure/function)?
 
 CREATE TABLE foo (
   node INTEGER,
   parent INTEGER REFERENCES foo(node)
 );
 
 Ideally the output would be a depth-first tree traversal starting at root 
 (marked by parent == node, for example.)
 
 Obviously, I can do this with normal programming and loops, but it bugged 
 me a while if its at all possible doing this in one query.

Have a look at contrib/ltree ;)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Using the GPU

2007-06-11 Thread Alban Hertroys

Alexander Staubo wrote:
 On 6/8/07, Billings, John [EMAIL PROTECTED] wrote:
 If so  which part of the database, and what kind of parallel
 algorithms would be  used?
 
 GPUs are parallel vector processing pipelines, which as far as I can
 tell do not lend themselves right away to the data structures that
 PostgreSQL uses; they're optimized for processing high volumes of
 homogenously typed values in sequence.

But wouldn't vector calculations on database data be sped up? I'm
thinking of GIS data, joins across ranges like matching one (start, end)
range with another, etc.
I realize these are rather specific calculations, but if they're
important to your application...

OTOH modern PC GPU's are optimized for pushing textures; basically
transferring a lot of data in as short a time as possible. Maybe it'd be
possible to move result sets around that way? Do joins even maybe?

And then there are the vertex and pixel shaders...

It'd be kind of odd though, to order a big time database server with a
high-end gaming card in it :P

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Postmaster processes taking all the CPU

2007-06-11 Thread Alban Hertroys

MC Moisei wrote:
 
 I'm not sure I understand the question. What else runs on it ?I have an 
 Apache that fronts a Tomcat (Java Enterprise App Server). In tomcat I only 
 run this application that has a connection pool of 30 connections(if I 
 remember correctly).Once the application starts to open connections it looks 
 that the each postmaster associated with the connection is not exiting as 
 fast as was before. I can follow up with a ps -aux capture if you think 
 that's helpful. Till yesterday all was working smoothly for about 2 years. It 
 looks like the postmasters are not finishing of if they do takes a good while 
 to finish. Also I've seen that the swap increases. I never use to have swap 
 used. I don't have space problems not errors in the syslog.Am I running out 
 of memory and all gets delayed by the swap handling ? I have the feeling that 
 I spin around my tail. So these processes are taking all the CPU and memory 
 and they hold for too long just doing a select. The traffic didn't increase 
 by any me
ans so one can say that causes the problem - at one point it sustained 4 times 
more traffic without problems.Hope this provide more insight.MC Date: Fri, 8 
Jun 2007 16:35:40 -0400 From: [EMAIL PROTECTED] To: 
pgsql-general@postgresql.org Subject: Re: [GENERAL] Postmaster processes 
taking all the CPU  On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei 
wrote:pack of postmaster(4-22) processes ran by postgres user are 
taking  over almost all the CPU.   What else is the box doing?  If it 
doesn't have any other work to do, why shouldn't postgres use the CPU time?  
(This is a way of saying, You didn't tell us anything that would allow us to 
help.)  A  --  Andrew Sullivan  | [EMAIL PROTECTED] In the future this 
spectacle of the middle classes shocking the avant- garde will probably become 
the textbook definition of Postmodernism.  --Brad Holland  
---(end of broadcast)--- TIP 
4: Have 
you searched our list archives? 
http://archives.postgresql.org/
 

Could you please send your messages as plain text? Your mail client is
doing something foul to the plain text alternative incorporated in the
multi-part message. It's almost unreadable this way.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] parametered views

2007-06-11 Thread Alban Hertroys

Rodrigo De León wrote:
 On Jun 8, 7:59 pm, [EMAIL PROTECTED] [EMAIL PROTECTED]
 wrote:
 i have 4 tables :

 date_table (date_id,.)
 A_table(A_table_id, something1,something2.)
 A1_table(A1_table_id references A_Table(A_Table_id),A11,A12)
 A2_table(A2_table_id references A_Table(A_table_id),A21,A22,...)

 so i want to create a view with date_id,A_table_id,something1,
 if something2=x then my view containts
  (A1_table.A11 and A1_table.A12)
  else it's containts (A2_table.A21 and A2_table.A22))

 so my view depends of something2 value.

 I hope i'm specific

 Thx
 Lhaj
 
 create view foobar as
 select date_id,A_table_id,something1
 ,case when something2=x then A1_table.A11 else A2_table.A21 end as foo
 ,case when something2=x then A1_table.A12 else A2_table.A22 end as bar
 from (... etc., etc. ...)

Not sure what you're trying to say here... You didn't formulate a question.

Maybe you mean this?:

CREATE VIEW foobar AS
SELECT date_id, A_table_id, A1_table.A11, A1_table.A12
 WHERE something2 = x
UNION
SELECT date_id, A_table_id, A2_table.A21, A2_table.A22
 WHERE something2  x

In case the records in the result sets matching something2 = x and the
ones not matching are distinct or if you don't care about duplicates,
use UNION ALL instead of UNION.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] performance problem with loading data

2007-06-11 Thread Alban Hertroys

Sergey Karin wrote:
 Hi all.
 
 I use PG 8.1.8 and PostGIS 1.1.1
 vka1=# select version();

 I develop application for loading geodata to database. In separate
 transaction the application  inserts the data in separate table that
 created dynamically when transaction started. All tables has equal
 structure.

 I found *VERY* strange problem: speed of the loading process is slow
 down (first 1 objects are loaded in 69792 ms and last 1 objects
 in 123737 ms). And futhermore, if I do NOT close current session but
 start new transaction, the first 1 objects will be loaded in 192279
 ms and last 1 objects in 251742 ms. And so on!! :-(
 
 But if I drop the trigger speed of loading process is NOT slow down.
 
 Who can explain me what I do incorrect?

I think I can explain what happens (to my understanding, I'm just a user).

When you start you have an empty table (right?). If PG's statistics are
accurate to match that situation at that point, it will know there are 0
records in it. Querying a table with 0 records is fastest with a
sequential scan.

Next you put data into the table, and at a certain point a sequential
scan will not be optimal anymore. But PG doesn't know that, so it keeps
using sequential scans!

I think you'll see performance improve if you add ANALYZE table
statements periodically. That way PG updates its stats on that table.

I know my explanation is a bit simplified, there are other statistics in
play. That is why I didn't advise to run ANALYZE just once after a
certain number of inserts ;)

Now this would be problematic with INSERT ... SELECT, as there's no room
to run periodical ANALYZES (maybe if you'd put in LIMIT/OFFSET, but that
feels kludgy).

For the technical people; would it be possible to use the statistics on
the table(s) in the SELECT part of the statement to update the
statistics of the table being inserted into? Maybe they wouldn't be
entirely accurate, but it wouldn't it be more accurate than statistics
that say it's empty?

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

< 5 6 7 8 9 10 11 12 13 >

901 - 1000 of 1292 matches

Mail list logo