Re: [GENERAL] value

2010-09-16 Thread Gissur Þórhallsson
>
> OK, but you still need to be careful. That trick will only work if you
> insert rows one at a time into the table. If you have a single insert
> that inserts multiple rows (as in my example), it would fail because
> the currval() call would return the same value for each row in the
> insert created by the rule.
>


My advice would be to convert to triggers.
>

After going over your example a bit better, I think I will follow your
advice.
Thanks a lot for the heads up.

Kind regards from Iceland,
Gissur Þórhallsson


Re: [GENERAL] Simple schema diff script in Perl

2010-09-16 Thread hans
Hi,

Thanks for the quick response, i installed sqlfaiyr and guess that  sqlt-diff 
is what i'm loogin for, but i can't guess how to use it, do you have any call 
samples, i found this:

sqlt-diff --to MySQL Schema1=MySQL Schema2=MySQL > SchemaUpdate.mysql.sql

i can't guess where is the database name or user to use, if it work with dumps 
i need to give the dump files and the database type...

Hans

- "Randal L. Schwartz"  escribió:

> > "hans" == hans   writes:
> 
> hans> Actually just compare tables and fields, for my current
> hans> requirements is ok, i plan to add the sequences, but no more.
> 
> I believe SQLFairy (aka SQL::Translator) can canonicalize schemas,
> and
> even give DDL to turn one schema into another.
> 
> -- 
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
> 0095
>  http://www.stonehenge.com/merlyn/>
> Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
> See http://methodsandmessages.vox.com/ for Smalltalk and Seaside
> discussion

-- 
Hans Poo, WeLinux S.A. http://www.welinux.cl
Oficina: +56-2-372.9770, Celular: 09-319.93.05
Bombero Ossa # 1010, Of: 526, Santiago


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Simple schema diff script in Perl

2010-09-16 Thread Randal L. Schwartz
> "hans" == hans   writes:

hans> Actually just compare tables and fields, for my current
hans> requirements is ok, i plan to add the sequences, but no more.

I believe SQLFairy (aka SQL::Translator) can canonicalize schemas, and
even give DDL to turn one schema into another.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
 http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transposing rows and columns

2010-09-16 Thread Sam Mason
On Thu, Sep 16, 2010 at 01:44:30PM -0400, Aram Fingal wrote:
> On Sep 16, 2010, at 12:28 PM, Sam Mason wrote:
> > If you want to do the transformation in SQL, you'd be writing something
> > like:
> > 
> >  SELECT drug, dose
> >MIN(CASE subject WHEN 1 THEN response END) AS resp_1,
> >MIN(CASE subject WHEN 2 THEN response END) AS resp_2,
> >MIN(CASE subject WHEN 3 THEN response END) AS resp_3
> >  FROM results
> >  WHERE expt_no = 1
> >AND subject IN (1,2,3)
> >  GROUP BY drug, dose
> >  ORDER BY drug, dose;
> 
> That's a good trick to know but I just tried it and found that it
> begins to get complicated with the actual data.  It also returns
> a separate row for each drug/dose/subject combination which isn't
> exactly what I want.  Each row has one column with a value and the
> rest of the columns in that row are all .

It shouldn't give a row per subject as the subject isn't in the GROUP
BY list.  Either that or you've got rounding problems that means that
the dose or drug combination means that they're effectively unique per
subject.  You need to find the set of columns that gives you the "right"
number of rows, the same issue applies to pivot tables in Excel.

> > Or you can use the tablefunc contrib module as suggested by Uwe.  I
> > prefer doing it by hand as you get more options, but it can be quite
> > tedious if you've got lots of columns you're trying to deal with.
> 
> I guess I need to read up on pivot tables.  At first glance, this
> looks like the best solution.

Maybe pull them out by experiment?

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Simple schema diff script in Perl

2010-09-16 Thread hans
Hi,

I was looking for a simple diff program to compare two schemas, i found two 
scripts out there but one compared everything and was too much for me, the 
other was a ruby script i wasn't able to run, so i wrote a little perl script 
that do the trick:

http://www.welinux.cl/hans/dbdiff.pl

Actually just compare tables and fields, for my current requirements is ok, i 
plan to add the sequences, but no more.

Actually it uses Pg as default perl driver, but any installed perl DBI driver 
should work.

The final comparison is done with vimdiff command of the vim suite, so it 
requires it to be installed. If there is some better (and simple) tool please 
let me know.

Bye
Hans

-- 
Hans Poo, WeLinux S.A. http://www.welinux.cl
Oficina: +56-2-372.9770, Celular: 09-319.93.05
Bombero Ossa # 1010, Of: 526, Santiago


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query join issue

2010-09-16 Thread Alban Hertroys
On 16 Sep 2010, at 18:23, Christine Penner wrote:

> There could be many training_course records for each of the other tables. I 
> want to get all records from the Train_mod and Train_comp table even if there 
> are no training course records available. This is the query I'm trying and I 
> get nothing. The data I'm trying this on has no training_course records but 
> does have records in the other tables. What am I doing wrong.
> 
> SELECT *
> FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON 
> TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ
> LEFT OUTER JOIN TRAIN_COMP ON TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
> where TC_PUB_ED  IS TRUE OR  TC_SEQ_NO IS NULL


Most likely TC_PUB_ED and/or TC_SEQ_NO in your WHERE clause are actually 
missing rows from TRAIN_COMP. The IS NULL condition may be succeeding, but 
TC_PUB_ED is most not TRUE but NULL in all those cases.

The solution is to put those conditions in your ON clause, like so:
LEFT OUTER JOIN TRAIN_COMP ON (
TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
AND (TC_PUB_ED  IS TRUE OR  TC_SEQ_NO IS NULL)
)

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4c92942d10252119096304!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transposing rows and columns

2010-09-16 Thread Aram Fingal

On Sep 16, 2010, at 4:37 PM, John R Pierce wrote:

> On 09/16/10 10:44 AM, Aram Fingal wrote:
>> I have thought about that but later on, when we do the full sized 
>> experiments, there will be too many rows for Excel to handle.
> 
> if you insist on this transposing, won't that mean you'll end up with more 
> columns than SQL can/should handle?


No.  The organization in Excel is much more efficient of the total number of 
cells used but not much good for querying.  When I transpose it for use in the 
database (or pivot it in Excel), it actually multiplies the number of rows.  
So, if the version with separate columns for each subject has X rows and Y 
columns, you get X * Y rows in the database version.  For example, If there are 
100 subjects, and 1000 drug/dose combinations.  Then the Excel version has 102 
columns (drug, dose and a column for each subject) and 1000 rows.  The database 
(or pivoted) version would have 4 columns (subject, drug, dose and response) 
and 100,000 rows.  Excel maxes out at 65,535 rows and PostgreSQL has no limit.  

The subjects, by the way, are not people, they are cancer cell tissue cultures 
in 384-well plates, handled by robots.  That's how we can do so many drug/dose 
combinations.  We'll do even more in the future.

-Aram

Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Frank Ch. Eigler

Peter Hopfgartner  writes:

> [...]
> > >http://sourceware.org/systemtap/examples/process/sigmon.stp

> Now we had the error, but systemtap did not report any SIGTERM. Is
> it possible to have this error without a SIGTERM being involved? As
> mentioned in a previous mail, I've modified the script to report
> SIGTERM sent to any process.

There are some other possibilities.  It's possible that the version of
stap you're using is not expanding signal.send to all possible paths
of the kernel dispatching signals to your process.

So one might try a few different things:


# see what die() is getting to work with
probe process("/usr/bin/postgres").function("die") {
   printf("%s[%d] received %d\n", execname(), pid(), $postgres_signal_arg)
}

# check for another process sending SIGTERM
probe syscall.kill {
   if (sig == 15) {
printf("%s[%d] sending %s\n", execname(), pid(), argstr)
print_ubacktrace() 
   }
}

# heck, trace the whole statement sequence during the signal handling
probe process("/usr/bin/postgres").statement("d...@*:*"),
  process("/usr/bin/postgres").statement("processinterru...@*:*") {
   printf("%s %s\n", pp(), $$vars)
}


You can run that in the background.  The second probe will give
systemwide SIGTERM activity, so you may need to filter it a bit.
If you know the appropriate postmaster process-id, you could change
the syscall.kill probe:

< if (sig == 15) {
> if (sig == 15 && pid == target_pid()) {

and invoke the script with   stap ... -x PID_OF_YOUR_POSTGRES_SERVER
(In this case, "sig" and "pid" come from the syscall arguments, that
is represent the intended signal recepient, rather than the sender;
see also 'stap -L signal.send'.)

Note that postgres does sometimes send signals to itself, so don't be
surprised to see post* processes show up there.

(A more modern system compiler & systemtap would give you much better
variable-value dumping options.)


- FChE

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transposing rows and columns

2010-09-16 Thread John R Pierce

 On 09/16/10 10:44 AM, Aram Fingal wrote:

I have thought about that but later on, when we do the full sized experiments, 
there will be too many rows for Excel to handle.


if you insist on this transposing, won't that mean you'll end up with 
more columns than SQL can/should handle?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query join issue

2010-09-16 Thread Christine Penner
I decided to approach this a different way. Not worth the time to 
figure out. I get all records from all tables and put them together 
in the program using this.


Thanks for your help.

Christine

At 12:20 PM 16/09/2010, you wrote:

On 16/09/2010 18:33, Christine Penner wrote:

Training Course
tc_seq_no (primary key)
tc_trm_seq (foreign key to train_mod table)
... and some other title, desc etc fields

train_mod
trm_seq_no (primary key)
trm_trc_seq (foreign key to train_comp table)
.. and title etc fields

train_comp
trc_seq_no (primary key)
.. and title etc fields


[snip]


SELECT *
FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON
TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ
LEFT OUTER JOIN TRAIN_COMP ON
TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
where TC_PUB_ED IS TRUE OR TC_SEQ_NO IS NULL


Hi Christine,

I can't see it either, and without any data to try it on I'm really 
only guessing


Try removing the WHERE clause and see if you get any rows back.

Also, try each join separately - just two tables at a time - and see 
what happens.


Hope this helps

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query join issue

2010-09-16 Thread Raymond O'Donnell

On 16/09/2010 18:33, Christine Penner wrote:

Training Course
tc_seq_no (primary key)
tc_trm_seq (foreign key to train_mod table)
... and some other title, desc etc fields

train_mod
trm_seq_no (primary key)
trm_trc_seq (foreign key to train_comp table)
.. and title etc fields

train_comp
trc_seq_no (primary key)
.. and title etc fields


[snip]


SELECT *
FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON
TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ
LEFT OUTER JOIN TRAIN_COMP ON
TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
where TC_PUB_ED IS TRUE OR TC_SEQ_NO IS NULL


Hi Christine,

I can't see it either, and without any data to try it on I'm really only 
guessing


Try removing the WHERE clause and see if you get any rows back.

Also, try each join separately - just two tables at a time - and see 
what happens.


Hope this helps

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Peter Hopfgartner
Tom Lane  wrote
Subject: Re: [GENERAL] Getting FATAL: terminating connection due to 
administrator command 
Date: 16.09.2010 18:49

>Peter Hopfgartner  writes:
>> Tom Lane  wrote
>>> Peter Hopfgartner  writes:
 Now we had the error, but systemtap did not report any SIGTERM. Is it
 possible to have this error without a SIGTERM being involved?
>>> 
>>> Hmph.  I would have said not, but ...
>>> 
>>> What PG version is this exactly?
>
>> "PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
>4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit"
>
>Well, I just took another careful look at the 8.4.x source code.  The
>only occurrence of that error string is in ProcessInterrupts() when it
>sees ProcDiePending set, and so far as I can see ProcDiePending is only
>set by the die() signal handler, and in postmaster child processes die()
>is only used as a SIGTERM handler.  So it's pretty hard to come to the
>conclusion that it could be anything else.
>
>I don't know systemtap well at all, so I can't suggest any reason why
>your script might be missing SIGTERM events, but it seems that it must
>be.
>Perhaps you can find a systemtap expert who can comment on that.
>
>   regards, tom lane
>
Is there any other way I can attack this issue? Attach GDB? How (I haven't done 
any serious C programming in this millenium).

Peter


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transposing rows and columns

2010-09-16 Thread Aram Fingal
On Sep 16, 2010, at 12:28 PM, Sam Mason wrote:

> On Thu, Sep 16, 2010 at 11:42:21AM -0400, Aram Fingal wrote:
>> create table results(
>> expt_no int references experiments(id),
>> subject int references subjects(id),
>> drug text references drugs(name),
>> dose numeric,
>> response numeric
>> )
> 
> What's the primary key?  I presume it's (expt_no,subject,drug,dose).

Yes that's correct.  I copied and simplified from the actual table, which has a 
lot more in the table definition.  


> 
>> Now, suppose I do some computation on the results in the database and
>> want to export it back out to the same kind of format that I received
>> it (drugs and doses in rows and subjects in columns.)
> 
> Have you tried setting up an ODBC data source to the database and use
> the PivotTable functionality in Excel to do the transformation?

I have thought about that but later on, when we do the full sized experiments, 
there will be too many rows for Excel to handle.


> 
> If you want to do the transformation in SQL, you'd be writing something
> like:
> 
>  SELECT drug, dose
>MIN(CASE subject WHEN 1 THEN response END) AS resp_1,
>MIN(CASE subject WHEN 2 THEN response END) AS resp_2,
>MIN(CASE subject WHEN 3 THEN response END) AS resp_3
>  FROM results
>  WHERE expt_no = 1
>AND subject IN (1,2,3)
>  GROUP BY drug, dose
>  ORDER BY drug, dose;

That's a good trick to know but I just tried it and found that it begins to get 
complicated with the actual data.  It also returns a separate row for each 
drug/dose/subject combination which isn't exactly what I want.  Each row has 
one column with a value and the rest of the columns in that row are all .

> 
> Or you can use the tablefunc contrib module as suggested by Uwe.  I
> prefer doing it by hand as you get more options, but it can be quite
> tedious if you've got lots of columns you're trying to deal with.

I guess I need to read up on pivot tables.  At first glance, this looks like 
the best solution.

-Aram


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query join issue

2010-09-16 Thread Christine Penner

Training Course
tc_seq_no (primary key)
tc_trm_seq (foreign key to train_mod table)
... and some other title, desc etc fields

train_mod
trm_seq_no (primary key)
trm_trc_seq (foreign key to train_comp table)
.. and title etc fields

train_comp
trc_seq_no (primary key)
.. and title etc fields

We don't qualify the field names in the where because there is really 
no chance we have duplicate field names. This is because of the 
software we use to access Postgres. Thats also the reason for the all 
caps. all of the query up to the where is put together for me based 
on info I set up for the query.


Hope this info helps. I'm sure there is something wrong with the join 
or something, I just don't see it.


Christine

At 10:22 AM 16/09/2010, you wrote:

On 16/09/2010 16:05, Christine Penner wrote:

I have a query that joins 3 tables, TRAIN_MOD,TRAIN_COMP and
TRAINING_COURSE,

There could be many training_course records for each of the other
tables. I want to get all records from the Train_mod and Train_comp
table even if there are no training course records available. This is
the query I'm trying and I get nothing. The data I'm trying this on has
no training_course records but does have records in the other tables.
What am I doing wrong.


SELECT *
FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON
TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ
LEFT OUTER JOIN TRAIN_COMP ON TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
where TC_PUB_ED IS TRUE OR TC_SEQ_NO IS NULL


Can you show us the table schemas?

Also, I think it's a good idea to qualify the columns in the WHERE 
clause, to prevent any possible ambiguity. Without seeing the table 
definitions I'm only guessing, but is it possible that these are 
doing something other than what you expect?


Ray.

PS - I personally find all-caps SQL very hard to read. :-)

--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query join issue

2010-09-16 Thread Raymond O'Donnell

On 16/09/2010 16:05, Christine Penner wrote:

I have a query that joins 3 tables, TRAIN_MOD,TRAIN_COMP and
TRAINING_COURSE,

There could be many training_course records for each of the other
tables. I want to get all records from the Train_mod and Train_comp
table even if there are no training course records available. This is
the query I'm trying and I get nothing. The data I'm trying this on has
no training_course records but does have records in the other tables.
What am I doing wrong.


SELECT *
FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON
TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ
LEFT OUTER JOIN TRAIN_COMP ON TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
where TC_PUB_ED IS TRUE OR TC_SEQ_NO IS NULL


Can you show us the table schemas?

Also, I think it's a good idea to qualify the columns in the WHERE 
clause, to prevent any possible ambiguity. Without seeing the table 
definitions I'm only guessing, but is it possible that these are doing 
something other than what you expect?


Ray.

PS - I personally find all-caps SQL very hard to read. :-)

--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] value

2010-09-16 Thread Dean Rasheed
2010/9/16 Gissur Þórhallsson :
>> which is probably not what you might expect.
>
> No, indeed it is not.
> My solution - which seems to be working - is replacing:
> new.my_table_id
> with:
> currval(pg_get_serial_sequence('my_table', 'my_table_id'))
> in the on_insert rule
>

OK, but you still need to be careful. That trick will only work if you
insert rows one at a time into the table. If you have a single insert
that inserts multiple rows (as in my example), it would fail because
the currval() call would return the same value for each row in the
insert created by the rule.

My advice would be to convert to triggers.

Regards,
Dean


> This does the trick.
> Thanks,
> Gissur Þórhallsson
>
> Loftmyndir ehf.
> Laugavegur 13
> IS 101 Reykjavík - Iceland
> sími (tel): (+354) 540 2500
> tölvupóstur (email): gis...@loftmyndir.is
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Bill Thoen



> On 9/16/2010 5:18 AM, Sam Mason wrote:
  

>> On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote:


>>> I need to clean up a lot of contact data because of a merge of customer
>>> lists that used to be kept separate.
>>> I allready know that there are double entries within the lists and they
>>> do overlap, too.
>>>
>>> Relevant fields could be  name, street, zip, city, phon
  

>
> Since your're working with words, a better "fuzzy" match algorithm is  
> Levenshtein distance. It's surprizingly good about coming up with an  
> index number for words (or even phrases). Calculate the Levenshtein  
> distance for two words and the closer the indexes are to each other the  
> more alike the words are. It's well-documented on Wikipedia (and other  
> places)
  


The problem with using something like this is that it's awkward to do
anything apart from the Cartesian product and compare everything with
everything else.  If you've got 500k rows to start with, that's a *lot*
of comparisons to be doing.  The nice thing about using something like
soundex is that it allows you to use equality when comparing and hence
PG has a chance to finish the query in a reasonable amount of time.

If you're dealing with a small number of possibilities, I'd agree that
some form of edit distance is a very useful tool.


> But overall, you're basically normalizing an address list with people  
> names. So I first go through the lists with an eye to getting rid of the  
> junk (regexp is great for identifying names with "illeagal" characters)  
> and creating a starting name in uppercase. There's always at least one  
> data entry clerk (usually several) who do quirky things like include  
> notes in the last name field.
  


Yup, normalisation is very important when dealing with this sort of
thing.  Reducing free form text down to something more constrained is
important.


> You really need to parse each word,  
> correct spelling and consider it in the local context, i.e. does it  
> "fit" where it is?  (as a geocding routine would do with street names)  
> There are canonical forms for names and though these vary from country  
> to country, they make good templates for a name parser.
  


I've tried doing something like this before, but tended to give up.  I
guess all the matching I've been doing has been small enough that it was
quicker to handle the odd cases by hand than spending more time writing
general purpose code that never seemed to be quite good enough.

-- Sam http://samason.me.uk/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query join issue

2010-09-16 Thread Christine Penner

I have a query that joins 3 tables, TRAIN_MOD,TRAIN_COMP and TRAINING_COURSE,

There could be many training_course records for each of the other 
tables. I want to get all records from the Train_mod and Train_comp 
table even if there are no training course records available. This is 
the query I'm trying and I get nothing. The data I'm trying this on 
has no training_course records but does have records in the other 
tables. What am I doing wrong.



SELECT *
FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON 
TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ

LEFT OUTER JOIN TRAIN_COMP ON TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
where TC_PUB_ED  IS TRUE OR  TC_SEQ_NO IS NULL

Christine Penner
Ingenious Software
250-352-9495
christ...@ingenioussoftware.com 



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Tom Lane
Peter Hopfgartner  writes:
> Tom Lane  wrote
>> Peter Hopfgartner  writes:
>>> Now we had the error, but systemtap did not report any SIGTERM. Is it
>>> possible to have this error without a SIGTERM being involved?
>> 
>> Hmph.  I would have said not, but ...
>> 
>> What PG version is this exactly?

> "PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
> 20080704 (Red Hat 4.1.2-48), 64-bit"

Well, I just took another careful look at the 8.4.x source code.  The
only occurrence of that error string is in ProcessInterrupts() when it
sees ProcDiePending set, and so far as I can see ProcDiePending is only
set by the die() signal handler, and in postmaster child processes die()
is only used as a SIGTERM handler.  So it's pretty hard to come to the
conclusion that it could be anything else.

I don't know systemtap well at all, so I can't suggest any reason why
your script might be missing SIGTERM events, but it seems that it must be.
Perhaps you can find a systemtap expert who can comment on that.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transposing rows and columns

2010-09-16 Thread Sam Mason
On Thu, Sep 16, 2010 at 11:42:21AM -0400, Aram Fingal wrote:
> create table results(
> expt_no int references experiments(id),
> subject int references subjects(id),
> drug text references drugs(name),
> dose numeric,
> response numeric
> )

What's the primary key?  I presume it's (expt_no,subject,drug,dose).

> Now, suppose I do some computation on the results in the database and
> want to export it back out to the same kind of format that I received
> it (drugs and doses in rows and subjects in columns.)

Have you tried setting up an ODBC data source to the database and use
the PivotTable functionality in Excel to do the transformation?

If you want to do the transformation in SQL, you'd be writing something
like:

  SELECT drug, dose
MIN(CASE subject WHEN 1 THEN response END) AS resp_1,
MIN(CASE subject WHEN 2 THEN response END) AS resp_2,
MIN(CASE subject WHEN 3 THEN response END) AS resp_3
  FROM results
  WHERE expt_no = 1
AND subject IN (1,2,3)
  GROUP BY drug, dose
  ORDER BY drug, dose;

Or you can use the tablefunc contrib module as suggested by Uwe.  I
prefer doing it by hand as you get more options, but it can be quite
tedious if you've got lots of columns you're trying to deal with.

If I've got my assumption about primary key wrong then my code, as well
as the tablefunc, will probably both fail to do the "right thing".

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query join issue

2010-09-16 Thread Christine Penner

I have a query that joins 3 tables, TRAIN_MOD,TRAIN_COMP and TRAINING_COURSE,

There could be many training_course records for each of the other 
tables. I want to get all records from the Train_mod and Train_comp 
table even if there are no training course records available. This is 
the query I'm trying and I get nothing. The data I'm trying this on 
has no training_course records but does have records in the other 
tables. What am I doing wrong.



SELECT *
FROM TRAIN_MOD LEFT OUTER JOIN TRAINING_COURSE ON 
TRAIN_MOD.TRM_SEQ_NO=TRAINING_COURSE.TC_TRM_SEQ

LEFT OUTER JOIN TRAIN_COMP ON TRAIN_MOD.TRM_TRC_SEQ=TRAIN_COMP.TRC_SEQ_NO
where TC_PUB_ED  IS TRUE OR  TC_SEQ_NO IS NULL

Christine Penner
Ingenious Software
250-352-9495
christ...@ingenioussoftware.com  



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transposing rows and columns

2010-09-16 Thread Uwe Schroeder


> I'm working with some people who live and breath Excel.  I need to be able
> to move data back and forth between formats which make sense for Excel and
> for PostgreSQL.  In some cases, this is just to accommodate what people are
> used to.  In other cases, like statistical clustering, it's something that
> really has to be done.
>
> Here is a simplified example:
>
> I'm given data in Excel with one sheet each for a bunch of experiments.  In
> each sheet, there are rows with different drugs at different doses and
> columns for each subject.  The cells contain the response data.  I wrote a
> Perl script which automates the process of extracting that data into a csv
> file which can be imported into a table like the following:
>
> create table results(
> expt_no int references experiments(id),
> subject int references subjects(id),
> drug text references drugs(name),
> dose numeric,
> response numeric
> )
>
> Now, suppose I do some computation on the results in the database and want
> to export it back out to the same kind of format that I received it (drugs
> and doses in rows and subjects in columns.)   One method would be to use
> Perl.  I could use DBD::Pg and loop through a bunch of queries to build a
> two dimensional array and then spit that back out but is there a good way
> to do this just in SQL?  Is there a better way than creating a temporary
> table for each subject and then joining all the temp tables?

You may want to look into the tablefunc contrib module. It contains a crosstab 
which will transpose rows and columns in the result.
This may be slow though.

HTH






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] value

2010-09-16 Thread Gissur Þórhallsson
>
> which is probably not what you might expect.
>
No, indeed it is not.

My solution - which seems to be working - is replacing:
new.my_table_id
with:
currval(pg_get_serial_sequence('my_table', 'my_table_id'))
in the on_insert rule

This does the trick.

Thanks,
Gissur Þórhallsson

Loftmyndir ehf.
Laugavegur 13
IS 101 Reykjavík - Iceland
sími (tel): (+354) 540 2500
tölvupóstur (email): gis...@loftmyndir.is


Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Peter Hopfgartner
Tom Lane  wrote
Subject: Re: [GENERAL] Getting FATAL: terminating connection due to 
administrator command 
Date: 16.09.2010 17:37

>Peter Hopfgartner  writes:
>> Now we had the error, but systemtap did not report any SIGTERM. Is it
>> possible to have this error without a SIGTERM being involved?
>
>Hmph.  I would have said not, but ...
>
>What PG version is this exactly?
>
>   regards, tom lane
>

It says:

"PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
20080704 (Red Hat 4.1.2-48), 64-bit"

and should be identical to the current Red Hat implementation of 
postgresql84-server as in RHEL 5.5

Peter


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Transposing rows and columns

2010-09-16 Thread Aram Fingal
I'm working with some people who live and breath Excel.  I need to be able to 
move data back and forth between formats which make sense for Excel and for 
PostgreSQL.  In some cases, this is just to accommodate what people are used 
to.  In other cases, like statistical clustering, it's something that really 
has to be done.

Here is a simplified example:

I'm given data in Excel with one sheet each for a bunch of experiments.  In 
each sheet, there are rows with different drugs at different doses and columns 
for each subject.  The cells contain the response data.  I wrote a Perl script 
which automates the process of extracting that data into a csv file which can 
be imported into a table like the following:

create table results(
expt_no int references experiments(id),
subject int references subjects(id),
drug text references drugs(name),
dose numeric,
response numeric
)

Now, suppose I do some computation on the results in the database and want to 
export it back out to the same kind of format that I received it (drugs and 
doses in rows and subjects in columns.)   One method would be to use Perl.  I 
could use DBD::Pg and loop through a bunch of queries to build a two 
dimensional array and then spit that back out but is there a good way to do 
this just in SQL?  Is there a better way than creating a temporary table for 
each subject and then joining all the temp tables?  

-Aram
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Tom Lane
Peter Hopfgartner  writes:
> Now we had the error, but systemtap did not report any SIGTERM. Is it
> possible to have this error without a SIGTERM being involved?

Hmph.  I would have said not, but ...

What PG version is this exactly?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Peter Hopfgartner
Peter Hopfgartner  wrote
Subject: Re: [GENERAL] Getting FATAL: terminating connection due to 
administrator command
Date: 16.09.2010 16:56

>Tom Lane  wrote
>
>Subject: Re: [GENERAL] Getting FATAL: terminating connection due to
>administrator command 
>
>Date: 15.09.2010 17:40
>
>
>
>>Peter Hopfgartner  writes:
>
>>> Tom Lane  wrote
>
 This is a result of something sending SIGTERM to the backend process.
>
>>
>
>>> Can I trace where the SIGTERM comes from?
>
>>
>
>>If this is a recent Red-Hat-based release, I think that systemtap could
>
>>probably be used to determine that.  There's a script here that solves
>
>>a related problem:
>
>>http://sourceware.org/systemtap/examples/process/sigmon.stp
>


Now we had the error, but systemtap did not report any SIGTERM. Is it possible 
to have this error without a SIGTERM being involved? As mentioned in a previous 
mail, I've modified the script to report SIGTERM sent to any process.

Peter


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Getting FATAL: terminating connection due to administrator command

2010-09-16 Thread Peter Hopfgartner
Tom Lane  wrote
Subject: Re: [GENERAL] Getting FATAL: terminating connection due to 
administrator command 
Date: 15.09.2010 17:40

>Peter Hopfgartner  writes:
>> Tom Lane  wrote
>>> This is a result of something sending SIGTERM to the backend process.
>
>> Can I trace where the SIGTERM comes from?
>
>If this is a recent Red-Hat-based release, I think that systemtap could
>probably be used to determine that.  There's a script here that solves
>a related problem:
>http://sourceware.org/systemtap/examples/process/sigmon.stp
>

Ok, I was able to create the packages with the debug info, adapted the script 
to report SIGTERM sent so any process, see attachment, tested this sending some 
SIGTERM and seeing them reported by systemtap.

Anyway, currently we are not able to reproduce the problem on the server, 
magically everything seems to work on this machine. The only changes should be 
the 2 debug-info packages 
(kernel-debuginfo-common-2.6.18-194.11.3.el5.x86_64.rpm and 
kernel-xen-debuginfo-2.6.18-194.11.3.el5.x86_64.rpm).

In the same time, the problem appeared twice today on our testing server. This 
suggests to drop the hardware failure hypothesis.

I'm running systemtap on the testing server now, too.

>   regards, tom lane
>

sigmon1.stp
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] missing chunk number 497 for toast value 504723663

2010-09-16 Thread Tom Lane
"Utsav Turray"  writes:
> I am using postgres  7.3.2  on RHEL 4.0.

Egad.

> Secondly what are probable  reasons behind corruption and what can we do to
> prevent this error.

Update.  Whatever reasons you might have for running 7.3.2 are bad ones.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] missing chunk number 497 for toast value 504723663

2010-09-16 Thread Utsav Turray
Dear All ,

 

I am using postgres  7.3.2  on RHEL 4.0.

I know the version is too old but I have no option upgrading the postgres
version.

 

I have a table with a column type "text". The column consists  large data
ranging from 500 kb to 900 kb. 

Total there are 5 records approximately. There are large number of
updates occurring on this table. 

Few days ago on another installation an error was generated during pg_dump
"missing chunk number 0 for toast value xxx" or "missing chunk number 0 for
toast value xxx". This problem was resolved by re-indexing the toast table.

But this time there are total 422 records which are giving error of both
"missing chunk number" and "unexpected chunk number" on the "text column
type" and even re-indexing toast table is not resolving the issue. The only
option I am left is to delete these records.

Is there any other way to resolve this issue as deleting the records will
lead to loss of information and the backup I am having also consists of
corrupted records.

Secondly what are probable  reasons behind corruption and what can we do to
prevent this error.

 

Thanks And Regards,

Utsav Turray


Disclaimer :- This e-mail and any attachment may contain confidential, 
proprietary or legally privileged information. If you are not the original 
intended recipient and have erroneously received this message, you are 
prohibited from using, copying, altering or disclosing the content of this 
message. Please delete it immediately and notify the sender. Newgen Software 
Technologies Ltd (NSTL)  accepts no responsibilities for loss or damage arising 
from the use of the information transmitted by this email including damages 
from virus and further acknowledges that no binding nature of the message shall 
be implied or assumed unless the sender does so expressly with due authority of 
NSTL. 



Re: [GENERAL] "EXECUTE ... into var" doesn't set FOUND: bug or feature?

2010-09-16 Thread Pavel Stehule
2010/9/16 Louis-David Mitterrand :
> On Thu, Sep 16, 2010 at 10:12:57AM -0400, Tom Lane wrote:
>> Louis-David Mitterrand  writes:
>> > I noticed that in a pl/pgsql function FOUND is not set after an
>> >     EXECUTE ... into var;
>> > Bug or feature?
>>
>> It's behaving as documented:
>> http://www.postgresql.org/docs/9.0/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS
>>
>> Adding EXECUTE to the list of statements that change FOUND would have a
>> rather large risk of breaking existing plpgsql functions, IMO.
>
> but is really surprising and non-intuitive, given that 'return query
> execute' does.

RETURN QUERY EXECUTE is a new statement. And it has not original in
PL/SQL so there are not reason to copy misfeature from Oracle.

Regards

Pavel Stehule

>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "EXECUTE ... into var" doesn't set FOUND: bug or feature?

2010-09-16 Thread Louis-David Mitterrand
On Thu, Sep 16, 2010 at 10:12:57AM -0400, Tom Lane wrote:
> Louis-David Mitterrand  writes:
> > I noticed that in a pl/pgsql function FOUND is not set after an 
> > EXECUTE ... into var;
> > Bug or feature?
> 
> It's behaving as documented:
> http://www.postgresql.org/docs/9.0/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS
> 
> Adding EXECUTE to the list of statements that change FOUND would have a
> rather large risk of breaking existing plpgsql functions, IMO.

but is really surprising and non-intuitive, given that 'return query
execute' does.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "EXECUTE ... into var" doesn't set FOUND: bug or feature?

2010-09-16 Thread Tom Lane
Louis-David Mitterrand  writes:
> I noticed that in a pl/pgsql function FOUND is not set after an 
>   EXECUTE ... into var;
> Bug or feature?

It's behaving as documented:
http://www.postgresql.org/docs/9.0/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS

Adding EXECUTE to the list of statements that change FOUND would have a
rather large risk of breaking existing plpgsql functions, IMO.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] value

2010-09-16 Thread Dean Rasheed
2010/9/16 Gissur Þórhallsson :
>> Yes.  You're using RULEs where TRIGGERs would do.  Change to TRIGGERs.
>
> While this could possibly solve my problem in particular; it doesn't explain
> why this is happening.
> Is this somehow expected behavior on an INSERT rule?
>

Rules can be pretty tricky things to work with, and this is one of the
well-known gotchas (to those who know it well!).

Consider the following simplified version of your example:

CREATE TABLE foo(a serial, b text);
CREATE TABLE bar(a int, b text);

CREATE RULE ins_rule AS ON INSERT TO foo
  DO ALSO INSERT INTO bar VALUES(new.a, new.b);

You might think that the rule would guarantee that any insert into foo
would be mirrored with an identical insert on bar. However, this is
not the case. Consider, for example, this insert:

INSERT INTO foo(b) VALUES ('Row 1'), ('Row 2'), ('Row 3');

What the rule will actually do is cause 2 separate INSERT commands to
be executed. The first will add 3 rows to foo, choosing 3 successive
values for 'a' from the sequence. The second command is an insert into
bar, and since 'a' isn't specified, it will use the default for 'a'
from foo, causing another 3 values to be pulled from the sequence. So
the end result is:

SELECT * FROM foo;
 a |   b
---+---
 1 | Row 1
 2 | Row 2
 3 | Row 3
(3 rows)

SELECT * FROM bar;
 a |   b
---+---
 4 | Row 1
 5 | Row 2
 6 | Row 3
(3 rows)

which is probably not what you might expect.

It's this sort of thing that makes many people prefer triggers to rules.

Regards,
Dean

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "EXECUTE ... into var" doesn't set FOUND: bug or feature?

2010-09-16 Thread Pavel Stehule
Hello

2010/9/16 Louis-David Mitterrand :
> Hi,
>
> I noticed that in a pl/pgsql function FOUND is not set after an
>
>        EXECUTE ... into var;
>
> Bug or feature?
>

feature :(

use a GET DIAGNOSTICS statement instead

Regards

Pavel Stehule

> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] "EXECUTE ... into var" doesn't set FOUND: bug or feature?

2010-09-16 Thread Louis-David Mitterrand
Hi,

I noticed that in a pl/pgsql function FOUND is not set after an 

EXECUTE ... into var;

Bug or feature?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libssl issue ?

2010-09-16 Thread Tom Lane
Craig Ringer  writes:
> On 16/09/2010 4:35 PM, Peter Roethlisberger wrote:
>> /usr/local/openssl/lib64/libssl.a: could not read symbols: Bad value

> Bad build of OpenSSL? What does:

Well, the real question here is why it's not picking up a shared-library
version of libssl instead.  By and large you don't want libpq linking
to a static version of openssl, so even if this configuration worked
it wouldn't be a good thing to do.

I'd get rid of the /usr/local version of openssl and use the
distro-provided one instead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Sam Mason
On Thu, Sep 16, 2010 at 03:22:15PM +0200, Andreas wrote:
> We are talking about nearly 500.000 records with considerable overlapping.

Other things to consider is whether each one contains unique entries and
hence can you do a "best match" between datasets--FULL OUTER JOIN is
your friend here, but duplicates become a problem.

> It's not only typos to catch. There is variation in the way to write  
> things that not necessarily are wrong.
> e.g.
> Miller's Bakery
> Bakery Miller
> Bakery Miller, Ltd.
> Bakery Miller and sons
> Bakery Smith (formerly Miller)

Soundex is tolerant to quite a lot of this, but word order is important.
When I've had to do this before ~360k merging with ~80k addresses I've
gone with normalised postcodes (in the UK postcodes contain a nice mix
of letters and numbers meaning that I can be reasonable sure about
typos) and then gone through a reasonable chunk by hand to make sure
things are working "correctly".

Just thought; depending on your spacial sparsity, you may be able
to get away with trusting the zip code and checking when the soundex of
the name is different.

> and the usual
> Strawberry Street
> Strawberrystreet
> Strawberry Str.42
> Strawberry Str. 42
> Strawberry Str. 42-45

Soundex gets those all the same (and even '42-45 Strawberry Str'), so
that's easy.  In fact it completely ignores the numbers so you'll have
to do something specific about them.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using libpq, floats and binary data

2010-09-16 Thread Merlin Moncure
On Wed, Sep 15, 2010 at 6:52 PM, Arjen Nienhuis  wrote:
> On Thu, Sep 16, 2010 at 12:37 AM, Merlin Moncure  wrote:
>> On Wed, Sep 15, 2010 at 6:18 PM, Arjen Nienhuis 
>> wrote:
>> > Hi,
>> > Inserting many of rows is almost always IO bound. Converting ints and
>> > floats
>> > to text is CPU bound and really fast anyway. To speed things up first
>> > look
>> > at things like indexes, how often you need to COMMIT or using COPY. Only
>> > then look at prepared statements and binary transfer modes. Else it's
>> > simply
>> > not worth the headache.
>>
>> That's an awfully broad statement, and untrue...many applications are
>> cpu bound. It's easier to scale storage than cpu after a point.  Also,
>> solid state storage is going to become increasingly common moving
>> forwards.
>>
>> Not all type receiving parsing is trivial as you claim; timestamps and
>> bytea for example are significantly cheaper to send in binary wire
>> format.  Anyways, libpqtypes gives you all the advantages without all
>> the fuss.  If you are really looking to shave cycles we allow you to
>> prepare the format string as well as prepare the statement before
>> sending it.  We wrote this interface for a reason: I'd say on average
>> it cuts down query time around 20% on average in addition to the other
>> advantages it provides.
>>
>> merlin
>
> I think we can agree on one thing: trying to do it without libpqtypes is a
> bad idea.
> Groeten, Arjen

Yeah, except in the special case of bytea, so you can avoid
escaping/encoding and the fairly large efficiency hit.  The
performance benefit of sending ints in binary vs text is barely
measurable.  Timestamps are the biggest win of the common types,
especially if you are starting from something that is itself a binary
time representation.

Another reason a lot of people try and eek every cycle out of the
query they can (preparing the statement and such) is to try and cut
down query execution time as much as possible even in cases where it
doesn't make much difference in terms of overall system load.   This
can happen for example when you have generated sql that crawls the
server record by record.  We all know this is often not the best way,
but when working under those constraints the method of query execution
becomes quite important.   exec prepared in binary vs exec params in
text can result in a greater than 50% reduction in query time in such
cases (some of that benefit is on the client).  If you're firing off
big queries (in terms of running time), obviously all this matters
very little.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Andreas

 Am 16.09.2010 13:18, schrieb Sam Mason:

On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote:

I need to clean up a lot of contact data because of a merge of customer
lists that used to be kept separate.

What to do depends on how much data you have; a few thousand and you can
do lots of fiddling by hand, whereas if you have a few tens of millions
of people you want to try and do more with code.


Thanks Sam,
I'll check this fuzzystrmatch.

We are talking about nearly 500.000 records with considerable overlapping.
It's not only typos to catch. There is variation in the way to write 
things that not necessarily are wrong.

e.g.
Miller's Bakery
Bakery Miller
Bakery Miller, Ltd.
Bakery Miller and sons
Bakery Smith (formerly Miller)

and the usual
Strawberry Street
Strawberrystreet
Strawberry Str.42
Strawberry Str. 42
Strawberry Str. 42-45


Regards
Andreas



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] value

2010-09-16 Thread Gissur Þórhallsson
>
> Yes.  You're using RULEs where TRIGGERs would do.  Change to TRIGGERs.
>

While this could possibly solve my problem in particular; it doesn't explain
why this is happening.
Is this somehow expected behavior on an INSERT rule?

 PS. It seems that I forgot to create a meaningful subject in my original
post, sorry about that,

Kind regards,
Gissur Þórhallsson

Loftmyndir ehf.
Laugavegur 13
IS 101 Reykjavík - Iceland
sími (tel): (+354) 540 2500
tölvupóstur (email): gis...@loftmyndir.is


Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Sam Mason
On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote:
> I need to clean up a lot of contact data because of a merge of customer  
> lists that used to be kept separate.
> I allready know that there are double entries within the lists and they  
> do overlap, too.
>
> Relevant fields could be  name, street, zip, city, phone

GROUP BY is your friend here; you basically want to normalise things as
much as possible and then GROUP BY counting number of duplicates and
where this count is greater than one you need to intervene somehow.

Humans are great at typos (2% of records seem to contain a typo of
some sort in my experience, with almost all of them not mattering)
so the first thing would be to correct the typos (or use algorithms
that are less susceptible to typos) and to start getting things
normalised.  Free form text is a bit of a fiddle to normalise, but the
fuzzystrmatch[1] module in PG can help with this.

An example of query that I do a lot of is:

  SELECT soundex(city), array_agg(DISTINCT city) -- array_accum for 8.3 and 
earlier
  FROM tbl
  GROUP BY 1
  HAVING COUNT(DISTINCT city) > 1
  ORDER BY 2 DESC;

Another common one is:

  SELECT t.*
  FROM tbl t, (
SELECT soundex(city) AS cty, soundex(name) AS name
FROM tbl
GROUP BY 1, 2
HAVING COUNT(DISTINCT name) > 1) x
  WHERE soundex(t.city) = x.city
AND soundex(t.name) = x.name;

I.e. find all the entries with similar sounding cities and names where
they have spelled their names differently.  You can then check through
and correct the entries where they really should be the same.

What to do depends on how much data you have; a few thousand and you can
do lots of fiddling by hand, whereas if you have a few tens of millions
of people you want to try and do more with code.

-- 
  Sam  http://samason.me.uk/

 [1] http://www.postgresql.org/docs/current/static/fuzzystrmatch.html

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] libssl issue ?

2010-09-16 Thread Craig Ringer

On 16/09/2010 4:35 PM, Peter Roethlisberger wrote:

I was able to configure Postgres Version 8.4.4. Now when I execute the make
I get the following:


It would be helpful if you provided a few basic details, like your OS, 
compiler & version, configure command line, etc.


You're clearly on some 64-bit SuSE, but that doesn't really tell anybody 
much.



/usr/local/openssl/lib64/libssl.a: could not read symbols: Bad value


Bad build of OpenSSL? What does:

 nm /usr/local/openssl/lib64/libssl.a

say.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] libssl issue ?

2010-09-16 Thread Peter Roethlisberger
I was able to configure Postgres Version 8.4.4. Now when I execute the make
I get the following:

/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld:
/usr/local/openssl/lib64/libssl.a(t1_meth.o): relocation R_X86_64_32 against
`a local symbol' can not be used when making a shared object; recompile with
-fPIC
/usr/local/openssl/lib64/libssl.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[3]: *** [libpq.so.5.2] Error 1
make[3]: Leaving directory `/tmp/postgresql-8.4.4/src/interfaces/libpq'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/tmp/postgresql-8.4.4/src/interfaces'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/tmp/postgresql-8.4.4/src'
make: *** [all] Error 2


Thansk: Peter