Re: [GENERAL] column names query

2017-09-08 Thread hamann . w
>> On Thu, Sep 7, 2017 at 3:28 PM  wrote:
>> 
>> > Example query
>> > select a.col1, case when a.col2 > 0 then a.col3 else b.xcol1 end as mycol3
>> > from a left join b on 
>> > Expected response
>> > col1 mycol3
>> >
>> 
>> This may be overkill, but works:
>> 
>> postgres=# \copy (select 1 as foo, 2 as bar limit 0) to stdout with (format
>> csv, header, delimiter ' ')
>> foo bar
>> 

Hi Manuel,

this works perfectly, many thanks

Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] column names query

2017-09-07 Thread hamann . w
>> SELECT table_name, column_name
>> FROM information_schema.columns
>> WHERE table_name = 'your_name';
>>  
>> - Original Message -From: hamann w <haman...@t-online.de>To: 
>> pgsql-general@postgresql.orgSent: Thu, 07 Sep 2017 07:18:12 - 
>> (UTC)Subject: [GENERAL] column names query
>> 
>> 
>> 
>> Hi,
>> 
>> is there a simple way to retrieve column names from a query - basically the 
>> way psql addscolumn headings when I do a select?
>> 
>> Best regardsWolfgang Hamann
>> 
>> 
>> 
>> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)To make 
>> changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
>> 

Hi,

I am already using something like this (but using pg_attribute table) to 
retrieve current table layout
for static tables. At the moment I am looking for a solution for ad hoc queries

Example query
select a.col1, case when a.col2 > 0 then a.col3 else b.xcol1 end as mycol3 from 
a left join b on 
Expected response
col1 mycol3

Obviously, I could save the result into a temp table and retrieve the table's 
heading.

Best regards
Wolfgang Hamann 





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] column names query

2017-09-07 Thread hamann . w


>> On Thu, Sep 7, 2017 at 9:18 AM,   wrote:
>> > is there a simple way to retrieve column names from a query - basically 
>> > the way psql adds
>> > column headings when I do a select?
>> 
>> How do you do the query? I mean, JDBC, PERL? After all psql is just a
>> C program doing a query using libpq and can do it, we may provide some
>> useful info if you show yours first.
>> 
>> Francisco Olarte.

Hi,

I imagine this situation:
I start working on a query... when I am happy with the result, I see a table 
(with headers) in psql.
Now I can do
\copy (select  my query here) to /tmp/testfile1
I would like to do something like
\copy (select heading(select  my query here)) to /tmp/heading_testfile1

Best regards
Wolfgang Hamann




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] column names query

2017-09-07 Thread hamann . w


Hi,

is there a simple way to retrieve column names from a query - basically the way 
psql adds
column headings when I do a select?

Best regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Table create time

2017-08-31 Thread hamann . w
>> On 31/08/2017 09:56, haman...@t-online.de wrote:
>> > Hi,
>> >
>> > is there a way to add a table create (and perhaps schema modify) timestamp 
>> > to the system?
>> > I do occasionally create semi-temporary tables (meant to live until a 
>> > problem is solved, i.e. longer
>> > than a session) with conveniently short names.
>> In FreeBSD you'd do smth like this to find the file creation time :
>> ls -lU /data/PG_9.3_201306121/16425/12344
>> 
>> where 12344 is the filenode of the relation in question. In ext4 you may do 
>> this albeit with more difficulty.
>> 

Hello Achilleas,

many thanks for responding. There are two problems;
a) accessing the filesystem will likely require some extra effort (e.g. 
installing an untrusted programming
language)
b) a dump/restore will modify the dates 

best regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Table create time

2017-08-31 Thread hamann . w

Hi,

is there a way to add a table create (and perhaps schema modify) timestamp to 
the system?
I do occasionally create semi-temporary tables (meant to live until a problem 
is solved, i.e. longer
than a session) with conveniently short names.
Also, is there a simple query to identify tables without a table comment? (so a 
weekly cron
could remind me of tables that are already a few days old but have no 
explanatory comment)

I am running PG 9.3

Best regards
Wolfgang Hamann




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How pg_dump works

2016-10-10 Thread hamann . w
>> Hi all,
>>  I need to pg_dump a database to another server.
>> The particularity is that the database is bigger than remaining space on 
>> disk. Obviously, this is not a problem because i'm dumping to another 
>> host, but I need to know if the procedure streams data to remote host or 
>> the data itself is packed up in temporary file(s) that are sent to 
>> remote host.
>> The former would be such a problem, because there are good chances I'll 
>> have my disk filled up...
>> 
>> I've not found details on this in documentation.
>> 
>> Thanks
>> Moreno.
>> 
Hi Moreno,

it is one big stream.
I am sending compressed pg_dump and can actually use
zcat dumpfile.gz | psql mydatabase
on the other end

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] SQL help - multiple aggregates

2016-08-18 Thread hamann . w

Hi,

I have a table cv with custid and vendid columns. Every entry represents the 
purchase of a product 
available from a specific vendor.
Now, for a set of "interesting" vendors, I would like to select a new table
custid, c415, c983, c1256
based upon part queries
select custid, count(vendid) as c415 from cv where vendid = 415 group by custid

The only way i managed to achieve that was

select distinct custid into temp table cv1 from cv;
alter table cv1 add column c415 int;
update cv1 set c415 = part.c415 from
 (select custid, count(vendid) as c415 from cv where vendid = 415 group by 
custid) part
 where cv1.custid = part.custid;
and repeating the process for every column requested

Is there a better way (by creating an aggregate function, perhaps)

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow SQL?

2016-07-12 Thread hamann . w
Hi Bjorn,

I have experienced that some subqueries can be quite slow, and would suspect 
the NOT IN
clause. I occasionally rewrite 
NOT IN (select key from unwanted_candidates)
as
IN (select key from possible_candidates except select key from 
unwanted_candidates)

Admittedly, I am not running latest version

Regards
Wolfgang Hamann

>> I am trying to move a small system from Oracle to PostgreSQL and I have come 
>> upon a sql that runs really slow compared to on the Oracle database and
>> I am not able to interpret why this is slow.
>> 
>> The SQL looks like this:
>> 
>> 
>> Select a.status, a.plass, a.navn, a.avlsnr,
>>date_part('day',(now() - s.dato)) dato_diff, v.tekst, 
>> COALESCE(a.avlsverdi,0)
>> From   sed_uttak s, sem_avlsverdi a, semin_vare v
>> where a.aktiv = 1
>> Ands.dato  = (Select Max(y.dato)
>>   From sed_uttak y
>>   Where y.avlsnr = s.avlsnr)
>> Anda.avlsnr = s.avlsnr
>> Ands.sedtype  = v.tallkode
>> And a.avlsnr Not In (Select avlsnr
>> From dyr_pause_mot)
>> Union
>> Select a.status, a.plass, a.navn, a.avlsnr,
>>date_part('day',(now() - s.dato)) dato_diff, 'Tappe pause', 
>> COALESCE(a.avlsverdi,0)
>> From   sed_uttak s, sem_avlsverdi a,  dyr_pause_mot p
>> Where s.dato = (Select Max(x.dato)
>>   From sed_uttak x
>>   Where x.avlsnr = s.avlsnr)
>> And  a.avlsnr = s.avlsnr
>> And  a.avlsnr = p.avlsnr
>> Union
>> Select a.status, a.plass, a.navn, a.avlsnr, null dato_diff, 'IKKE TAPPET', 
>> COALESCE(a.avlsverdi,0)
>> From   sem_avlsverdi a
>> Where a.aktiv = 1
>> And   a.avlsnr Not In (Select avlsnr From sed_uttak)
>> And   a.avlsnr Not In (Select avlsnr From dyr_pause_mot)
>> 
>> 
>> 
>> And the explain result looks like this:
>> 
>> HashAggregate  (cost=7288068.92..7288092.10 rows=2318 width=36) (actual 
>> time=10740.366..10741.879 rows=6475 loops=1) 
>>   
>>   Group Key: a.status, a.plass, a.navn, a.avlsnr, (date_part('day'::text, 
>> (now() - (s.dato)::timestamp with time zone))), v.tekst, 
>> (COALESCE((a.avlsverdi)::integer, 0))   
>>   ->  Append  (cost=1.46..7288028.35 rows=2318 width=36) (actual 
>> time=0.203..10730.906 rows=8915 loops=1) 
>>  
>> ->  Nested Loop  (cost=1.46..7274678.41 rows=698 width=82) (actual 
>> time=0.203..10638.870 rows=8602 loops=1) 
>>
>>   Join Filter: (s.sedtype = v.tallkode)  
>>  
>>  
>>   Rows Removed by Join Filter: 127006
>>  
>>  
>>   ->  Nested Loop  (cost=1.46..7274438.07 rows=698 width=26) 
>> (actual time=0.189..10607.509 rows=6164 loops=1) 
>>  
>> ->  Seq Scan on sem_avlsverdi a  (cost=1.04..153.19 
>> rows=3238 width=16) (actual time=0.024..4.027 rows=6474 loops=1) 
>>   
>>   Filter: ((NOT (hashed SubPlan 5)) AND (aktiv = 1)) 
>>  
>>  
>>   Rows Removed by Filter: 3  
>>  
>>  
>>   SubPlan 5  
>>  
>>  
>> ->  Seq Scan on dyr_pause_mot dyr_pause_mot_1  
>> (cost=0.00..1.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)   
>>
>> ->  Index Scan using idx_seduttak_avlsnr on sed_uttak s  
>> (cost=0.42..2246.53 rows=1 width=14) (actual time=1.404..1.637 rows=1 
>> loops=6474) 
>>   Index Cond: (avlsnr = a.avlsnr)
>>  
>>  
>>   Filter: (dato = (SubPlan 4))   
>>  
>>  
>>   Rows Removed by Filter: 42 
>>  
>>  
>>   SubPlan 4  
>>

Re: [GENERAL] Running query without trigger?

2016-07-11 Thread hamann . w

>> On 2016-07-09 08:20, haman...@t-online.de wrote:
>> > Hi,
>> >
>> > a table is associated with a trigger for normal use.
>> > An admin (someone with ALTER privilege) can disable tthe trigger, run some 
>> > bulk update,
>> > and then re-enable it. This means, however, that  normal user activity has 
>> > to be locked out.
>> >
>> > There are two possible scenarios: the bulk update would not cause trigger 
>> > activity at all,
>> > because of the values and columns involved.
>> > or - the bulk update is followed by another bulk transaction that  is 
>> > equivalent to trigger
>> > invocations per row.
>> > At least in the first case, running this particular query without 
>> > triggering the trigger,
>> > but normal activity still going on, would be really great
>> > Is there a way to achieve this?
>> >
>> > Regards
>> > Wolfgang Hamann
>> 
>> Well for temporary disabling triggers the easiest is to run:
>>  SET session_replication_role = replica;
>>  UPDATE ...
>>  SET session_replication_role = DEFAULT;
>> 
>> This only affects the current session i.e. you and not any other 
>> sessions which is what you seemed to require. All this assuming you run 
>> a supported version of postgres
>> 
>> Regards,
>> Christian Elmerot, Systems Engineer One.com
>> 
Hello Christian,

thanks for the tip.
Out of the suggestions I got, specifying a list of relevant columns seems to be 
the best one
for my specific case. I believe your suggestion will be helpful in other - more 
ad hoc -
situations

Regards
Wolfgang








-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running query without trigger?

2016-07-11 Thread hamann . w

>> On 2016-07-09 08:20, haman...@t-online.de wrote:
>> > Hi,
>> >
>> > a table is associated with a trigger for normal use.
>> > An admin (someone with ALTER privilege) can disable tthe trigger, run some 
>> > bulk update,
>> > and then re-enable it. This means, however, that  normal user activity has 
>> > to be locked out.
>> >
>> > There are two possible scenarios: the bulk update would not cause trigger 
>> > activity at all,
>> > because of the values and columns involved.
>> > or - the bulk update is followed by another bulk transaction that  is 
>> > equivalent to trigger
>> > invocations per row.
>> > At least in the first case, running this particular query without 
>> > triggering the trigger,
>> > but normal activity still going on, would be really great
>> > Is there a way to achieve this?
>> >
>> > Regards
>> > Wolfgang Hamann
>> 
>> Well for temporary disabling triggers the easiest is to run:
>>  SET session_replication_role = replica;
>>  UPDATE ...
>>  SET session_replication_role = DEFAULT;
>> 
>> This only affects the current session i.e. you and not any other 
>> sessions which is what you seemed to require. All this assuming you run 
>> a supported version of postgres
>> 
>> Regards,
>> Christian Elmerot, Systems Engineer One.com
>> 
Hello Christian,

thanks for the tip.
Out of the suggestions I got, specifying a list of relevant columns seems to be 
the best one
for my specific case. I believe your suggestion will be helpful in other - more 
ad hoc -
situations

Regards
Wolfgang






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Running query without trigger?

2016-07-09 Thread hamann . w

Hi,

a table is associated with a trigger for normal use.
An admin (someone with ALTER privilege) can disable tthe trigger, run some bulk 
update,
and then re-enable it. This means, however, that  normal user activity has to 
be locked out.

There are two possible scenarios: the bulk update would not cause trigger 
activity at all,
because of the values and columns involved.
or - the bulk update is followed by another bulk transaction that  is 
equivalent to trigger
invocations per row.
At least in the first case, running this particular query without triggering 
the trigger,
but normal activity still going on, would be really great
Is there a way to achieve this?

Regards
Wolfgang Hamann




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Skip trigger?

2016-05-01 Thread hamann . w

>> On Sat, Apr 30, 2016 at 1:38 AM,   wrote:
>> > I have a table with a row update trigger that is quite slow.
>> > The trigger finction basically sets some bits in a "changed" column
>> > depending on which values really changed.
>> > For some bulk updates it can be determined in advance that the
>> > trigger function will not have any effect.
>> > Is there any way to run an update query and specify that it should not
>> > activate the trigger.
>> > I know that I can disable the trigger and reenable it later;
>> > however other concurrent updates mights need it
>> 
>> Indeed the main issue is how you want to handle concurrency.  ALTER
>> TABLE statements to disable triggers works and is transactional, but
>> locks the table, which may be undesired.  Here are some useful
>> pointers: 
>> http://blog.endpoint.com/2015/07/selectively-firing-postgres-triggers.html
>> 
Hi Manuel,

many thanks ... this seems to be just what I was looking for. I will give it a 
try tomorrow

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Skip trigger?

2016-05-01 Thread hamann . w
>> On 2016-04-30 02:08, wolfg...@alle-noten.de wrote:
>> > Hi,
>> > 
>> > I have a table with a row update trigger that is quite slow.
>> > The trigger finction basically sets some bits in a "changed" column
>> > depending on which values really changed.
>> > For some bulk updates it can be determined in advance that the
>> > trigger function will not have any effect.
>> > Is there any way to run an update query and specify that it should not
>> > activate the trigger.
>> > I know that I can disable the trigger and reenable it later;
>> > however other concurrent updates mights need it
>> 
>> 
>> I always disable the trigger, run the update, and enable the trigger
>> within a transaction. This locks the table and will prevent other
>> sessions from doing updates without the trigger (I run it during
>> off-hours if it is going to take more time than is acceptable).
>> 
Hi Stephen,
this is what I do now occasionally. However, I would like to be able to run 
some things right away.
The solutions in 
http://blog.endpoint.com/2015/07/selectively-firing-postgres-triggers.html
look very promising

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] running script on the server

2015-07-28 Thread hamann . w

Hi,

can any of the programming languages (perl, tcl) be used to exec a process on 
the server
and open files?
The scenario I have in mind would
- copy some fields of a table to an external text file
- have a script generate binary data from the text file
- load that data into a large object
The client would just
select rebuild()

Best regards
Wolfgang Hamann

--


gpg key: 92FCFB25




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [SOLVED] running script on the server

2015-07-28 Thread hamann . w
Adrian Klaver wrote:
 On 07/28/2015 08:28 PM, haman...@t-online.de wrote:
 
  Hi,
 
  can any of the programming languages (perl, tcl) be used to exec a process 
  on the server
  and open files?
 
 You are looking for untrusted.
 
 For Python that is the only way it comes, hence plpythonu(ntrusted)
 
 For Perl see here:
 
 http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html
 
 That is all I know about.
 
  The scenario I have in mind would
  - copy some fields of a table to an external text file
  - have a script generate binary data from the text file
 
 So what exactly does the above step entail?
 
 Just wondering if it could be done without leaving the database?

Hi Adrian,
the untrusted part seems to be the solution.
The script consists of a perl part - text preprocessing likely to be changed -
as well as a C program that actually builds the blob.
I have a little experience with building postgresql binary modules,
but I would prefer to keep the - tested - program separate.
Copying data from the table and text preprocessing could run in plperl

 
  - load that data into a large object
  The client would just
  select rebuild()

Best regards
Wolfgang Hamann

--


gpg key: 92FCFB25




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_attribute growing extremely

2014-06-05 Thread hamann . w

Hi,

on a server running 8.4 I observe that, since a while, the pg_attribute table 
is growing
enormously. 
Soon after reloading I have one file
ls -s 1249
1048580 1249
a day later this is 
1048580 1249
1048580 1249.1
1048580 1249.10
1048580 1249.11
1048580 1249.12
1048580 1249.13
1048580 1249.14
1048580 1249.15
 682212 1249.16
1048580 1249.2
1048580 1249.3
1048580 1249.4
1048580 1249.5
1048580 1249.6
1048580 1249.7
1048580 1249.8
1048580 1249.9
   4316 1249_fsm
 24 1249_vm
and 5 days later the system had arrived at 102 files

The server is running since quite some time, without any major change.
The workload is mostly reading, a little text updates and a little more binary 
updates.
Every two weeks or so the database is recreated and reloaded
The total size of data is growing slowly (say a rate of 10% per year)

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Debugging of C functions

2013-11-25 Thread hamann . w
 Hi,
 �
 How is it possible to debug user defined functions written in C.
 I can't just include all the files and run it.
 �
 Does anyone have an idea?
 �
 Janek Sendrowski
 

Hi,

I recall compiling with debug as normal, adding a printf followed by 30 sec 
sleep, and
use the pause to attach gdb.
If it is at all possible, you should try to run the database in single mode 
rather than
normal server operation

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] incremental dumps

2013-08-10 Thread hamann . w
 On 8/1/13, haman...@t-online.de haman...@t-online.de wrote:
  Hi,
  I want to store copies of our data on a remote machine as a security
  measure.
 
 
  Wolfgang
 
 2 questions:
 
 1.  How secure is the remote site?
 2.  How much data are we talking about?
 --
 Mike Nolan

Hi Mike,

currently the source uses some 20 GB in a database partition and about 700 GB
in a general data partition. For the database, a diff -e grows to about 10% of 
the size
of a full dump in a week
The remote site is a raid box at a hosting center, with paid backup

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] incremental dumps

2013-08-05 Thread hamann . w

Luca Ferrari wrote:
On Fri, Aug 2, 2013 at 6:55 PM,  haman...@t-online.de wrote:

 thanks for the hint - this is probably one of the things to do.
 I have something else in mind, but at present I just suspect that this might 
 happen:
 when I modify data and select _without an ordering_, I am pretty sure to get 
 the data
 in a different sequence than before. So I wonder whethet forcing the dump to 
 honor
 a particular ordering (if that is at all possible) would also reduce the size 
 of dumps ... or the
 time diff takes to produce them



May I ask what is the final purpose? Because if it is to take a backup
I guess this is not the right way to go, while if it is keeping (and
rebuilding) an history of data, than using a more specific approach
(like logging) could give you less headaches.

Luca
--
Hi Luca,

we recently decided to have off-site backups rather than burning piles of DVDs 
that are kept
on-site. The backup server sits in a data center and is fed nightly via rsync.
The link is not too fast.
One thought in favor of text files: if disaster really strikes (the regular 
machine goes on fire)
it is quite likely that a replacement would be installed with latest versions 
of all software.
Now, if I had binary files, I would probably have to install the old version of 
the software
just to be able to do a regular dump and then reload into newer one
With the planned setup, I would be able to look up previous states of the 
database as well.
(Sample scenario: when was the price of product xyz actually changed?)
This is likely not too convenient ... but loading successive dumps into a 
secondary installation
of the database is definitely worse.

Regards
Wolfgang


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] incremental dumps

2013-08-02 Thread hamann . w


 On 08/01/2013 02:59 AM, haman...@t-online.de wrote:
 
  However, the diff files seem to be considerably larger than one would 
  expect.
  One obvious part of the problem is the fact that diff shows old and new 
  text,
 
 You could try using
   diff --suppress-common-lines -ed
 which in my experience creates the smallest diffs (actually ed scripts).
 Apply them with
   patch -e
 
 Martin


Hi Martin,

thanks for the hint - this is probably one of the things to do.
I have something else in mind, but at present I just suspect that this might 
happen:
when I modify data and select _without an ordering_, I am pretty sure to get 
the data
in a different sequence than before. So I wonder whethet forcing the dump to 
honor
a particular ordering (if that is at all possible) would also reduce the size 
of dumps ... or the
time diff takes to produce them

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] incremental dumps

2013-08-01 Thread hamann . w
Hi,
I want to store copies of our data on a remote machine as a security measure.
My first attempt was a full dump (which takes too long to upload)
followed by diffs between the pgdump files.
This provides readable / searchable versioned data (I could alway apply
the diffs on the remote machine and search the text file, without having
an instance of postgres running on that machine)

However, the diff files seem to be considerably larger than one would expect.
One obvious part of the problem is the fact that diff shows old and new text,
so e.g. changing the amount of stock for a product with a 1kB description
would generate at least 2kb of text in the diff file.

What would be a better approach?

Best regards
Wolfgang


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] File System backup

2013-06-23 Thread hamann . w
 yes, when i tried to start, postgresql service using init.d its gave me the
 error  removed stale pid, postgresql failed to start.
 
 Regards,
 Itishree
 
 

Hi,

have you tried to execute the startup sequence step by step?
On systems without systemd, something like

sh -x /etc/init.d/postgresql start 21 | less

should reveal the entire sequence. I dont know whether systemd has something to 
help,
I really did a step by step run one time

Regards
Wolfgang Hamann




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] how can this get faster

2013-04-22 Thread hamann . w

Hi,

I have a table with mainly a list of keywords and a few odd columns referring 
to their origin.
There is a standard btree index, and searches are carried out as anchored regex.
Entries in the table and queries are uppercased; this turned out to have a big 
speed
advantage over a ~* condition.
So queries are like:
select * from mytable where word ~ '^CARNEVAL'
The table currently holds 15.000.000 rows, and query times are on the order of 
10-20 msec

Now I am trying something different: I have a temp table with words appearing 
in the table
plus some extra data, and I want to get the other columns from that table too

select aux.col1, aux.col2, mytable.inf1, mytable.inf2 from aux, mytable where 
aux.word = mytable.word;

I specifically do
analyze aux;
before the operation. On a sample my aux table has 433 entries (different 
words), and I get
this behaviour
 Hash Join  (cost=12.74..321957.88 rows=472279 width=12) (actual 
time=0.530..2242.837 rows=29342 loops=1)
   Hash Cond: (mytable.word = aux.word)
   -  Seq Scan on mytable  (cost=0.00..259779.89 rows=15317989 width=12) 
(actual time=0.009..808.577 rows=15317717 loops=1)
   -  Hash  (cost=7.33..7.33 rows=433 width=14) (actual time=0.173..0.173 
rows=433 loops=1)
 -  Seq Scan on aux  (cost=0.00..7.33 rows=433 width=14) (actual 
time=0.004..0.063 rows=433 loops=1)
 Total runtime: 2243.924 ms

experimenting with different query data (i.e. different collections of words) 
reveals a
strong dependency on actual data. In particular, I split the above in two 
subtables.
The first one containing 45 rows produced
 Nested Loop  (cost=31.41..169444.22 rows=49082 width=12) (actual 
time=0.040..37.739 rows=590 loops=1)
   -  Seq Scan on aux1  (cost=0.00..1.45 rows=45 width=18) (actual 
time=0.003..0.011 rows=45 loops=1)
   -  Bitmap Heap Scan on mytable  (cost=31.41..3751.76 rows=1091 width=12) 
(actual time=0.014..0.054 rows=13 loops=45)
 Recheck Cond: (mytable.word = aux1.word)
 -  Bitmap Index Scan on mytable_word_idx  (cost=0.00..31.14 rows=1091 
width=0) (actual time=0.008..0.008 rows=13 loops=45)
   Index Cond: (mytable.word = aux1.word)
 Total runtime: 37.800 ms
while the second one (remaining 388 rows) resulted in another sequential scan 
on the large table,
(and even no result within 3 minutes, when omitting the analyze step 

Is there a way to help the planner choose the index scan

Regards
Wolfgang Hamann




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pltcl and modules question

2013-03-27 Thread hamann . w


 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: RIPEMD160
 
 
  is there any way to use a module within a pltcl script, i.e. have
  load /path/to/mystuff.so
  or
  package require mystuff
  in a script.
 
 You can load tcl code by putting it in the pltcl_modules tables. See:
 
 http://www.postgresql.org/docs/9.2/static/pltcl-unknown.html
 
  Similarly, would it be possible to access loaded module from perl 
  script, i.e. have
  use mystuff;
 
 You can load any Perl module you want within a Pl/PerlU function.
 
 It's possible to access shared code with Pl/Perl, but it's a little 
 more involved. See:
 
 http://www.postgresql.org/docs/9.2/static/plperl-under-the-hood.html
 
 - -- 

Hi Greg,

thanks a lot, I will give it a try next week. Comparing the two references, I 
sort of fear
that there mght be a problem when tcl wants to load a binary rather than script 
module
I can try pgtclu as well

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pltcl and modules question

2013-03-26 Thread hamann . w

Hi,

is there any way to use a module within a pltcl script, i.e. have
load /path/to/mystuff.so
or
package require mystuff
in a script.
The reason why I am asking: I have recently converted a fairly slow script
(lots of regex) into one compiled module that basically does all regex at once, 
and critcl
made it fairly easy to build a .so or a tcl package out of the code
 
Similarly, would it be possible to access loaded module from perl script, i.e. 
have
use mystuff;
line

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimizing query?

2013-02-02 Thread hamann . w
Pavel Stehule wrote:

 
 but maybe  https://github.com/dimitri/prefix can help
 

Hi Pavel,

thanks - this works perfect. However, it does not seem to play well
with the optimizer, so I ended up with

select all candidates into a temp table using prefix operator
apply all other conditions by joining that temp table to original ones

Regards
Wolfgang



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re [GENERAL] Deleting 173000 records takes forever

2013-02-01 Thread hamann . w

Alexander Farber wrote:
delete from pref_users
where id not in (select distinct id from pref_money);

Hi,

try instead
delete from pref_users
where id in (select id from pref_users except select id from pref_money);

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimizing query?

2013-01-31 Thread hamann . w

Pavel Stehlule wrote:

  Hi,
 
  I am trying to match items from 2 tables based on a common string.
  One is a big table which has one column with entries like XY123, ABC44, =
 etc
  The table has an index on that column.
  The second table is, typically, much smaller
 
  select  from tab1, tab2 where tab1.code =3D tab2.code;
 
  This works fine and fast.
  Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D=
  in the
  big table and want them to match XY423, GF55 in the second table
 
  Variants I have tried
 
  select   from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])');
  select   from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z=
 ])');
 
 
  Have you tried the substring function?
 
  select   from tab1, tab2 where substring(tab1.code from 1 for 5) =3D
  tab2.code
 

Hi Pavel, it was just by chance that a fixed size substring would match the
data at hand. It is more common to have a digit/letter (or vice versa) boundary
or a hyphen there

 
  both take an enormous time. In the better case that I can subset (e.g. a=
 ll candidates in table 2
  share initial AX) I get back to manageable times by adding
and tab1.code ~ '^AX'
  into the recipe. Actual runtime with about a million entries in tab1 and=
  800 entries in tab2
  is about 40 seconds.
 
 any join where result is related to some function result can be very
 slow, because estimation will be out and any repeated function
 evaluation is just expensive.

I see the problem since obviously every the ~ operator with a non-constant
pattern is constantly recompiling the pattern.

I wonder whether it would be possible to invent a prefix-match operator that 
approaches
the performance of string equality. I noted in the past (not sure whether 
anything
has changed in regex matching) that a constant leading part of regex would 
improve
performance, i.e. use an index scan to select possible candidates.
  
 You can try use a functional index.
 
 create index on tab2 ((substring(tab1.code from 1 for 5))
 

What kind of trick is that - mixing two tables into a functional index?
What would the exact syntax be for that?

Regards
Wolfgang Hamann






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] optimize query?

2013-01-30 Thread hamann . w
Hi,

I am trying to match items from 2 tables based on a common string.
One is a big table which has one column with entries like XY123, ABC44, etc
The table has an index on that column.
The second table is, typically, much smaller

select  from tab1, tab2 where tab1.code = tab2.code;

This works fine and fast.
Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D in the
big table and want them to match XY423, GF55 in the second table

Variants I have tried

select   from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])');
select   from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z])');

both take an enormous time. In the better case that I can subset (e.g. all 
candidates in table 2
share initial AX) I get back to manageable times by adding
  and tab1.code ~ '^AX'
into the recipe. Actual runtime with about a million entries in tab1 and 800 
entries in tab2
is about 40 seconds.

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] optimize query?

2013-01-30 Thread hamann . w

Bob Futrelle wrote:

If looking for the variants with a single suffixed character is all you'll
ever need to do:
Do a single pass on the large table, after creating a field, 'trunc' that
contains a truncated version of the item, e.g., adding XY423 to trunc for
each entry of the form XY423A, or XY423B, etc.  This would be a one-time
operation.

Hi Bob,

unfortunately it is not because  the suffix pattern is not always the same.
On the other hand, I tried to split the query into

select substring (code from '^[A-Z]+[0-9]+') as code into tmp table t
select  where tab2.code = t.code

which definitely was a big improvement. I believe that most of the time
passing in a pattern like that would work

So thank you for that tip

Regards
Wolfgang


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using POSIX Regular Expressions on xml type fields gives inconsistent results

2012-12-22 Thread hamann . w
 
 David Johnston pol...@yahoo.com writes:
  [optionally go look at  ~*  in the documentation at this point; or just
  try a simple flip-flop of the expression]
 
  SELECT 'Kumar' ~* 'kuma' - TRUE !!! (ah ha! I had the order reversed)
 
  SELECT any( ARRAY['Kumar','Gozales']::text ) ... wait...ANY doesn't work on
  the other side... :( 
 
  [At this point I'd confirm or question why ANY hasn't been made to go both
  ways but also realize that I will have to approach this in a different way
  to achieve my goal.]
 
 It's been awhile since I looked at the point in detail, but I seem to
 recall that there are fundamental syntactic-ambiguity reasons why the
 ANY/ALL part has to be on the righthand side of the comparison operator.
 
 There's a much easier fix to this problem though, which is to invent a
 reverse ~ operator that does POSIX comparison with the pattern on the
 left.  The hardest part of doing that for yourself is choosing a name
 for the reverse operator --- it just goes like
 
 create function reverse_regexeq(text, text) returns bool as
 'select $2 ~ $1' language sql strict immutable;
 
 create operator ~~~ (procedure = reverse_regexeq,
 leftarg = text, rightarg = text);
 
 and similarly for the case-insensitive version, and there you go:
 pattern ~~~ ANY (whatever) solves the problem.
 

Hi Tom,

while it might work to reverse the order in a regex, I have experienced severe 
slowdown
when the pattern comes from table data

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Backslashitis

2012-06-14 Thread hamann . w
Hi,

I have a column declared as array of text. I can get a single backslash into 
one of the array elements by
update ... set mycol[1] = E'blah \\here'
If I try to update the whole array
update ... set mycol = E'{blah \\here}'
the backslash is missing. I can get two backslashes there.
Is there a good way to solve the problem, other than rewriting my update script 
to do array updates one element at a time?

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Horizontal aggregation?

2012-04-14 Thread hamann . w


Hi,

I am looking for a feature that would select from a table with
k1  a
k1  b
k1  c
k2  a
k3  b
k3  c
something like
k1  a b c
k2  a
k3  b c
(Just all elements next to each other, with a space in between)
or perhaps an array output
k1  {a,b,c]
k2  {a}
k3  {b,c}

If such an operator exists, would there be a remove duplicates option?

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] non-static LIKE patterns

2012-04-14 Thread hamann . w
 
 haman...@t-online.de writes:
  Tom Lane wrote:
  If you want it to be bulletproof, what I'd think about is something like
 WHERE second.path LIKE quote_like(first.path)||'%'
 
  Just out of curiosity: wouldn't that (as well as using non-static like)
  be an enormous performance problem?
 
 Well, it won't be free, but I think you've already doomed yourself to
 a not-very-bright plan by using LIKE in this way at all.
 
 In any case, as a wise man once said, you can make it run arbitrarily
 fast if it doesn't have to give the right answer.  Correctness trumps
 any micro-optimization questions, so if you have to have prefix matching
 of this sort, it's gonna cost ya somehow.
 

Hi Tom,

I just stumbled across this question because I regularly come across problems 
that,
at first, look like they should be solved with non-static LIKE or REGEX patterns

I actually have two situations where I would need a better plan. One is, 
fortunately,
fairly static (mostly lookups, hardly inserts) for name matches. Many famous 
people
appear in different spellings, say these two musicians
Franz|Ferenc Liszt
Fr(e|y)der(ic|yk) Chopin
So the first plan would be to regex-compare the sought name against the first 
name (or last name)
regexes. Run-time is astronomical, though
My current approach is to
a) keep the regexes in a separate table/column, so names with a regex entry are 
handled in
a smaller query
b) reverse the query: for every regex (they are well-behaved in this context) I 
pre-create a pattern
so that my actual query becomes
where pre-made-pattern ~ searched_name
c) while preparing the pattern, a common initial character (the F for Franz 
and Ferenc) is identified
to build an index. In the rare case that the first letter is already different, 
there would be
two entries in the table. So the actual query can check for first letter before 
it does the regex.

The other situation, unfortionately, is ad-hoc queries where I cannot do that 
kind of preparation
typically, the DB would contain strings like XY4711A, XY271, XY17321AAA, and I 
want to
check whether an input like XY17321 matches a database entry up to the end of 
the
numerals. So I add [^0-9]*$ to the end of my candidates, select
  where candidate ~ entry-in-table
and go for a coffee or two

Of course I would prefer to see a pre-built solution do all that mess for me...

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Horizontal aggregation?

2012-04-14 Thread hamann . w


 
  Hi,
 
  I am looking for a feature that would select from a table with
 
  If such an operator exists, would there be a remove duplicates option?
 
  Regards
  Wolfgang Hamann
 
 
  select k, array_agg(distinct val ) from t;
 
  See the docs: http://www.postgresql.org/docs/9.1/static/sql-expressions.h=
 tml#SYNTAX-AGGREGATES
 
 obviously I forgot the group by:
 
 
 select k, array_agg(distinct val ) from t group by k;

Hi Abel,

thanks a lot.

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] non-static LIKE patterns

2012-04-12 Thread hamann . w

Tom Lane wrote:

patrick keshishian pkesh...@gmail.com writes:
 Thanks for the quick reply. Would be tough choosing another
 reasonable ESCAPE character while dealing with paths. Will think
 more about this.

If you want it to be bulletproof, what I'd think about is something like

WHERE second.path LIKE quote_like(first.path)||'%'

where quote_like() is a function that inserts a backslash before each
backslash, percent, and underscore in the given value.  Probably not
hard to cons that up from regexp_replace().

regards, tom lane


Just out of curiosity: wouldn't that (as well as using non-static like)
be an enormous performance problem?
I tried something with normal ~ regex matching some time ago but
gave up on the idea pretty soon

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] question about alternate ordering of results

2012-04-10 Thread hamann . w

Hi Tom,

declaring another operator class helped. At first, however,
results were sorted deifferent than expected. A little gdb session revealed that
if fact only the FUNCTION 1 entry in the operator class is used

Regards
Wolfgang Hamann

 
 haman...@t-online.de writes:
  Now, in versions 8 and later the using - is rejected,
  the ordering op needs to be  or  member of a btree operator class. 
  What is needed to create the old behaviour again 
  - create a complete operator class, including new names for the unchanged 
  equals/not equals function?
 
 Yes.  It sounds like you have pretty much all the spare parts you need,
 you just have to collect them together into an opclass for each
 ordering you want.
 
  Is this relevant to performance?
 
 Somewhat, in that it helps the planner optimize ordering considerations.
 But IIRC the main argument for tightening it up was to catch mistakes
 wherein somebody says ORDER BY x USING , or some other operator that
 doesn't produce a consistent sort order.
 
  regards, tom lane





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pltcl and modules

2012-04-10 Thread hamann . w


Hi,

I recently found out about critcl, which does a fairly decent job of converting 
a C
fragment inside a tcl procedure into a loadable module.
Now, if I wanted to use the same code inside pltcl, would that be possible?
If yes, any special precautions other than getting the file paths right?

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] question about alternate ordering of results

2012-04-06 Thread hamann . w
Hi,

in the old days (version 7.x) I created a set of btree operators

create operator  (
  procedure = mytext_lt,
  leftarg = text,
  rightarg = text,
  commutator = ,
  negator = =,
  restrict = scalarltsel,
  join = scalarltjoinsel
);

etc. for a completeoperator class

create operator class mytext_ops
  for type text using btree as
 operator 1  (text,text),

);

Some tables have an index using this operator class
create index blah_t  on blah using btree (col1 mytext_ops);

I also created a few extra operators
create operator - (
  procedure = mytext_lt_x,
  leftarg = text,
  rightarg = text,
  commutator = -,
  negator = =-,
  restrict = scalarltsel,
  join = scalarltjoinsel
);
and could use them in a query, like

select  from blah order by col1;
select  from blah order by col1 using ;
select  from blah order by col1 using -;
(resulting in 3 different result orderings)

BTW: the difference between the two ops is natural ordering of numerics: if 
two
strings start to differ at 3rd vs 17th, the  returns them in alpha sort 
and the -
produces proper numeric order

Now, in versions 8 and later the using - is rejected,
the ordering op needs to be  or  member of a btree operator class. 
What is needed to create the old behaviour again 
- create a complete operator class, including new names for the unchanged 
equals/not equals function?
- adding another index to get the feature

Is this relevant to performance? I guess that an index using one opclass and 
ordering using the
other one would result in an index scan to locate the data and then an extra 
ordering pass to
order them the other way

Regards
Wolfgang Hamann





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] can these queries be combined into one?

2012-03-25 Thread hamann . w


Hi,

I am currently doing something like

select ordercode, descr, codes into temp table x from products where ...
Here codes is a bit-mapped field
update x set codes = codes | 512 from othertable t where ordercode = 
t.ordercode and 
select * from x

Is there a way to avoid that temp table?

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] table alias for update

2012-03-25 Thread hamann . w


is there a way to use a table alias for the target table in an UPDATE query?
I would like to avoid spelling out things like that in psql:

update myverylongtablename set col1 = r.col1repl from repltable r 
 where myverylongtablename.id = r.id and myverylongtablename.col2  0

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COPY and indices?

2012-03-13 Thread hamann . w

Scott Marlowe wrote:
2012/3/12 François Beausoleil franc...@teksol.info:
 Hi all,

 When using COPY FROM STDIN to stream thousands of rows (20k and more hourly), 
 what happens with indices? Are they updated only once after the operation, or 
 are they updated once per row? Note that I'm not replacing the table's data: 
 I'm appending to what's already there. I suspect batching writes will be 
 faster than writing each individual row using an INSERT statement.

Copy is faster, but if a single row fails (dup key etc) the whole copy fails.

Hi Scott,

my impression: it would be even faster to drop the indices, do the bulk copy,
and rebuild the indices after the fact.

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] what Linux to run

2012-02-28 Thread hamann . w
 
 If we move to Linux, what is the preferred Linux for running Postgres
 on.  This machine would be dedicated to the database only.=20
 
 I'd like a recommendation for both a GUI hosted version and a non-GUI
 version.  I haven't used Linux in the past but did spend several year s
 in a mixed Unix and IBM mainframe environment at the console level.
 =20
 
Hi,

one thing you might want to consider is system lifetime: some distro may be set 
up so that you
more or less have to reinstall within 2 years, if you plan to use update 
service - others may be
longer.
Now, fast development is great AND allows you to change to better hardware 
easily.
It does however mean that you might get surprised with a different postgres 
version at times
you dont really like it.
If you plan to install from source, this would not be of any concern

regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] what Linux to run

2012-02-28 Thread hamann . w
 
 On Tue, 28 Feb 2012, haman...@t-online.de wrote:
 
  one thing you might want to consider is system lifetime: some distro may
  be set up so that you more or less have to reinstall within 2 years, if
  you plan to use update service - others may be longer. Now, fast
  development is great AND allows you to change to better hardware easily.
  It does however mean that you might get surprised with a different
  postgres version at times you dont really like it. If you plan to install
  from source, this would not be of any concern
 
 Wolfgang,
 
Most updates fix security vulnerabilities. If you keep current with those
 there's not a compelling need to upgrade the distribution itself unless you
 want to do so. There's a distinction between the distribution itself
 (kernel, and GNU tools) and the end-user applications bundled with the
 distribution. Also, the distributions with which I'm familiar allow you to
 select the applications to upgrade so you can avoid surprises.
 

Hi Rich,

if - after say 18 months, I do no longer get updates (this seems to be 
lifecycle of
the locally popular SuSE), it means that you either have to do an upgrade 
install
or forget about security fixes. Now the upgrade install might bring you some 
software
with incompatible changes, or even might replace some software you used to rely 
on
with something different
After some unpleasant surprises I stopped to upgrade: rather get a fresh box, 
install
everything there, and once it plays nicely, swap them

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] problem setting up

2012-02-27 Thread hamann . w
Hi, I am just trying to build a duplicate of our database server.
Like the main one, I compiled it from source, ran initdb, created a
superuser, and then proceded to creating a database.
However, db creation is denied to the superuser. Likewise,
if I create the db as user postgres and then change ownership, I still
do not have enough rights to, say, install a language.
The pg_database.datdba as well as pg_user tables seem to have correct values.
Where could I look next?
Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Searching for bare letters

2011-10-02 Thread hamann . w
Reuven M. Lerner wrote:

 pHi, everyone.  I'm working on a project on PostgreSQL 9.0 (soon
   to be upgraded to 9.1, given that we haven't yet launched).  The
   project will involve numerous text fields containing English,
   Spanish, and Portuguese.  Some of those text fields will be
   searchable by the user.  That's easy enough to do; for our
   purposes, I was planning to use some combination of LIKE searches;
   the database is small enough that this doesn't take very much
   time, and we don't expect the number of searchable records (or
   columns within those records) to be all that large./p
 pThe thing is, the people running the site want searches to work
   on what I'm calling (for lack of a better term) bare letters. 
   That is, if the user searches for n, then the search should also
   match Spanish words containing ñ.  I'm told by Spanish-speaking
   members of the team that this is how they would expect searches to
   work.  However, when I just did a quick test using a UTF-8 encoded
   9.0 database, I found that PostgreSQL didn't  see the two
   characters as identical.  (I must say, this is the behavior that I
   would have expected, had the Spanish-speaking team member not said
   anything on the subject.)/p
 pSo my question is whether I can somehow wrangle PostgreSQL into
   thinking that n and ñ are the same character for search
   purposes, or if I need to do something else -- use regexps, keep a
   naked, searchable version of each column alongside the native
   one, or something else entirely -- to get this to work./p
 pAny ideas?/p
 pThanks,/p
 pReuvenbr

I had the same problem with german (there is ä ö ü)
I ended up with a normalized version of the database (for many purposes, this 
could
be just an extra column) plus preprocessing the input.
There is one difficulty with german searches: these letters are commonly 
transliterated into
ue etc, like in Muenchen. So depending on culture, some people would expect a 
u search
term to match, and others the ue. So preprocessing query means replacing bare 
u
(not followed by e) with a ue? regex

BTW: if your search form does not explicitly tell the browser to use utf8 to 
encode the search field,
you might expect a small proportion of iso-latin1 requests

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] how to improve this similarity query?

2011-09-30 Thread hamann . w
Hi,

I have a table of names, and searches are usually performed on prefix match.
This could nicely translate into an index search
Suppose first name is stored as either 'Jim' or 'Jimmy', searching 
... where firstname ~* '^jim';
gets proper result. I had hoped that creating a functional
index on lower(firstname) and using a query like
 where lower(firstname) ~ '^jim'
would improve the search, but it does not.
I ended up with adding a lowercased column for matching

Now a few names (a few percent of the entire lot) have alternates, like 'James'.
These could be nicknames, informal variants, language variants, alternate 
spellings

I have already split off these few percent into a separate table and can query 
that like
... where 'jim' ~* firstname_pattern;'jim' ~* firstname_pattern;'jim' ~* 
firstname_pattern;

There are two problems with this approach: when I use 'Jimbo' for the plain 
query,
I do not get any 'Jim' or 'Jimmy', as expected. To achieve the same effect on 
the pattern
query, I need to decorate the pattern somewhat. Actually, when I specify 
'J(im|ames)' for
the pattern, it gets preprocessed - and is stored in the database as - 
'J(im|am($|e($|s)))$'
Unfortunately there are regex patterns which the preprocessing script cannot 
handle, so
I might have to try a different regex.
The other, bigger, problem: the search cannot make use of  an index, and it has 
to compile
a regex for every entry in the table. I am considering a change to that part: 
in the Jim/James
case it is obvious that I could speed up the query with
 where firstname_pattern ~* '^j' and 'jim' ~* firstname_pattern;
If the pattern was 'Bob|Robert' instead, I would have to change the 
preprocessing so the
'B' and 'R' parts would be separate.

So, I wonder whether there is any better way of doing these. I have looked into 
tsquery
resp. fulltext, but they do not seem to support prefix matches, only exact ones.

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-26 Thread hamann . w

Eduardo Morras nec...@retena.com wrote:

 
 At 08:04 25/09/2011, haman...@t-online.de wrote:
 
   select * from items where regexp_matches(items.code,'(ABC) (DE1)
   (any_substring)'){};
  
 
 Hi Eduardo,
 
 it is clear that scanning the table once with a list of matches will 
 outperform
 rescanning the table for every string wanted. Now, my problem is 
 that the patterns are
 dynamic as well. So if I could translate a table with one 
 column  and a few thousand rows
 into something like
 regexp_matches(code,'string1|string2|.string2781')
 would ideally be a performant query. Unfortunately I have no idea 
 how I could achieve this
 transformation inside the database. Doing it externally fails, 
 because any single query cannot
 be more than so few characters.
 
 You can create a plsql function and pass a setof text that do it. 
 Sorry but instead saying you What Must You Type, WMYT(c), i prefer 
 the How Should You Do way, HSYD(c). Note that you can get the same 
 results using other approachs (f.ex. using FTS described in chapter 12)
 
 Check this topics:
 
 Function 
 Creation  http://www.postgresql.org/docs/9.0/static/sql-createfunction.html
 
 Tutorial about Function 
 Creation  http://www.adderpit.com/practical-postgresql/x10374.htm
 

Hi,

I tried the pl/sql approach to convert the contents of that patterns table into 
a regex.
Results: 40 seconds runtime for 9500 candidates and 815 patterns
718 seconds for the same set of 9500 candidates, but using 4000 patterns 
instead.
So it seems that I am reaching limits of pattern match


As for the fulltext index (and the underlying tsquery): this is an exact match 
rather than prefix
match, so I would need to know match patterns in advance in order to build the 
index

I am thinking about that anyway (because ABC1234 likely should not match ABC123 
pattern
in my context), but I would sort of prefer a system where I can state the rules 
when I
see the data set, rather than having to pre-create an index.

Thanks for the tutorial link :)
It seems that the responses on my post give all sorts of input that will help 
me on other
tasks

Regards
Wolfgang Hamann






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-26 Thread hamann . w


Alban Hertroys haram...@gmail.com wrote:

  Hi,
 =20
  the strings are not really known before.
  Let me explain the scenario; there is one table about products, and =
 code is the
  manufacturer's (or resellers') product id.
  So, if ABC were a maker of laptops, ABC123 and ABC456 might be two =
 different machines,
  and ABC123G might have G3 mobile installed, or ABC123X might be the =
 same thing
  with extra memory. Obviously these device variants all look the same.
  Now reseller sends us a collection of product images, so there would =
 be ABC123.jpg
  and ABC456.jpg
  The database task at hand is matching products to images (and then =
 inserting the image
  name into a column of the products table).
 
 
 I guessed right then. The origin of your problem is that you have =
 similar items in your database, but the database doesn't know they are =
 similar. I'd fix that first, it makes the problem a whole lot easier to =
 handle.
 
 For example, if an image comes in named ABC123G.jpg, you look up the =
 product and manufacturer and update its image. Then you query for =
 products of the same manufacturer that are similar to ABC123G (result: =
 ABC123 and ABC123X) and update their images as well (if appropriate; =
 perhaps they have a recent enough image of their own?).
 
 
 As another whacky alternative to your regular expressions; I think it =
 would be possible to abuse the text-search functionality in Postgres to =
 match product id's. Those id's are basically a language per manufacturer =
 describing product details.
 
 If you can split the product id's up into lexemes that describe the base =
 product id and it's options, then you can use full-text search to match =
 up expressions similar to the lexemes derived from the image name.
 
 For example:
  productid | lexemes
 ---+--
  ABC123| {'ABC' '123'}
  ABC123G   | {'ABC' '123' 'G'}
  ABC123X   | {'ABC' '123' 'X'}
  ABC456| {'ABC' '456'}
 
 I'm not really sure if that's possible, or how much work it would be per =
 manufacturer - I haven't used FTS much.
 
 I'd first see if I couldn't add that similarity information to the =
 products table, though ;)
 
 Alban Hertroys
 
 --
 The scale of a problem often equals the size of an ego.
 
 

Hi,

the actual process tends to be
- add products to database
- then receive images and try to match them to products.

So I know about the naming scheme only when I see a list of files, and would 
want to have
a cmdline option for my matching script that distinguishes formats like
a) exact
b) alpha suffix following numeric main body
c) period or slash between main and related
To make things even more complex, I might receive images from a reseller that 
offers
a few manufacturers using different conventions.

I wonder whether this would translate well into building a temporary index, if 
I detect b or c
patterns.

When I asked first, I also started working on a different approach. This did 
work on the
current case (I got similar datasets before, but with considerably fewer 
items), so I will
probably try that and some ideas I got from this discussion, and see how far I 
get.
This approach is a separate perl script that builds a tree structure of all the 
image names
and then tries to create a regex for a crude filter. In the example it would 
have
determined that all images in the lot match an ABC prefix. Then it selects all 
matching
codes (so it can work with the entire database) and runs them through the same 
tree
structure.

Regards
Wolfgang Hamann



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-25 Thread hamann . w
Eduardo Morras wrote:

 
 Hi,
 
 if I understand this right, it does not mean check if the string 
 appears at position 0
 which could translate into an index query, but rather check if the 
 string appears anywhere
 and then check if that is position 0, so the entire table is checked.
 
 The second one yes, as it checks all patterns you want only one time 
 per row they only needs one table scan. The first one eliminates the 
 substring 'ABC' from the string, if the lengths of both strings are 
 equal, the substring 'ABC' wasn't in it. If they are different, the 
 trimmed string will be shorter.
 
 explain analyze select items.num, wantcode from items, n where 
 strpos(code, wantcode) = 0;
   Nested Loop  (cost=167.14..196066.54 rows=39178 width=36) (actual 
  time=0.074..36639.312 rows=7832539 loops=1)
 Join Filter: (strpos((inner.code)::text, outer.wantcode) = 0)
 -  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32) (actual 
  time=0.005..2.212 rows=815 loops=1)
 -  Materialize  (cost=167.14..263.28 rows=9614 width=42) 
  (actual time=0.007..13.970 rows=9614 loops=815)
   -  Seq Scan on items  (cost=0.00..167.14 rows=9614 
  width=42) (actual time=0.044..14.855 rows=9614 loops=1)
   Total runtime: 46229.836 ms
 
 
 The query ran much faster than the pattern query, however. This 
 seems to be the performance
 of just searching for a plain string vs. initializing the regex 
 engine every time (for 815
 queries in a test set)
 
 It will do only one table scan while your original code will do one 
 for each substring you want to test. You can add more and more 
 substrings without too much cost. If you want to use the regex engine 
 instead the postgresql string funtions check the regexp_matches(), it 
 should be faster if you have 3000 substrings.
 
 select * from items where regexp_matches(items.code,'(ABC) (DE1) 
 (any_substring)'){};
 

Hi Eduardo,

it is clear that scanning the table once with a list of matches will outperform
rescanning the table for every string wanted. Now, my problem is that the 
patterns are
dynamic as well. So if I could translate a table with one column  and a few 
thousand rows
into something like
regexp_matches(code,'string1|string2|.string2781')
would ideally be a performant query. Unfortunately I have no idea how I could 
achieve this
transformation inside the database. Doing it externally fails, because any 
single query cannot
be more than so few characters.

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-23 Thread hamann . w
Alban Hertroys haram...@gmail.com wrote:

 What is the output of explain?

 You say 'the other table', so presumably we're dealing with a foreign key
 here. Is there an index on that column?

Albe Laurenz wrote:

 Is the index used for where code ~ '^ABC3563'?
 
 If not, then the result is fast only because the table is scanned only once,
 and it's just the factor of 3000 that's killing you.
  
 The second query (where code ~ wantcode) can never use an index because
 the pattern wantcode is unknown at query planning time.
  
 Yours,
 Laurenz Albe


Here I created a subset (just number and code matching a certain prefix)

\d items
  Table pg_temp_1.items
 Column | Type  | Modifiers 
+---+---
 num| integer   | 
 code   | character varying(40) | 
create index itemsc on items (code);

select count(*) from items;
 count 
---
  9614

A single anchored query
select * from items where code ~ '^ABC';
does indeed use the index to retrieve data.
 
Next I copied a file of wanted codes

create temp table n (wantcode text);
\copy n from /tmp/rmartin.tmp

the file contains plain names, i.e. unanchored matches

explain analyze select num, n.wantcode from items, n where items.code ~ 
n.wantcode;
 Nested Loop  (cost=20.00..216502.14 rows=48070 width=36) (actual 
time=148.479..336280.488 rows=2871 loops=1)
   Join Filter: ((outer.code)::text ~ inner.wantcode)
   -  Seq Scan on items  (cost=0.00..167.14 rows=9614 width=42) (actual 
time=0.048..38.666 rows=9614 loops=1)
   -  Materialize  (cost=20.00..30.00 rows=1000 width=32) (actual 
time=0.001..1.049 rows=815 loops=9614)
 -  Seq Scan on n  (cost=0.00..20.00 rows=1000 width=32) (actual 
time=0.003..1.839 rows=815 loops=1)
 Total runtime: 336286.692 ms

An exact match  where items.code = n.wantcode on the same data completes in 
40 ms

BTW: indexing the second table does not affect the query plan or the runtime, 
it just shows
actual row count rather than estimate.

This is, of course, bad; an anchored match could be faster and also is more 
appropriate
to the scenario. So I change the contents of the second table

update n set wantcode = textcat('^', wantcode);

and try again, with similar results
 Nested Loop  (cost=14.15..176478.01 rows=39178 width=36) (actual 
time=125.114..308831.697 rows=2871 loops=1)
   Join Filter: ((outer.code)::text ~ inner.wantcode)
   -  Seq Scan on items  (cost=0.00..167.14 rows=9614 width=42) (actual 
time=0.061..2034.572 rows=9614 loops=1)
   -  Materialize  (cost=14.15..22.30 rows=815 width=32) (actual 
time=0.001..1.095 rows=815 loops=9614)
 -  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32) (actual 
time=0.114..1.893 rows=815 loops=1)
 Total runtime: 308837.746 ms


I am aware that this is unlikely to work fast (the planner would perhaps need a 
hint in the query
rather than in the data column to choose an anchored match algorithm (in case 
there is
such an algo, of course)

So I wonder whether there might be a different approach to this problem rather 
than
pattern matching.
I recall I had a similar problem before with a contacts column possibly 
containing one or more
email addresses. Here searches would also be number of people times number of 
requests
performance. I finally ended up with a @@ match (contrib/tsquery) and a 
supporting GIST index,
but that only supports exact match, not prefix

Regards
Wolfgang Hamann






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-23 Thread hamann . w
Eduardo Morras wrote:

 You can try these, i doubt they will use any index but its a 
 different approach:
 
 select * from items where 
 length(items.code)length(rtrim(items.code,'ABC'));
 
 select * from items where strpos(items.code,'ABC')=0 or 
 strpos(items.code,'any_substring')=0;

Hi,

if I understand this right, it does not mean check if the string appears at 
position 0
which could translate into an index query, but rather check if the string 
appears anywhere
and then check if that is position 0, so the entire table is checked.

explain analyze select items.num, wantcode from items, n where strpos(code, 
wantcode) = 0;
 Nested Loop  (cost=167.14..196066.54 rows=39178 width=36) (actual 
time=0.074..36639.312 rows=7832539 loops=1)
   Join Filter: (strpos((inner.code)::text, outer.wantcode) = 0)
   -  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32) (actual 
time=0.005..2.212 rows=815 loops=1)
   -  Materialize  (cost=167.14..263.28 rows=9614 width=42) (actual 
time=0.007..13.970 rows=9614 loops=815)
 -  Seq Scan on items  (cost=0.00..167.14 rows=9614 width=42) (actual 
time=0.044..14.855 rows=9614 loops=1)
 Total runtime: 46229.836 ms


The query ran much faster than the pattern query, however. This seems to be the 
performance
of just searching for a plain string vs. initializing the regex engine every 
time (for 815
queries in a test set)

Regards
Wolfgang Hamann

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-23 Thread hamann . w
Alban Hertroys wrote:

 So you're comparing a variable field value to a variable pattern - yeah,
 that's going to hurt. There's no way you could index exactly that.
 
 Perhaps there's some way you can transform the problem so that you get
 something indexable?
 For example, if your match patterns follow a certain pattern by themselves,
 you could add a column with the longest match pattern that would match the
 string. Then you could just do a query for which records have the match
 pattern (in that new column) that you're looking for and voila!
 
 If something like that is possible strongly depends on what kind of match
 patterns you're using, of course.

Hi Alban,

I already did that - the test set is just all records from the real table 
(about a million
entries) that match the common 'ABC' prefix

 An exact match  where items.code = n.wantcode on the same data completes
 in 40 ms


 That's an exact string match, of course that will be fast ;)

The main difference is: the fast query looks like

explain select items.num, wantcode from items, n where code = wantcode;
 Merge Join  (cost=53.56..1104.02 rows=39178 width=36)
   Merge Cond: ((outer.code)::text = inner.wantcode)
   -  Index Scan using itemsc on items  (cost=0.00..438.75 rows=9614 width=42)
   -  Sort  (cost=53.56..55.60 rows=815 width=32)
 Sort Key: n.wantcode
 -  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32)

and the slow ones looks like that one:

 Nested Loop  (cost=14.15..176478.01 rows=39178 width=36)
   Join Filter: ((outer.code)::text ~ inner.wantcode)

So the database takes an entirely differnet approach at retrieving the entries.

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] looking for a faster way to do that

2011-09-23 Thread hamann . w
 
  Hi Alban,
 
  I already did that - the test set is just all records from the real table=
  (about a million
  entries) that match the common 'ABC' prefix
 
 I think you misunderstood what I wrote. Notice the difference between
 which strings match the pattern and which records have the match
 pattern (in that new column) - the first is a regular expression
 match (unindexable), while the second is a string equality match
 (indexable).
 
 What I'm suggesting is to add a column, which for the string 'ABCDEFG'
 would contain 'ABC%'.
 Data would look like:
 
 SELECT str, pattern FROM tbl;
 =A0str | pattern
 -+-
 =A0ABCDEFG | ABC%
  ABCDEF  | ABC%
  BCDEFGH | BCD%
  etc.
 
 (can't format this properly in webmail, sorry)
 
 When you look for records that match the pattern 'ABC%', you would
 normally perform a query like:
 
 SELECT str FROM tbl WHERE str LIKE 'ABC%';
 
 But with this new column, you would query:
 
 SELECT str FROM tbl WHERE pattern =3D 'ABC%';
 
 As I said, it depends a lot on your pattern needs whether this
 solution would work at all for you. If you only ever use a few
 patterns, it will work. If you use many different patterns or don't
 know before-hand which patterns will be used, it won't work well at
 all.
 
  The main difference is: the fast query looks like
 
  explain select items.num, wantcode from items, n where code =3D wantcode;
  =A0Merge Join =A0(cost=3D53.56..1104.02 rows=3D39178 width=3D36)
  =A0 Merge Cond: ((outer.code)::text =3D inner.wantcode)
  =A0 - =A0Index Scan using itemsc on items =A0(cost=3D0.00..438.75 rows=
 =3D9614 width=3D42)
  =A0 - =A0Sort =A0(cost=3D53.56..55.60 rows=3D815 width=3D32)
  =A0 =A0 =A0 =A0 Sort Key: n.wantcode
  =A0 =A0 =A0 =A0 - =A0Seq Scan on n =A0(cost=3D0.00..14.15 rows=3D815 wid=
 th=3D32)
 
 Is there an index on wantcode? If you have a million or more records,
 I would expect an index scan for a measly 815 matches...
 
  and the slow ones looks like that one:
 
  =A0Nested Loop =A0(cost=3D14.15..176478.01 rows=3D39178 width=3D36)
  =A0 Join Filter: ((outer.code)::text ~ inner.wantcode)
 
  So the database takes an entirely differnet approach at retrieving the en=
 tries.
 
 Yes, because you're still using ~ there, with a pattern that's unknown
 at query planning time. That will only be fast under some fairly rare
 circumstances.

Hi,

the problem is that I read the patterns from a file, as part of the script. 
They are not
known seperately. So it seems that creating the extra column is just the same 
effort as
running the original query. The processing is one-time in nature.
The one thing I can do is selecting a range of items on a common prefix, if all 
the
codes in the second table have some characters in common

 Is there an index on wantcode? If you have a million or more records,
 I would expect an index scan for a measly 815 matches..

When I ran a test, there was no real difference with wantcode indexed or not
It was interesting to try another suggestion and noting the difference between 
comparison
functions, with identical (lack of) use of an index

Regards
Wolfgang Hamann


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] looking for a faster way to do that

2011-09-21 Thread hamann . w


Hi,

I have one large table (about a million entries) with an indexed column 
containing codes
like ABC3561A, ABC3563X, 72-451-823 etc. (lots of order numbers from different
manufacturers)

When I ask for a specific item
select code  where code = 'ABC3563X'
I get fast result. I also get fast result when doing a prefix match
select code  where code ~ '^ABC3563'

If a am retrieving many items by joining with another table
select code . where code = wantcode
this is still fast.
If I try to get many items on a prefix match
select code  where code ~ wantcode
things go very slow. Explain shows a nested loop, so seemingly the table is 
rescanned
for every wanted item in the other table. A test run (3000 wanted codes against 
a
shortened table of 1 ones) took about 200 seconds to complete

What other queries could I use to get the requested selection?

Regards
Wolfgang




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general