[GENERAL] arrays, inline to pointer

2016-05-03 Thread Marcus Engene

Hi,

I have some whopper tables for machine learning. One table has a handful 
of 200-500 double precision arrays (representing feature vectors). It's 
a 9.5 on a SSD (over USB3). Each table has 5-15M rows in them.


For each array I've added, and populated, any dealings with the table 
has become way way slower. I can only assume this is because the array 
data is inline in the datablock on disk that stores the row.


Is it possible to "have a pointer instead", or is the only workaround a 
child table with vector only in it?


Best regards,
Marcus



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] efficient math vector operations on arrays

2015-12-23 Thread Marcus Engene

Hi,

Are there highly efficient C extensions out there for math operations on 
arrays? Dot product and whatnot.


Example usecase: sort an item by euclid distance.

Kind regards,
Marcus



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] efficient math vector operations on arrays

2015-12-23 Thread Marcus Engene

On 24/12/15 07:13, Pavel Stehule wrote:

Hi

2015-12-24 8:05 GMT+01:00 Marcus Engene <meng...@engene.se 
<mailto:meng...@engene.se>>:


Hi,

Are there highly efficient C extensions out there for math
operations on arrays? Dot product and whatnot.


what you mean "highly efficient" ?


Implemented as a C module so I wont have to unnest or plpgsql.

Kind regards,
Marcus



[GENERAL] Trigger to a queue for sending updates to a cache layer

2014-08-20 Thread Marcus Engene

Hi,

I'm working with a retail site with tens of millions of products in 
several languages.


For the detail pages, we try to cache in memcached. We also have quite a 
bit of keyword translation lookups (for international queries to solr).


We're thinking of adding a nosql layer that takes the big beating with 
the mundane reads. I don't fancy manually adding code to every possible 
place in the php code that touches the related tables.


What fast trigger solution would you use for this? I'm thinking of 
something that would just push (via fdw?) data to a queue; table name 
and item id. Naturally it should not slow down pg noticable and have 
stellar uptime.


Is the whole concept a stupid idea?

Thanks,
Marcus



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] copy expensive local view to an RDS instance

2014-05-06 Thread Marcus Engene

Hi,

I have a local db behind a firewall etc. Basically, I'd like to do what 
I'd locally would...


create table abc
as
select
*
from
local_expensive_view;

abc - on RDS
local_expensive_view - on local machine

How would you go about doing this?

Thanks,
Marcus



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] copy expensive local view to an RDS instance

2014-05-06 Thread Marcus Engene

On 06/05/14 16:58, bricklen wrote:


A very quick search shows that rds supports dblink, so perhaps that 
would work.

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html


Then I'd need to open our servers to external visits. It would be lovely 
if dblink_exec could push a subselect of data instead instead of pull 
from RDS. Does this make sense?


Thanks,
Marcus



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] copy expensive local view to an RDS instance

2014-05-06 Thread Marcus Engene

On 06/05/14 17:15, bricklen wrote:
On Tue, May 6, 2014 at 8:07 AM, Marcus Engene meng...@engene.se 
mailto:meng...@engene.se wrote:


On 06/05/14 16:58, bricklen wrote:


A very quick search shows that rds supports dblink, so perhaps
that would work.

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html


Then I'd need to open our servers to external visits. It would be
lovely if dblink_exec could push a subselect of data instead
instead of pull from RDS. Does this make sense?


Is the idea to pull data from the RDS to your local machine? If so, 
dblink or possibly plproxy[1] should be able to do that.


[1] https://wiki.postgresql.org/wiki/PL/Proxy
Sorry, no, I want to push data from my local machine but I've only seen 
examples of push with dblink_exec and litteral values rather than a 
subselect or smth.


Best regards,
Marcus



[GENERAL] insert data over dblink from bulk select of local table

2014-03-25 Thread Marcus Engene

Hi,

I have a production database with some heavy views.

I'd like to make a hourly cron (or smth) that bulk inserts new entries 
into a table on RDS.


It's easy to get data down to the prod side, but is it possible to do a 
bulk insert the other way around? All examples I've seen adds the data 
in a litteral way.


Thanks,
Marcus



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] varchar lengths

2010-09-21 Thread Marcus Engene

Hi list,

In Oracle I can...
create table a
(
 b varchar2(10 chars)
);
...and then, regardless of character encoding and how much space an 
ascii character vs a ö takes, 10 characters will fit there.


If I do say a web-thing in php I have to do horrors like...
if (10  mb_strlen ($b, '8bit'))
{
// fail to explain to the user why it's too long
}

I could be liberal with the length and instead do a...
create table a
(
 b varchar(40)
);
...but I like constraints and this isn't a mysql list so I don't have to 
motivate that.


Is there anything I've misunderstood? How does the rest of you deal with 
this situation?


Thanks,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] varchar lengths

2010-09-21 Thread Marcus Engene

On 9/21/10 1:29 , Terry Lee Tucker wrote:

On Tuesday, September 21, 2010 07:23:45 Massa, Harald Armin wrote:
   

I recommend to use TEXT as type for that kind of columns.
99 out of 100 theories about this value will never be longer then xx
characters fail in the long run.

And text, limited only by PostgreSQLs limits, performs as good or
better then varchar(length_limit) The time of we only can allow n
chars for first name for performance reasons have gone by, together
with walkmen and VHS.


Harald
 

Also, if you are absolutely set on a constraint on the length of the text, you
can use a trigger for this and when the constraint changes, and it will, you
simply modify the trigger.

   

Thanks for your answers!

Richard was completely right of course. I hadn't actually tested this 
since 8.0 but now it works splendidly. Apologies for the noise.


I do use text in several places but in some, where it's motivated, I 
like to use constrained lengths. May it be tables that hold data that 
goes to legacy systems, indexed columns (such as username) or the name 
of a product.


Have a lovely tuesday everyone,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Plan for in with one argument

2010-07-11 Thread Marcus Engene

Hi List,

With automated queries where I have COLUMN IN (), I get a different plan 
from COLUMN = (). That would make sense if there are several arguments, 
but in the case with just one argument the penalty can be seveare. One 
query went from 5s execution time to a few houndreds of mS when I 
changed IN to = if the number of arguments is 1.


Is there a technical reason for not treating IN with one argument as = 
in that case?


pondDump= explain analyze  select
pic.objectid as pic_objectid
from
pond_item_common pic
where
pic.pond_user IN (select pu2.objectid from pond_user pu2 where 
username_locase IN ('iceberger'))

limit 100;
  QUERY 
PLAN

--
 Limit  (cost=15.41..396.89 rows=100 width=4) (actual time=0.047..0.061 
rows=11 loops=1)
   -  Nested Loop  (cost=15.41..1400.19 rows=363 width=4) (actual 
time=0.046..0.056 rows=11 loops=1)
 -  HashAggregate  (cost=8.28..8.29 rows=1 width=4) (actual 
time=0.026..0.026 rows=1 loops=1)
   -  Index Scan using pond_user_c2 on pond_user pu2  
(cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
 Index Cond: ((username_locase)::text = 
'iceberger'::text)
 -  Bitmap Heap Scan on pond_item_common pic  
(cost=7.13..1387.36 rows=363 width=8) (actual time=0.015..0.024 rows=11 
loops=1)

   Recheck Cond: (pic.pond_user = pu2.objectid)
   -  Bitmap Index Scan on pond_item_common_x1  
(cost=0.00..7.04 rows=363 width=0) (actual time=0.012..0.012 rows=11 
loops=1)

 Index Cond: (pic.pond_user = pu2.objectid)
 Total runtime: 0.181 ms
(10 rows)

pondDump= explain analyze
select
pic.objectid as pic_objectid
from
pond_item_common pic
where
pic.pond_user = (select pu2.objectid from pond_user pu2 where 
username_locase IN ('iceberger'))

limit 100;
 QUERY PLAN
-
 Limit  (cost=15.41..395.88 rows=100 width=4) (actual time=0.043..0.055 
rows=11 loops=1)

   InitPlan 1 (returns $0)
 -  Index Scan using pond_user_c2 on pond_user pu2  
(cost=0.00..8.27 rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)

   Index Cond: ((username_locase)::text = 'iceberger'::text)
   -  Bitmap Heap Scan on pond_item_common pic  (cost=7.13..1388.27 
rows=363 width=4) (actual time=0.042..0.053 rows=11 loops=1)

 Recheck Cond: (pond_user = $0)
 -  Bitmap Index Scan on pond_item_common_x1  (cost=0.00..7.04 
rows=363 width=0) (actual time=0.038..0.038 rows=11 loops=1)

   Index Cond: (pond_user = $0)
 Total runtime: 0.096 ms
(9 rows)

pondDump=

Best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Plan for in with one argument

2010-07-11 Thread Marcus Engene

On 7/11/10 12:42 , Alban Hertroys wrote:

On 11 Jul 2010, at 11:38, Marcus Engene wrote:

   

Hi List,

With automated queries where I have COLUMN IN (), I get a different plan from 
COLUMN = (). That would make sense if there are several arguments, but in the 
case with just one argument the penalty can be seveare. One query went from 5s 
execution time to a few houndreds of mS when I changed IN to = if the number of 
arguments is 1.

Is there a technical reason for not treating IN with one argument as = in that 
case?
 

It does that already for constant IN-lists:
=  create table test (id serial PRIMARY KEY);
=  insert into test (id) SELECT nextval('test_id_seq') from generate_
series(1, 1);
=  ANALYZE test;
=  explain analyse select * from test where id IN (15);
QUERY PLAN

  Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=4) (actual 
time=0.024..0.029 rows=1 loops=1)
Index Cond: (id = 15)
  Total runtime: 0.102 ms
(3 rows)


However, you're using a subquery to get the IN-list. I'm pretty sure the 
planner cannot know for certain that your subquery will return only one row, so 
it cannot substitute your IN(subquery) with =(subquery).

You'd probably be better off using an EXISTS instead of an IN there, that 
should certainly help for cases where the subquery returns many records, but it 
also gives the planner a better idea of your intentions.

   


Hi Alban,

This makes great sense both in theory and empiric tests. Thanks for the 
explanation.


Best regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] short circuit joins

2009-08-30 Thread Marcus Engene

Hi,

-- a
select
   ...
from
   base_table bt
   left outer join sub_table_1 st1 on (st1.base_table = bt.id)
   left outer join sub_table_2 st1 on (st2.base_table = bt.id)

-- b
select
   ...
from
   base_table bt
   left outer join sub_table_1 st1 on (bt.objecttype = 1 AND 
st1.base_table = bt.id)
   left outer join sub_table_2 st1 on (bt.objecttype = 2 AND 
st2.base_table = bt.id)


Pretend this is some stupid way of object orientation. base_table is the 
base class and sub_table_x are subclasses. base_table.objecttype tells 
which instance it is. Just for the sake of discussion, it could also be 
like bt.is_married_monogamous = 1 AND wife.husband = bt.id for example.


In case b, does Postgres avoid to unnecessarily try look for respective 
sub_table ? Is it worthwhile to have?


I'm on 8.3 presently. Still curious if newer versions have some 
optimization here.


Best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] somewhat slow query with subselect

2009-08-25 Thread Marcus Engene

Hi list,

I have a table, not that many rows (400k) but with about 55 columns. 
There are some maintenance selects I want to do (every now and then) 
that I don't want to add indexes for.


select
   ...
from
   quite_big_table qbt
where
   qbt.owner = 123 AND
   exists (
   select null
   from quite_big_table qbt2
   where
   qbt2.owner = qbt.owner AND
   qbt2.id  qbt.id AND
   qbt2.filelength = qbt.filelength
   )

In a case with some 5000 rows belonging to owner 123, this select really 
takes a long time. Way longer than without the subselect and order by 
filelength. I agree that with the subselect it would take some extra 
juice, but in my mind it would do some hash in memory which wouldn't be 
too slow to lookup in.


shared_buffers = 16GB
temp_buffers = 1GB
work_mem = 32MB
maintenance_work_mem = 1GB
#server has 64GB (64bit)

I guess there is some penalty because of the size (# columns) of the 
table since it has to go thru more blocks on the disk. Would it have 
been beneficial if filelength was in a joined child table instead?


How would you do this? Create a temporary table with owner+filelenght?

Thanks!
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] somewhat slow query with subselect

2009-08-25 Thread Marcus Engene

Tom Lane wrote:

Marcus Engene meng...@engene.se writes:
  
... In a case with some 5000 rows belonging to owner 123, this select really 
takes a long time. Way longer than without the subselect and order by 
filelength. I agree that with the subselect it would take some extra 
juice, but in my mind it would do some hash in memory which wouldn't be 
too slow to lookup in.



8.4 can turn EXISTS subqueries into hash joins, but previous releases
won't...

regards, tom lane

  
Thank you very much for your answer, Tom. I tried to join the table 
instead and it was way faster.


 Sort  (cost=46769.87..46770.51 rows=258 width=48)
  Sort Key: pic.filesize
  -  Nested Loop  (cost=34.30..46759.54 rows=258 width=48)
Join Filter: ((picsame.objectid  pic.objectid) AND 
(pic.filesize = picsame.filesize))

-  Nested Loop  (cost=8.27..3099.28 rows=16 width=56)
  -  HashAggregate  (cost=8.27..8.28 rows=1 width=4)
-  Index Scan using user_c2 on user pu2  
(cost=0.00..8.27 rows=1 width=4)
  Index Cond: ((username_locase)::text = 
'prolificarts'::text)
  -  Index Scan using item_common_x1 on item_common pic  
(cost=0.00..3081.41 rows=767 width=52)

Index Cond: (pic.user = pu2.objectid)
-  Bitmap Heap Scan on item_common picsame  
(cost=26.03..2715.34 rows=767 width=16)

  Recheck Cond: (picsame.user = pic.user)
  -  Bitmap Index Scan on item_common_x1  
(cost=0.00..25.84 rows=767 width=0)

Index Cond: (picsame.user = pic.user)

Best regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] where in (select array)

2008-11-21 Thread Marcus Engene

Richard Huxton wrote:

I imagine it's cheaper disk  dump wise to do the array thing in this and
some other similar usages, and therefore it would be nice to have a
non-ugly usage pattern.



Don't imagine, test. And then factor in the cost of fiddling around with
arrays when you need to access individual values. And the cost of the
time you spent working on all this.
  

On my dev 8.2.4 I get
using real values from a db dump with
931873 might like rows
46539 might like vector rows

Might like (row version):
10s to dump the second time, 38MB txt, 4MB gzip

Might like vector:
2s to dump the second time, 7.6MB text, 2MB gzip

Might like (row version)
explain cost, my in () version: ~200
explain cost, join on: ~670
explain cost, virtual table *): ~670

*)
select
  ...
from
  (select ...) as a.b

Might like vector:
explain cost, my in (): 1669

If there would have been a generate_series function for vectors, the 
choice would have been easy I think.


Best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] where in (select array)

2008-11-20 Thread Marcus Engene

Hi List,

I have the might_like table that contains products a user might like if 
he likes the present one (item).


CREATE TABLE might_like
(
item   INTEGER NOT NULL
,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
,child  INTEGER NOT NULL
)
WITHOUT OIDS;

CREATE INDEX might_like_x1 ON might_like(item);

Since there are (will be) houndreds of thousands of items, and 20+ might 
like items, i thought it would be nice to reduce the set to 1/20th by 
using a vector.


CREATE TABLE might_like_vector
(
item   INTEGER NOT NULL
,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
,child_arr  INTEGER[]
)
WITHOUT OIDS;

CREATE INDEX might_like_vector_x1 ON might_like_vector(item);

But then this don't work:

select
   ...
from
   item pic
where
   pic.objectid in (
   select mlv.child_arr
   from might_like_vector mlv
   where mlv.item = 125 AND
 mlv.created_at  now() - interval '1 week'
   )
limit 16

This seems to work but is ugly:

select
   ...
from
   item pic
where
   pic.objectid in (
   select mlv.child_arr[s.a]
   from might_like_vector mlv
   ,generate_series(1,20) as s(a)
   where mlv.item = 125 AND
 mlv.created_at  now() - interval '1 week'
   )
limit 16

Is there a better way to do it?

Thanks,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] where in (select array)

2008-11-20 Thread Marcus Engene

Richard Huxton wrote:

Marcus Engene wrote:
  

Hi List,

I have the might_like table that contains products a user might like if
he likes the present one (item).

CREATE TABLE might_like
(
item   INTEGER NOT NULL
,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
,child  INTEGER NOT NULL
)
WITHOUT OIDS;

CREATE INDEX might_like_x1 ON might_like(item);

Since there are (will be) houndreds of thousands of items, and 20+ might
like items, i thought it would be nice to reduce the set to 1/20th by
using a vector.

CREATE TABLE might_like_vector
(
item   INTEGER NOT NULL
,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
,child_arr  INTEGER[]
)
WITHOUT OIDS;



You haven't reduced the set at all, you've just turned part of it
sideways. You might gain something on your search, but I'm guessing
you've not tested it.

Hmm - the attached script generates 100,000 items and 10 liked ones for
each (well for the first 99,990 it says you like the next 10 items).
They're all given different timestamps at day intervals which means
you'll end up with 6 or seven matches for you sample query.
  
Sorry, I was a bit unclear. This is run on a web server. The might like 
rows are generated if they are not available for an item when the 
corresponding page is generated. The one row per might-like-item is 
online since yesterday and has when this is written 1/2 M rows in it. 
The caching is primarily initiated by a google indexer agent.


This might-like generation is expensive so caching them in the db is a 
must and the used CPU of the web-server halfed when this caching had 
been put live and cached the most commonly used items.


When the might-like data is read from the database, the generated 
presentation html is put in memcached with a timeout of 1h (presently). 
Memcached here is probably way overkill, but using it in situations like 
this makes the site more scaleable to add webservers and postpones the 
problem of clustering pg.


So with memcached I care less about saving a few mS in select latency 
and more about postponing other approaching problems like having the 
dbdump manageble. Right now it's a 100MB gzipped dump, which is very 
manageable, so where it's possible I'd like to keep the data compact. I 
imagine it's cheaper disk  dump wise to do the array thing in this and 
some other similar usages, and therefore it would be nice to have a 
non-ugly usage pattern. Also, we're going to collect usage statistics to 
further optimize the behavior of the site, and I'm really worried about 
these millions of rows.


If this is a bad approach to the problem I'm very interested to hear 
this. Regardless of the best approach, a proper solution to the 
subquery in would be super appreciated too :)


Thanks for your answer!

Best regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Stroring html form settings

2008-09-26 Thread Marcus Engene

Dianne Yumul wrote:

Hello,

I have some html forms that I save the settings into the database, 
things like which item was selected in the menu and if a checkbox was 
checked. The table looks like this:


 user_id | report_id |  info
-+---+
 111 | 1 | A:CHECKEDCHECKEDCHECKED::
 111 | 2 | A:::CHECKED::
 111 | 3 | A::CHECKED:CHECKED::CHECKEDCHECKED:::

The info column has the settings separated with a : and consecutive 
colons mean the user didn't make a selection. Would this be the way to 
store them?


I've done some searching and I could use XML (I will read some 
tutorials after writing this email). But there may be other ways and 
I'm just too much of a newbie to know. I'm using Postgresql 8.1.11 and 
PHP on CentOS 5.2.


Thanks in advance.

Dianne


Hi,

I would serialize to JSON instead of XML.
http://www.aurore.net/projects/php-json/

A simple json_encode($_POST) might do the trick. You could either use 
json_decode() to make a PHP struct of it or send it more or less as it 
is to Javascript.


I usually find XML horrible to work with in a web context.

Best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] regexp_replace() [noindex] thing

2008-09-25 Thread Marcus Engene

Hi!

I'm using tsearch2 and sometimes there are blocks of text that shouldn't 
be indexed.


There is a trigger function that gathers data from the usual suspects 
and updates the index. in this trigger, I'd like to exclude thing in a 
[noindex] tag:

select
   regexp_replace 
('innan[noindex]apa[/noindex]klas[noindex]banan[/noindex]',

   '(\\\[noindex\\\])+.*?(\\\[/noindex\\\])+',
   ' ')
I would like to have a function like the above that returns innan klas 
for this data. I would have expected it to as I use the non greedy version.


Could someone please point me in the general direction here?

Best regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regexp_replace() [noindex] thing

2008-09-25 Thread Marcus Engene

Tom Lane wrote:

Marcus Engene [EMAIL PROTECTED] writes:
  
I would like to have a function like the above that returns innan klas 
for this data. I would have expected it to as I use the non greedy version.



regression=# select
regexp_replace 
('innan[noindex]apa[/noindex]klas[noindex]banan[/noindex]',

'\\[noindex\\].*?\\[/noindex\\]',
' ', 'g');
 regexp_replace 

 innan klas 
(1 row)


Super!

Best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Marcus Engene

Phoenix Kiula wrote:

 If you don't want to store IPs for registered users, I'd use:

 user_id INTEGER,
 ip cidr,
 CONSTRAINT must_have_userstamp
 CHECK ( user_id IS NOT NULL OR ip IS NOT NULL)

 ... and yes, I'd use a functional index to look it up, or even a
 trigger-maintained cache of the text representation if I had to. Then




Ok, this is an idea. And I do agree that multifunction fields are a
potential pain in the distant future.

My questions:

1. What extra tax will this constraint levy on an INSERT or UPDATE on
this table? There are about 100,000 inserts a day, and over three
times as many UPDATES. The concurrency is pretty high -- I mean
sometimes 1,000 users at the same time but no more than that. If the
additional cost of insertion/updating is not too heavy, I suppose this
could be a nice approach.

2. Why not have an INET field...why a CIDR? What's the benefit? It
stores those pesky .../8 type additional data which one has to mask
with functions. Would INET work just as well?

3. Storage wise does this add significantly? How much space does an
INET field take as opposed to, say, a VARCHAR field?

4. Most importantly, how would you structure the index for this? I
would much rather have a fast = in my sql's WHERE clause. No OR
etc. Any thoughts?

Thanks

  
Use the best of two worlds - consider memcached and use the db only when 
you create/update an entry so that you can restore it if memcached 
(perhaps as a consequence of a server reboot) gets restarted.

http://www.slideshare.net/vishnu/livejournals-backend-a-history-of-scaling

best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Getting several columns from subselect with LIMIT 1

2008-09-20 Thread Marcus Engene

Pierre Thibaudeau wrote:

I have a PERSONS table.
I also have a PROGENY table, which is a many-to-many association table
with two foreign keys to the PERSONS table to itself.
(In this day and age, not only can an individual have any number of
children, but also a person can have any number of parents!  At least,
let's assume that's true for the sake of this setup.)

Suppose I wish to construct a view of the persons, along with the name
of their first-born (if they have one;  NULL otherwise).  The
following SELECT does just that:

SELECT
persons.*,
(
SELECT child.name
FROM progeny JOIN persons child ON child.id = progeny.child
WHERE progeny.parent = persons.id
ORDER BY child.birthdate ASC
LIMIT 1
) AS firstborn_name
 FROM persons;

Now, this is probably not the most elegant piece of code, but the real
problem is that
I cannot see how to extend it to the case where I want not only the
firstborn's name but also the firstborn's ID
(short of repeating the entire subselect a second time).  At the
moment, with this current syntax, my subSELECT statement would not be
allowed to return more than a single column.

Any suggestion?

  

Would this work?

select
   p.*
  ,pp.*
from
   persons p
  ,(
   SELECT child.name, child.id
   FROM progeny JOIN persons child ON child.id = progeny.child
   WHERE progeny.parent = p.id
   ORDER BY child.birthdate ASC
   LIMIT 1
   ) as kid(kid_name,kid_id)

best regards,
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] index on id and created_at

2008-09-11 Thread Marcus Engene

Hi,

If I have a table like...

CREATE TABLE apa
(
objectid   SERIAL PRIMARY KEY NOT NULL
,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
,somedata   text
)
WITHOUT OIDS;

...where if rowX har higher objectid than rowY, it is implied that rowX 
has a later created_at then rowY.


select
   objectid
from
   apa
where
   created_at  now() - interval '1 day'
order by
   objectid desc;

In this select, it would have been nice to dodge the full table scan 
without adding an index to created_at. Is this possible somehow?


Best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] stem tsearch2, want different stemmed words

2007-07-01 Thread Marcus Engene

Oleg Bartunov wrote:

On Sat, 30 Jun 2007, Marcus Engene wrote:


Hi!

bond= SELECT to_tsvector('default','animation animal');
to_tsvector
-
'anim':1,2
(1 row)

bond=

Sorry for a silly question, I wonder, how do I override this? I would 
want different stemmed words for these.


create synonym dictionary. Read about this 
http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes

Many thanks!

For future googlers: do check what was in your pg_ts_cfgmap before updating;

update  pg_ts_cfgmap set dict_name='{ts_p5_syn,en_stem}' where
ts_name='default' and  tok_alias in
('lword', 'lpart_hword','lhword' );

;-P

Best regards,
Marcus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] stem tsearch2, want different stemmed words

2007-06-30 Thread Marcus Engene

Hi!

bond= SELECT to_tsvector('default','animation animal');
to_tsvector
-
'anim':1,2
(1 row)

bond=

Sorry for a silly question, I wonder, how do I override this? I would 
want different stemmed words for these.


Best regards,
Marcus


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] performance; disk bad or something?

2007-04-07 Thread Marcus Engene

Tom Lane skrev:

Marcus Engene [EMAIL PROTECTED] writes:
  
 -  Index Scan using apa_item_common_x1 on apa_item_common aic  
(cost=0.00..4956.68 rows=1174 width=8) (actual time=19.854..9557.606 
rows=1226 loops=1)



If the table only has 12000 rows then it should never have used an index
scan here at all --- a plain seqscan is usually the best bet for
retrieving 10% of a table.  Are you using nondefault planner settings?

How big is the table physically (VACUUM VERBOSE output about it might
tell something)?

Hi and thanks for your answer!

All planner settings in postgresql.conf are commented out. Until 
yesterday the only setting I've poked with is shared_buffers.


Best regards,
Marcus

apa= vacuum verbose apa_item_common;
INFO:  vacuuming public.apa_item_common
INFO:  index apa_item_common_pkey now contains 12863 row versions in 
36 pages

DETAIL:  1246 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.21 sec.
INFO:  index apa_item_common_x1 now contains 12863 row versions in 38 
pages

DETAIL:  1246 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.72 sec.
INFO:  index apa_item_common_fts now contains 12863 row versions in 
391 pages

DETAIL:  1246 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.01u sec elapsed 1.16 sec.
INFO:  index apa_item_common_x2 now contains 12863 row versions in 36 
pages

DETAIL:  1246 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.51 sec.
INFO:  apa_item_common: removed 9028 row versions in 3651 pages
DETAIL:  CPU 0.24s/0.36u sec elapsed 30.69 sec.
INFO:  apa_item_common: found 9028 removable, 12863 nonremovable row 
versions in 14489 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 76646 unused item pointers.
0 pages are entirely empty.
CPU 0.64s/0.47u sec elapsed 84.91 sec.
INFO:  vacuuming pg_toast.pg_toast_181470
INFO:  index pg_toast_181470_index now contains 1040 row versions in 5 
pages

DETAIL:  71 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.07 sec.
INFO:  pg_toast_181470: removed 618 row versions in 383 pages
DETAIL:  CPU 0.01s/0.03u sec elapsed 4.55 sec.
INFO:  pg_toast_181470: found 618 removable, 1040 nonremovable row 
versions in 1288 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 6121 unused item pointers.
0 pages are entirely empty.
CPU 0.03s/0.04u sec elapsed 16.64 sec.
VACUUM
apa=


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] performance; disk bad or something?

2007-04-07 Thread Marcus Engene

Michael Fuhr skrev:

On Sat, Apr 07, 2007 at 09:28:58AM +0200, Marcus Engene wrote:
  

INFO:  apa_item_common: removed 9028 row versions in 3651 pages
DETAIL:  CPU 0.24s/0.36u sec elapsed 30.69 sec.
INFO:  apa_item_common: found 9028 removable, 12863 nonremovable row 
versions in 14489 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 76646 unused item pointers.



How often does this table receive updates and deletes and how often
are you vacuuming it?  It averages less than one row per page (12863
nonremovable row versions in 14489 pages) and appears to have become
quite bloated with dead rows sometime in the past (76646 unused
item pointers).  Use CLUSTER or VACUUM FULL + REINDEX to compact
the table and run ANALYZE afterwards to update the statistics, then
make sure you're vacuuming it often enough to keep it from becoming
bloated again.

Is your free space map sufficiently sized?  If you do a database-wide
VACUUM VERBOSE, what are the last few lines of the output that
mention free space map settings?

  
If I should take a guess, there are 5 deletes per day and 5 updates or 
inserts per hour. The table is 1.5 years old and I try to vacuuming it 
once a week; although without full. I normally do a reindex as well.


I've googled a bit to find optimizer hints a la oracle's /*+ 
index(asdasd) */ but from what I can tell pg has chosen not to use that? 
I find them convenient for testing at least, even if I agree that one 
perhaps should avoid having them in a final product. Toggling role/chose 
in Oracle is something I've often had use for too.


The original select seems to be consistantly fast now. That is good, but 
do I have a ticking bomb? 12k rows is little by any measure and if it 
was so slow by a little bloat it will be inevitable to reoccur again? 
Worth mentioning is perhaps that I also have a tsearch2 index on each 
row that is about 50words each. But timed lookups on that index, which 
one would expect to be the slowest(?), always seem to be blazingly fast.


Is 8.2.x better at these simple things too or is it mainly complex 
multithreadable queries which will benefit from it?


I hadn't touched any fsm settings but I've now set it to
max_fsm_pages = 20  # min max_fsm_relations*16, 6 bytes each 20k
max_fsm_relations = 1   # min 100, ~50 bytes each 1k

Thanks for your help!
Marcus

apa=# vacuum full verbose apa_item_common;
INFO:  vacuuming public.apa_item_common
INFO:  apa_item_common: found 176 removable, 12866 nonremovable row 
versions in 14489 pages

DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 220 to 2032 bytes long.
There were 85496 unused item pointers.
Total free space (including removable row versions) is 103371272 bytes.
8673 pages are or will become empty, including 0 at the end of the table.
14479 pages containing 103370096 free bytes are potential move destinations.
CPU 0.38s/0.04u sec elapsed 60.17 sec.
INFO:  index apa_item_common_pkey now contains 12866 row versions in 
36 pages

DETAIL:  176 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.20 sec.
INFO:  index apa_item_common_x1 now contains 12866 row versions in 38 
pages

DETAIL:  176 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.84 sec.
INFO:  index apa_item_common_fts now contains 12866 row versions in 
396 pages

DETAIL:  176 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.01u sec elapsed 1.67 sec.
INFO:  index apa_item_common_x2 now contains 12866 row versions in 36 
pages

DETAIL:  176 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.88 sec.
INFO:  apa_item_common: moved 10868 row versions, truncated 14489 to 
1832 pages

DETAIL:  CPU 1.77s/21.13u sec elapsed 294.11 sec.
INFO:  index apa_item_common_pkey now contains 12866 row versions in 
58 pages

DETAIL:  10868 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.02u sec elapsed 0.35 sec.
INFO:  index apa_item_common_x1 now contains 12866 row versions in 69 
pages

DETAIL:  10868 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.02u sec elapsed 0.40 sec.
INFO:  index apa_item_common_fts now contains 12866 row versions in 
671 pages

DETAIL:  10868 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.04u sec elapsed 1.80 sec.
INFO:  index apa_item_common_x2 now contains 12866 row versions in 67 
pages

DETAIL:  10868 index row versions were removed.
21 index pages have been deleted, 21 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.34 sec.
INFO:  vacuuming pg_toast.pg_toast_181470
INFO:  pg_toast_181470: found 10 removable, 1040 nonremovable row

[GENERAL] performance; disk bad or something?

2007-04-06 Thread Marcus Engene

Hi,

There are some performance issues I am still confused over.

I've got a Linux box, raid1, 1GB memory
CPU-wise the server is mostly idle

PG 8.0.10,
shared_buffers = 1
work_mem = 16348
maintenance_work_mem = 65536
Parameters are tweaked without much difference.

The following query is over a 12000 row table and apa_user is indexed.
I've recently vaccumed the db, analyze and recreated indexes on 
apa_item_common.


So how on earth could this query take 10s to complete? Are my disks bad? 
Are there other possible reasons?
If there were a couple of million rows I would still think it would run 
way faster than this.
I cannot see odd messages in dmesg. It more or less started to act like 
this over night.


Best regards,
Marcus

apa= explain analyze
apa-  select
apa-  aic.objectid as aic_objectid
apa-  from
apa-  apa_item_common aic
apa-  where
apa-  aic.apa_user = 704 AND
apa-  aic.status = 30
apa-  ORDER BY aic.calc_rating desc
apa-  LIMIT 1000;

Limit  (cost=5016.54..5019.04 rows=1000 width=8) (actual 
time=9560.471..9562.201 rows=1000 loops=1)
  -  Sort  (cost=5016.54..5019.47 rows=1174 width=8) (actual 
time=9560.469..9561.065 rows=1000 loops=1)

Sort Key: calc_rating
-  Index Scan using apa_item_common_x1 on apa_item_common aic  
(cost=0.00..4956.68 rows=1174 width=8) (actual time=19.854..9557.606 
rows=1226 loops=1)

  Index Cond: (apa_user = 704)
  Filter: (status = 30)
Total runtime: 9563.016 ms
(7 rows)

running the same query again, now it is cached then?

Limit  (cost=5016.54..5019.04 rows=1000 width=8) (actual 
time=7.261..9.016 rows=1000 loops=1)
  -  Sort  (cost=5016.54..5019.47 rows=1174 width=8) (actual 
time=7.258..7.870 rows=1000 loops=1)

Sort Key: calc_rating
-  Index Scan using pond_item_common_x1 on pond_item_common 
pic  (cost=0.00..4956.68 rows=1174 width=8) (actual time=0.029..5.483 
rows=1226 loops=1)

  Index Cond: (pond_user = 704)
  Filter: (status = 30)
Total runtime: 9.840 ms
(7 rows)


---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] speeding up a query

2007-04-03 Thread Marcus Engene

Hi,

I'm on 8.0.10 and there is a query I cannot quite get adequately fast.
Should it take 2.5s to sort these 442 rows? Are my settings bad? Is
my query stupid?

Would appreciate any tips.

Best regards,
Marcus


apa= explain analyze
apa-  select
apa-  ai.objectid as ai_objectid
apa-  from
apa-  apa_item ai
apa-  where
apa-  idxfti @@ to_tsquery('default', 'KCA0304')  AND
apa-  ai.status = 30
apa-  ORDER BY ai.calc_rating desc
apa-  LIMIT 1000;

Limit  (cost=54.40..54.43 rows=12 width=8) (actual 
time=2650.254..2651.093 rows=442 loops=1)
  -  Sort  (cost=54.40..54.43 rows=12 width=8) (actual 
time=2650.251..2650.515 rows=442 loops=1)

Sort Key: calc_rating
-  Index Scan using apa_item_fts on apa_item ai  
(cost=0.00..54.18 rows=12 width=8) (actual time=61.261..2649.045 
rows=442 loops=1)

  Index Cond: (idxfti @@ '''kca0304'''::tsquery)
  Filter: (status = 30)
Total runtime: 2651.659 ms
(7 rows)

apa= explain analyze
apa-  select
apa-  ai.objectid as ai_objectid
apa-  from
apa-  apa_item ai
apa-  where
apa-  idxfti @@ to_tsquery('default', 'KCA0304')  AND
apa-  ai.status = 30
apa-  LIMIT 1000;

Limit  (cost=0.00..54.18 rows=12 width=4) (actual time=0.186..18.628 
rows=442 loops=1)
  -  Index Scan using apa_item_fts on apa_item ai  (cost=0.00..54.18 
rows=12 width=4) (actual time=0.183..17.999 rows=442 loops=1)

Index Cond: (idxfti @@ '''kca0304'''::tsquery)
Filter: (status = 30)
Total runtime: 19.062 ms
(5 rows)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] speeding up a query

2007-04-03 Thread Marcus Engene

Hi again,

I was thinking, in my slow query it seems the sorting is the villain. 
Doing a simple qsort test I notice that:

[EMAIL PROTECTED] /cygdrive/c/pond/dev/tt
$ time ./a.exe 430

real0m0.051s
user0m0.030s
sys 0m0.000s

[EMAIL PROTECTED] /cygdrive/c/pond/dev/tt
$ time ./a.exe 43

real0m0.238s
user0m0.218s
sys 0m0.015s

[EMAIL PROTECTED] /cygdrive/c/pond/dev/tt
$ time ./a.exe 430

real0m2.594s
user0m2.061s
sys 0m0.108s

From this very unfair test indeed I see that my machine has the 
capability to sort 4.3 million entries during the same time my pg is 
sorting 430.


And i cannot stop wondering if there is some generic sorting routine 
that is incredibly slow? Would it be possible to, in the situations 
where order by is by simple datatypes of one column, to do a special 
sorting, like the qsort example in the end of this mail? Is this already 
addressed in later versions?


If no, why? and if yes, where in the pg code do I look?

Best regards,
Marcus


#include stdio.h
#include stdlib.h

typedef struct {
   int val;
   void   *pek;
} QSORTSTRUCT_INT_S;

int sortstruct_int_compare(void const *a, void  const *b)
{
   return ( ((QSORTSTRUCT_INT_S *)a)-val - ((QSORTSTRUCT_INT_S 
*)b)-val );

}

int main (int argc, char **argv)
{
   int nbr = 0;
   int i = 0;
   QSORTSTRUCT_INT_S *sort_arr = 0;
   if (1 == argc) {
   printf(forgot amount argument\n);
   exit(1);
   }
   nbr = atoi (argv[1]);
   if (0 == (sort_arr = malloc (sizeof(QSORTSTRUCT_INT_S) * nbr))) {
   printf(cannot alloc\n);
   exit(1);
   }
   srand(123);
   for (i=0; inbr; i++) {
   sort_arr[i].val = rand();
   }
   qsort(sort_arr, nbr, sizeof(QSORTSTRUCT_INT_S),sortstruct_int_compare);
   return 0;
}


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] speeding up a query

2007-04-03 Thread Marcus Engene

Tom Lane skrev:

Marcus Engene [EMAIL PROTECTED] writes:
  

Should it take 2.5s to sort these 442 rows?



  
 Limit  (cost=54.40..54.43 rows=12 width=8) (actual 
time=2650.254..2651.093 rows=442 loops=1)
   -  Sort  (cost=54.40..54.43 rows=12 width=8) (actual 
time=2650.251..2650.515 rows=442 loops=1)

 Sort Key: calc_rating
 -  Index Scan using apa_item_fts on apa_item ai  
(cost=0.00..54.18 rows=12 width=8) (actual time=61.261..2649.045 
rows=442 loops=1)

   Index Cond: (idxfti @@ '''kca0304'''::tsquery)
   Filter: (status = 30)
 Total runtime: 2651.659 ms



It's not the sort that's taking 2.5s --- the sort looks to be taking
about a millisec and a half.  The indexscan is eating the other 2649
msec.  The question that seems to be interesting is what's the
difference between the contexts of your two queries, because they
sure look like the indexscans were the same.  Maybe the second one
is merely benefiting from the first one having already sucked all the
data into cache?

regards, tom lane
  

Yes indeed you are completely right! Both queries take about the same when
run after the other. And I just made a fool of myself with an optimizing 
idea I

had...

Sorry for the noise and thanks for your answer!

Best regards,
Marcus


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] generate_series with month intervals

2007-01-11 Thread Marcus Engene

Hi list,

I'd like to generate the latest year dynamically with generate_series.
This select works day wise:

select date_trunc ('month', now())::date + s.a
from generate_series(0, 11) as s(a)

I tried this but it didn't work:

select date_trunc ('month', now())::date + interval s.a || ' months'
from generate_series(0, 11) as s(a)

This works but looks grotesque:

select distinct date_trunc ('month', now()::date + s.a)::date
from generate_series(0, 365) as s(a)

Is there a way to do this more elegantly?

Best regards,
Marcus


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] generate_series with month intervals

2007-01-11 Thread Marcus Engene

Tom Lane skrev:

Marcus Engene [EMAIL PROTECTED] writes:

I tried this but it didn't work:



select date_trunc ('month', now())::date + interval s.a || ' months'
from generate_series(0, 11) as s(a)


People keep trying that :-(.  The typename 'foo' syntax is for a
*literal constant* only.  Instead use multiplication, something like

select (date_trunc('month', now())::date + s.a * '1 month'::interval)::date
from generate_series(0, 11) as s(a)

You'll want the cast back to date as the last step here because
date+interval will give timestamp.

regards, tom lane


Wonderful! Worked like a charm. Thanks!

Marcus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Changes in 8.2's PHP behaviour?

2006-12-19 Thread Marcus Engene

semi-ambivalent skrev:

All,

I have a simple web page that inserts data into a table in 8.1.5 using
PHP4. It's pretty amateurish but gets the job done. A few days ago I
upgraded to 8.2 (this is on a FreeBSD system, and I used the port) but
when I tried to use the web page this morning I got errors about:

PHP Fatal error:  Call to undefined function:  pg_escape_string() in
/usr/local/www/data-dist/some_dir/some_dir/dataInsert.php on line 9,
referer: http://localhost/some_dir/some_dir/dataEnter.php

I'm a pretty poor PHP person, and I think its error messages can be
cryptic, but nothing has changed in this except the version of
postgresql. Does 8.2 handle input differently than does 8.1.x? Is that
where to begin looking? I have seen talk about handling of escape
strings but thought is was 'this way is good and this way is bad' kind
of thing, not a 'this way will no longer work with postgresql' thing.
FWIW a quick pg_dump, a fallback to 8.1.5 and a restore and things work
again.

thanks for any pointers,

r


If a recompile of php is necessary perhaps you might want to consider
php 5.1 in which a kind soul implemented support for bind variables.

http://ch2.php.net/manual/en/function.pg-query-params.php

Best regards,
Marcus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] TOAD-like query builder for PostgreSQL?

2006-12-09 Thread Marcus Engene

[EMAIL PROTECTED] skrev:

I really like TOAD for building Oracle queries. Is there a TOAD-like,
FOSS query builder for PostgreSQL?


Hi,

I use Toad for Oracle too.

pgAdminIII is included Postgres, its good but I prefer pg lightning 
admin which I've been using daily for almost a year now. It's sold

for $10 at the moment.
http://amsoftwaredesign.com/

Author seems responsive; I made him add the AltGr-Return from toad
query editor ;-)

Best regards,
Marcus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Only MONO/WinForms is a way to go

2006-11-28 Thread Marcus Engene

John DeSoi skrev:

There are highly productive IDEs for the Mac with all the goodies you 
mention. But few are cross-platform.


Your statement about Windows desktop market share is correct, but it is 
not the relevant point. Many people are interested in cross-platform 
tools because they want to serve the Windows desktop market, but not 
have to give up Linux or OS X to do it.


Personally, I bothered with cross platform for my product because
I wanted to develop it in Windows/Visual studio. I have found nothing
that comes near it on Linux and I'm to old to waste time with emacs
for idealistic reasons. Emacs serves a purpose but not mine.

Best regards,
Marcus


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] IN clause

2006-11-24 Thread Marcus Engene

I see we have a C J Date fan on the list! ;-)

There is one other case where I personally find nullable
columns a good thing: process_me ish flags. When a row
is not supposed to be processed that field is null and
when a field is null it wont be in the index [at least
on Oracle].

Best regards,
Marcus

Brandon Aiken skrev:

Hasn't it been said enough?  Don't allow NULLs in your database.
Databases are for storing data, not a lack of it.  The only time NULL
should appear is during outer joins.

--
Brandon Aiken
CS/IT Systems Engineer
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martijn van
Oosterhout
Sent: Friday, November 24, 2006 7:20 AM
To: surabhi.ahuja
Cc: A. Kretschmer; pgsql-general@postgresql.org
Subject: Re: [GENERAL] IN clause

On Fri, Nov 24, 2006 at 05:31:07PM +0530, surabhi.ahuja wrote:
That is fine 
but what I was actually expecting is this
if 
select * from table where col_name in (null, 'a', 'b');


to return those rows where col_name is null or if it = a or if it is =

b
 
But i think in does not not support null queries , am i right?


You'll need to check the standard, but IN() treats NULL specially, I
think it returns NULL if any of the elements is null, or something like
that. It certainly doesn't work the way you think it does.

Have a nice day,



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Discover temporary INDEX/TABLE name

2006-11-15 Thread Marcus Engene

Ilja Golshtein skrev:

Hello!

How could I find out if a temporary table 
(or index on a temporary table) was created 
by current session?


The problem is something like
SELECT COUNT(*) FROM PG_INDEXES WHERE INDEXNAME='tmpind1'
does not work since temporary indexes from other sessions
are visible. I need a way to make a distinguish
between temporary things belong to current session
and others.


I really do appreciate any help.




Hi,

Would it be terrible stupid of me to suggest you name
the temporary things with f.ex pg_backend_pid() appended
to the name? Or is the naming outside of your control?

Best regards,
Marcis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] encoding advice requested

2006-11-13 Thread Marcus Engene

Rick Schumeyer skrev:

I will have to try the WIN1252 encoding.

On the client side, my application is a web browser.  On the server 
side, it is php scripts on a linux box.  The data comes from copying 
data from a browser window (pointing to another web site) and pasting it 
into an html textarea, which is then submitted.

Given this, would you still suggest the WIN1252 encoding?


In my setup I compiled php with
--enable-zend-multibyte
...which makes all strings unicode internally (I suppose they use 
wchar_t instead of char or something). Thus mb_*() are [from what I can 
tell] not necessary [for me] anymore. Do use a fairly recent php, not 
only for bind variables in the pg api.


In php.ini i've got
default_charset = utf-8
mbstring.internal_encoding = UTF-8;

in the html head:
meta http-equiv=Content-Type content=text/html; charset=utf-8 /

The db is in utf-8.

Flawlessly it has saved everything I've tossed at it, including all 
sorts of apostrophes. I've copy  pasted chinese, hebrew, swedish, 
arabic... texts into textarea with no other problem that hebrew and 
arabic makes most sense written from right to left ;-)


Best regards,
Marcus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] tsearch2() with data from other table

2006-11-10 Thread Marcus Engene

Hi,

I have a table with items.
One field is site_user, which fk:s the user row that owns the item.

There are a few fields i throw to tsearch2() via an index right now,
but one user wanted his name to be in the search as well. So, what I
would want to do is:

CREATE FUNCTION euits(int)
RETURNS text AS 'select username || \' \' || firstname || \' \' || 
lastname from site_user where objectid = $1;' LANGUAGE SQL;


CREATE TRIGGER site_item_fts
BEFORE UPDATE OR INSERT ON site_item
FOR EACH ROW EXECUTE PROCEDURE
tsearch2(idxfti, name, keywords, keywords_cb, location_country, 
location_city, media_source, description, euits, site_user);


Pg accepts this, but when I

update site_item
set
description = description

I get:
Query executed in 78 ms, 6 Row(s) Affected
WARNING:  TSearch: 'site_user' is not of character type
WARNING:  TSearch: 'site_user' is not of character type
...

I suppose I am doing something stupid here? If I remove euits
I still receive the same error.

Any suggestions are very welcome.

Best regards,
Marcus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] tsearch2() with data from other table

2006-11-10 Thread Marcus Engene

Teodor Sigaev skrev:

CREATE FUNCTION euits(int)
RETURNS text AS 'select username || \' \' || firstname || \' \' || 
lastname from site_user where objectid = $1;' LANGUAGE SQL;


CREATE TRIGGER site_item_fts
BEFORE UPDATE OR INSERT ON site_item
FOR EACH ROW EXECUTE PROCEDURE
tsearch2(idxfti, name, keywords, keywords_cb, location_country, 
location_city, media_source, description, euits, site_user);


site_user is a table, isn't it?
tsearch2 trigger accepts only column's names and functions with 
prototype TEXT func(TEXT).


For clarify, show your table's definitions.


Hi,

I'm sorry I was unclear,

create table site_user
(
 objectid integer
,firstname varchar[60]
,lastname varchar[60]
,username varchar[10]
...
) without oids;


create table site_item
(
 objectid integer
,site_user integer -- fk to site_user.objectid
,description text
,idxfti tsvector
...
) without oids;

So, when updating site_item I want to fetch the names from site_user and 
 give this data to tsearch2() along with other stuff from site_item.


Best regards,
Marcus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Indices for select count(*)?

2005-12-21 Thread Marcus Engene

Greg Stark wrote:

Alexander Scholz [EMAIL PROTECTED] writes:


Hi, thank you for your answer.

Regarding the performance flow when trying to find out how many records are
currently being stored in the table, I don't see how an index should help...
Nevertheless we've created an unique index on ID but SELECT count(ID) from
XYZ still takes 35 seconds*. (ID is the primary key basing on a sequence,
select count(*) isn't faster.)

So - what kind of indexing would speed this up then?



No form of indexing can speed this up. To answer the server has to look at
every record and count up how many of them should be included in your result.


Why couldn't it be possible to count # of items in an index?
The density of the information (items/inode|block|whatever it's called 
in btrees) is likely to be much higher giving less disk i/o.


I'm sorry if this has been discussed recently.

Best regards,
Marcus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] fts, compond words?

2005-12-12 Thread Marcus Engene
 That a simple case, what about languages as norwegian or german? They 
 has compound words and ispell dictionary can split them to lexemes.

 But, usialy there is more than one variant of separation:

 forbruksvaremerkelov
 forbruk vare merke lov
 forbruk vare merkelov
 forbruk varemerke lov
 forbruk varemerkelov
 forbruksvare merke lov
 forbruksvare merkelov
 (notice: I don't know translation, just an example. When we working 
on  compound word support we found word which has 24 variant of

 separation!!)

 So, query 'a + forbruksvaremerkelov' will be awful:

 a + ( (forbruk  vare  merke  lov) | (forbruk  vare  merkelov) | 
... )


 Of course, that is examle just from mind, but solution of phrase
 search should work reasonably with such corner cases.

(Sorry for replying in the wrong place in the thread, I was away for a 
trip and unsubscribed meanwhile)


I'm a swede and swedish is similair to norweigan and german. Take this 
example:


lång hårig kvinna
långhårig kvinna

Words are put together to make a new word with different meaning. The 
first example means tall hairy woman and the second is woman with 
long hair. If I would be on f.ex a date site, I'd want the distinction. 
;-) If not, i should enter both strings

(lång hårig | långhårig)  kvinna
...which is perfectly acceptable.

IMHO I don't see any point in splitting these words.


Let's go back to the subject, what about a syntax like this:

idxfti @@ to_tsquery('default', 'pizza  (Chicago | [New York]')

Ie the exact match string is always atomic. Wouldn't that be doable 
without any logical implications?


Best regards,
Marcus

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] fts, compond words?

2005-12-05 Thread Marcus Engene

Hi,

I use the tsearch full text search with pg 8.0.3. It works great, but I 
wonder if it's possible to search for compound words?

Ie if I search for New York i want to get a match on
  New York has traffic problems.
but not on
  New axe murderer incident in brittish York.

Is this possible?

I don't use any wrapper, just
select
...
from
...
where
idxfti @@ to_tsquery('default', 'searchstring')

Thanks,
Marcus


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] fts, compond words?

2005-12-05 Thread Marcus Engene

Oleg Bartunov wrote:

On Mon, 5 Dec 2005, Marcus Engene wrote:


Hi,

I use the tsearch full text search with pg 8.0.3. It works great, but 
I wonder if it's possible to search for compound words?

Ie if I search for New York i want to get a match on
 New York has traffic problems.
but not on
 New axe murderer incident in brittish York.

Is this possible?

I don't use any wrapper, just
select
...
from
...
where
idxfti @@ to_tsquery('default', 'searchstring')




ranking function is what you need. Read documentation.



Hi,

I realized from the documentation that I'm not looking for
compound words after all, I meant exact phrase.

I can't see how to make rank tell me which results has an
exact phrase? Like there must be a occurence of 'new' before
'york'  (stemmed not really exact phrase)?

Is there something new in rank for pg 8.1?

Thanks!
Marcus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq