Re: [PERFORM] What is the best way to do attribute/values?

2004-08-25 Thread Mark Kirkwood

Josh Berkus wrote:
Things we've already tried to avoid going over old ground:
1) increasing statistics;
2) increasing sort_mem (to 256MB, which is overkill)
3) testing on 8.0 beta, which does not affect the issue.
At this point I'm looking for ideas.   Suggestions, anyone?
 

with respect to query design:
consider instead of:
select
pav1.person_id
from
person_attributes_vertical pav1,
person_attributes_vertical pav2
where
pav1.attribute_id = 1
and pav1.value_id in (2,3)
and pav2.attribute_id = 2
and pav2.value_id in (2,3)
and pav1.person_id = pav2.person_id
try:
select
pav1.person_id
from
person_attributes_vertical pav1
where
   (pav1.attribute_id = 1
and pav1.value_id in (2,3))
or (pav1.attribute_id = 2
and pav1.value_id in (2,3))

I am gambling that the 'or's' might be less expensive than the multiple self joins 
(particularly in the more general cases!).
To make access work well you might want to have *several* concatenated indexes of 2 - 
4 attributes - to work around Pg inability to use more than 1 in a given query.
For this query indexing (attribute_id, value_id) is probably good.
Consider playing with 'random_page_cost' and maybe 'effective_cache_size' to encourage 
the planner to use 'em.
regards
Mark


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] What is the best way to do attribute/values?

2004-08-25 Thread Jeff
On Aug 25, 2004, at 4:22 AM, Mark Kirkwood wrote:
select
pav1.person_id
from
person_attributes_vertical pav1
where
   (pav1.attribute_id = 1
and pav1.value_id in (2,3))
or (pav1.attribute_id = 2
and pav1.value_id in (2,3))
You know..
It may help if you toss in a group by
ie
select pav1.person_id, count(*) from person_attributes_vertical pav1
where (pav1.attribute_id = 1 and pav1.value_id in (2,3)) or ( ... ) or 
(...)
group by pav1.person_id
order by count(*) desc

that should give you the person_id's that matched the most 
criteria
I've used similar things before now that I've thought about it.

If you want an exact match you could put
having count(*) = $myNumAttributes in there too.. By definition an 
exact match would match that definition..

it has an added side effect of producing closest matches when an 
exact match cannot be found... granted you may not want that for a 
dating site : )

You asked for a blond female, blue eyes.. but I couldn't find any... 
but I *DID* find a brown haired male with brown eyes! Is that good 
enough?

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] What is the best way to do attribute/values?

2004-08-25 Thread Leeuw van der, Tim
Hi,

On Aug 25, 2004, at 4:22 AM, Mark Kirkwood wrote:

 select
   pav1.person_id
 from
   person_attributes_vertical pav1
 where
  (pav1.attribute_id = 1
   and pav1.value_id in (2,3))
   or (pav1.attribute_id = 2
   and pav1.value_id in (2,3))


[...]

Why not combine attribute_id and value_id? Then you have nothing but an OR (or IN).

It should, AFAICS, give you much better selectivity on your indexes:

There will be a lot of attributes with the same ID; there will also be a lot of 
attributes with the same value. However, there should be much less attributes with a 
specific combination of (ID/Value).
Right now I think it will be very hard to determine which field has a better 
selectivity: attribute_id or value_id.


The combined attribute/value field could be an int8 or so, where the upper 4 bytes are 
for attribute_id and the lower 4 bytes for value_id.
Depending on the number of attributes and possible values a smaller datatype and / or 
a different split can be made. A smaller datatype will result in faster access.

What difference does that make?

regards,

--Tim

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] What is the best way to do attribute/values?

2004-08-25 Thread Josh Berkus
Mark, Tim,

 select
   pav1.person_id
 from
   person_attributes_vertical pav1
 where
  (pav1.attribute_id = 1
   and pav1.value_id in (2,3))
   or (pav1.attribute_id = 2
   and pav1.value_id in (2,3))

Not the same query, sorry.   Daniel's query yields all the person_id's which 
have criteria A AND criteria B.   Yours gives all the person_id's which have 
criteria A OR criteria B.

 There will be a lot of attributes with the same ID; there will also be a
 lot of attributes with the same value. However, there should be much less
 attributes with a specific combination of (ID/Value). Right now I think it
 will be very hard to determine which field has a better selectivity:
 attribute_id or value_id.

Given that there is already an index on ( attribute_id, value_id ) I don't 
quite see what difference this makes.   Unless you're suggesting this as a 
workaround for the PG Planner's poor use of the index?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] What is the best way to do attribute/values?

2004-08-25 Thread Mark Kirkwood
Josh Berkus wrote:
Mark, Tim,
 

select
	pav1.person_id
from
	person_attributes_vertical pav1
where
	   (pav1.attribute_id = 1
	and pav1.value_id in (2,3))
	or (pav1.attribute_id = 2
	and pav1.value_id in (2,3))
   

Not the same query, sorry.   Daniel's query yields all the person_id's which 
have criteria A AND criteria B.   Yours gives all the person_id's which have 
criteria A OR criteria B.

 

Apologies, not thinking clearly enough there...
Maybe try out intersection :
select
   pav1.person_id
from
   person_attributes_vertical pav1
where
  (pav1.attribute_id = 1
   and pav1.value_id in (2,3))
intersect
select
   pav1.person_id
from
   person_attributes_vertical pav1
where (pav1.attribute_id = 2
   and pav1.value_id in (2,3))
In the advent that is unhelpful, I wonder about simplifying the 
situation and investigating how

select
   pav1.person_id
from
   person_attributes_vertical pav1
where
  pav1.attribute_id = 1
performs, compared to
select
   pav1.person_id
from
   person_attributes_vertical pav1
where
  (pav1.attribute_id = 1
   and pav1.value_id in (2,3))
If the first performs ok and the second does not, It may be possible to 
get better times by doing some horrible re-writes :e.g:

select
   pav1.person_id
from
   person_attributes_vertical pav1
where
  (pav1.attribute_id = 1
   and pav1.value_id||null in (2,3))
etc.
regards
Mark

   

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] What is the best way to do attribute/values?

2004-08-24 Thread Daniel Ceregatti
Hi list,

I have a database with 1M people in it. Each person has about 20
attributes, such as height, weight, eye color, etc. I need to be able to
search for people based on these attributes. A search can be conducted
on one attribute, all attributes, or any number in between. How would
_you_ do this?

I have already attempted to answer this. My attempts are detailed here:

http://sh.nu/email.txt

This is the email I was originally going to send to this list. Since
it's so large, I decided to link to it instead. If you feel that it
belongs in a post to the list, let me know, and I'll post again.

I've discussed these attempts with people in #postgresql on
irc.freenode.net. Agliodbs (I presume you know who this is) was very
helpful, but in end was at a loss. I find myself in the same postition
at this time. He suggested I contact this list.

My ultimate goal is performance. This _must_ be fast. And by fast, I
mean,  1 second, for every permutation of the number of attributes
searched for. Flexibility would be a bonus, but at this point I'll
settle for something that's harder to maintain if I can get the speed
gain I need.

Thanks,

Daniel Ceregatti

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] What is the best way to do attribute/values?

2004-08-24 Thread Josh Berkus
Folks,

 I've discussed these attempts with people in #postgresql on
 irc.freenode.net. Agliodbs (I presume you know who this is) was very
 helpful, but in end was at a loss. I find myself in the same postition
 at this time. He suggested I contact this list.

There's a couple of issues here to attack:

1) PostgreSQL is not using the most optimal plan.First, it's ignoring the 
fact that all referenced columns are indexed and only using the first column, 
then filtering based on the other criteria.   Second, testing has shown that 
a hash join would actually be faster.   We've tried upping the statistics, 
but it doesn't seem to have an effect on the planner's erroneous estimates.

2) Even were it using the most optimal plan, it's still to slow.   As you can 
see from the plan, each merge join takes about 1.5 to 2 seconds.(hash 
joins are only about 0.5 seconds slower).  Mysteriously, a big chunk of this 
time is spent *in bewtween* planner steps, as if there was some hold-up in 
retrieving the index or table pages.   There may be, but Daniel and I have 
not been able to diagnose the cause.   It's particularly mysterious since a 
filter-and-sort on a *single* criteria set, without join, takes  400ms.

Things we've already tried to avoid going over old ground:
1) increasing statistics;
2) increasing sort_mem (to 256MB, which is overkill)
3) testing on 8.0 beta, which does not affect the issue.

At this point I'm looking for ideas.   Suggestions, anyone?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] What is the best way to do attribute/values?

2004-08-24 Thread Richard Huxton
Daniel Ceregatti wrote:
Hi list,
I have a database with 1M people in it. Each person has about 20
attributes, such as height, weight, eye color, etc. I need to be able to
search for people based on these attributes. A search can be conducted
on one attribute, all attributes, or any number in between. How would
_you_ do this?
I have already attempted to answer this. My attempts are detailed here:
http://sh.nu/email.txt
Hmm... interesting.
Shot in the dark - try a tsearch2 full-text index. Your problem could be 
translated into searching strings of the form
  hair=black eyes=blue age=117

Not pretty, but might give you the speed you want.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly