date:20030714

Re: [SQL] Replacing a simple nested query?

2003-07-14 Thread Steve Wampler

On Sun, 2003-07-13 at 14:50, Steve Wampler wrote:
> I've got a simple nested query:
> 
>   select * from attributes where id in (select id from
>  attributes where (name='obsid') and (value='oid00066'));
> 
> that performs abysmally.  I've heard this described as the
> 'classic WHERE IN' problem.
> 
> Is there a better way to obtain the same results?  The inner
> select identifies a set of ids (2049 of them, to be exact)
> that are then used to locate records that have the same id
> (about 30-40K of those, including the aforementioned 2049).

For the record, Joe Conway and Hannu Krosing both provided
the same solution:

   select at.* from attributes_table at, attributes a
  where at.id = a.id and a.name='obsid' and a.value='oid00066';

which is several orders of infinity faster than than my naive
approach above:
-
lab.devel.configdb=# explain analyze select * from
attributes_table where id in (select id from attributes
where (name='obsid') and (value='oid00066')) order by id;
NOTICE:  QUERY PLAN:

Index Scan using id_index on attributes_table  (cost=0.00..8773703316.10
rows=241201 width=59) (actual time=136297.91..3418016.04 rows=32799
loops=1)
  SubPlan
->  Materialize  (cost=18187.48..18187.48 rows=15 width=25) (actual
time=0.01..1.68 rows=1979 loops=482402)
  ->  Index Scan using name_index on attributes_table 
(cost=0.00..18187.48 rows=15 width=25) (actual time=0.27..251.95
rows=2049 loops=1)
Total runtime: 3418035.38 msec
--
lab.devel.configdb=# explain analyze select at.* from
attributes_table at, attributes a
where at.id = a.id and a.name='obsid' and a.value='oid00066';
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..18739.44 rows=217 width=84) (actual
time=0.76..1220.65 rows=32799 loops=1)
  ->  Index Scan using name_index on attributes_table 
(cost=0.00..18187.48 rows=15 width=25) (actual time=0.47..507.31
rows=2049 loops=1)
  ->  Index Scan using id_index on attributes_table at 
(cost=0.00..35.80 rows=12 width=59) (actual time=0.11..0.31 rows=16
loops=2049)
Total runtime: 1235.42 msec
---

My thanks to both Joe and Hannu!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
Quantum materiae materietur marmota monax si marmota
monax materiam possit materiari?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [SQL] max length of sql select statement (long!)

2003-07-14 Thread markus brosch


It's me again - with the easy and complex statement at the same time ;-)
Who can help me?!

As far, my solution is still unsatisfactory. 
I'll explain my full task this time, 
otherwise it is hard to explain!

My tables: 
(redundant for speed reasons - doesn't matter - only requests!)

===

table: goGraph
primarykey(parent,child)
index (B-tree) on parent column and on child column.

parent |   child
--
GO:001 |   GO:002
GO:001 |   GO:003
GO:002 |   GO:005
GO:002 |   GO:006
GO:003 |   GO:006
GO:003 |   GO:007
...|   ...
about 15000 elements more!

This represents a direct acyclic graph:

 GO:001
 /\
v  v
GO:002GO:003
/\/\
   v  v  v  v
GO:005   GO:006GO:007

===

table: binaryInteraction
primarykey(baitProtein, preyProtein, gogoKey)
index (B-tree) on gogokey.

baitProtein | preyProtein | "baitGoId" | "preyGoId" | gogoKey
--
123 | 123 | GO:001 | GO:001 | 11
123 | 123 | GO:020 | GO:001 | 120
123 | 123 | GO:020 | GO:020 | 2020
123 | 345 | GO:001 | GO:001 | 11 
123 | 345 | GO:020 | GO:001 | 120
... | ... | ...| ...| ...
up to several millions entries!

===

So, what I have to do is:
Someone choose 2 goIds. For these ids and all their children
(information from goGraph) I want the count(*) of "fits" within the
table binaryInteraction. That means, baitGoId and preyGoId must be
either the two choosen goIds or one of the related children. I use the
gogoKey for this task: I compute Integers (within Java) from the goIds
and sort smallerGoId.concatenate(biggerGoId) - so I've unique keys for
baitGo / preyGo pairs. One more problem: The goGraph (see table) is a
graph ... and for that reason one node can have more parents. If I
choose e.g. GO:02 and GO:03 from the example above and want to
compute their children, I don't want to take GO:06 into account (two
parents in which I am intested in). That menas, whenever I ask for
children of two nodes, I want a DISTINCT SET of children.  

Example how I am processing the data at the moment:

User chosse GoId1: GO:005 and GO:008;

Java: 
- get all childs of GO:05 by jdbc from goGraph - collection "one"
- get all childs of GO:06 by jdbc from goGraph - collection "two"
- compute intersection of the two collections
- substract of each collection the intersection
- result: distinct children of two goIds

Now the binaryInteraction table came into our game ;-)
I apply the one distinct collection "one" and collection "two" to
gogoKey:

---

1. SELECT count(*) 
   FROM (SELECT DISTINCT bait, prey 
 FROM binaryInteraction 
 WHERE gogoKey IN (gogokey1, gogokey2, gogokey3, ... )
)
   AS foo;

gogokeyX will be processed by java -> it generates for each
go-go-combination the unique gogokey. 

Speed: medium

Limit: only 1 Elements within the IN statement can be processed by
default. In postgres.conf I can change this limit, as Rod Taylor already
posted. But in the manual there is a warning of BufferOverflow if the
value it too high ...

If I reach the limit, I get something like:
java.sql.SQLException: ERROR:  Expression too complex: nesting depth
exceeds max_expr_depth = 1

BTW: Why nested depth? Within IN I compute thousands of "OR"s, but why
nested ... ???

---

2. for all gogokeys I generate a temporary table and do a join with the
binaryInteraction table.

SELECT COUNT(*)
FROM (SELECT DISTINCT binaryInteraction.bait binaryInteraction.prey 
  FROM binaryInteraction, tempTable 
  WHERE binaryInteraction.gogokey = tempTable.gogokey) 
AS temp)

Speed: very slow, but no limit!

---

After all, I want to compute and precalculate every possible
goId-goId-combination ;-) So, each of these things I explained here,
I've to do millions of time! I tried to find a recursive solution -
impossible!

Who can help? Maybe there is a way to compute it in one but more complex
sql-statemnt. Me as far I am still new to sql I don't know the "tricks"!
If there would be a way to calculate the intersection ... then maybe a
solution is not that far ... Also I was thinking about a denormalized
goGraph table: parent -> allChilds. Then I compute (within SQL?) the
intersection between the 2 GoIds and then I apply it to
binaryInteraction.

Thanks for any ideas and thanks at least 
for reading that long story ;-)

Cheers Markus

PS: If you have any additiona

[SQL] ad hoc referential integrity

2003-07-14 Thread Matt Tenenbaum

At the outset I hope that this is the right place to ask this question 
(also, I hope the question isn't a foolish one).

Say I created a couple tables A and B, where A contains a column p 
which B references as a foreign key. Further say that, when I created 
these tables, I meant to specify 'on delete cascade' for p in the 
definition of B but didn't. What's the proper way to ensure both the 
integrity of the tables and that, when I delete rows from A, the 
corresponding rows from B are removed as well? (In particular, is there 
a shorter way than dropping the table and recreating it? Are rules the 
answer?)

Again, I hope this is the right place to answer this question; I guess 
I'm not 'new' to SQL, but I'm still a bit rough around the edges, so 
trying to find the answer on my own has been a bit slow-going.

Thanks for any help.
-matt
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] ad hoc referential integrity

2003-07-14 Thread Stephan Szabo

On Tue, 15 Jul 2003, Matt Tenenbaum wrote:

> Say I created a couple tables A and B, where A contains a column p
> which B references as a foreign key. Further say that, when I created
> these tables, I meant to specify 'on delete cascade' for p in the
> definition of B but didn't. What's the proper way to ensure both the
> integrity of the tables and that, when I delete rows from A, the
> corresponding rows from B are removed as well? (In particular, is there
> a shorter way than dropping the table and recreating it? Are rules the
> answer?)

If I'm understanding what you want, I'd suggest dropping the constraint
and re-adding it with the on delete cascade.  If you're on 7.3 or better
you should be able to do this entirely with alter table, before that you
have to do a little more work.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] Replacing a simple nested query?

Re: [SQL] max length of sql select statement (long!)

[SQL] ad hoc referential integrity

Re: [SQL] ad hoc referential integrity

4 matches

Site Navigation

Mail list logo

Footer information