subject:"\[GENERAL\] Problem search on text arrays, using the overlaps \(\) operator"

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-22 Thread John Cheng

- "nha"  wrote:

> 
> Another way could concern the hash join. It has been shown that this
> step costs a lot with respect to the overall runtime. Depending on
> available storage space and DBMS load, a kind of materialized view
> may
> be handled in order to cut off the overloading join. Here are some
> suggested statements to create this helper table:
> 
[snip]

Hi nha,

Sorry about the long lag after your last post. I didn't want to post
back until I had something solid to report on. Using a materialized
view turned out to be the best way to solve my problem. My coworker
designed a new table that consists of the key columns for 3 large
tables that were being joined. A trigger is used to make sure
the "materialized view" is kept up-to-date. Since new data is added
infrequently (once a month), the cost of keeping the materialized view
up-to-date is cheap. The resulting query runs exceedingly fast! :)

Thank you so much for your guidance. I have learned a lot from this
incident! 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-08 Thread John Cheng


Accidentally sent to nha only

--- On Wed, 7/8/09, John Cheng  wrote:

> From: John Cheng 
> Subject: Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) 
> operator
> To: "nha" 
> Date: Wednesday, July 8, 2009, 4:24 PM
> Hi nha,
> 
> I will try out your suggestion about a materialized view. I
> had never even thought about trying it.
> 
> As luck would have it, I had to try out these tests on a
> different database today, which resulted in a different
> query plan that executed both forms in the same time. 
> This different plan used a merge join instead of a hash
> join. I will research this are more to see if I learn
> anything new.
> 
> You also pointed out that my queries (reports) are simply
> reporting off of a lot of data. Perhaps I need to see if the
> windowing functions in 8.4 can help improve things, or
> perhaps try to partition the data. Unfortunately, these kind
> of changes will be bigger than I had expected.
> 
> 
>       
> 





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-08 Thread nha

Hello,

Le 8/07/09 0:52, John Cheng a écrit :
> I don't mean to be pesky. I was just wondering if there is anything
> else I should try? 
> 
> Should I simply rewrite all queries, change the form
> 
> WHERE textarr && '{foo, bar}'::text[]
> 
> To
> 
> WHERE (textarr && '{foo}'::text[]
>  OR textarr && '{bar}'::text[])
> 
> ?
> 

While still puzzled by the big runtime difference you report between the
2 condition forms, I went on assessing these runtimes on my side from
the new case statements that are assumed to reflect more the real world.

Here are some measure results I got: (sorry for this long table)

seq style   runtime
--- -   --- 
(db=slf)
N01 OR-EA   6 237
N02 CC-EA   5 250
N03 OR+EA   12 628  
N04 CC+EA   12 700  
N05 OR+EA   15 679  
N06 CC+EA   11 510  
N07 CC-EA   7 712
N08 OR-EA   8 741
N09 CC-EA   4 963
N10 OR-EA   6 499
(db=stg)
N11 CC+EA   15 978  
N12 OR+EA   15 350  
N13 CC-EA   8 102
N14 OR-EA   9 428
N15 OR-EA   5 267
N16 CC-EA   5 017
N17 OR-EA   6 119
N18 CC-EA   4 955
N19 OR+EA   11 722  
N20 CC+EA   11 532  
N21 OR-EA   7 303
N22 CC-EA   5 438
N23 CC-EA   5 519
N24 OR-EA   5 373
N25 OR-EA   5 422
N26 CC-EA   5 064
(db=stg)
N27 CC-EA   8 314
(db=slf)
N28 OR-EA   6 656
(db=stg)
N29 OR-EA   6 760
(db=slf)
N30 CC-EA   6 753
(db=stg)
N31 CC-EA   5 500
(db=slf)
N32 OR-EA   5 907
(db=stg)
N33 OR-EA   5 391
(db=slf)
N34 CC-EA   5 517
--- -   --  

Legend
--
seq: sequence order.
style: condition style of query.
CC: style "arr&&{f,b}" (one clause with multi-value text table).
OR: style "arr&&{f} or arr&&{b}" (many clauses with 1-value text table).
OR2: same style as style OR, with explicit JOIN in query expression.
+EA: performed with EXPLAIN ANALYZE on query. Slower.
-EA: performed without EXPLAIN ANALYZE on query. Faster.
runtime: run time in milliseconds.
(db=?): indicates that the following sequences have been performed after
a drop-and-create process for all the tables and indexes.
--

Results from 2 selected EXPLAIN ANALYZE sequences:

-- seq 03 (OR+EA)
Aggregate  (cost=37630.52..37630.53 rows=1 width=0) (actual
time=12628.182..12628.184 rows=1 loops=1)
  ->  Hash Join  (cost=25989.12..37601.04 rows=11792 width=0) (actual
time=8796.002..12231.422 rows=30 loops=1)
Hash Cond: ((bar.id)::numeric = foo.bar_id)
->  Seq Scan on bar  (cost=0.00..4328.00 rows=30 width=4)
(actual time=0.025..402.458 rows=30 loops=1)
->  Hash  (cost=24636.81..24636.81 rows=82425 width=8) (actual
time=8795.027..8795.027 rows=2097152 loops=1)
  ->  Bitmap Heap Scan on foo  (cost=1565.44..24636.81
rows=82425 width=8) (actual time=670.248..5098.109 rows=2097152 loops=1)
Recheck Cond: ((keywords && '{ford}'::text[]) OR
(keywords && '{toyota}'::text[]) OR (keywords && '{volkswagen}'::text[])
OR (keywords && '{saturn}'::text[]) OR (keywords && '{honda}'::text[])
OR (keywords && '{porsche}'::text[]) OR (keywords && '{hummer}'::text[])
OR (keywords && '{ferrari}'::text[]))
->  BitmapOr  (cost=1565.44..1565.44 rows=83879
width=0) (actual time=665.516..665.516 rows=0 loops=1)
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=114.013..114.013
rows=262144 loops=1)
Index Cond: (keywords && '{ford}'::text[])
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=72.398..72.398
rows=262144 loops=1)
Index Cond: (keywords && '{toyota}'::text[])
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=74.118..74.118
rows=262144 loops=1)
Index Cond: (keywords &&
'{volkswagen}'::text[])
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=58.486..58.486
rows=262144 loops=1)
Index Cond: (keywords && '{saturn}'::text[])
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=114.671..114.671
rows=524288 loops=1)
Index Cond: (keywords && '{honda}'::text[])
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=115.290..115.290
rows=524288 loops=1)
Index Cond: (keywords &&
'{porsche}'::text[])
  ->  Bitmap Index Scan on foo_idx
(cost=0.00..175.07 rows=10485 width=0) (actual time=58.184..58.184
rows=262144 loops=1)
Index Cond: (keywords && '{hummer}'::text[])
  ->  Bitmap Index Scan on fo

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-07 Thread John Cheng


I don't mean to be pesky. I was just wondering if there is anything
else I should try? 

Should I simply rewrite all queries, change the form

WHERE textarr && '{foo, bar}'::text[]

To

WHERE (textarr && '{foo}'::text[]
   OR textarr && '{bar}'::text[])

?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-06 Thread John Cheng



- "nha"  wrote:

> From: "nha" 
> To: "John Cheng" 
> Cc: pgsql-general@postgresql.org
> Sent: Monday, July 6, 2009 9:12:22 AM GMT -08:00 US/Canada Pacific
> Subject: Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) 
> operator
>
> Hello,
> 
> With your exhaustive example statements based on table foo and cars, I
> 
> performed some measures on my side (PostgreSQL 8.3.1 server). Here are
> 
> some statistical results:
> 
[ ... snipped ... ]
> 
> In my opinion, analysis and optimization may be deepen over table 
> indexes used for join planning. As your reported query plans show, the
> 
> Where clauses are performed independantly from the table ml_lead; the
> 
> reason is that all the attributes of the clauses belong to the table 
> lead_reporting_data. Time may be reduced on join condition
> achievements.
> 
> Hoping this observation will contribute a little to your opinion.
> 
> Without any claim, I attached a document to this email for details on
> 
> the measures I took with the overlap operator -- OpenDocument 
> Spreadsheet (ODS) v2 formatted file, 24 kiB. The 3rd sheet "various" 
> presents the detailed measures related to the data reported in this
> email.
> 
> Regards.
> 
> --
> nha / Lyon / France.

Hi nha,

I had not expected anyone to go to such lengths to evaluate my
situation, thank you so much!

After looking at your analysis, I realized that the test case I
created isn't close enough to the queries running in our prod
environment. For one, table 'foo' does not join to another table; The
other thing is that the amount of data isn't the same; Finally, these
tables have been ANALYZED.

So I took some time to update the test case. On our server, running
8.3.6, I was able to reproduce the difference between the two styles:
"arr&&{f,b}" and "arr&&{f} or arr&&{b}".

First, the setup:

-- Begin test case
-- Sets up 'bar'
SELECT id INTO TEMP TABLE bar FROM (SELECT generate_series(1,30) as id) AS 
bar;
CREATE INDEX bar_idx ON bar (id);
ANALYZE bar;
-- Sets up 'foo'
CREATE TEMP SEQUENCE foo_bar_id_seq;
CREATE TEMP TABLE foo (
bar_id numeric DEFAULT NEXTVAL('foo_bar_id_seq'),
keywords text[]
);
CREATE INDEX foo_idx ON foo USING gin (keywords);
INSERT INTO foo (keywords) VALUES ('{ford}'::text[]);
INSERT INTO foo (keywords) VALUES ('{toyota}'::text[]);
INSERT INTO foo (keywords) VALUES ('{volkswagen}'::text[]);
INSERT INTO foo (keywords) VALUES ('{saturn}'::text[]);
INSERT INTO foo (keywords) VALUES ('{honda}'::text[]);
INSERT INTO foo (keywords) VALUES ('{porsche}'::text[]);
INSERT INTO foo (keywords) VALUES ('{porsche, audi, chrysler}'::text[]);
INSERT INTO foo (keywords) VALUES ('{honda, hummer, ferrari}'::text[]);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
INSERT INTO foo (keywords) (SELECT keywords FROM foo);
ANALYZE foo;
-- End test case

Query for the form "arr&&{f,b}"
SELECT 
count(*) 
FROM foo
INNER JOIN bar ON foo.bar_id = bar.id
WHERE
foo.keywords && '{ford, toyota, volkswagen, saturn, honda, porsche, hummer, 
ferrari}'::text[];

Query for the form "arr&&{f} or arr&&{b}":
SELECT 
count(*)
FROM foo, bar
WHERE 
foo.bar_id = bar.id
AND
(
keywords && '{ford}'::text[]
OR keywords && '{toyota}'::text[] 
OR keywords && '{volkswagen}'::text[] 
OR keywords && '{saturn}'::text[] 
OR keywords && '{honda}'::text[] 
OR keywords && '{porsche}'::text[] 
OR keywords && '{hummer}'::text[] 
OR keywords && '{ferrari}'::text[] 
);

For the first form, "arr&&{f,b}&q

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-06 Thread nha


Hello,

Le 2/07/09 2:07, John Cheng a écrit :

We use text[] on one of our tables. This text[] column allows us to
search for records that matches a keyword in a set of keywords. For
example, if we want to find records that has a keyword of "foo" or
"bar", we can use the condition:

   keywords&&  '{foo, bar}'::text[]

Another wau is to do this:

   (keywords&&  '{foo}::text[]' OR keywords&&  '{bar}::text[]')

I am noticing a big difference between the two ways. I'm trying to
find out if we need to re-write our queries to speed them up, or
perhaps I am just missing something about how to use text[].

[...]
For some reason, I am seeing a big difference in our real database. I
don't want to just rewrite all of our queries yet. I'm guessing the
data makes a big difference.  What would be a good way to examine the
data to figure out what's the best way to write our queries? Is there
any features in PostgreSQL that can help me improve the performance?

Any advice would be greatly appreciated!



With your exhaustive example statements based on table foo and cars, I 
performed some measures on my side (PostgreSQL 8.3.1 server). Here are 
some statistical results:


seq rtm delta   ratio   deco
--- --- -   -   
s1cc873 -1,74%  2//91+1
s1or889 1,71%   2//91+1
s2cc13228,53%   3//91+2
s2or1209-9,32%  3//91+2
s3cc892 -2,61%  2//91+(.5*2)
s3or915 2,54%   2//91+(.5*2)
s4cc511 -3,00%  1//9(.5*2)
s4or526 2,91%   1//9(.5*2)
s5cc16352,13%   4//91+1+2
s5or1600-2,17%  4//91+1+2
--- --- -   -   

seq where clauses
--- -
s1cckeywords && '{ford, toyota}'::text[]
s1orkeywords && '{ford}'::text[] OR keywords && '{toyota}'::text[]
s2cckeywords && '{ford, honda}'::text[]
s2orkeywords && '{ford}'::text[] OR keywords && '{honda}'::text[]
s3cckeywords && '{honda, ferrari}'::text[]
s3orkeywords && '{honda}'::text[] OR keywords && '{ferrari}'::text[]
s4cckeywords && '{ferrari, hummer}'::text[]
s4orkeywords && '{ferrari}'::text[] OR keywords && '{hummer}'::text[]
s5cckeywords && '{ford, toyota, porsche}'::text[]
s5or	keywords && '{ford}'::text[] OR keywords && '{toyota}'::text[] OR 
keywords && '{porsche}'::text[]


legend
--
seq sequence of 10 subsequent explain analyze per row
rtm runtime mean (in milliseconds) of 10 subsequent measures
delta   difference percentage between cc and or sequences
cc  unique where clause with >1-size table (eg. {foo,bar})
or  multiple where clauses with 1-size text table (eg. {foo})
ratio   ratio between # of result rows and # of table rows
deco	result row partition between constant text table values in where 
clause.


In the following, I refer to your condition forms as:
- arr&&{f,b}
- arr&&{f} or arr&&{b}

I noticed first that, contrarily to your observation, for the "ford or 
toyata" case (sequence s1 developped into 2 subcases s1cc and s1or for 
both forms of condition), runtime mean is shorter for s1cc (arr&&{f,t}) 
than for s1or (arr&&{f} or arr&&{t}). But the difference percentage is 
only about 1,7% (ie. not enough relevant).


This empirical advantage of form arr&&{f,t} over form (arr&&{f} or 
arr&&{t}) is also observed for 2 cases (s3 and s4) out of 4 (s2 up to 
s5). The difference percentage looks more relevant (about 3%). The cases 
s3 and s4 differ from the others (s1, s2, and s5) by the fact that the 
sets of matching rows for each compared text table value intersect: all 
the rows matched by ferrari also match honda (strict inclusion not 
equality); all the rows matched by ferrari also match hummer and 
conversely (double inclusion here, ie. equality). In the other 3 cases, 
each compared text table value matches set of rows without intersecting 
the matching row set of the other(s) value(s). We may then assume that 
form arr&&{f,t} would fit better when there are lots of rows matched by 
several terms--however this cannot be generalized at this stage.


The reported data let us also guess some linear relationship between 
runtime and # of result rows. Here this relationship seems equally 
applicable with both forms arr&&{f,t} and (arr&&{f} or arr&&{t}).


Out of these measures and report, I notice that, regarding the query 
plan explanation of your queries over real data, the difference between 
actual runtimes reported for each case of condition forms is not so 
relevant with respect to the overall runtime of the queries. At bitmap 
heap scan on the table over which conditions are performed, when the 
last row is retrieved, actual runtime is respectively of:

- for arr&&{f,b}: 1276.990 ms;
- for (arr&&{f) or arr&&{b}): 1211.535 ms.
While quite close (difference percentage of about 5%), the difference is 
not really harmful with respect to the overall runtimes (resp. 13197 ms 
and 7768 ms), ie. in terms of part of overall runtimes resp. 
(1276/13197

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-03 Thread John Cheng


Hi Andreas,

I'm afraid fulltext search won't fit our app here. Our application
tags each record with "source flags", which is a text[] of strings
that describes where the record came from. These flags are already
passed into the application when we store the records. So we can
simply store them as text[]. Contrast to this, doing a fulltext search
would be storing these flags as one single string, then using the
to_tsvector() to have PostgreSQL parse it out again. The fulltext
search approach doesn't seem to make sense for us.

I'm also suspcious that the same type of problem would affect queries
on tsvector columns, but I have not tested myself.

- Original Message -
From: "Andreas Wenk" 
To: "John Cheng" , "PG-General Mailing List" 

Sent: Friday, July 3, 2009 2:12:46 AM GMT -08:00 US/Canada Pacific
Subject: Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) 
operator

John Cheng schrieb:
> ---
> For some reason, I am seeing a big difference in our real database. I
> don't want to just rewrite all of our queries yet. I'm guessing the
> data makes a big difference.  What would be a good way to examine the
> data to figure out what's the best way to write our queries? Is there
> any features in PostgreSQL that can help me improve the performance?
> 
> Any advice would be greatly appreciated!

Hi,

did you think about using the fulltext search integrated up from version 8.3. I 
never used 
your approach and don't know if the fulltextsearch is suitable for your case 
... just a hint.

http://www.postgresql.org/docs/8.4/interactive/textsearch.html

Cheers

Andy



  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-03 Thread Andreas Wenk


John Cheng schrieb:

---
For some reason, I am seeing a big difference in our real database. I
don't want to just rewrite all of our queries yet. I'm guessing the
data makes a big difference.  What would be a good way to examine the
data to figure out what's the best way to write our queries? Is there
any features in PostgreSQL that can help me improve the performance?

Any advice would be greatly appreciated!


Hi,

did you think about using the fulltext search integrated up from version 8.3. I never used 
your approach and don't know if the fulltextsearch is suitable for your case ... just a hint.


http://www.postgresql.org/docs/8.4/interactive/textsearch.html

Cheers

Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Problem search on text arrays, using the overlaps (&&) operator

2009-07-01 Thread John Cheng


We use text[] on one of our tables. This text[] column allows us to
search for records that matches a keyword in a set of keywords. For
example, if we want to find records that has a keyword of "foo" or
"bar", we can use the condition:

  keywords && '{foo, bar}'::text[]

Another wau is to do this:

  (keywords && '{foo}::text[]' OR keywords && '{bar}::text[]')

I am noticing a big difference between the two ways. I'm trying to
find out if we need to re-write our queries to speed them up, or
perhaps I am just missing something about how to use text[].

To set up a simple test case, use:

CREATE TEMP TABLE foo (
keywords text[]
);
CREATE INDEX foo_idx ON foo USING gin (keywords);
INSERT INTO foo VALUES ('{ford}'::text[]);
INSERT INTO foo VALUES ('{toyota}'::text[]);
INSERT INTO foo VALUES ('{volkswagen}'::text[]);
INSERT INTO foo VALUES ('{dodge}'::text[]);
INSERT INTO foo VALUES ('{saturn}'::text[]);
INSERT INTO foo VALUES ('{honda}'::text[]);
INSERT INTO foo VALUES ('{porsche}'::text[]);
INSERT INTO foo VALUES ('{porsche, audi, chrysler}'::text[]);
INSERT INTO foo VALUES ('{honda, hummer, ferrari}'::text[]);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
INSERT INTO foo (SELECT keywords FROM foo);
-- Query of form 'arr && {foo, bar}'
EXPLAIN ANALYZE SELECT 
count(*) 
FROM foo 
WHERE keywords && '{ford, toyota}'::text[];

-- result --
QUERY PLAN:
 Aggregate  (cost=9870.50..9870.51 rows=1 width=0) (actual 
time=449.937..449.938 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=104.88..9853.56 rows=6778 width=0) 
(actual time=61.197..308.724 rows=262144 loops=1)
 Recheck Cond: (keywords && '{ford,toyota}'::text[])
 ->  Bitmap Index Scan on foo_idx  (cost=0.00..103.19 rows=6778 
width=0) (actual time=58.816..58.816 rows=262144 loops=1)
   Index Cond: (keywords && '{ford,toyota}'::text[])
 Total runtime: 450.121 ms
(6 rows)


-- Query of form 'arr && {foo} OR arr && bar'
EXPLAIN ANALYZE SELECT count(*) FROM foo WHERE 
(
keywords && '{ford}'::text[]
OR keywords && '{toyota}'::text[] 
)

-- result --
QUERY PLAN:
 Aggregate  (cost=11351.85..11351.86 rows=1 width=0) (actual 
time=424.389..424.389 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=213.13..11318.04 rows=13522 width=0) 
(actual time=43.728..273.913 rows=262144 loops=1)
 Recheck Cond: ((keywords && '{ford}'::text[]) OR (keywords && 
'{toyota}'::text[]))
 ->  BitmapOr  (cost=213.13..213.13 rows=13556 width=0) (actual 
time=41.386..41.386 rows=0 loops=1)
   ->  Bitmap Index Scan on foo_idx  (cost=0.00..103.19 rows=6778 
width=0) (actual time=21.216..21.216 rows=131072 loops=1)
 Index Cond: (keywords && '{ford}'::text[])
   ->  Bitmap Index Scan on foo_idx  (cost=0.00..103.19 rows=6778 
width=0) (actual time=20.167..20.167 rows=131072 loops=1)
 Index Cond: (keywords && '{toyota}'::text[])
 Total runtime: 424.431 ms
(9 rows)


The difference is very little here. However, in our application I am
seeing a much bigger difference. The affected query is a lot more
complicated:

First, a query of the form "keywords && '{foo, bar}'::text[]"

EXPLAIN ANALYZE SELECT
count(*)
FROM mb_lead ml
INNER JOIN lead_reporting_data lrd ON lrd.lead_id = ml.lead_id 
WHERE lrd.typeflags && '{autobytel.volume, automotive}'::text[];


-- result --
QUERY PLAN:
 Aggregate  (cost=71602.10..71602.11 rows=1 width=0) (actual 
time=13196.895..13196.896 rows=1 loops=1)
   ->  Hash Join  (cost=60278.37..71598.14 rows=1582 width=0) (actual 
time=1924.076..13170.602 rows=29567 loops=1)
 Hash Cond: (ml.lead_id = lrd.lead_id)
 ->  Seq Scan on mb_lead ml  (cost=0.00..6557.98 rows=316398 width=8) 
(actual time=0.014..293.214 rows=316398 loops=1)
 ->  Hash  (cost=60022.57..60022.57 rows=20464 width=8) (actual 
time=1922.050..1922.050 rows=473743 loops=1)
   ->  Bitmap Heap Scan on lead_reporting_data lrd  
(cost=808.14..60022.57 rows=20464 width=8) (actual time=424.841..1276.990 
rows=473743 loops=1)
 Recheck Cond: (typeflags && 
'{autobytel.volume,automotive}'::text[])
 ->  Bitmap Index Scan on lead_reporting_data_typeflags_idx 
 (cost=0.00..803.02 rows=20464 width=0) (actual time=308.941..308.941 
rows

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

Re: [GENERAL] Problem search on text arrays, using the overlaps (&&) operator

[GENERAL] Problem search on text arrays, using the overlaps (&&) operator

9 matches

Site Navigation

Mail list logo

Footer information