Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-03-06 Thread Joris Dobbelsteen
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 5 March 2008 0:29
To: Joris Dobbelsteen
Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner: rows=1 after similar to 
where condition. 

Joris Dobbelsteen [EMAIL PROTECTED] writes:
[snip]
   Recheck Cond: (((program)::text = 'amavis'::text) AND 
 ((facility)::text = 'mail'::text))

selects rows that match the regex with much higher probability 
than the general row population does.  Since we don't yet have 
any cross-column statistics the planner has no chance of 
realizing that.

Yes, that was it, should have realized that.

Thanks,

- Joris

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-03-04 Thread Joris Dobbelsteen
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Joris 
Dobbelsteen
Sent: Monday, 25 February 2008 17:08
To: Tom Lane
Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner: rows=1 after similar to 
where condition. 

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Monday, 25 February 2008 16:34
To: Joris Dobbelsteen
Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner: rows=1 after similar to 
where condition. 

Joris Dobbelsteen [EMAIL PROTECTED] writes:
 Bitmap Heap Scan on log_syslog syslog  (cost=11168.32..16988.84
 rows=1
 width=221) (actual time=11145.729..30067.606 rows=212 loops=1)
   Recheck Cond: (((program)::text = 'amavis'::text) AND 
 ((facility)::text = 'mail'::text))
   Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
 '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed
[A-Za-z0-9]+,
 [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+,
 (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
 queued.as: [^ ,]+, [0-9]+ ms)$'::text))

It's not too surprising that you'd get a small selectivity 
estimate for 
such a long regexp; the default estimate is just based on the 
amount of 
fixed text in the pattern, and you've got a lot.

If you increase the stats target for the column to 100 or 
more then it 
will try actually applying the regexp to all the histogram entries.
That might or might not give you a better estimate.

I will try that, expect result back within a few days (have it 
collect some better sample set). Unfortunally the regex is not 
so much for narrowing down the selection, but rather 
guarenteeing the format of the messages.
You seem to consider the common case differently, and I can 
agree for most part. Unfortunally my use-case is different 
from the expected. That said, might a less aggressive 
selectivity estimation for long strings work better in the common case?

A new test case (I did a fresh VACUUM ANALYZE with your statistics for
text set to 100):
Arround 5288 rows out of 4.3 Million match.

Bitmap Heap Scan on log_syslog syslog  (cost=1.94..53522.27 rows=1
width=226) (actual time=41661.354..92719.083 rows=5288 loops=1)
  Recheck Cond: (((program)::text = 'amavis'::text) AND
((facility)::text = 'mail'::text))
  Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
'***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+,
[][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+,
(Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
queued.as: [^ ,]+, [0-9]+ ms)$'::text))
  -  BitmapAnd  (cost=1.94..1.94 rows=15279 width=0) (actual
time=4641.009..4641.009 rows=0 loops=1)
-  Bitmap Index Scan on IX_log_syslog_program
(cost=0.00..2908.86 rows=113370 width=0) (actual time=2913.718..2913.718
rows=113897 loops=1)
  Index Cond: ((program)::text = 'amavis'::text)
-  Bitmap Index Scan on IX_log_syslog_facility
(cost=0.00..14868.57 rows=591426 width=0) (actual
time=1715.591..1715.591 rows=586509 loops=1)
  Index Cond: ((facility)::text = 'mail'::text)
Total runtime: 92738.389 ms

Unfortunally, Tom, it seems the data varies to much and is not included
in the histogram. Probably the data varies too much. In this case, a
regex NOT for selection but rather for forcing the input format should
be done differently.
My construction with the regex as substring() construction and a
WHERE substring() IS NOT NULL seems to give a better estimate in these
cases. The result seems equivalent.

Bitmap Heap Scan on log_syslog syslog  (cost=17783.78..53966.33
rows=5844 width=226) (actual time=59095.076..110913.152 rows=5295
loops=1)
  Recheck Cond: (((program)::text = 'amavis'::text) AND
((facility)::text = 'mail'::text))
  Filter: (((priority)::text = 'notice'::text) AND
(substring((text)::text, 'amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\)
Passed \\[A-Za-z0-9]+\\, [][0-9.]* [^]+ - [^]+, Message-ID:
[^]+, (Resent-Message-ID: [^]+, |)mail_id: [^ ,]+, Hits:
[-+0-9.,]+, queued_as: [^ ,]+, [0-9]+ ms'::text, '\\'::text) IS NOT
NULL))
  -  BitmapAnd  (cost=17783.78..17783.78 rows=15279 width=0) (actual
time=4003.657..4003.657 rows=0 loops=1)
-  Bitmap Index Scan on IX_log_syslog_program
(cost=0.00..2908.86 rows=113370 width=0) (actual time=1652.278..1652.278
rows=113939 loops=1)
  Index Cond: ((program)::text = 'amavis'::text)
-  Bitmap Index Scan on IX_log_syslog_facility
(cost=0.00..14868.57 rows=591426 width=0) (actual
time=2339.943..2339.943 rows=586653 loops=1)
  Index Cond: ((facility)::text = 'mail'::text)
Total runtime: 110921.978 ms

Note: few added rows in second run is due to the fact that this is a
live table that receives input continuesly.

Concluding:
Your estimator is really great and seems to give pretty good estimates!
Except for regular expressions, which seem more tricky in this regard.

A good note might

Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-03-04 Thread Tom Lane
Joris Dobbelsteen [EMAIL PROTECTED] writes:
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 If you increase the stats target for the column to 100 or 
 more then it 
 will try actually applying the regexp to all the histogram entries.
 That might or might not give you a better estimate.

 A new test case (I did a fresh VACUUM ANALYZE with your statistics for
 text set to 100):
 Arround 5288 rows out of 4.3 Million match.

Ah, you had not given us that number before.  That's one in 800 rows,
more or less, which means that there's no chance of getting a
well-founded statistical estimate with less than 800 items in the stats
collection.  Does it do any better with stats target set to 1000?

I think though that the real problem may be that the index condition

   Recheck Cond: (((program)::text = 'amavis'::text) AND
 ((facility)::text = 'mail'::text))

selects rows that match the regex with much higher probability than the
general row population does.  Since we don't yet have any cross-column
statistics the planner has no chance of realizing that.

 My construction with the regex as substring() construction and a
 WHERE substring() IS NOT NULL seems to give a better estimate in these
 cases. The result seems equivalent.

Actually, it's got exactly 0 knowledge about substring() and is giving
you a completely generic guess for this clause :-(

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-25 Thread Joris Dobbelsteen
-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Monday, 25 February 2008 7:14
To: Joris Dobbelsteen
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner: rows=1 after similar to 
where condition.

On Sun, Feb 24, 2008 at 4:35 PM, Joris Dobbelsteen 
[EMAIL PROTECTED] wrote:
 I seem to have some planner oddity, where it seems to completely  
 mispredict the output after a regex compare. I've seem it on other  
 occasions, where it completely screws up the join. You can note the  
 rows=1 after the filter.
  A similar sitution has occurred when doing a regex filter in a 
 subquery,  which was subsequently predited as 1 row and triggered 
 (oddly enough) a  sequencial scan. Doing the same using 
equality on 
 the result to  substring(text from regex) seemed to work and 
 produced a useful  plan, since it did a hash-join (as it 
should have).
  Is this a known problem? Otherwise I think I should build a smaller 
 test  case...

  Using Postgresql 8.2.6 from Debian Etch-backports.

Should be:
PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian
4.1.1-21).
Should have paid closer attention.


  Bitmap Heap Scan on log_syslog syslog  (cost=13124.26..51855.25 
 rows=1  width=270)
Recheck Cond: (((program)::text = 'amavis'::text) AND  
 ((facility)::text = 'mail'::text))
Filter: ***SOME VERY LONG SIMILAR TO REGEX
-  BitmapAnd  (cost=13124.26..13124.26 rows=18957 width=0)
  -  Bitmap Index Scan on IX_log_syslog_program
  (cost=0.00..2223.95 rows=92323 width=0)
Index Cond: ((program)::text = 'amavis'::text)
  -  Bitmap Index Scan on IX_log_syslog_facility
  (cost=0.00..10899.81 rows=463621 width=0)
Index Cond: ((facility)::text = 'mail'::text)

It's not saying it will only get one row back for sure, it's 
saying it thinks it will return one row.  and depending on 
your query, it might.
 What's the query, and what's the explain analyze of that query?


See the attached file for the query and the explain (hopefully this
gives a consistent view and maintains the layout for easier reading).

The point is that it will NOT, not even close. The planner guesses 1
row, but the output was arround 13000 rows (of the 2.2M rows in the
table). Oddly enough the 18k rows on the bitmap and seems a very good
estimation. In fact, if I omit the SIMILAR TO, it estimates ~12000
rows, which is spot on. So it seems the SIMILAR TO really gets the
planner confused.

The real value was 12981 rows that were returned by the query in the
first case. However, since I removed this data from the original table
(its now somewhere else), I cannot present the original EXPLAIN ANALYZE
any more. The new dataset only contains ~137 (but I still have the old
statistics, I think, or at least they provide the same predictions).
I also included a run after EXPLAIN ANALYZE on the current dataset.

Hopefully this helps.

Thanks,

- Joris


QUERY=



SELECT  a.msgid, a.rcv_time, a.sent_time, a.hostname, 
a.passedas, a.from, a.to,
a.message_id, a.resent_message_id,
a.mail_id,
a.queue_id,
convert_score(a.spamscore) AS spamscore,
CAST(a.duration AS integer)
FROM(SELECT syslog.msgid, syslog.rcv_time, syslog.sent_time, 
syslog.hostname,
substring(syslog.text from 'amavis% Passed #%#, % % - %, 
Message-ID: %, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as passedas,
substring(syslog.text from 'amavis% Passed %, % #%# - %, 
Message-ID: %, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as from,
substring(syslog.text from 'amavis% Passed %, % % - #%#, 
Message-ID: %, mail_id: %, Hits: %, queued_as: %, % ms' for '#') as to,
substring(syslog.text from 'amavis% Passed %, % % - %, 
Message-ID: #[^]+#, (Resent-Message-ID: #%#, |)mail_id: %, Hits: %, 
queued_as: %, % ms' for '#') as message_id,
substring(syslog.text from 'amavis% Passed %, % % - %, 
Message-ID: %, Resent-Message-ID: #%#, mail_id: %, Hits: %, queued_as: %, 
% ms' for '#') as resent_message_id,
substring(syslog.text from 'amavis% Passed %, % % - %, 
Message-ID: %, mail_id: #%#, Hits: %, queued_as: %, % ms' for '#') as 
mail_id,
substring(syslog.text from 'amavis% Passed %, % % - %, 
Message-ID: %, mail_id: %, Hits: #%#, queued_as: %, % ms' for '#') as 
spamscore,
substring(syslog.text from 'amavis% Passed %, % % - %, 
Message-ID: %, mail_id: %, Hits: %, queued_as: #%#, % ms' for '#') as 
queue_id,
substring(syslog.text from 'amavis% Passed %, % % - %, 
Message-ID: %, mail_id: %, Hits: %, queued_as: %, #%# ms' for '#') as 
duration,
syslog.text,
syslog.facility, syslog.priority, syslog.program
 FROM public.log_syslog syslog
 WHERE syslog.program = 'amavis'
 AND syslog.facility = 'mail'
 AND syslog.priority

Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-25 Thread Gregory Stark
Joris Dobbelsteen [EMAIL PROTECTED] writes:

 Should be:
 PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian 
 4.1.1-21).

In this case that may matter. One of the changes in the 8.2.6 update was:

. Improve planner's handling of LIKE/regex estimation in non-C locales (Tom)

I seem to recall this largely had to do with negated regexp matches but I
might only be remembering part of it. I would start by taking all the fixes
for known bugs before trying to diagnose a new one :)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-25 Thread Joris Dobbelsteen
-Original Message-
From: Gregory Stark [mailto:[EMAIL PROTECTED] 
Sent: Monday, 25 February 2008 12:31
To: Joris Dobbelsteen
Cc: Scott Marlowe; pgsql-general@postgresql.org
Subject: Re: Planner: rows=1 after similar to where condition.

Joris Dobbelsteen [EMAIL PROTECTED] writes:

 Should be:
 PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) 
(Debian 4.1.1-21).

In this case that may matter. One of the changes in the 8.2.6 
update was:

. Improve planner's handling of LIKE/regex estimation in non-C 
locales (Tom)

I seem to recall this largely had to do with negated regexp 
matches but I might only be remembering part of it. I would 
start by taking all the fixes for known bugs before trying to 
diagnose a new one :)

It seems debian backports are not upgraded on my system. So I performed
the upgrade to:
PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)
And indeed, I should have upgraded it before to the latest version.

It still spits out the same EXPLAIN ANALYZE result.

Bitmap Heap Scan on log_syslog syslog  (cost=11168.32..16988.84 rows=1
width=221) (actual time=11145.729..30067.606 rows=212 loops=1)
  Recheck Cond: (((program)::text = 'amavis'::text) AND
((facility)::text = 'mail'::text))
  Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
'***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+,
[][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+,
(Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
queued.as: [^ ,]+, [0-9]+ ms)$'::text))
  -  BitmapAnd  (cost=11168.32..11168.32 rows=2095 width=0) (actual
time=608.771..608.771 rows=0 loops=1)
-  Bitmap Index Scan on IX_log_syslog_program
(cost=0.00..490.06 rows=19160 width=0) (actual time=94.982..94.982
rows=85238 loops=1)
  Index Cond: ((program)::text = 'amavis'::text)
-  Bitmap Index Scan on IX_log_syslog_facility
(cost=0.00..10677.75 rows=426214 width=0) (actual time=504.960..504.960
rows=455084 loops=1)
  Index Cond: ((facility)::text = 'mail'::text)
Total runtime: 30068.868 ms

From what little I know statistics and how, in postgres, the filter
expressions are computed, the rows=1 prediction seems still a bit odd.

It seems the fix did not effect this query.

Thanks for your support so far,

- Joris


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-25 Thread Tom Lane
Joris Dobbelsteen [EMAIL PROTECTED] writes:
 Bitmap Heap Scan on log_syslog syslog  (cost=11168.32..16988.84 rows=1
 width=221) (actual time=11145.729..30067.606 rows=212 loops=1)
   Recheck Cond: (((program)::text = 'amavis'::text) AND
 ((facility)::text = 'mail'::text))
   Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
 '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed [A-Za-z0-9]+,
 [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+,
 (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
 queued.as: [^ ,]+, [0-9]+ ms)$'::text))

It's not too surprising that you'd get a small selectivity estimate
for such a long regexp; the default estimate is just based on the amount
of fixed text in the pattern, and you've got a lot.

If you increase the stats target for the column to 100 or more then it
will try actually applying the regexp to all the histogram entries.
That might or might not give you a better estimate.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-25 Thread Joris Dobbelsteen
Resent due to bounce... 
orange.nl #5.0.0 X-SMTP-Server; host sss.pgh.pa.us[66.207.139.130] said:
550

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, 25 February 2008 16:34
To: Joris Dobbelsteen
Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner: rows=1 after similar to 
where condition. 

Joris Dobbelsteen [EMAIL PROTECTED] writes:
 Bitmap Heap Scan on log_syslog syslog  (cost=11168.32..16988.84 
 rows=1
 width=221) (actual time=11145.729..30067.606 rows=212 loops=1)
   Recheck Cond: (((program)::text = 'amavis'::text) AND 
 ((facility)::text = 'mail'::text))
   Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
 '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed 
[A-Za-z0-9]+,
 [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+,
 (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
 queued.as: [^ ,]+, [0-9]+ ms)$'::text))

It's not too surprising that you'd get a small selectivity 
estimate for such a long regexp; the default estimate is just 
based on the amount of fixed text in the pattern, and you've got a lot.

If you increase the stats target for the column to 100 or more 
then it will try actually applying the regexp to all the 
histogram entries.
That might or might not give you a better estimate.

I will try that, expect result back within a few days (have it collect
some better sample set). Unfortunally the regex is not so much for
narrowing down the selection, but rather guarenteeing the format of the
messages.
You seem to consider the common case differently, and I can agree for
most part. Unfortunally my use-case is different from the expected. That
said, might a less aggressive selectivity estimation for long strings
work better in the common case?

Might an alternative be to use a function and check for a positive
result, i.e. something that the predictor cannot take into account?

- Joris



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-25 Thread Joris Dobbelsteen
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, 25 February 2008 16:34
To: Joris Dobbelsteen
Cc: Gregory Stark; Scott Marlowe; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Planner: rows=1 after similar to 
where condition. 

Joris Dobbelsteen [EMAIL PROTECTED] writes:
 Bitmap Heap Scan on log_syslog syslog  (cost=11168.32..16988.84 
 rows=1
 width=221) (actual time=11145.729..30067.606 rows=212 loops=1)
   Recheck Cond: (((program)::text = 'amavis'::text) AND 
 ((facility)::text = 'mail'::text))
   Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
 '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed 
[A-Za-z0-9]+,
 [][0-9\\.]* [^]+ - [^]+, Message-ID: [^]+,
 (Resent-Message-ID: [^]+, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
 queued.as: [^ ,]+, [0-9]+ ms)$'::text))

It's not too surprising that you'd get a small selectivity 
estimate for such a long regexp; the default estimate is just 
based on the amount of fixed text in the pattern, and you've got a lot.

If you increase the stats target for the column to 100 or more 
then it will try actually applying the regexp to all the 
histogram entries.
That might or might not give you a better estimate.

I will try that, expect result back within a few days (have it collect
some better sample set). Unfortunally the regex is not so much for
narrowing down the selection, but rather guarenteeing the format of the
messages.
You seem to consider the common case differently, and I can agree for
most part. Unfortunally my use-case is different from the expected. That
said, might a less aggressive selectivity estimation for long strings
work better in the common case?

Might an alternative be to use a function and check for a positive
result, i.e. something that the predictor cannot take into account?

- Joris


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Planner: rows=1 after similar to where condition.

2008-02-24 Thread Joris Dobbelsteen
I seem to have some planner oddity, where it seems to completely
mispredict the output after a regex compare. I've seem it on other
occasions, where it completely screws up the join. You can note the
rows=1 after the filter.
A similar sitution has occurred when doing a regex filter in a subquery,
which was subsequently predited as 1 row and triggered (oddly enough) a
sequencial scan. Doing the same using equality on the result to
substring(text from regex) seemed to work and produced a useful
plan, since it did a hash-join (as it should have).
Is this a known problem? Otherwise I think I should build a smaller test
case...

Using Postgresql 8.2.6 from Debian Etch-backports.

Bitmap Heap Scan on log_syslog syslog  (cost=13124.26..51855.25 rows=1
width=270)
  Recheck Cond: (((program)::text = 'amavis'::text) AND
((facility)::text = 'mail'::text))
  Filter: ***SOME VERY LONG SIMILAR TO REGEX
  -  BitmapAnd  (cost=13124.26..13124.26 rows=18957 width=0)
-  Bitmap Index Scan on IX_log_syslog_program
(cost=0.00..2223.95 rows=92323 width=0)
  Index Cond: ((program)::text = 'amavis'::text)
-  Bitmap Index Scan on IX_log_syslog_facility
(cost=0.00..10899.81 rows=463621 width=0)
  Index Cond: ((facility)::text = 'mail'::text)

- Joris


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Planner: rows=1 after similar to where condition.

2008-02-24 Thread Scott Marlowe
On Sun, Feb 24, 2008 at 4:35 PM, Joris Dobbelsteen
[EMAIL PROTECTED] wrote:
 I seem to have some planner oddity, where it seems to completely
  mispredict the output after a regex compare. I've seem it on other
  occasions, where it completely screws up the join. You can note the
  rows=1 after the filter.
  A similar sitution has occurred when doing a regex filter in a subquery,
  which was subsequently predited as 1 row and triggered (oddly enough) a
  sequencial scan. Doing the same using equality on the result to
  substring(text from regex) seemed to work and produced a useful
  plan, since it did a hash-join (as it should have).
  Is this a known problem? Otherwise I think I should build a smaller test
  case...

  Using Postgresql 8.2.6 from Debian Etch-backports.

  Bitmap Heap Scan on log_syslog syslog  (cost=13124.26..51855.25 rows=1
  width=270)
Recheck Cond: (((program)::text = 'amavis'::text) AND
  ((facility)::text = 'mail'::text))
Filter: ***SOME VERY LONG SIMILAR TO REGEX
-  BitmapAnd  (cost=13124.26..13124.26 rows=18957 width=0)
  -  Bitmap Index Scan on IX_log_syslog_program
  (cost=0.00..2223.95 rows=92323 width=0)
Index Cond: ((program)::text = 'amavis'::text)
  -  Bitmap Index Scan on IX_log_syslog_facility
  (cost=0.00..10899.81 rows=463621 width=0)
Index Cond: ((facility)::text = 'mail'::text)

It's not saying it will only get one row back for sure, it's saying it
thinks it will return one row.  and depending on your query, it might.
 What's the query, and what's the explain analyze of that query?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq