Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> No, that's not right, the table was empty.  I rebuilt the table as it 
> was before, here are all three queries again:

Ah, thanks for the more solid data.

> ->  Bitmap Index Scan on search_key  (cost=0.00..63623.00 rows=1 width=0) 
> (actual time=4.996..4.996 rows=1 loops=1)
>   Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
> N:*'::text))

> ->  Bitmap Index Scan on search_key  (cost=0.00..23.00 rows=1 width=0) 
> (actual time=4.057..4.057 rows=1 loops=1)
>   Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
> N'::text))

This says there's only about a 25% runtime penalty for the partial match,
at least on your example, compared to the planner's estimate of 2700x
penalty :-(.  Definitely need to fix that.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> Here are my results, if there are any others you'd like to see please 
> let me know.  Thanks Tom.

For comparison, could we see the results for the non-partial case, ie

explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 9:53 AM, Tom Lane wrote:

Andy Colson  writes:

Here are my results, if there are any others you'd like to see please
let me know.  Thanks Tom.


For comparison, could we see the results for the non-partial case, ie

explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

regards, tom lane



QUERY PLAN 


---
 Bitmap Heap Scan on search  (cost=6.00..7.02 rows=1 width=100) (actual 
time=0.029..0.029 rows=0 loops=1)

   Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N'::text))
   ->  Bitmap Index Scan on search_key  (cost=0.00..6.00 rows=1 
width=0) (actual time=0.025..0.025 rows=0 loops=1)
 Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N'::text))

 Total runtime: 0.060 ms
(5 rows)




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 9:55 AM, Andy Colson wrote:

On 12/30/2015 9:53 AM, Tom Lane wrote:

Andy Colson  writes:

Here are my results, if there are any others you'd like to see please
let me know.  Thanks Tom.


For comparison, could we see the results for the non-partial case, ie

explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

regards, tom lane



 QUERY PLAN
---

  Bitmap Heap Scan on search  (cost=6.00..7.02 rows=1 width=100) (actual
time=0.029..0.029 rows=0 loops=1)
Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N'::text))
->  Bitmap Index Scan on search_key  (cost=0.00..6.00 rows=1
width=0) (actual time=0.025..0.025 rows=0 loops=1)
  Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST &
N'::text))
  Total runtime: 0.060 ms
(5 rows)






Oh!  I just realized... this is on my test box and I just blew away that 
table to rebuild it.  And I have to run off to a meeting, so if this 
analyze makes no sense, that's why.


I'll re-re-do it within the hour.  Sorry about that.

-Andy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 10:09 AM, Tom Lane wrote:

Andy Colson  writes:

 ->  Bitmap Index Scan on search_key  (cost=0.00..6.00 rows=1 width=0) 
(actual time=0.025..0.025 rows=0 loops=1)
   Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N'::text))


Hmm ... so the partial case actually is significantly more expensive than
the non-partial case: 4 msec vs .025 msec.  Still, that's about a 200x
penalty, not the 1x penalty the planner is ascribing to it.

Thanks for the data!  I'll go moan about this on -hackers.

regards, tom lane



No, that's not right, the table was empty.  I rebuilt the table as it 
was before, here are all three queries again:


explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');

  QUERY PLAN 


---
 Seq Scan on search  (cost=0.00..2211.08 rows=1 width=73) (actual 
time=31.904..380.568 rows=1 loops=1)

   Filter: (search_vec @@ to_tsquery('213 & E & 13 & ST & N:*'::text))
   Rows Removed by Filter: 79071
 Total runtime: 380.609 ms



set enable_seqscan TO 0;
explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');

  QUERY PLAN 


---
 Bitmap Heap Scan on search  (cost=63623.00..63624.02 rows=1 width=73) 
(actual time=5.004..5.004 rows=1 loops=1)
   Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N:*'::text))
   ->  Bitmap Index Scan on search_key  (cost=0.00..63623.00 rows=1 
width=0) (actual time=4.996..4.996 rows=1 loops=1)
 Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N:*'::text))

 Total runtime: 5.045 ms


explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

 QUERY PLAN 



 Bitmap Heap Scan on search  (cost=23.00..24.02 rows=1 width=73) 
(actual time=4.067..4.067 rows=1 loops=1)

   Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N'::text))
   ->  Bitmap Index Scan on search_key  (cost=0.00..23.00 rows=1 
width=0) (actual time=4.057..4.057 rows=1 loops=1)
 Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N'::text))

 Total runtime: 4.109 ms



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
I wrote:
> This says there's only about a 25% runtime penalty for the partial match,
> at least on your example, compared to the planner's estimate of 2700x
> penalty :-(.  Definitely need to fix that.

I tried to reproduce this behavior with simple generated data, and could
not: the estimates seem to track the actual cost reasonably well.  So
it seems like you've got some weird data statistics that are causing a
misestimate.  Could we see the pg_stats row for that tsvector column?
Or maybe even the actual data?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Tom Lane
Cory Tucker  writes:
> This table is almost always queried using a combination of (account_id,
> record_id) and is generally pretty fast.  However, under certain loads, the
> query becomes slower and slower as time goes on.  The workload that causes
> this to happen is when data for a new account_id is being inserted into the
> table.  This will happen in rapid succession and may insert millions of
> rows over the course of several hours.

Are those insertions happening in one enormous transaction, or even just
a few very large ones?

> The pattern that I notice when this happens is that the CPU on DB will be
> pegged much higher than usual, and the query to lookup records for the
> (account_id, record_id) combo will steadily rise from <1ms to more then 2
> or 3 seconds over time.

I'm suspicious that this is not autovacuum's fault but reflects the cost
of checking uncommitted tuples to see if they've become committed yet.
If so, there may be little you can do about it except break the insertion
into smaller transactions ... which might or might not be all right from
a data consistency standpoint.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Cory Tucker
On Wed, Dec 30, 2015 at 11:20 AM Tom Lane  wrote:

> Cory Tucker  writes:
> > This table is almost always queried using a combination of (account_id,
> > record_id) and is generally pretty fast.  However, under certain loads,
> the
> > query becomes slower and slower as time goes on.  The workload that
> causes
> > this to happen is when data for a new account_id is being inserted into
> the
> > table.  This will happen in rapid succession and may insert millions of
> > rows over the course of several hours.
>
> Are those insertions happening in one enormous transaction, or even just
> a few very large ones?
>

No, one transaction per row insert.


>
> > The pattern that I notice when this happens is that the CPU on DB will be
> > pegged much higher than usual, and the query to lookup records for the
> > (account_id, record_id) combo will steadily rise from <1ms to more then 2
> > or 3 seconds over time.
>
> I'm suspicious that this is not autovacuum's fault but reflects the cost
> of checking uncommitted tuples to see if they've become committed yet.
> If so, there may be little you can do about it except break the insertion
> into smaller transactions ... which might or might not be all right from
> a data consistency standpoint.
>
> regards, tom lane
>


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 1:55 PM, Tom Lane wrote:

Andy Colson  writes:

On 12/30/2015 1:07 PM, Tom Lane wrote:

it seems like you've got some weird data statistics that are causing a
misestimate.  Could we see the pg_stats row for that tsvector column?
Or maybe even the actual data?



The table exists in a schema named jasperia, I've been removing the name
for simplicity.  The dump of the table is here:


Thanks very much for sharing the data.  But now I am well and truly
confused, because I still can't reproduce your results.  I get

regression=# explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');
  QUERY PLAN

  Bitmap Heap Scan on search  (cost=76.00..80.02 rows=1 width=72) (actual 
time=8.119..8.119 rows=1 loops=1)
Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N:*'::text))
->  Bitmap Index Scan on search_key  (cost=0.00..76.00 rows=1 width=0) 
(actual time=8.113..8.113 rows=1 loops=1)
  Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N:*'::text))
  Total runtime: 8.210 ms

This is on 9.3 branch tip, not 9.3.9 which I don't have installed;
but I see no bug fixes related to GIN estimation in the commit logs
since 9.3.9.

Are you using any nondefault planner settings?  Anything else
unusual about your installation?

regards, tom lane



There are others, but I'll bet its:

random_page_cost = 1


The Others:

max_connections = 20
shared_buffers = 400MB
work_mem = 5MB
maintenance_work_mem = 64MB
effective_cache_size = 1700MB
synchronous_commit = off
effective_io_concurrency = 3
track_io_timing = on
max_locks_per_transaction = 2300


I can't honestly say why I've set random_page_cost.  Its been way too 
long for me to remember.  The box is running a 4 drive sata software 
raid 10, on Slackware64.


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> On 12/30/2015 1:55 PM, Tom Lane wrote:
>> Are you using any nondefault planner settings?  Anything else
>> unusual about your installation?

> There are others, but I'll bet its:
> random_page_cost = 1

Nope...

Maybe something weird about the build you're using?  What does
pg_config print?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

Wow thats bad.

Here's another link:

http://camavision.com/dn/stats.txt

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Cory Tucker
We have a performance problem accessing one of our tables, I think because
the statistics are out of date.  The table is fairly large, on the order of
100M rows or so.
The general structure of the table is as follows:

Column | Type | Modifiers
---+--+
id | bigint | not null default nextval('foo_id_seq'::regclass)
record_id | text |
account_id | bigint | not null

With indexes:
"foo_pkey" PRIMARY KEY, btree (id)
"uq_account_id_record_id" UNIQUE CONSTRAINT, btree (account_id,
record_id)


This table is almost always queried using a combination of (account_id,
record_id) and is generally pretty fast.  However, under certain loads, the
query becomes slower and slower as time goes on.  The workload that causes
this to happen is when data for a new account_id is being inserted into the
table.  This will happen in rapid succession and may insert millions of
rows over the course of several hours.

The pattern that I notice when this happens is that the CPU on DB will be
pegged much higher than usual, and the query to lookup records for the
(account_id, record_id) combo will steadily rise from <1ms to more then 2
or 3 seconds over time.

The fix I have employed to restore the speed of the query after I notice it
is happening is to manually issue a VACUUM ANALYZE on the table.  After the
analyze is done, the query returns to its normal speed.

I am looking for suggestions for how to tune, or perhaps automatically
detect this pattern, so that I don't have to manually intervene whenever
this happens.

Here are my autovacuum settings:

name |  setting  | unit
-+---+--
 autovacuum  | on|
 autovacuum_analyze_scale_factor | 0.05  |
 autovacuum_analyze_threshold| 50|
 autovacuum_freeze_max_age   | 2 |
 autovacuum_max_workers  | 3 |
 autovacuum_multixact_freeze_max_age | 4 |
 autovacuum_naptime  | 30| s
 autovacuum_vacuum_cost_delay| 20| ms
 autovacuum_vacuum_cost_limit| -1|
 autovacuum_vacuum_scale_factor  | 0.1   |
 autovacuum_vacuum_threshold | 50|
 autovacuum_work_mem | -1| kB

We're using 9.4.4 (RDS)


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> On 12/30/2015 1:07 PM, Tom Lane wrote:
>> it seems like you've got some weird data statistics that are causing a
>> misestimate.  Could we see the pg_stats row for that tsvector column?
>> Or maybe even the actual data?

> The table exists in a schema named jasperia, I've been removing the name 
> for simplicity.  The dump of the table is here:

Thanks very much for sharing the data.  But now I am well and truly
confused, because I still can't reproduce your results.  I get

regression=# explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');
 QUERY PLAN 


 Bitmap Heap Scan on search  (cost=76.00..80.02 rows=1 width=72) (actual 
time=8.119..8.119 rows=1 loops=1)
   Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N:*'::text))
   ->  Bitmap Index Scan on search_key  (cost=0.00..76.00 rows=1 width=0) 
(actual time=8.113..8.113 rows=1 loops=1)
 Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & N:*'::text))
 Total runtime: 8.210 ms

This is on 9.3 branch tip, not 9.3.9 which I don't have installed;
but I see no bug fixes related to GIN estimation in the commit logs
since 9.3.9.

Are you using any nondefault planner settings?  Anything else
unusual about your installation?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 2:03 PM, Andy Colson wrote:

On 12/30/2015 1:55 PM, Tom Lane wrote:

Andy Colson  writes:

On 12/30/2015 1:07 PM, Tom Lane wrote:

it seems like you've got some weird data statistics that are causing a
misestimate.  Could we see the pg_stats row for that tsvector column?
Or maybe even the actual data?



The table exists in a schema named jasperia, I've been removing the name
for simplicity.  The dump of the table is here:


Thanks very much for sharing the data.  But now I am well and truly
confused, because I still can't reproduce your results.  I get

regression=# explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');
  QUERY PLAN


  Bitmap Heap Scan on search  (cost=76.00..80.02 rows=1 width=72)
(actual time=8.119..8.119 rows=1 loops=1)
Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST &
N:*'::text))
->  Bitmap Index Scan on search_key  (cost=0.00..76.00 rows=1
width=0) (actual time=8.113..8.113 rows=1 loops=1)
  Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST &
N:*'::text))
  Total runtime: 8.210 ms

This is on 9.3 branch tip, not 9.3.9 which I don't have installed;
but I see no bug fixes related to GIN estimation in the commit logs
since 9.3.9.

Are you using any nondefault planner settings?  Anything else
unusual about your installation?

regards, tom lane



There are others, but I'll bet its:

random_page_cost = 1



Humm, nope.  I removed the config option, restart PG, then analyzed the 
search table:


# show random_page_cost ;
 random_page_cost
--
 4

# analyze search;

And it still wont use the index.  I'll tool around a little more and see 
if I can find something.


Thanks much for all your help on this.

-Andy





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 2:18 PM, Tom Lane wrote:

Andy Colson  writes:

On 12/30/2015 1:55 PM, Tom Lane wrote:

Are you using any nondefault planner settings?  Anything else
unusual about your installation?



There are others, but I'll bet its:
random_page_cost = 1


Nope...

Maybe something weird about the build you're using?  What does
pg_config print?

regards, tom lane



BINDIR = /usr/local/pg93/bin
DOCDIR = /usr/local/pg93/share/doc/postgresql
HTMLDIR = /usr/local/pg93/share/doc/postgresql
INCLUDEDIR = /usr/local/pg93/include
PKGINCLUDEDIR = /usr/local/pg93/include/postgresql
INCLUDEDIR-SERVER = /usr/local/pg93/include/postgresql/server
LIBDIR = /usr/local/pg93/lib
PKGLIBDIR = /usr/local/pg93/lib/postgresql
LOCALEDIR = /usr/local/pg93/share/locale
MANDIR = /usr/local/pg93/share/man
SHAREDIR = /usr/local/pg93/share/postgresql
SYSCONFDIR = /usr/local/pg93/etc/postgresql
PGXS = /usr/local/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/pg93' '--with-perl' 
'--enable-thread-safety' '--build=x86_64-slackware-linux' 
'build_alias=x86_64-slackware-linux' 'CFLAGS=-O2 -fPIC 
-DLINUX_OOM_SCORE_ADJ=0'

CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -fPIC -DLINUX_OOM_SCORE_ADJ=0 -Wall -Wmissing-prototypes 
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing 
-fwrapv -fexcess-precision=standard

CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed 
-Wl,-rpath,'/usr/local/pg93/lib',--enable-new-dtags

LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lz -lreadline -ltermcap -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.9




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> On 12/30/2015 2:18 PM, Tom Lane wrote:
>> Maybe something weird about the build you're using?  What does
>> pg_config print?

> [ output ]

No smoking gun there either.

It might be worthwhile to update to 9.3.10, just in case there is
something wonky about this particular build you've got.  But I'm
starting to get the feeling that you may not get an answer short
of tracing through gincostestimate to see where it's going nuts.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 2:33 PM, Tom Lane wrote:

Andy Colson  writes:

On 12/30/2015 2:18 PM, Tom Lane wrote:

Maybe something weird about the build you're using?  What does
pg_config print?



[ output ]


No smoking gun there either.

It might be worthwhile to update to 9.3.10, just in case there is
something wonky about this particular build you've got.  But I'm
starting to get the feeling that you may not get an answer short
of tracing through gincostestimate to see where it's going nuts.

regards, tom lane



The entire database is 78Gig, would you expect a "vacuum analyze" to fix 
it?  I never run it.


Cuz I started one, and its still going, but at this point right now it's 
preferring indexed scans. So it seems fixed.


I'd ran: analyze jasperia.search

many times, before and after I'd emailed the list. I've rebuilt the 
search table several times over, but never vacuumed it.



explain analyze
select *
from jasperia.search
where search_vec @@ to_tsquery_partial('213 E 13 ST N')

   QUERY PLAN 


-
 Bitmap Heap Scan on search  (cost=76.01..80.03 rows=1 width=73) 
(actual time=62.803..62.804 rows=1 loops=1)
   Recheck Cond: (search_vec @@ 
to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) || 
':*'::text)))
   ->  Bitmap Index Scan on search_key  (cost=0.00..76.01 rows=1 
width=0) (actual time=62.797..62.797 rows=1 loops=1)
 Index Cond: (search_vec @@ 
to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) || 
':*'::text)))

 Total runtime: 62.869 ms



(* The vacuum analyze is still running *)





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 2:39 PM, Andy Colson wrote:

On 12/30/2015 2:33 PM, Tom Lane wrote:

Andy Colson  writes:

On 12/30/2015 2:18 PM, Tom Lane wrote:

Maybe something weird about the build you're using?  What does
pg_config print?



[ output ]


No smoking gun there either.

It might be worthwhile to update to 9.3.10, just in case there is
something wonky about this particular build you've got.  But I'm
starting to get the feeling that you may not get an answer short
of tracing through gincostestimate to see where it's going nuts.

regards, tom lane



The entire database is 78Gig, would you expect a "vacuum analyze" to fix
it?  I never run it.



Ok, I can reproduce this now.  The full vacuum analyze isn't needed.

If I drop and recreate the table it goes back to preferring table scan. 
 I can "analyze search" and it still table scans.


But once I "vacuum analyze search", then it starts index scanning.


-Andy




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/30/2015 1:07 PM, Tom Lane wrote:

I wrote:

This says there's only about a 25% runtime penalty for the partial match,
at least on your example, compared to the planner's estimate of 2700x
penalty :-(.  Definitely need to fix that.


I tried to reproduce this behavior with simple generated data, and could
not: the estimates seem to track the actual cost reasonably well.  So
it seems like you've got some weird data statistics that are causing a
misestimate.  Could we see the pg_stats row for that tsvector column?
Or maybe even the actual data?

regards, tom lane



The table exists in a schema named jasperia, I've been removing the name 
for simplicity.  The dump of the table is here:


http://camavision.com/dn/search.sql.bz2

Here is the pg_stats, I hope word wrap doesn't mess this up too bad:

 schemaname | tablename |  attname   | inherited | null_frac | 
avg_width | n_distinct | 






















most_common_vals 





















| 








   most_common_freqs 







  | 















   histogram_bounds 














   | correlation | 











































most_common_elems 










































  | 






































































  most_common_elem_freqs 







































































 | elem_count_histogram
+---++---+---+---++--
--
--

Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Joe Conway
On 12/30/2015 11:09 AM, Cory Tucker wrote:
> We have a performance problem accessing one of our tables, I think
> because the statistics are out of date.  The table is fairly large, on
> the order of 100M rows or so. 

> The fix I have employed to restore the speed of the query after I notice
> it is happening is to manually issue a VACUUM ANALYZE on the table. 
> After the analyze is done, the query returns to its normal speed.

>  autovacuum_analyze_scale_factor | 0.05  |
>  autovacuum_analyze_threshold| 50|
>  autovacuum_vacuum_scale_factor  | 0.1   |
>  autovacuum_vacuum_threshold | 50|

With this scenario you can expect an autoanalyze every 5 million rows
and autovacuum every 10 million. In my experience (and based on your
description, yours as well) this is not often enough. Not only that,
when it does run it runs longer than you would like, causing an I/O hit
while it does.

You probably should tune this table specifically, e.g.

ALTER TABLE foo SET (autovacuum_vacuum_threshold=10,
 autovacuum_vacuum_scale_factor=0);
ALTER TABLE foo SET (autovacuum_analyze_threshold=10,
 autovacuum_analyze_scale_factor=0);

That will cause autovac and autoanalyze to run every 100k records
changed (pick your own number here, but I have used this very
successfully in the past). This way not only will the table remain well
vacuum analyzed, when they run they will finish quickly and have minimal
impact.

HTH,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> Ok, I can reproduce this now.  The full vacuum analyze isn't needed.
> If I drop and recreate the table it goes back to preferring table scan. 
>   I can "analyze search" and it still table scans.
> But once I "vacuum analyze search", then it starts index scanning.

Hah.  You didn't say what your table recreation process is, but now I bet
it involves create the table, create the index, *then* fill the table.
I was just running the dump script, which creates the index last.
If I do it the other way then I get insane estimates.

Tracing through that, when gincostestimate looks at the GIN index's
metapage stats, it sees this:

(gdb) p *metadata
$1 = {head = 2, tail = 136, tailFreeSize = 3272, nPendingPages = 135, 
  nPendingHeapTuples = 33424, nTotalPages = 2, nEntryPages = 1, 
  nDataPages = 0, nEntries = 0, ginVersion = 2}

ie, the page counts are as of the time of index creation not current.
The insanity must come from trying to scale these up to the current index
size and getting silly results.  In particular, it's still gonna end up
with numDataPages equal to zero, which I bet is bad news ...

After VACUUM I see

(gdb) p *metadata
$2 = {head = 4294967295, tail = 4294967295, tailFreeSize = 0, 
  nPendingPages = 0, nPendingHeapTuples = 0, nTotalPages = 685, 
  nEntryPages = 410, nDataPages = 16, nEntries = 44125, ginVersion = 2}

and the cost estimate is far saner.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Tom Lane
Andy Colson  writes:
> ->  Bitmap Index Scan on search_key  (cost=0.00..6.00 rows=1 width=0) 
> (actual time=0.025..0.025 rows=0 loops=1)
>   Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
> N'::text))

Hmm ... so the partial case actually is significantly more expensive than
the non-partial case: 4 msec vs .025 msec.  Still, that's about a 200x
penalty, not the 1x penalty the planner is ascribing to it.

Thanks for the data!  I'll go moan about this on -hackers.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transfer db from one port to another

2015-12-30 Thread Andreas Kretschmer
Killian Driscoll  wrote:

> It worked - thank you very much for your time.

Great! 


> Regarding the file format used: I had used the pg_dump with .sql, but you
> suggested .out. Is there a particular reason to use .out instead of .sql when
> backing up?

No, doesn't matter.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transfer db from one port to another

2015-12-30 Thread Adrian Klaver

On 12/29/2015 11:38 PM, Killian Driscoll wrote:

On 24 December 2015 at 18:33, Adrian Klaver > wrote:






It worked - thank you very much for your time.

Regarding the file format used: I had used the pg_dump with .sql, but
you suggested .out. Is there a particular reason to use .out instead of
.sql when backing up?


I do that to distinguish between plain text dumps(*.sql) and custom 
format dumps(*.out) for myself. Postgres itself does not care about the 
extension, or if there is an extension at all.






--
Adrian Klaver
adrian.kla...@aklaver.com 





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/29/2015 6:03 PM, Jim Nasby wrote:


If I'm reading EXPLAIN ANALYZE correctly, to_tsquery_partial is being
simplified out of the query entirely:

Filter: (search_vec @@
to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) ||
':*'::text)))

Part of this could well be that you're not feeding the same data to
to_tsquery. Your hard-coded example is

where search_vec @@ to_tsquery('213 & E & 13 & ST & N');

but your second query becomes '213 & E & 13 & ST & N:*'. Have you tried
that as a hard-coded value?


Ahh!  Yep, that was the missing link.  Jeez, I can't believe I couldn't 
find it.  The :* is for matching partials, its even IN the name 
to_tsquery_partial.


Indeed, this does not use an index:

explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*')

Thank you!

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] cannot get stable function to use index

2015-12-30 Thread Andy Colson

On 12/29/2015 6:35 PM, Tom Lane wrote:

Andy Colson  writes:

I cannot get this sql to use the index:



explain analyze
select *
from search
where search_vec @@ to_tsquery_partial('213 E 13 ST N')



--
Seq Scan on search  (cost=0.00..2526.56 rows=1 width=69) (actual
time=68.033..677.490 rows=1 loops=1)
 Filter: (search_vec @@
to_tsquery((array_to_string('{213,E,13,ST,N}'::text[], ' & '::text) ||
':*'::text)))
 Rows Removed by Filter: 76427
   Total runtime: 677.548 ms
(4 rows)


If you force it with enable_seqscan = off, you'll soon see that it's
capable of picking the indexscan plan, but it doesn't want to because it
estimates that the cost will be much higher, which seems to be a
consequence of the ":*" in the query.  (Even though the functions involved
are only stable, the planner is capable of seeing through them to look at
the pattern that will be fed to the GIN index search.)  You get the same
results if you use the resulting tsquery without any function at all.
For example (with dummy data), I get

regression=# explain select * from search
where search_vec @@ '213 & e & 13 & st & n:*'::tsquery;
 QUERY PLAN
--
  Seq Scan on search  (cost=0.00..3774.01 rows=1 width=21)
Filter: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'':*'::tsquery)
(2 rows)

regression=# set enable_seqscan TO 0;
SET
regression=# explain select * from search
where search_vec @@ '213 & e & 13 & st & n:*'::tsquery;
  QUERY PLAN

  Bitmap Heap Scan on search  (cost=10.00..104448.01 rows=1 width=21)
Recheck Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'':*'::tsquery)
->  Bitmap Index Scan on search_key  (cost=0.00..10.00 rows=1 width=0)
  Index Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'':*'::tsquery)
(4 rows)

but for comparison, with a pattern without ':*', I get

regression=# explain select * from search
where search_vec @@ '213 & e & 13 & st & n'::tsquery;
 QUERY PLAN
--
  Bitmap Heap Scan on search  (cost=44.00..48.01 rows=1 width=21)
Recheck Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'''::tsquery)
->  Bitmap Index Scan on search_key  (cost=0.00..44.00 rows=1 width=0)
  Index Cond: (search_vec @@ '''213'' & ''e'' & ''13'' & ''st'' & 
''n'''::tsquery)
(4 rows)

I'm inclined to think this is a bug in the estimator; it seems to be
charging for many more "entry page" fetches than there are pages in
the index.  But maybe it's right and there will be lots of repeated
work involved.  It would be interesting to see EXPLAIN ANALYZE results
from your data for these examples.

regards, tom lane




Here are my results, if there are any others you'd like to see please 
let me know.  Thanks Tom.



# explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');

  QUERY PLAN
---
 Seq Scan on search  (cost=0.00..2144.42 rows=1 width=69) (actual 
time=30.584..361.147 rows=1 loops=1)

   Filter: (search_vec @@ to_tsquery('213 & E & 13 & ST & N:*'::text))
   Rows Removed by Filter: 76427
 Total runtime: 361.181 ms
(4 rows)

Time: 363.012 ms



# set enable_seqscan TO 0;
SET
Time: 0.185 ms

# explain analyze
select *
from search
where search_vec @@ to_tsquery('213 & E & 13 & ST & N:*');
  QUERY PLAN
---
 Bitmap Heap Scan on search  (cost=63716.00..63717.02 rows=1 width=69) 
 (actual time=4.354..4.355 rows=1 loops=1)
   Recheck Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N:*'::text))
   ->  Bitmap Index Scan on search_key  (cost=0.00..63716.00 rows=1 
width=0) (actual time=4.351..4.351 rows=1 loops=1)
 Index Cond: (search_vec @@ to_tsquery('213 & E & 13 & ST & 
N:*'::text))

 Total runtime: 4.370 ms
(5 rows)

Time: 4.794 ms





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general