Re: [HACKERS] char() overhead on read-only workloads not so insignifcant as the docs claim it is...

2009-06-15 Thread Gurjeet Singh
Comments?

On Sat, Jun 13, 2009 at 3:44 PM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:

 I'm currently doing some benchmarking on a Nehalem box(
 http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html)
 with 8.4 and while investigating what looks like issues in pgbench I also
 noticed that using char() has more than a negligable overhead on some (very
 special) readonly(!) workloads.

 for example running sysbench in read-only mode against 8.4 results in a
 profile(for the full run) that looks similiar to:

 samples  %symbol name
 981690   11.0656  bcTruelen
 3591834.0487  index_getnext
 3111283.5070  AllocSetAlloc
 2723303.0697  hash_search_with_hash_value
 2581572.9099  LWLockAcquire
 1956732.2056  _bt_compare
 1903032.1451  slot_deform_tuple
 1681011.8948  PostgresMain
 1641911.8508  _bt_checkkeys
 1261101.4215  FunctionCall2
 1239651.3973  SearchCatCache
 1206291.3597  LWLockRelease

 the default sysbench mode actually uses a number of different queries and
 the ones dealing with char() are actually only a small part of the full set
 of queries sent.
 The specific query is causing bcTruelen to show up in the profile is:

 SELECT c from sbtest where id between $1 and $2 order by c where the
 parameters are for example
 $1 = '5009559', $2 = '5009658' - ie ranges of 100.


 benchmarking only that query results in:

 samples  %symbol name
 2148182  23.5861  bcTruelen
 3694634.0565  index_getnext
 3627843.9832  AllocSetAlloc
 2841983.1204  slot_deform_tuple
 1852792.0343  _bt_checkkeys
 1801191.9776  LWLockAcquire
 1727331.8965  appendBinaryStringInfo
 1441581.5828  internal_putbytes
 1410401.5486  AllocSetFree
 1380931.5162  printtup
 1242551.3643  hash_search_with_hash_value
 1170541.2852  heap_form_minimal_tuple

 at around 46000 queries/s

 changing the fault sysbench schema from:

 Table public.sbtest
  Column |  Type  |  Modifiers

 ++-
  id | integer| not null default
 nextval('sbtest_id_seq'::regclass)
  k  | integer| not null default 0
  c  | character(120) | not null default ''::bpchar
  pad| character(60)  | not null default ''::bpchar
 Indexes:
sbtest_pkey PRIMARY KEY, btree (id)
k btree (k)


 to
  Table public.sbtest
  Column |   Type|  Modifiers

 +---+-
  id | integer   | not null default
 nextval('sbtest_id_seq'::regclass)
  k  | integer   | not null default 0
  c  | character varying | not null default ''::character varying
  pad| character(60) | not null default ''::bpchar
 Indexes:
sbtest_pkey PRIMARY KEY, btree (id)
k btree (k)

 results in a near 50%(!) speedup in terms of tps to around 67000 queries/s.
 This is however an extreme case because the c column actually contains no
 data at all (except for an empty string).

 the profile for the changed testcase looks like:
 4307975.  index_getnext
 3967504.8095  AllocSetAlloc
 3455084.1883  slot_deform_tuple
 2282222.7666  appendBinaryStringInfo
 2277662.7610  _bt_checkkeys
 1938182.3495  LWLockAcquire
 1799252.1811  internal_putbytes
 1688712.0471  printtup
 1520261.8429  AllocSetFree
 1463331.7739  heap_form_minimal_tuple
 1443051.7493  FunctionCall2
 1283201.  hash_search_with_hash_value


 at the very least we should reconsider this part of our docs:

   There is no performance difference between these three types, apart from
 increased storage space when using the blank-padded type, and a few extra
 CPU cycles to check the length when storing into a length-constrained
 column.

 from http://www.postgresql.org/docs/8.4/static/datatype-character.html



 regards

 Stefan

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Lets call it Postgres

EnterpriseDB  http://www.enterprisedb.com

gurjeet[.sin...@enterprisedb.com
singh.gurj...@{ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device


Re: [HACKERS] char() overhead on read-only workloads not so insignifcant as the docs claim it is...

2009-06-15 Thread Alvaro Herrera
 On Sat, Jun 13, 2009 at 3:44 PM, Stefan Kaltenbrunner
 ste...@kaltenbrunner.cc wrote:

  The specific query is causing bcTruelen to show up in the profile is:
 
  SELECT c from sbtest where id between $1 and $2 order by c where the
  parameters are for example
  $1 = '5009559', $2 = '5009658' - ie ranges of 100.
 
 
  benchmarking only that query results in:
 
  samples  %symbol name
  2148182  23.5861  bcTruelen
  3694634.0565  index_getnext
  3627843.9832  AllocSetAlloc

Gurjeet Singh escribió:
 Comments?

Maybe bcTruelen could be optimized to step on one word at a time
(perhaps by using XOR against a precomputed word filled with ' '),
instead of one byte at a time ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] char() overhead on read-only workloads not so insignifcant as the docs claim it is...

2009-06-15 Thread Jeremy Kerr
Alvaro,

 Maybe bcTruelen could be optimized to step on one word at a time
 (perhaps by using XOR against a precomputed word filled with ' '),
 instead of one byte at a time ...

I have a patch for this, will send soon.

Regards,


Jeremy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] char() overhead on read-only workloads not so insignifcant as the docs claim it is...

2009-06-13 Thread Stefan Kaltenbrunner
I'm currently doing some benchmarking on a Nehalem 
box(http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html) 
with 8.4 and while investigating what looks like issues in pgbench I 
also noticed that using char() has more than a negligable overhead on 
some (very special) readonly(!) workloads.


for example running sysbench in read-only mode against 8.4 results in a 
profile(for the full run) that looks similiar to:


samples  %symbol name
981690   11.0656  bcTruelen
3591834.0487  index_getnext
3111283.5070  AllocSetAlloc
2723303.0697  hash_search_with_hash_value
2581572.9099  LWLockAcquire
1956732.2056  _bt_compare
1903032.1451  slot_deform_tuple
1681011.8948  PostgresMain
1641911.8508  _bt_checkkeys
1261101.4215  FunctionCall2
1239651.3973  SearchCatCache
1206291.3597  LWLockRelease

the default sysbench mode actually uses a number of different queries 
and the ones dealing with char() are actually only a small part of the 
full set of queries sent.

The specific query is causing bcTruelen to show up in the profile is:

SELECT c from sbtest where id between $1 and $2 order by c where the 
parameters are for example

$1 = '5009559', $2 = '5009658' - ie ranges of 100.


benchmarking only that query results in:

samples  %symbol name
2148182  23.5861  bcTruelen
3694634.0565  index_getnext
3627843.9832  AllocSetAlloc
2841983.1204  slot_deform_tuple
1852792.0343  _bt_checkkeys
1801191.9776  LWLockAcquire
1727331.8965  appendBinaryStringInfo
1441581.5828  internal_putbytes
1410401.5486  AllocSetFree
1380931.5162  printtup
1242551.3643  hash_search_with_hash_value
1170541.2852  heap_form_minimal_tuple

at around 46000 queries/s

changing the fault sysbench schema from:

 Table public.sbtest
 Column |  Type  |  Modifiers 


++-
 id | integer| not null default 
nextval('sbtest_id_seq'::regclass)

 k  | integer| not null default 0
 c  | character(120) | not null default ''::bpchar
 pad| character(60)  | not null default ''::bpchar
Indexes:
sbtest_pkey PRIMARY KEY, btree (id)
k btree (k)


to
  Table public.sbtest
 Column |   Type|  Modifiers 


+---+-
 id | integer   | not null default 
nextval('sbtest_id_seq'::regclass)

 k  | integer   | not null default 0
 c  | character varying | not null default ''::character varying
 pad| character(60) | not null default ''::bpchar
Indexes:
sbtest_pkey PRIMARY KEY, btree (id)
k btree (k)

results in a near 50%(!) speedup in terms of tps to around 67000 
queries/s. This is however an extreme case because the c column actually 
contains no data at all (except for an empty string).


the profile for the changed testcase looks like:
4307975.  index_getnext
3967504.8095  AllocSetAlloc
3455084.1883  slot_deform_tuple
2282222.7666  appendBinaryStringInfo
2277662.7610  _bt_checkkeys
1938182.3495  LWLockAcquire
1799252.1811  internal_putbytes
1688712.0471  printtup
1520261.8429  AllocSetFree
1463331.7739  heap_form_minimal_tuple
1443051.7493  FunctionCall2
1283201.  hash_search_with_hash_value


at the very least we should reconsider this part of our docs:

  There is no performance difference between these three types, apart 
from increased storage space when using the blank-padded type, and a few 
extra CPU cycles to check the length when storing into a 
length-constrained column.


from http://www.postgresql.org/docs/8.4/static/datatype-character.html



regards

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers