date:20031008

Re: [PERFORM] Compare rows

2003-10-08 Thread Josh Berkus

Chris,

> Some time in the late '80s, probably '88 or '89, there was a paper
> presented in Communications of the ACM that proposed using this sort
> of "hypernormalized" schema as a way of having _really_ narrow schemas
> that would be exceedingly expressive.  They illustrated an example of

> The entertaining claim was that they felt they could model the
> complexities of the operations of any sort of company using not more
> than 50 tables.  It seemed somewhat interesting, at the time; it truly
> resonated as Really Interesting when I saw SAP R/3, with its bloat of
> 1500-odd tables.

One can always take things too far.   Trying to make everying 100% dynamic so 
that you can cram your whole database into 4 tables is going too far; so is 
the kind of bloat that produces systems like SAP, which is more based on 
legacy than design (I analyzed a large commercial billing system once and was 
startled to discover that 1/4 of its 400 tables and almost half of the 40,000 
collective columns were not used and present only for backward 
compatibility).

The usefulness of the "vertical values child table" which I suggest is largely 
dependant on the number of values not represented.   In Greg's case, fully 
75% of the fields in his huge table are NULL; this is incredibly inefficient, 
the more so when you consider his task of calling each field by name in each 
query.

The "vertical values child table" is also ideal for User Defined Fields or any 
other form of user-configurable add-on data which will be NULL more often 
than not.

This is an old SQL concept, though; I'm sure it has an official name 
somewhere.

> The need to do a lot of joins would likely hurt performance somewhat,
> as well as the way that it greatly increases the number of rows.
> Although you could always split it into several tables, one for each
> "value_type", and UNION them into a view...

It increases the number of rows, yes, but *decreases* the storage size of data 
by eliminating thousands ... or millions ... of NULL fields.   How would 
splitting the vertical values into dozens of seperate tables help things?

Personally, I'd rather have a table with 3 columns and 8 million rows than a 
table with 642 columns and 100,000 rows.  Much easier to deal with.

And we are also assuming that Greg seldom needs to see all of the fields at 
once.   I'm pretty sure of this; if he did, he'd have run into the "wide row" 
bug in 7.3 and would be complaining about it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Tom Lane


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Bruce Momjian

Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > On Wed, 2003-10-08 at 14:31, Bruce Momjian wrote:
> >> Well, this is really embarassing.  I can't imagine why we would not set
> >> at least -O on all platforms.
> 
> I believe that autoconf will automatically select -O2 (when CFLAGS isn't
> already set) *if* it's chosen gcc.  It won't select anything for vendor
> ccs.
> 
> > Can we get these optimizations enabled in time for the next 7.4 beta?
> 
> I think it's too late in the beta cycle to add optimization flags except
> for platforms we can get specific success results for.  (Solaris is
> probably okay for instance.)  The risk of breaking things seems too
> high.

OK, patch attached and applied.  It centralizes the optimization
defaults into configure.in, rather than having CFLAGS= in the template
files.

It used -O2 for gcc (generated automatically by autoconf), and -O for
non-gcc, unless the template overrides it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: configure
===
RCS file: /cvsroot/pgsql-server/configure,v
retrieving revision 1.302
diff -c -c -r1.302 configure
*** configure   3 Oct 2003 03:08:14 -   1.302
--- configure   9 Oct 2003 03:16:44 -
***
*** 2393,2398 
--- 2393,2402 
  if test "$ac_env_CFLAGS_set" = set; then
CFLAGS=$ac_env_CFLAGS_value
  fi
+ # configure sets CFLAGS to -O2 for gcc, so this is only for non-gcc
+ if test x"$CFLAGS" = x""; then
+   CFLAGS="-O"
+ fi
  if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
CFLAGS="$CFLAGS -g"
  fi
Index: configure.in
===
RCS file: /cvsroot/pgsql-server/configure.in,v
retrieving revision 1.293
diff -c -c -r1.293 configure.in
*** configure.in3 Oct 2003 03:08:14 -   1.293
--- configure.in9 Oct 2003 03:16:46 -
***
*** 238,243 
--- 238,247 
  if test "$ac_env_CFLAGS_set" = set; then
CFLAGS=$ac_env_CFLAGS_value
  fi
+ # configure sets CFLAGS to -O2 for gcc, so this is only for non-gcc
+ if test x"$CFLAGS" = x""; then
+   CFLAGS="-O"
+ fi
  if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
CFLAGS="$CFLAGS -g"
  fi
Index: src/template/beos
===
RCS file: /cvsroot/pgsql-server/src/template/beos,v
retrieving revision 1.6
diff -c -c -r1.6 beos
*** src/template/beos   21 Oct 2000 22:36:13 -  1.6
--- src/template/beos   9 Oct 2003 03:16:51 -
***
*** 1 
- CFLAGS='-O2'
--- 0 
Index: src/template/bsdi
===
RCS file: /cvsroot/pgsql-server/src/template/bsdi,v
retrieving revision 1.16
diff -c -c -r1.16 bsdi
*** src/template/bsdi   27 Sep 2003 16:24:44 -  1.16
--- src/template/bsdi   9 Oct 2003 03:16:51 -
***
*** 5,13 
  esac
  
  case $host_os in
!   bsdi2.0 | bsdi2.1 | bsdi3*)
! CC=gcc2
! ;;
  esac
  
  THREAD_SUPPORT=yes
--- 5,11 
  esac
  
  case $host_os in
!   bsdi2.0 | bsdi2.1 | bsdi3*) CC=gcc2;;
  esac
  
  THREAD_SUPPORT=yes
Index: src/template/cygwin
===
RCS file: /cvsroot/pgsql-server/src/template/cygwin,v
retrieving revision 1.2
diff -c -c -r1.2 cygwin
*** src/template/cygwin 9 Oct 2003 02:37:09 -   1.2
--- src/template/cygwin 9 Oct 2003 03:16:51 -
***
*** 1,2 
- CFLAGS='-O2'
  SRCH_LIB='/usr/local/lib'
--- 1 
Index: src/template/dgux
===
RCS file: /cvsroot/pgsql-server/src/template/dgux,v
retrieving revision 1.10
diff -c -c -r1.10 dgux
*** src/template/dgux   21 Oct 2000 22:36:13 -  1.10
--- src/template/dgux   9 Oct 2003 03:16:51 -
***
*** 1 
- CFLAGS=
--- 0 
Index: src/template/freebsd
===
RCS file: /cvsroot/pgsql-server/src/template/freebsd,v
retrieving revision 1.23
diff -c -c -r1.23 freebsd
*** src/template/freebsd27 Sep 2003 16:24:44 -  1.23
--- src/template/freebsd9 Oct 2003 03:16:51 -
***
*** 1,17 
- CFLAGS='-pipe'
- 
  case $host_cpu in
!   alpha*)   CFLAGS="$CFLAGS -O" ;;
  esac
  
  THREAD_SUPPORT=yes
  NEED_REENTRANT_FUNCS=yes
  THREAD_CPPFLAGS="-D_THREAD_SAFE"
  case $host_os in
!   freebsd2*|freebsd3*|freebsd4*)
!   THREAD_LIBS="-pthread"
!   ;;
!   *)
!   THREAD_LIBS="-lc_r"
!   ;;
  esac
--- 1,11 
  case $host_cpu in
!   alpha*)   CFLAGS="-

Re: [PERFORM] Compare rows

2003-10-08 Thread Christopher Browne

In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Josh Berkus) 
transmitted:
> child table
> idvalue_type  value
> 3 uptime  0.3
> 3 speed   11.2
> 3 memory  37
> 3 tty 6
> 7 uptime  1.1
> 7 memory  15
> 9 uptime  0.1
> 9 memory  94
> 9 tty 2
>
> As you can see, the NULLs are not stored, making this system much more 
> efficient on storage space.

Wow, that takes me back to a paper I have been looking for for
_years_.

Some time in the late '80s, probably '88 or '89, there was a paper
presented in Communications of the ACM that proposed using this sort
of "hypernormalized" schema as a way of having _really_ narrow schemas
that would be exceedingly expressive.  They illustrated an example of
an address table that could hold full addresses with a schema with
only about half a dozen columns, the idea being that you'd have
several rows linked together.

The methodology was _heavy_ on metadata, though not so much so that
there were no columns left over for "real" data.

The entertaining claim was that they felt they could model the
complexities of the operations of any sort of company using not more
than 50 tables.  It seemed somewhat interesting, at the time; it truly
resonated as Really Interesting when I saw SAP R/3, with its bloat of
1500-odd tables.

(I seem to remember the authors being Boston-based, and they indicated
that they had implemented this "on VMS," which would more than likely
imply RDB; somehow I doubt that'll be the set of detail that makes
someone remember it...)

The need to do a lot of joins would likely hurt performance somewhat,
as well as the way that it greatly increases the number of rows.
Although you could always split it into several tables, one for each
"value_type", and UNION them into a view...
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://cbbrowne.com/info/unix.html
You shouldn't anthropomorphize computers; they don't like it.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Christopher Kings-Lynne

Well, this is really embarassing.  I can't imagine why we would not set
at least -O on all platforms.  Looking at the template files, I see
these have no optimization set:


	freebsd (non-alpha)


I'm wondering what that had in mind:

http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/template/freebsd.diff?r1=1.10&r2=1.11
When I used to build pgsql on freebsd/alpha, I would get heaps of GCC 
warnings saying 'optimisations for the alpha are broken'.  I can't 
remember if that meant anything more than just -O or not though.

Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Bruce Momjian

Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > Well, this is really embarassing.  I can't imagine why we would not set
> > at least -O on all platforms.  Looking at the template files, I see
> > these have no optimization set:
> 
> > freebsd (non-alpha)
> 
> I'm wondering what that had in mind:
> 
> http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/template/freebsd.diff?r1=1.10&r2=1.11

I was wondering that myself.  I think the idea was that we already do
-O2 in configure if it is gcc, so why do it in the template files.  What
is killing us is the CFLAGS= lines in the configuration files.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Bruce Momjian

Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > On Wed, 2003-10-08 at 14:31, Bruce Momjian wrote:
> >> Well, this is really embarassing.  I can't imagine why we would not set
> >> at least -O on all platforms.
> 
> I believe that autoconf will automatically select -O2 (when CFLAGS isn't
> already set) *if* it's chosen gcc.  It won't select anything for vendor
> ccs.

I think the problem is that template/solaris overrides that with:

  CFLAGS=

> > Can we get these optimizations enabled in time for the next 7.4 beta?
> 
> I think it's too late in the beta cycle to add optimization flags except
> for platforms we can get specific success results for.  (Solaris is
> probably okay for instance.)  The risk of breaking things seems too
> high.

Agreed.  Do we set them all to -O2, then remove it from the ones we
don't get successful reports on?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Tom Lane

Neil Conway <[EMAIL PROTECTED]> writes:
> On Wed, 2003-10-08 at 14:31, Bruce Momjian wrote:
>> Well, this is really embarassing.  I can't imagine why we would not set
>> at least -O on all platforms.

I believe that autoconf will automatically select -O2 (when CFLAGS isn't
already set) *if* it's chosen gcc.  It won't select anything for vendor
ccs.

> Can we get these optimizations enabled in time for the next 7.4 beta?

I think it's too late in the beta cycle to add optimization flags except
for platforms we can get specific success results for.  (Solaris is
probably okay for instance.)  The risk of breaking things seems too
high.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Presentation

2003-10-08 Thread Tom Lane

Jeff <[EMAIL PROTECTED]> writes:
> On Wed, 8 Oct 2003, Neil Conway wrote:
>> Slide 37: as far as I know, reordering of outer joins is not implemented
>> in 7.4

> Huh. I could have sworn Tom did something like that.

Not yet.  7.4 can reorder *inner* joins that happen to be written
with JOIN syntax.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Compare rows

2003-10-08 Thread Josh Berkus

Greg,

> You lost me on that one.  What's a "vertical child table"?

Currently, you store data like this:

id  address uptime  speed   memory  tty
3   67.92   0.3 11.237  6
7   69.51.1 NULL15  NULL
9   65.50.1 NULL94  2

The most efficient way for you to store data would be like this:

main table
id  address
3   67.92
7   69.5
9   65.5

child table
id  value_type  value
3   uptime  0.3
3   speed   11.2
3   memory  37
3   tty 6
7   uptime  1.1
7   memory  15
9   uptime  0.1
9   memory  94
9   tty 2

As you can see, the NULLs are not stored, making this system much more 
efficient on storage space.

Tommorrow I'll (hopefully) write up how to query this for comparisons.   It 
would help if you gave a little more details about what specific comparison 
you're doing, e.g. between tables or table to value, comparing just the last 
value or all rows, etc.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] IMMUTABLE function's flag do not work: 7.3.4, plpgsql

2003-10-08 Thread Tom Lane

Andriy Tkachuk <[EMAIL PROTECTED]> writes:
> At second. calc_total() is immutable function:
> but it seems that it's not cached in one session:

It's not supposed to be.

The reason the "runtime" is small in your example is that the planner
executes the function call while preparing the plan, and this isn't
counted in EXPLAIN's runtime measurement.  There's no claim anywhere
that the results of such an evaluation would be saved for other plans.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Presentation

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 15:38, Jeff wrote:
> Huh. I could have sworn Tom did something like that.
> Perhaps I am thinking of something else.
> You had to enable some magic GUC.

Perhaps you're thinking of the new GUC var join_collapse_limit, which is
related, but doesn't effect the reordering of outer joins.

-Neil



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] PostgreSQL vs. MySQL

2003-10-08 Thread Vivek Khera

> "JB" == Josh Berkus <[EMAIL PROTECTED]> writes:

JB> Hmmm ... both, I think.   The Install Docs should have:

JB> "Here are the top # things you will want to adjust in your PostgreSQL.conf:
JB> 1) Shared_buffers 
JB> 2) Sort_mem 
JB> 3) effective_cache_size 
JB> 4) random_page_cost 
JB> 5) Fsync 
JB> etc."

Add:

max_fsm_relations (perhaps it is ok with current default)
max_fsm_pages

I don't think you really want to diddle with fsync in the name of
speed at the cost of safety.

and possibly:

checkpoint_segments (if you do a lot of writes to the DB for extended
durations of time)  With 7.4 it warns you in the
logs if you should increase this.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Compare rows

2003-10-08 Thread Dror Matalon

Greg,

On Wed, Oct 08, 2003 at 03:07:30PM -0400, Greg Spiegelberg wrote:
> Dror,
> 
> I gave this some serious thought at first.  I only deal with
> int8, numeric(24,12) and varchar(32) columns which I could
> reduce to 3 different tables.  Problem was going from 1700-3000

I'm not sure how the data types come into play here. I was for the most
part following your examples.

> rows to around 300,000-1,000,000 rows per system per day that
> is sending data to our database.
> 

Depending on the distribution of your data you can end up with more,
less or roughly the same amount of data in the end. It all depends on
how many of the 600+ columns change every time you insert a row. If only
a few of them do, then you'll clearly end up with less total data, since
you'll be writing several rows that are very short instead of one
huge row that contains all the information. In other words, you're
tracking changes better.

It also sounds like you feel that having a few thousand rows in a very
"wide" table is better than having 300,000 - 1,00,000 rows in a "narrow"
table. My gut feeling is that it's the other way around, but there are
plenty of people on this list who can provide a more informed answer.

Using the above eample, assuming that both tables roughly have the same
number of pages in them, would postgres deal better with a table with
3-4 columns with 300,000 - 1,000,000 rows or with a table with several
hundred columns with only 3000 or so rows?

Regards,

Dror

> BTW, the int8 and numeric(24,12) are for future expansion.
> I hate limits.
> 
> Greg
> 
> 
> Dror Matalon wrote:
> >It's still not quite clear what you're trying to do. Many people's gut
> >reaction is that you're doing something strange with so many columns in
> >a table.
> >
> >Using your example, a different approach might be to do this instead:
> >
> > Day  |  Name |   Value
> > --+-+---
> > Oct 1 | OS  | Solaris 5.8 
> > Oct 1 | Patch   | 108528-12
> > Oct 3 | Patch   | 108528-13
> >
> >
> >You end up with lots more rows, fewer columns, but it might be
> >harder to query the table. On the other hand, queries should run quite
> >fast, since it's a much more "normal" table.
> >
> >But without knowing more, and seeing what the other columns look like,
> >it's hard to tell.
> >
> >Dror
> 
> 
> -- 
> Greg Spiegelberg
>  Sr. Product Development Engineer
>  Cranel, Incorporated.
>  Phone: 614.318.4314
>  Fax:   614.431.8388
>  Email: [EMAIL PROTECTED]
> Cranel. Technology. Integrity. Focus.
> 
> 
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Presentation

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Neil Conway wrote:

> On Wed, 2003-10-08 at 11:02, Jeff wrote:
> > The boss cleared my de-company info-ified pg presentation.
>
> Slide 37: as far as I know, reordering of outer joins is not implemented
> in 7.4
>

Huh. I could have sworn Tom did something like that.
Perhaps I am thinking of something else.
You had to enable some magic GUC.

Maybe he did a test and it never made it in.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Compare rows

2003-10-08 Thread Jean-Luc Lachance

Here is what i think you can use:

One master table with out duplicates and one anciliary table with
duplicate for the day.
Insert the result of the select from the anciliary table into the master
table, truncate the anciliary table.


select distinct on ( {all the fields except day}) * from table order by
{all the fields except day}, day;

As in:

select distinct on ( OS, Patch) * from table order by OS, Patch, Day;

JLL

BTW, PG developper, since the distinct on list MUST be included in the
order by clause why not make it implicitly part of the order by clause?



Greg Spiegelberg wrote:
> 
> Joe Conway wrote:
> > Greg Spiegelberg wrote:
> >
> >> The reason for my initial question was this.  We save changes only.
> >> In other words, if system S has row T1 for day D1 and if on day D2
> >> we have another row T1 (excluding our time column) we don't want
> >> to save it.
> >
> >
> > It still isn't entirely clear to me what you are trying to do, but
> > perhaps some sort of calculated checksum or hash would work to determine
> > if the data has changed?
> 
> Best example I have is this.
> 
> You're running Solaris 5.8 with patch 108528-X and you're collecting
> that data daily.  Would you want option 1 or 2 below?
> 
> Option 1 - Store it all
>   Day  |  OS |   Patch
> --+-+---
> Oct 1 | Solaris 5.8 | 108528-12
> Oct 2 | Solaris 5.8 | 108528-12
> Oct 3 | Solaris 5.8 | 108528-13
> Oct 4 | Solaris 5.8 | 108528-13
> Oct 5 | Solaris 5.8 | 108528-13
> and so on...
> 
> To find what you're running:
> select * from table order by day desc limit 1;
> 
> To find when it last changed however takes a join.
> 
> Option 2 - Store only changes
>   Day  |  OS |   Patch
> --+-+---
> Oct 1 | Solaris 5.8 | 108528-12
> Oct 3 | Solaris 5.8 | 108528-13
> 
> To find what you're running:
> select * from table order by day desc limit 1;
> 
> To find when it last changed:
> select * from table order by day desc limit 1 offset 1;
> 
> I selected Option 2 because I'm dealing with mounds of complicated and
> varying data formats and didn't want to have to write complex queries
> for everything.
> 
> Greg
> 
> --
> Greg Spiegelberg
>   Sr. Product Development Engineer
>   Cranel, Incorporated.
>   Phone: 614.318.4314
>   Fax:   614.431.8388
>   Email: [EMAIL PROTECTED]
> Cranel. Technology. Integrity. Focus.
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Compare rows

2003-10-08 Thread Jason Hihn



> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Greg
> Spiegelberg
> Sent: Wednesday, October 08, 2003 3:11 PM
> To: PgSQL Performance ML
> Subject: Re: [PERFORM] Compare rows
> 
> 
> Josh Berkus wrote:
> > Greg,
> > 
> > 
> >>The data represents metrics at a point in time on a system for
> >>network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
> >>speed, and whatever else can be gathered.
> >>
> >>We arrived at this one 642 column table after testing the whole
> >>process from data gathering, methods of temporarily storing then
> >>loading to the database.  Initially, 37+ tables were in use but
> >>the one big-un has saved us over 3.4 minutes.
> > 
> > 
> > Hmmm ... if few of those columns are NULL, then you are 
> probably right ... 
> > this is probably the most normalized design.   If, however, 
> many of columns 
> > are NULL the majority of the time, then the design you should 
> be using is a 
> > vertial child table, of the form  ( value_type  | value ).   
> > 
> > Such a vertical child table would also make your comparison 
> between instances 
> > *much* easier, as it could be executed via a simple 
> 4-table-outer-join and 3 
> > where clauses.  So even if you don't have a lot of NULLs, you 
> probably want 
> > to consider this.
> 
> You lost me on that one.  What's a "vertical child table"?

Parent table Fkey | Option | Value
--++---
  | OS | Solaris
  | DISK1  | 30g
      ^^^-- values 
  fields are values in a column rather than 'fields'


> Statistically, about 6% of the rows use more than 200 of the columns,
> 27% of the rows use 80-199 or more columns, 45% of the rows use 40-79
> columns and the remaining 22% of the rows use 39 or less of the columns.
> That is a lot of NULLS.  Never gave that much thought.
> 
> To ensure query efficiency, hide the NULLs and simulate the multiple
> tables I have a boatload of indexes, ensure that every query makees use
> of an index, and have created 37 views.  It's worked pretty well so
> far
> 
> 
> >>The reason for my initial question was this.  We save changes only.
> >>In other words, if system S has row T1 for day D1 and if on day D2
> >>we have another row T1 (excluding our time column) we don't want
> >>to save it.
> > 
> > 
> > If re-designing the table per the above is not a possibility, 
> then I'd suggest 
> > that you locate 3-5 columns that:
> > 1) are not NULL for any row;
> > 2) combined, serve to identify a tiny subset of rows, i.e. 3% 
> or less of the 
> > table.
> 
> There are always, always, always 7 columns that contain data.
> 
> 
> > Then put a multi-column index on those columns, and do your 
> comparison.  
> > Hopefully the planner should pick up on the availablity of the 
> index and scan 
> > only the rows retrieved by the index.   However, there is the distinct 
> > possibility that the presence of 637 WHERE criteria will 
> confuse the planner, 
> > causing it to resort to a full table seq scan; in that case, 
> you will want to 
> > use a subselect to force the issue.
> 
> That's what I'm trying to avoid is a big WHERE (c1,c2,...,c637) <> 
> (d1,d2,...,d637) clause.  Ugly.
> 
> 
> > Or, as Joe Conway suggested, you could figure out some kind of 
> value hash that 
> > uniquely identifies your rows.
> 
> I've given that some though and though appealing I don't think I'd care
> to spend the CPU cycles to do it.  Best way I can figure to accomplish
> it would be to generate an MD5 on each row without the timestamp and
> store it in another column, create an index on the MD5 column, generate
> MD5 on each line I want to insert.  Makes for a simple WHERE...
> 
> Okay.  I'll give it a whirl.  What's one more column, right?
> 
> Greg
> 
> -- 
> Greg Spiegelberg
>   Sr. Product Development Engineer
>   Cranel, Incorporated.
>   Phone: 614.318.4314
>   Fax:   614.431.8388
>   Email: [EMAIL PROTECTED]
> Cranel. Technology. Integrity. Focus.
> 
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread pgsql-performance


In message <[EMAIL PROTECTED]>, Jeff writes:

I'll go run the regression test suite with my gcc -O2 pg and the suncc pg.
See if they pass the test.

My default set of gcc optimization flags is:

-O3 -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -falign-functions 
-mcpu=i686 -march=i686

Obviously the last two flags product CPU specific code, so would have
to differ...autoconf is always possible, but so is just lopping them off.

I have found these flags to produce faster code that a simple -O2, but
I understand the exact combination which is best for you is
code-dependent.  Of course, if you are getting really excited, you can
use -fbranch-probabilities, but as you will see if you investigate
that requires some profiling information, so is not very easy to
actually practically use.

-Seth Robertson

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Compare rows

2003-10-08 Thread Greg Spiegelberg

Josh Berkus wrote:
Greg,


The data represents metrics at a point in time on a system for
network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
speed, and whatever else can be gathered.
We arrived at this one 642 column table after testing the whole
process from data gathering, methods of temporarily storing then
loading to the database.  Initially, 37+ tables were in use but
the one big-un has saved us over 3.4 minutes.


Hmmm ... if few of those columns are NULL, then you are probably right ... 
this is probably the most normalized design.   If, however, many of columns 
are NULL the majority of the time, then the design you should be using is a 
vertial child table, of the form  ( value_type  | value ).   

Such a vertical child table would also make your comparison between instances 
*much* easier, as it could be executed via a simple 4-table-outer-join and 3 
where clauses.  So even if you don't have a lot of NULLs, you probably want 
to consider this.
You lost me on that one.  What's a "vertical child table"?

Statistically, about 6% of the rows use more than 200 of the columns,
27% of the rows use 80-199 or more columns, 45% of the rows use 40-79
columns and the remaining 22% of the rows use 39 or less of the columns.
That is a lot of NULLS.  Never gave that much thought.
To ensure query efficiency, hide the NULLs and simulate the multiple
tables I have a boatload of indexes, ensure that every query makees use
of an index, and have created 37 views.  It's worked pretty well so
far

The reason for my initial question was this.  We save changes only.
In other words, if system S has row T1 for day D1 and if on day D2
we have another row T1 (excluding our time column) we don't want
to save it.


If re-designing the table per the above is not a possibility, then I'd suggest 
that you locate 3-5 columns that:
1) are not NULL for any row;
2) combined, serve to identify a tiny subset of rows, i.e. 3% or less of the 
table.
There are always, always, always 7 columns that contain data.


Then put a multi-column index on those columns, and do your comparison.  
Hopefully the planner should pick up on the availablity of the index and scan 
only the rows retrieved by the index.   However, there is the distinct 
possibility that the presence of 637 WHERE criteria will confuse the planner, 
causing it to resort to a full table seq scan; in that case, you will want to 
use a subselect to force the issue.
That's what I'm trying to avoid is a big WHERE (c1,c2,...,c637) <> 
(d1,d2,...,d637) clause.  Ugly.


Or, as Joe Conway suggested, you could figure out some kind of value hash that 
uniquely identifies your rows.
I've given that some though and though appealing I don't think I'd care
to spend the CPU cycles to do it.  Best way I can figure to accomplish
it would be to generate an MD5 on each row without the timestamp and
store it in another column, create an index on the MD5 column, generate
MD5 on each line I want to insert.  Makes for a simple WHERE...
Okay.  I'll give it a whirl.  What's one more column, right?

Greg

--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Compare rows

2003-10-08 Thread Greg Spiegelberg

Dror,

I gave this some serious thought at first.  I only deal with
int8, numeric(24,12) and varchar(32) columns which I could
reduce to 3 different tables.  Problem was going from 1700-3000
rows to around 300,000-1,000,000 rows per system per day that
is sending data to our database.
BTW, the int8 and numeric(24,12) are for future expansion.
I hate limits.
Greg

Dror Matalon wrote:
It's still not quite clear what you're trying to do. Many people's gut
reaction is that you're doing something strange with so many columns in
a table.
Using your example, a different approach might be to do this instead:

 Day  |  Name |   Value
 --+-+---
 Oct 1 | OS  | Solaris 5.8 
 Oct 1 | Patch   | 108528-12
 Oct 3 | Patch   | 108528-13

You end up with lots more rows, fewer columns, but it might be
harder to query the table. On the other hand, queries should run quite
fast, since it's a much more "normal" table.
But without knowing more, and seeing what the other columns look like,
it's hard to tell.
Dror


--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Compare rows

2003-10-08 Thread Dror Matalon

It's still not quite clear what you're trying to do. Many people's gut
reaction is that you're doing something strange with so many columns in
a table.

Using your example, a different approach might be to do this instead:

 Day  |  Name |   Value
 --+-+---
 Oct 1 | OS  | Solaris 5.8 
 Oct 1 | Patch   | 108528-12
 Oct 3 | Patch   | 108528-13


You end up with lots more rows, fewer columns, but it might be
harder to query the table. On the other hand, queries should run quite
fast, since it's a much more "normal" table.

But without knowing more, and seeing what the other columns look like,
it's hard to tell.

Dror

On Wed, Oct 08, 2003 at 02:39:54PM -0400, Greg Spiegelberg wrote:
> Joe Conway wrote:
> >Greg Spiegelberg wrote:
> >
> >>The reason for my initial question was this.  We save changes only.
> >>In other words, if system S has row T1 for day D1 and if on day D2
> >>we have another row T1 (excluding our time column) we don't want
> >>to save it.
> >
> >
> >It still isn't entirely clear to me what you are trying to do, but 
> >perhaps some sort of calculated checksum or hash would work to determine 
> >if the data has changed?
> 
> Best example I have is this.
> 
> You're running Solaris 5.8 with patch 108528-X and you're collecting
> that data daily.  Would you want option 1 or 2 below?
> 
> Option 1 - Store it all
>  Day  |  OS |   Patch
> --+-+---
> Oct 1 | Solaris 5.8 | 108528-12
> Oct 2 | Solaris 5.8 | 108528-12
> Oct 3 | Solaris 5.8 | 108528-13
> Oct 4 | Solaris 5.8 | 108528-13
> Oct 5 | Solaris 5.8 | 108528-13
> and so on...
> 
> To find what you're running:
> select * from table order by day desc limit 1;
> 
> To find when it last changed however takes a join.
> 
> 
> Option 2 - Store only changes
>  Day  |  OS |   Patch
> --+-+---
> Oct 1 | Solaris 5.8 | 108528-12
> Oct 3 | Solaris 5.8 | 108528-13
> 
> To find what you're running:
> select * from table order by day desc limit 1;
> 
> To find when it last changed:
> select * from table order by day desc limit 1 offset 1;
> 
> I selected Option 2 because I'm dealing with mounds of complicated and
> varying data formats and didn't want to have to write complex queries
> for everything.
> 
> Greg
> 
> -- 
> Greg Spiegelberg
>  Sr. Product Development Engineer
>  Cranel, Incorporated.
>  Phone: 614.318.4314
>  Fax:   614.431.8388
>  Email: [EMAIL PROTECTED]
> Cranel. Technology. Integrity. Focus.
> 
> 
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>   http://www.postgresql.org/docs/faqs/FAQ.html

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Neil Conway wrote:

> ISTM the most legitimate reason for not enabling compilater
> optimizations on a given compiler/OS/architecture combination is might
> cause compiler errors / bad code generation.
>
> Can we get these optimizations enabled in time for the next 7.4 beta? It
> might also be good to add an item in the release notes about it.
>
> -Neil
>

I just ran make check for sun with gcc -O2 and suncc -fast and both
passed.

We'll need other arguments to suncc to supress some warnings, etc. (-fast
generates a warning for every file compiled telling you it will only
run on ultrasparc machines)

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Presentation

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 11:02, Jeff wrote:
> The boss cleared my de-company info-ified pg presentation.

Slide 37: as far as I know, reordering of outer joins is not implemented
in 7.4

-Neil



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Compare rows

2003-10-08 Thread Greg Spiegelberg

Joe Conway wrote:
Greg Spiegelberg wrote:

The reason for my initial question was this.  We save changes only.
In other words, if system S has row T1 for day D1 and if on day D2
we have another row T1 (excluding our time column) we don't want
to save it.


It still isn't entirely clear to me what you are trying to do, but 
perhaps some sort of calculated checksum or hash would work to determine 
if the data has changed?
Best example I have is this.

You're running Solaris 5.8 with patch 108528-X and you're collecting
that data daily.  Would you want option 1 or 2 below?
Option 1 - Store it all
 Day  |  OS |   Patch
--+-+---
Oct 1 | Solaris 5.8 | 108528-12
Oct 2 | Solaris 5.8 | 108528-12
Oct 3 | Solaris 5.8 | 108528-13
Oct 4 | Solaris 5.8 | 108528-13
Oct 5 | Solaris 5.8 | 108528-13
and so on...
To find what you're running:
select * from table order by day desc limit 1;
To find when it last changed however takes a join.

Option 2 - Store only changes
 Day  |  OS |   Patch
--+-+---
Oct 1 | Solaris 5.8 | 108528-12
Oct 3 | Solaris 5.8 | 108528-13
To find what you're running:
select * from table order by day desc limit 1;
To find when it last changed:
select * from table order by day desc limit 1 offset 1;
I selected Option 2 because I'm dealing with mounds of complicated and
varying data formats and didn't want to have to write complex queries
for everything.
Greg

--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 14:31, Bruce Momjian wrote:
> Well, this is really embarassing.  I can't imagine why we would not set
> at least -O on all platforms.

ISTM the most legitimate reason for not enabling compilater
optimizations on a given compiler/OS/architecture combination is might
cause compiler errors / bad code generation.

Can we get these optimizations enabled in time for the next 7.4 beta? It
might also be good to add an item in the release notes about it.

-Neil

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Bruce Momjian

Jeff wrote:
> On Wed, 8 Oct 2003, Neil Conway wrote:
> 
> >
> > What CFLAGS does configure pick for gcc? From
> > src/backend/template/solaris, I'd guess it's not enabling any
> > optimization. Is that the case? If so, some gcc numbers with -O and -O2
> > would be useful.
> >
> 
> I can't believe I didn't think of this before! heh.
> Turns out gcc was getting nothing for flags.
> 
> I added -O2 to CFLAGS and my 60 seconds went down to 21.  A rather mild
> improvment huh?
> 
> I did a few more tests and suncc still beats it out - but not by too much
> now (Not enought to justify buying a license just for compiling pg)
> 
> I'll go run the regression test suite with my gcc -O2 pg and the suncc pg.
> See if they pass the test.
> 
> If they do we should consider adding -O2 and -fast to the CFLAGS.

[ CC added for hackers.]

Well, this is really embarassing.  I can't imagine why we would not set
at least -O on all platforms.  Looking at the template files, I see
these have no optimization set:

darwin
dgux
freebsd (non-alpha)
irix5
nextstep
osf (gcc)
qnx4
solaris
sunos4
svr4
ultrix4

I thought we used to have code that did -O for any platforms that set no
cflags, but I don't see that around anywhere.  I recommend adding -O2,
or at leaset -O to all these platforms --- we can then use platform
testing to make sure they are working.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] PostgreSQL vs. MySQL

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 14:05, Josh Berkus wrote:
> Hmmm ... both, I think.   The Install Docs should have:
> 
> "Here are the top # things you will want to adjust in your PostgreSQL.conf:
> 1) Shared_buffers 
> 2) Sort_mem 
> 3) effective_cache_size 
> 4) random_page_cost 
> 5) Fsync 
> etc."

> Barring an objection, I'll get to work on this.

I think this kind of information belongs in the documentation proper,
not in the installation instructions. I think you should put this kind
of tuning information in the "Performance Tips" chapter, and include a
pointer to it in the installation instructions.

-Neil

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Speeding up Aggregates

2003-10-08 Thread Dror Matalon


Actually what finally sovled the problem is repeating the 
dtstamp > last_viewed
in the sub select

select articlenumber, channel, description, title, link, dtstamp  from items i1, 
my_channels where ((i1.channel = '2' and
my_channels.id = '2' and owner = 'drormata'  and (dtstamp > last_viewed)) ) and 
(dtstamp = (select max (dtstamp) from items i2 
where channel = '2' and i1.link = i2.link));

to
explain analyze select articlenumber, channel, description, title, link, dtstamp  from 
items i1, my_channels where ((i1.channel = '2' and
my_channels.id = '2' and owner = 'drormata'  and (dtstamp > last_viewed)) ) and 
(dtstamp = (select max (dtstamp) from items i2 where
channel = '2' and i1.link = i2.link and dtstamp > last_viewed));

Which in the stored procedure looks like this:
CREATE or REPLACE FUNCTION item_max_date (int4, varchar, timestamptz)
RETURNS
timestamptz AS '
select max(dtstamp) from items where channel = $1 and link = $2 and
dtstamp > $3;
' LANGUAGE 'sql';


Basically I have hundreds or thousands of items but only a few that
satisfy "dtstamp > last_viewed". Obviously I want to run the max() only on
on a few items. Repeating "dtstamp > last_viewed" did the trick, but it
seems like there should be a more elegant/clear way to tell the planner
which constraint to apply first.

Dror



On Wed, Oct 08, 2003 at 10:54:24AM -0400, Greg Stark wrote:
> Rod Taylor <[EMAIL PROTECTED]> writes:
> 
> > On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > > >
> > > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > > of it, otherwise the below query would be a gain (might be a small one
> > > > anyway).
> > > > 
> > > >   select dtstamp
> > > > from items
> > > >where channel = $1
> > > >  and link = $2
> > > > ORDER BY dtstamp DESC
> > > >LIMIT 1;
> > 
> > It didn't make a difference even with the 3 term index? I guess you
> > don't have very many common values for channel / link combination.
> 
> You need to do:
> 
>  ORDER BY channel DESC, link DESC, dtstamp DESC
> 
> This is an optimizer nit. It doesn't notice that since it selected on channel
> and link already the remaining tuples in the index will be ordered simply by
> dtstamp.
> 
> (This is the thing i pointed out previously in
> <[EMAIL PROTECTED]> on Feb 13th 2003 on pgsql-general)
> 
> 
> -- 
> greg
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL vs. MySQL

2003-10-08 Thread Bruce Momjian


Totally agree.

---

Josh Berkus wrote:
> Bruce,
> 
> > Yes, I think that is a good idea --- now, does it go in the install
> > docs, or in the docs next to each GUC item?
> 
> Hmmm ... both, I think.   The Install Docs should have:
> 
> "Here are the top # things you will want to adjust in your PostgreSQL.conf:
> 1) Shared_buffers 
> 2) Sort_mem 
> 3) effective_cache_size 
> 4) random_page_cost 
> 5) Fsync 
> etc."
> 
> Then next to each of these items in the Docs, I add 1-2 sentences about how to 
> set that item.
> 
> Hmmm ... do we have similar instructions for setting connection options and 
> pg_hba.conf?  We should have a P telling people they need to do this.
> 
> Barring an objection, I'll get to work on this.
> 
> -- 
> Josh Berkus
> Aglio Database Solutions
> San Francisco
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Neil Conway wrote:

>
> What CFLAGS does configure pick for gcc? From
> src/backend/template/solaris, I'd guess it's not enabling any
> optimization. Is that the case? If so, some gcc numbers with -O and -O2
> would be useful.
>

I can't believe I didn't think of this before! heh.
Turns out gcc was getting nothing for flags.

I added -O2 to CFLAGS and my 60 seconds went down to 21.  A rather mild
improvment huh?

I did a few more tests and suncc still beats it out - but not by too much
now (Not enought to justify buying a license just for compiling pg)

I'll go run the regression test suite with my gcc -O2 pg and the suncc pg.
See if they pass the test.

If they do we should consider adding -O2 and -fast to the CFLAGS.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] PostgreSQL vs. MySQL

2003-10-08 Thread Josh Berkus

Bruce,

> Yes, I think that is a good idea --- now, does it go in the install
> docs, or in the docs next to each GUC item?

Hmmm ... both, I think.   The Install Docs should have:

"Here are the top # things you will want to adjust in your PostgreSQL.conf:
1) Shared_buffers 
2) Sort_mem 
3) effective_cache_size 
4) random_page_cost 
5) Fsync 
etc."

Then next to each of these items in the Docs, I add 1-2 sentences about how to 
set that item.

Hmmm ... do we have similar instructions for setting connection options and 
pg_hba.conf?  We should have a P telling people they need to do this.

Barring an objection, I'll get to work on this.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Compare rows

2003-10-08 Thread Greg Spiegelberg

See below.

Shridhar Daithankar wrote:
Greg Spiegelberg wrote:

The data represents metrics at a point in time on a system for
network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
speed, and whatever else can be gathered.
We arrived at this one 642 column table after testing the whole
process from data gathering, methods of temporarily storing then
loading to the database.  Initially, 37+ tables were in use but
the one big-un has saved us over 3.4 minutes.


I am sure you changed the desing because those 3.4 minutes were 
significant to you.

But I suggest you go back to 37 table design and see where bottleneck 
is. Probably you can tune a join across 37 tables much better than 
optimizing a difference between two 637 column rows.
The bottleneck is across the board.

On the data collection side I'd have to manage 37 different methods
and output formats whereas now I have 1 standard associative array
that gets reset in memory for each "row" stored.
On the data validation side, I have one routine to check the incoming
data for errors, missing columns, data types and so on.  Quick & easy.
On the data import it's easier and more efficient to do one COPY for
a standard format from one program instead of multiple programs or
COPY's.  We were using 37 PHP scripts to handle the import and the
time it took to load, execute, exit, reload each script was killing
us.  Now, 1 PHP and 1 COPY.

Besides such a large number of columns will cost heavily in terms of 
defragmentation across pages. The wasted space and IO therof could be 
significant issue for large number of rows.
No arguement here.


642 column is a bad design. Theoretically and from implementation of 
postgresql point of view. You did it because of speed problem. Now if we 
can resolve those speed problems, perhaps you could go back to other 
design.

Is it feasible for you right now or you are too much committed to the 
big table?
Pretty commited though I do try to be open.

Greg

--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 11:46, Jeff wrote:
> Yeah - like I expected it was able to generate much better code for
> _bt_checkkeys which was the #1 function in gcc on both sun & linux.
> 
> and as you can see, suncc was just able to generate much nicer code.

What CFLAGS does configure pick for gcc? From
src/backend/template/solaris, I'd guess it's not enabling any
optimization. Is that the case? If so, some gcc numbers with -O and -O2
would be useful.

-Neil

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Compare rows

2003-10-08 Thread Jason Hihn

Comment interjected below.

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Greg
> Spiegelberg
> Sent: Wednesday, October 08, 2003 12:28 PM
> To: PgSQL Performance ML
> Subject: Re: [PERFORM] Compare rows
>
>
> Josh Berkus wrote:
> > Greg,
> >
> >
> >>Anyone have any suggestions on how to efficiently compare
> >>rows in the same table?  This table has 637 columns to be
> >>compared and 642 total columns.
> >
> >
> > 637 columns?   Are you sure that's normalized?   It's hard for
> me to conceive
> > of a circumstance where that many columns would be necessary.
> >
> > If this isn't a catastrophic normalization problem (which it
> sounds like),
> > then you will probably still need to work through procedureal
> normalization
> > code, as SQL simply doesn't offer any way around naming all the
> columns by
> > hand.   Perhaps you could describe the problem in more detail?
> >
>
> The data represents metrics at a point in time on a system for
> network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
> speed, and whatever else can be gathered.
>
> We arrived at this one 642 column table after testing the whole
> process from data gathering, methods of temporarily storing then
> loading to the database.  Initially, 37+ tables were in use but
> the one big-un has saved us over 3.4 minutes.
>
> The reason for my initial question was this.  We save changes only.
> In other words, if system S has row T1 for day D1 and if on day D2
> we have another row T1 (excluding our time column) we don't want
> to save it.

Um, isn't this a purpose of a key? And I am confused. Do you want to UPDATE
the changed columns? or skip it all together?
You have: (System, Day, T1 | T2 |...Tn )
But should use:
Master: (System, Day, Table={T1, T2, .. Tn)) [Keys: sytem, day, table]
T1 { System, Day, {other fields}}  [foreign keys [system, day]

This should allow you to find your dupes very fast (indexes!) and save a lot
of space (few/no null columns), and now you don't have to worry about
comparing fields, and moving huge result sets around.


> That said, if the 3.4 minutes gets burned during our comparison which
> saves changes only we may look at reverting to separate tables.  There
> are only 1,700 to 3,000 rows on average per load.
>
> Oh, PostgreSQL 7.3.3, PHP 4.3.1, RedHat 7.3, kernel 2.4.20-18.7smp,
> 2x1.4GHz PIII, 2GB memory, and 1Gbs SAN w/ Hitachi 9910 LUN's.
>
> Greg
>
> --
> Greg Spiegelberg
>   Sr. Product Development Engineer
>   Cranel, Incorporated.
>   Phone: 614.318.4314
>   Fax:   614.431.8388
>   Email: [EMAIL PROTECTED]
> Cranel. Technology. Integrity. Focus.
>
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Compare rows

2003-10-08 Thread Josh Berkus

Greg,

> The data represents metrics at a point in time on a system for
> network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
> speed, and whatever else can be gathered.
>
> We arrived at this one 642 column table after testing the whole
> process from data gathering, methods of temporarily storing then
> loading to the database.  Initially, 37+ tables were in use but
> the one big-un has saved us over 3.4 minutes.

Hmmm ... if few of those columns are NULL, then you are probably right ... 
this is probably the most normalized design.   If, however, many of columns 
are NULL the majority of the time, then the design you should be using is a 
vertial child table, of the form  ( value_type  | value ).   

Such a vertical child table would also make your comparison between instances 
*much* easier, as it could be executed via a simple 4-table-outer-join and 3 
where clauses.  So even if you don't have a lot of NULLs, you probably want 
to consider this.

> The reason for my initial question was this.  We save changes only.
> In other words, if system S has row T1 for day D1 and if on day D2
> we have another row T1 (excluding our time column) we don't want
> to save it.

If re-designing the table per the above is not a possibility, then I'd suggest 
that you locate 3-5 columns that:
1) are not NULL for any row;
2) combined, serve to identify a tiny subset of rows, i.e. 3% or less of the 
table.

Then put a multi-column index on those columns, and do your comparison.  
Hopefully the planner should pick up on the availablity of the index and scan 
only the rows retrieved by the index.   However, there is the distinct 
possibility that the presence of 637 WHERE criteria will confuse the planner, 
causing it to resort to a full table seq scan; in that case, you will want to 
use a subselect to force the issue.

Or, as Joe Conway suggested, you could figure out some kind of value hash that 
uniquely identifies your rows.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Presentation

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Shridhar Daithankar wrote:

> * Same slide. IIRC postgresql always compresses bytea/varchar. Not too much sure
> about which but there is something that is compressed by default..:-)

> * Tablespaces has a patch floating somewhere. IIRC Gavin Sherry is the one who
> is most ahead of it. For all goodness, they will feature in 7.5 and design is

For the sake of things, I didn't include any features a patch provides. I
did include things that may appear in contrib/.

> * Mysql transaction breaks down if tables from different table types are involved.
> * Mysql transactions do not feature constant time commit/rollback like
> postgresql. The time to rollback depends upon size of transaction
> * Mysql does not split large files in segments the way postgresql do. Try
> storing 60GB of data in single mysql table.

I didn't add these ones. The user can figure this one out.
Perhaps when we/me expands this into multiple documents we can expand on
this.

> * Slide on caching. Postgresql can use 7000MB of caching. Important part is it
> does not lock that memory in it's own process space. OS can move around buffer
> cache but not memory space of an application.

I'm guilty of this myself - when I first started pg I was looking for a
way to make it use a zillion megs of memory like we have informix do -
Perhaps I'll reword that segment.. the point was to show PG relies on the
OS to do a lot of caching and that it doesn't do it itself.

> * Using trigger for maintening a row count would generate as much dead rows as
> you wanted to avoid in first place..:-)

We all know this.. but it is a way to get a fast select count(*) from
table

> All of them are really minor. It's a very well done presentation but 45 slides
> could be bit too much at a time. I suggest splitting the presentation in 3.
> Intro and comparison, features, administration, programming and tuning. Wow..
> they are 5..:-)
>

Yeah. What I'd really love to do is de-powerpointify it and make it a nice
set of "real" web pages.

> Can you rip out informix migration? It could be a good guide by itself.
>

I agree. It would be good to rip out. I think we have the oracle guide
somewhere..

I've put this updated on up on hte postgres.jefftrout.com site
along with openoffice version.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Compare rows

2003-10-08 Thread Joe Conway

Greg Spiegelberg wrote:
The reason for my initial question was this.  We save changes only.
In other words, if system S has row T1 for day D1 and if on day D2
we have another row T1 (excluding our time column) we don't want
to save it.
It still isn't entirely clear to me what you are trying to do, but 
perhaps some sort of calculated checksum or hash would work to determine 
if the data has changed?

Joe



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 10:48, Andrew Sullivan wrote:
> My worry about this test is that it gives us precious little
> knowledge about concurrent connection slowness, which is where I find
> the most significant problems.

As Jeff points out, the second set of results is for 20 concurrent
connections. Note that the advantage sunsoft cc has over gcc decreases
as the number of connections increases (which makes sense, as the 20x
workload is likely to be more I/O bound).

-Neil

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Compare rows

2003-10-08 Thread Shridhar Daithankar

Greg Spiegelberg wrote:

The data represents metrics at a point in time on a system for
network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
speed, and whatever else can be gathered.
We arrived at this one 642 column table after testing the whole
process from data gathering, methods of temporarily storing then
loading to the database.  Initially, 37+ tables were in use but
the one big-un has saved us over 3.4 minutes.
I am sure you changed the desing because those 3.4 minutes were significant to you.

But I suggest you go back to 37 table design and see where bottleneck is. 
Probably you can tune a join across 37 tables much better than optimizing a 
difference between two 637 column rows.

Besides such a large number of columns will cost heavily in terms of 
defragmentation across pages. The wasted space and IO therof could be 
significant issue for large number of rows.

642 column is a bad design. Theoretically and from implementation of postgresql 
point of view. You did it because of speed problem. Now if we can resolve those 
speed problems, perhaps you could go back to other design.

Is it feasible for you right now or you are too much committed to the big table?

And of course, then it is routing postgresql tuning exercise..:-)

 Shridhar



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Presentation

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Shridhar Daithankar wrote:


Thanks for the nitpicks :)

I've taken some into consideration.
I also signed onto the advocacy list so I can be in on discussions there.

Feel free to convert to whatever format you'd like. I originally started
working on it in OpenOffice, but I got mad at it. So I switched to
powerpoint and got mad at that too :)


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Compare rows

2003-10-08 Thread Greg Spiegelberg

Josh Berkus wrote:
Greg,


Anyone have any suggestions on how to efficiently compare
rows in the same table?  This table has 637 columns to be
compared and 642 total columns.


637 columns?   Are you sure that's normalized?   It's hard for me to conceive 
of a circumstance where that many columns would be necessary.

If this isn't a catastrophic normalization problem (which it sounds like), 
then you will probably still need to work through procedureal normalization 
code, as SQL simply doesn't offer any way around naming all the columns by 
hand.   Perhaps you could describe the problem in more detail?

The data represents metrics at a point in time on a system for
network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
speed, and whatever else can be gathered.
We arrived at this one 642 column table after testing the whole
process from data gathering, methods of temporarily storing then
loading to the database.  Initially, 37+ tables were in use but
the one big-un has saved us over 3.4 minutes.
The reason for my initial question was this.  We save changes only.
In other words, if system S has row T1 for day D1 and if on day D2
we have another row T1 (excluding our time column) we don't want
to save it.
That said, if the 3.4 minutes gets burned during our comparison which
saves changes only we may look at reverting to separate tables.  There
are only 1,700 to 3,000 rows on average per load.
Oh, PostgreSQL 7.3.3, PHP 4.3.1, RedHat 7.3, kernel 2.4.20-18.7smp,
2x1.4GHz PIII, 2GB memory, and 1Gbs SAN w/ Hitachi 9910 LUN's.
Greg

--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Compare rows

2003-10-08 Thread Josh Berkus

Greg,

> Anyone have any suggestions on how to efficiently compare
> rows in the same table?  This table has 637 columns to be
> compared and 642 total columns.

637 columns?   Are you sure that's normalized?   It's hard for me to conceive 
of a circumstance where that many columns would be necessary.

If this isn't a catastrophic normalization problem (which it sounds like), 
then you will probably still need to work through procedureal normalization 
code, as SQL simply doesn't offer any way around naming all the columns by 
hand.   Perhaps you could describe the problem in more detail?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Presentation

2003-10-08 Thread Josh Berkus

Jeff,

> Its avail in powerpoint and (ugg) powerpoint exported html.

I can probably convert it to OpenOffice.org and Flash.  OK?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Presentation

2003-10-08 Thread Rod Taylor

> * Same slide. IIRC postgresql always compresses bytea/varchar. Not too much sure 
> about which but there is something that is compressed by default..:-)

I'm not sure about that.

Even toasted values are not always compressed, though they certainly can
be and usually are.


signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] Presentation

2003-10-08 Thread Shridhar Daithankar

Jeff wrote:
Let me know if there are blatant errors, etc in there.
Maybe even slightly more subtle blatant errors :)
Some minor nitpicks,

* Slide 5, postgresql already features 64 bit port. The sentence is slightly 
confusing
* Same slide. IIRC postgresql always compresses bytea/varchar. Not too much sure 
about which but there is something that is compressed by default..:-)
* Tablespaces has a patch floating somewhere. IIRC Gavin Sherry is the one who 
is most ahead of it. For all goodness, they will feature in 7.5 and design is 
done. There aren't much issues there.
* Mysql transaction breaks down if tables from different table types are involved.
* Mysql transactions do not feature constant time commit/rollback like 
postgresql. The time to rollback depends upon size of transaction
* Mysql does not split large files in segments the way postgresql do. Try 
storing 60GB of data in single mysql table.
* Slide on informix. It would be better if you mention what database you were 
using on your pentium. Assuming postgresql is fine, but being specific helps.
* Slide on caching. Postgresql can use 7000MB of caching. Important part is it 
does not lock that memory in it's own process space. OS can move around buffer 
cache but not memory space of an application.
* Installation slide. We can do without 'yada' for being formal, right..:-) 
(Sorry if thats too nitpicky but couldn't help it..:-))
* initdb could be coupled with configure/make install but again, it's a matter 
of choice.
* Slide on configuration. 'Reliable DB corruption' is a confusing term. 'DB 
corruption for sure' or something on that line would be more appropriate 
especially if presentation is read in documentation form and not explained in a 
live session. but you decide.
*  I doubt pg_autovacuum will be in core source but predicting that long is 
always risky..:-)
* Using trigger for maintening a row count would generate as much dead rows as 
you wanted to avoid in first place..:-)

All of them are really minor. It's a very well done presentation but 45 slides 
could be bit too much at a time. I suggest splitting the presentation in 3. 
Intro and comparison, features, administration, programming and tuning. Wow.. 
they are 5..:-)

Can you rip out informix migration? It could be a good guide by itself.

Thanks again for documentation. After you decide what license you want to 
release it under, the team can put it on techdocs.postgresql.org..

Again, thanks for a good presentation..

 Shridhar



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Neil Conway wrote:

> Interesting (and surprising that the performance differential is that
> large, to me at least). Can you tell if the performance gain comes from
> an improvement in a particular subsystem? (i.e. could you get a profile
> of Sun/gcc and compare it with Sun/sunsoft).
>

Yeah - like I expected it was able to generate much better code for
_bt_checkkeys which was the #1 function in gcc on both sun & linux.

and as you can see, suncc was just able to generate much nicer code. I'd
look at the assembler output but that won't be useful since I am very
unfamiliar with the [ultra]sparc instruction set..


Here's the prof and gprof output for the latest run:
GCC:
  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 31.52 19.4419.44 internal_mcount
 20.28 31.9512.51  8199466 0.00 0.00  _bt_checkkeys
  5.61 35.41 3.46  8197422 0.00 0.00  _bt_step
  5.01 38.50 3.09 24738620 0.00 0.00  FunctionCall2
  3.00 40.35 1.85  8194186 0.00 0.00  varchareq
  2.61 41.96 1.6124309 0.07 0.28  _bt_next
  2.42 43.45 1.49 1003 1.49 1.51  AtEOXact_Buffers
  2.37 44.91 1.4612642 0.12 0.12  _read
  2.33 46.35 1.44 16517771 0.00 0.00  pg_detoast_datum
  2.08 47.63 1.28  8193186 0.00 0.00  int4lt
  1.35 48.46 0.83  8237204 0.00 0.00  BufferGetBlockNumber
  1.35 49.29 0.83  8193888 0.00 0.00  int4ge
  1.35 50.12 0.83 _mcount


SunCC -pg -fast.
 %Time Seconds Cumsecs  #Calls   msec/call  Name

  23.24.274.27108922056  0.  _mcount
  20.73.828.09 8304052  0.0005  _bt_checkkeys
  13.72.53   10.6225054788  0.0001  FunctionCall2
   5.10.94   11.56   24002  0.0392  _bt_next
   4.40.81   12.37 8301867  0.0001  _bt_step
   3.40.63   13.00 8298219  0.0001  varchareq
   2.70.50   13.5016726855  0.  pg_detoast_datum
   2.40.45   13.95 8342464  0.0001  BufferGetBlockNumber
   2.40.44   14.39 8297941  0.0001  int4ge
   2.20.41   14.801003  0.409   AtEOXact_Buffers
   2.00.37   15.17 4220349  0.0001  lc_collate_is_c
   2.00.37   15.54 8297219  0.  int4lt
   1.60.29   15.83   26537  0.0109  AllocSetContextCreate
   0.90.16   15.991887  0.085   pglz_decompress
   0.70.13   16.12  159966  0.0008  nocachegetattr
   0.70.13   16.25 4220349  0.  varstr_cmp
   0.60.11   16.36  937576  0.0001  MemoryContextAlloc
   0.50.09   16.45  150453  0.0006  hash_search





> -Neil
>
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>
>

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] planner doesn't use multicolumn index

2003-10-08 Thread Manfred Koizar

On Wed, 8 Oct 2003 09:08:59 -0500 (CDT), Adrian Demaestri
<[EMAIL PROTECTED]> wrote:
>the type of the fields are int2 and
>int4, the where condition is v.g. partido=99 and partida=123).

Write your search condition as

WHERE partido=99::int2 and partida=123

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Presentation

2003-10-08 Thread Jeff

The boss cleared my de-company info-ified pg presentation.
It deals with PG features, crude comparison to other dbs, install, admin,
and most importantly - optimization & quirks.

Its avail in powerpoint and (ugg) powerpoint exported html.

Let me know if there are blatant errors, etc in there.
Maybe even slightly more subtle blatant errors :)

The people here thought it was good.

http://postgres.jefftrout.com/

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Neil Conway wrote:

> What is the query?
>

It retrieves an index listing for our boards. The boards are flat (not
threaded) and messages are numbered starting at 1 for each board.

If you pass in 0 for the start_from it assumes the latest 60.

And it should be noted - in some cases some boards have nearly 2M posts.
Index on board_name, number.

I cannot give out too too much stuff ;)

create or replace function get_index2(integer, varchar, varchar)
returns setof snippet
as '
DECLARE
p_start alias for $1;
p_board alias for $2;
v_start integer;
v_num integer;
v_body text;
v_sender varchar(35);
v_time timestamptz;
v_finish integer;
v_row record;
v_ret snippet;
BEGIN

v_start := p_start;

if v_start = 0 then
select * into v_start from get_high_msg(p_board);
v_start := v_start - 59;
end if;

v_finish := v_start + 60;

for v_row in
select number, substr(body, 0, 50) as snip, member_handle,
timestamp
from posts
where board_name = p_board and
number >= v_start and
number < v_finish
order by number desc
LOOP
return next v_row;
END LOOP;

return;
END;
' language 'plpgsql';


> Interesting (and surprising that the performance differential is that
> large, to me at least). Can you tell if the performance gain comes from
> an improvement in a particular subsystem? (i.e. could you get a profile
> of Sun/gcc and compare it with Sun/sunsoft).
>

I'll get these later today.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Jeff

On Wed, 8 Oct 2003, Andrew Sullivan wrote:

> My worry about this test is that it gives us precious little
> knowledge about concurrent connection slowness, which is where I find
> the most significant problems.  When we tried a Sunsoft cc vs gcc 2.95
> on Sol 7 about 1 1/2 years ago, we found more or less no difference
> once we added more than 5 connections (and we always have more than 5
> connections).  It might be worth trying again, though, since we moved
> to Sol 8.
>

The 20x column are the results when I fired up 20 beater concurrently.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Speeding up Aggregates

2003-10-08 Thread Greg Stark

Rod Taylor <[EMAIL PROTECTED]> writes:

> On Fri, 2003-10-03 at 17:53, Dror Matalon wrote:
> > On Fri, Oct 03, 2003 at 05:44:49PM -0400, Rod Taylor wrote:
> > >
> > > It is too bad the (channel, link) index doesn't have dtstamp at the end
> > > of it, otherwise the below query would be a gain (might be a small one
> > > anyway).
> > > 
> > >   select dtstamp
> > > from items
> > >where channel = $1
> > >  and link = $2
> > > ORDER BY dtstamp DESC
> > >LIMIT 1;
> 
> It didn't make a difference even with the 3 term index? I guess you
> don't have very many common values for channel / link combination.

You need to do:

 ORDER BY channel DESC, link DESC, dtstamp DESC

This is an optimizer nit. It doesn't notice that since it selected on channel
and link already the remaining tuples in the index will be ordered simply by
dtstamp.

(This is the thing i pointed out previously in
<[EMAIL PROTECTED]> on Feb 13th 2003 on pgsql-general)


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Neil Conway

On Wed, 2003-10-08 at 08:36, Jeff wrote:
> So here's the results using my load tester (single connection per beater,
> repeats the query 1000 times with different input each time (we'll get
> ~20k rows back), the query is a common query around here.

What is the query?

> Linux - 1x - 35 seconds, 20x - 180 seconds

"20x" means 20 concurrent testing processes, right?

> Sun - gcc - 1x 60 seconds  20x 245 seconds
> Sun - sunsoft defaults - 1x 52 seonds 20x [similar to gcc most likely]
> Sun - sunsoft -fast  - 1x 28 seconds  20x 164 seconds

Interesting (and surprising that the performance differential is that
large, to me at least). Can you tell if the performance gain comes from
an improvement in a particular subsystem? (i.e. could you get a profile
of Sun/gcc and compare it with Sun/sunsoft).

-Neil

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Andrew Sullivan

On Wed, Oct 08, 2003 at 08:36:56AM -0400, Jeff wrote:
> 
> So here's the results using my load tester (single connection per beater,
> repeats the query 1000 times with different input each time (we'll get
> ~20k rows back), the query is a common query around here.

My worry about this test is that it gives us precious little
knowledge about concurrent connection slowness, which is where I find
the most significant problems.  When we tried a Sunsoft cc vs gcc 2.95
on Sol 7 about 1 1/2 years ago, we found more or less no difference
once we added more than 5 connections (and we always have more than 5
connections).  It might be worth trying again, though, since we moved
to Sol 8.

Thanks for the result. 

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] planner doesn't use multicolumn index

2003-10-08 Thread Shridhar Daithankar

Adrian Demaestri wrote:

We've a table with about 8 million rows, and we need to get rows by the value 
>of two of its fields( the type of the fields are int2 and int4,
the where condition is v.g. partido=99 and partida=123). We created a
>multicolumn index on that fields but the planner doesn't use it, it still use
>a seqscan. That fields are primary key of the table and we clusterded the table
>based on that index, but it still doesn't work. We also set the enviroment
> variable enable_seqscan to false and nathing happends. The only way the
>planner use it is in querys that order by the expression of the index.
Use partido=99::int2 and partida=123::int4

Match the data types basically..

 Shridhar

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] planner doesn't use multicolumn index

2003-10-08 Thread Tomasz Myrta

We've a table with about 8 million rows, and we need to get rows by the 
value of two of its fields( the type of the fields are int2 and int4, 
the where condition is v.g. partido=99 and partida=123). We created a 
multicolumn index on that fields but the planner doesn't use it, it 
still use a seqscan. That fields are primary key of the table and we 
clusterded the table based on that index, but it still doesn't work. We 
also set the enviroment variable enable_seqscan to false and nathing 
happends. The only way the planner use it is in querys that order by the 
expression of the index.
Any idea?
thanks.
Adrián
where partido=99::int2 and partida=123;

Regards,
Tomasz Myrta
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

[PERFORM] planner doesn't use multicolumn index

2003-10-08 Thread Adrian Demaestri

We've a table with about 8 million rows, and we need to get rows by the value of two of its fields( the type of the fields are int2 and int4, the where condition is v.g. partido=99 and partida=123). We created a multicolumn index on that fields but the planner doesn't use it, it still use a seqscan. That fields are primary key of the table and we clusterded the table based on that index, but it still doesn't work. We also set the enviroment variable enable_seqscan to false and nathing happends. The only way the planner use it is in querys that order by the _expression_ of the index.
Any idea?
thanks.
AdriánDo You Yahoo!?

Todo lo que quieres saber de Estados Unidos, América Latina y el resto del Mundo.
Visíta Yahoo! Noticias.

[PERFORM] Compare rows

2003-10-08 Thread Greg Spiegelberg

All,

Anyone have any suggestions on how to efficiently compare
rows in the same table?  This table has 637 columns to be
compared and 642 total columns.
TIA,
Greg
--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] IMMUTABLE function's flag do not work: 7.3.4, plpgsql

2003-10-08 Thread Andriy Tkachuk

Hi folks. I notice that immutable flag does nothing when i invoke
my plpgsql function within one session with same args.


tele=# SELECT version();
   version
-
 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC 2.96



At first EXPLAIN ANALYZE shown strange runtime :)

[15:41]/0:[EMAIL PROTECTED]:~>time psql -c 'EXPLAIN ANALYZE SELECT calc_total(1466476, 
1062363600, 1064955599)' tele
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1)
 Total runtime: 0.02 msec
^
(2 rows)

real0m19.282s
^




At second. calc_total() is immutable function:

tele=# SELECT provolatile from pg_proc where proname = 'calc_total' and pronargs =3;
 provolatile
-
 i

but it seems that it's not cached in one session:

[15:38]/0:[EMAIL PROTECTED]:~>psql tele
Welcome to psql 7.3.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help on internal slash commands
   \g or terminate with semicolon to execute query
   \q to quit

tele=# EXPLAIN ANALYZE SELECT calc_total(1466476, 1062363600, 1064955599);
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1)
 Total runtime: 0.02 msec
(2 rows)

tele=# EXPLAIN ANALYZE SELECT calc_total(1466476, 1062363600, 1064955599);
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1)
 Total runtime: 0.02 msec
(2 rows)


What i miss?

Thanks,
  Andriy Tkachuk

http://www.imt.com.ua


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

[PERFORM] Sun performance - Major discovery!

2003-10-08 Thread Jeff

Well, as you guys know I've been tinkering with sun-vs-linux postgres for
a while trying to come up with reasons for the HUGE performance
differences. We've all had our anecdotal thoughts (fork sucks, ipc sucks,
ufs sucks, etc) and I've had a breakthrough.

Knowing that GCC only produces good code on x86 (and powerpc with apple's
mods, but it is doubtful that is as good as ibm's power compiler) I
decided to try out Sunsoft CC.  I'd heard from more than one person/place
that gcc makes abysmal sparc code.  Given that the performance profiles
for both the linux and sun boxes showed the same functions taking up most
of the time I thought I'd see what a difference sunsoft could give me.

So - hardware -
Sun E450 4x400mhz ultrasparc IIi, 4GB ram, scsi soemthing disk. (not
raid) solaris 2.6

Linux - 2xP3 500mhz, 2GB, scsi disk of some flavor (not raid) linux 2.2.17
(old I know!)

So here's the results using my load tester (single connection per beater,
repeats the query 1000 times with different input each time (we'll get
~20k rows back), the query is a common query around here.

I discounted the first run of the test as caches populated.

Linux - 1x - 35 seconds, 20x - 180 seconds

Sun - gcc - 1x 60 seconds  20x 245 seconds
Sun - sunsoft defaults - 1x 52 seonds 20x [similar to gcc most likely]
Sun - sunsoft -fast  - 1x 28 seconds  20x 164 seconds

As you math guru's can probably deduce - that is a rather large
improvement.  And by rather large I mean hugely significant.  With results
like this, I think it warrants mentioning in the FAQ_Solaris, and probably
the performance guide.

Connecting will always be a bit slower. But I think most people realize
that connecting to a db is not cheap.

I think update/etc will cause more locking, but I think IO will become the
bottle neck much sooner than lock/unlock will. (This is mostly anecdotal
given how fast solaris can lock/unlock a semaphore and how much IO I know
I have)

Oh yes, with was with 7.3.4 and sunsoft cc Sun WorkShop 6 update 1 C
5.2 2000/09/11 (which is old, perhaps newer ones make even better code?)

I'm not sure of PG's policy of non-gcc things in configure, but perhaps if
we detect sunsoft we toss in the -fast flag and maybe make it the
preferred one on sun? [btw, it compiled with no changes but it did spew
out tons of warnings]

comments?

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

59 matches

Mail list logo