[HACKERS] Postgresql on multi-core CPU's: is this old news?

2011-04-06 Thread Mischa Sandberg
Came across the following in a paper from Oct 2010. Was wondering is this is 
old news I missed in this group.
http://pdos.csail.mit.edu/papers/linux:osdi10.pdf
about Linux optimization on multi-core CPU's.

The group at MIT were exploring how some Linux apps were scaling up --- 
sometimes badly, mostly due to hidden contention over cache-line consistency 
across the cores' caches.
In a nutshell: if an app, or the system calls it uses, tries to modify anything 
in a cache line (32-64 byte slice of memory) that another core is using, 
there's a lot of fumbling in the dark to make sure there is no conflict. When I 
saw PostgreSQL named in the abstract, I thought, "Aha! Contention over shm". 
Not so. Skip to page 11 (section 5.5) for most of the PG specifics.


Re: [HACKERS] Multi-pass planner

2009-08-20 Thread Mischa Sandberg
In a federated database engine I built in the mid-90's,
it more or less ran both plans in parallel, to implement fast-first and 
min-total cost.
The index query in general started returning rows whose oids went into a filter
that discarded them from the serial query once it started to crank things out.

'index' and 'serial' is not exactly what was going on;
the federated engine was joining multiple tables across multiple (sometimes 
hundreds)
of databases, with really variable transmission times, and tolerance for 
timed-out db servers.
It had no reliable way to get cost estimates in advance,
since it didn't have access to statistical metadata (no, Postgres wasn't one
of the databases I had to take as input, more's the pity).

'Parallel' is also not quite accurate. It retained plans (queries were heavily 
repeated)
and did mickey-mouse simulated annealing of past performance, so that if one of 
two plans
was the faster 90% of the time, then there was a 10% probability it would run 
both plans in parallel,
just to check whether something had changed.

The code was part of a proprietary product, and was used by Acxiom and Cisco.
But the concept was pretty simple and as described above.

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Greg Stark
> Sent: Thursday, August 20, 2009 10:32 AM
> To: Robert Haas
> Cc: Kevin Grittner; decibel; Pg Hackers
> Subject: Re: [HACKERS] Multi-pass planner
>
> On Thu, Aug 20, 2009 at 6:28 PM, Greg Stark wrote:
> > I don't think it's a bad idea, I just think you have to set your
> > expectations pretty low. If the estimates are bad there
> isn't really
> > any plan that will be guaranteed to run quickly.
>
> Actually this is usually Tom's point when this topic comes
> up. Say you're deciding between an index scan and a
> sequential scan. The sequential scan has a total cost of
> 1000..1000 but the index scan has an estimated total cost of
> 1..1. If you pick the sequential scan you might be
> running 1000x slower than the index scan in the worst case.
> But if you pick the index scan you might be running 10x
> slower than the sequential scan in the worst case. If you
> don't trust the estimate where does that leave you? Making a
> mistake either way is fatal.
>
>
> --
> greg
> http://mit.edu/~gsstark/resume.pdf
>
> --
> Sent via pgsql-hackers mailing list
> (pgsql-hackers@postgresql.org) To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans

2009-07-14 Thread Mischa Sandberg
Now I'm back where I can go look at the source code :-)
Thanks.

> -Original Message-
> From: Jaime Casanova [mailto:jcasa...@systemguards.com.ec]
> Sent: Monday, July 13, 2009 8:40 PM
> To: Mischa Sandberg
> Cc: Heikki Linnakangas; PostgreSQL-development
> Subject: Re: [HACKERS] Index-only scans
>
> On Mon, Jul 13, 2009 at 5:38 PM, Mischa
> Sandberg wrote:
> > Does PG have an intermediate execution node to sort/batch
> index entries (heap tuple ptrs) by heap page prior to lookup?
> Something mssql does ...
> >
>
> it sounds a lot like a bitmap index scan
>
>
> --
> Atentamente,
> Jaime Casanova
> Soporte y capacitación de PostgreSQL
> Asesoría y desarrollo de sistemas
> Guayaquil - Ecuador
> Cel. +59387171157
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans

2009-07-13 Thread Mischa Sandberg
Does PG have an intermediate execution node to sort/batch index entries (heap 
tuple ptrs) by heap page prior to lookup? Something mssql does ...

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Bruce Momjian
> Sent: Monday, July 13, 2009 6:38 AM
> To: Heikki Linnakangas
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Index-only scans
>
> Heikki Linnakangas wrote:
> > Even if we don't solve the visibility
> > map problem, just allowing the executor to evaluate quals
> that are not
> > directly indexable using data from the index, would be useful. For
> > example, "SELECT * FROM foo WHERE textcol LIKE '%bar%', and
> you have a
> > b-tree index on textcol, the planner could choose a
> full-index-scan,
> > apply the '%bar%' filter on the index tuples, and only fetch those
> > heap tuples that match that qual.
>
> Interesting, I had not considered that.  You are using the
> index as a single-column table that can be scanned more
> quickly than the heap.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + If your life is a hard drive, Christ can be your backup. +
>
> --
> Sent via pgsql-hackers mailing list
> (pgsql-hackers@postgresql.org) To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] postgresql in FreeBSD jails: proposal

2008-01-17 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>:

> Mischa Sandberg <[EMAIL PROTECTED]> writes:
> > I'm going to skip the kill(1,0) test and depend on nattch only,
> > with a function that PGSharedMemoryIsInUse() can also use.
> > (For a healthy server, nattch is never less than 2, right?)
> 
> Oh, forgot to mention: healthy servers are not the point here.
> You should make the code keep its hands off any segment with
> nonzero nattch, because even one orphaned backend is enough
> to cause trouble.

Note taken. Worth putting a warning in the log, too?

Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [ADMIN] postgresql in FreeBSD jails: proposal

2008-01-17 Thread Mischa Sandberg
Quoting Stephen Frost <[EMAIL PROTECTED]>:

> * Tom Lane ([EMAIL PROTECTED]) wrote:
> > "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> > > Easiest fix: change the UID of the user running the postmaster
> (ie. pgsql) so
> > > that each runs as a distinct UID (instead of distinct PGPORT) ...
> been doing 
> > > this since moving to FreeBSD 6.x ... no patches required ...
> > 
> > Sure, but in the spirit of "belt and suspenders too", I'd think
> that
> > doing that *and* something like Mischa's proposal wouldn't be bad.
> 
> I agree that we should try to be careful about stepping on 
> segments that might still be in use, but I would also discourage
> jail users from using the same uid for multiple PG clusters 
> since the jail doesn't protect the shmem segment.  
> We use seperate uids even w/ linux-vservers where shmem
> and everything *is* seperate, following the same 
> 'belt and suspenders too' spirit for security.

Thanks for all the input. Fixing freebsd might get answered
on a different channel :-)

Unfortunately, different uid's is not even an option here;
but serious security in this sitch is  not relevant, either.

We have a freebsd core guy here, and he says that there's no
pressing incentive for jails to handle sysv ipc, given mmap
and file locking :-( And given his other comments, I wouldn't
consider jails a "secure" environment, just a modest and
convenient way to emulate multiple machines with caveats.
.
So, given Tom's comment, that it's antisocial to zap a shm seg 
that other processes have attached ...

I'm going to skip the kill(1,0) test and depend on nattch only,
with a function that PGSharedMemoryIsInUse() can also use.
(For a healthy server, nattch is never less than 2, right?)
If no unpleasant edge cases come out of this in our test
framework, I'd like to submit that as a patch. 
Talked with our Linux guys about vserver, and they see no issues. 
Mr. Solaris here is currently a long way ooto ... opinions?

Afaics the change in behaviour is, if a degraded server exited
with some backend hanging, the second server will create a new
segment after bumping the ipc key; if system shm limits do not
allow for two such shm segments, the second server will bail.
For production systems, ensuring no orphan shm segs
is not left to heuristic clean-up by server re-start.

Hope that makes sense for the generic Postgres world.

If anyone is interested in creating hung backends, you can
create a named pipe, and tell the server to COPY from it.
---
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>:

> Mischa Sandberg <[EMAIL PROTECTED]> writes:
> > +   /* In a FreeBSD jail, you can't "kill -0" a
> postmaster
> > +* running in a different jail, so the shm seg
> might
> > +* still be in use. Safer to test nattch ?
> > +*/
> > +   if (kill(1,0) && errno == ESRCH &&
> PGSharedMemoryIsInUse(0,NextShmemSegID))
> > +   continue;
> 
> Isn't the last part of that test backward?  If it isn't, I don't
> understand what it's for at all.

Serious blush here. Yes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Mischa Sandberg
Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A
jail is halfway between a chroot and a VM. Jails blow a number of
assumptions about a unix environment: sysv ipc's are global to all
jails; but a process can only "see" other processes also running in the
jail. In fact, the quickest way to tell whether you're running in a jail
is to test for process 1.

PGSharedMemoryCreate chooses/reuses an ipc key in a reasonable way to
cover previous postmasters crashing and leaving a shm seg behind,
possibly with some backends still running.

Unfortunately, with multiple jails running PG servers and (due to app
limitations) all servers having same PGPORT, you get the situation that
when jail#2 (,jail#3,...) server comes up, it:
- detects that there is a shm seg with ipc key 5432001
- checks whether the associated postmaster process exists (with kill -0)
- overwrites the segment created and being used by jail #1

There's a workaround (there always is) other than this patch, involving
NAT translation so that the postmasters listen on different ports, but
the outside world sees them each listening on 5432. But that seems
somewhat circuitous.

I've hacked sysv_shmem.c (in PG 8.0.9) to handle this problem. Given the
trouble that postmaster goes to, to stop shm seg leakage, I'd like to
solicit any opinions on the wisdom of this edge case. If this patch IS
useful, what would be the right level of compile-time restriction
("#ifdef __FreeBSD__" ???)

@@ -319,7 +319,8 @@
 
if (makePrivate)/* a standalone backend
shouldn't do this */
continue;
-
+   /* In a FreeBSD jail, you can't "kill -0" a postmaster
+* running in a different jail, so the shm seg might
+* still be in use. Safer to test nattch ?
+*/
+   if (kill(1,0) && errno == ESRCH &&
!PGSharedMemoryIsInUse(0,NextShmSegID))
+   continue;
if ((memAddress = PGSharedMemoryAttach(NextShmemSegID,
&shmid)) == NULL)
continue;   /* can't attach,
not one of mine */
 
End of Patch.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Porting MSSQL to PGSQL -- triggers

2006-05-23 Thread Mischa Sandberg

Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
How expensive is this going to be, especially for huge numbers of rows? 


Certainly cheaper than firing a per-row trigger.


I'm curious: I've never written a MSSQL trigger that did NOT use the 
INSERTED/DELETED pseudotables (aka NEW/OLD). I know STATEMENT-level triggers 
have existed in PG for a while ... but what do people use them for???


--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Porting MSSQL to PGSQL (Was: [HACKERS] [OT] MySQL is bad, but

2006-05-22 Thread Mischa Sandberg

Jim C. Nasby wrote:

On Sun, May 21, 2006 at 02:58:17PM -0700, Josh Berkus wrote:


Actually, porting TSQL to PL/pgSQL would be very hard.   I speak as an expert 
TSQL developer.  For example, most data manipulation in TSQL is done through 
updatable cursors, something we don't currently support.  


There are plenty of direct runtime incompatibilities, that have to be 
implemented not as language-to-language, but language-to-procedure.

That also makes them stick out as the first thing to refactor and reimplement,
once the basic working system has been ported :-)

Hadn't thought about updatable cursors, but solved that problem before in 
SimbaExpress and SimbaFusion with (effectively) arrays of OID's.


That's what I love about PG: so many rich consistent facilities
that you can implement things that tie other DB's in knots.

Also, T-SQL uses 
un-ordered, callable parameters for SPs, something which we *also* don't 
support.


Not such a big. Any translation has full access to the catalog
(ported over to the target PG system) and the parser/translator
can fill in the blanks.


And TSQL doesn't fail a transaction on an error, resulting in code like

UPDATE ...

if @@error = 0 then
UPDATE ...
end
if @@error = 0 then
...

Yeah, transaction error behaviour always brings out the nasty incompatibilities
(Jim, was that a major problem on Project Lance?).
--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Porting MSSQL to PGSQL: trace and profile

2006-05-22 Thread Mischa Sandberg

Martijn van Oosterhout wrote:

For high-end MSSQL shops, a high value is being able to trace and profile 
(EXPLAIN) every client SQL command from the server side ... with plenty of 
options for selective trace.

This would also be highly valuable to have in PostgreSQL.


Are we talking EXPLAIN (which is cheap) or EXPLAIN ANALYZE (which is
less so)?


No, that's the thing; server-side trace and filtering are EXPLAIN ANALYZE, and 
require no change to the app, catching (problem) plans in context.


For example, (using PG here) we had an ETL program that occasionally went very 
stupid. It turned out that the problem was the interaction between autovacuum

timing, and when the program wiped out and rebuilt a worktable.
I had modified the app to EXPLAIN ANALYZE the update commands,
but how do you modify the straight SELECTs? Taking the statements out of context
and trying them offline with EXPLAIN ANALYZE sent me up the wrong tree
a few times :-(

With PG, where I end up is setting log_min_duration_statement=10
and grinding the logs for the information I want, hoping to convincingly replay 
the SQL (with EXPLAIN ANALYZE) under similar circumstances.

If there are no applications that PREPARE their statements, I have a chance.
BTW was logging of (execution of) prepared statements with context info ever 
considered? Or have I missed something?

--
The MSSQL facility make on-the-fly EXPLAIN ANALYZE possible for all statements.
Its selective filter functions make feasible in production systems,
where you can turn it on for a week, to catch hard-to-reproduce issues.
--
I did go digging into source to make EXPLAIN ANALYZE work for a server-side 
trace, but it looked like a major. Any pointers? Willing to go dive into it again.

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Porting MSSQL to PGSQL (Was: [HACKERS] [OT] MySQL is bad, but THIS bad?)

2006-05-21 Thread Mischa Sandberg

On Thursday 18 May 2006 12:38, Josh Berkus wrote:

Personally, I'd go after MSSQL before I bothered with MySQL.   Sure, let's
make *migration* easier for those who wake up and smell the BS, but
migration can (and probably should) be one-way.


Somebody earlier was mentioning, why no automatic transformer from Transact-SQL
to PLPGSQL (maybe with a bunch of glue routines). The grammar is not a problem,
though you have to wonder at all the wired-in keywords (T-SQL always felt like 
COBOL).


The stumbling blocks are not in language, but function. Many of those functions 
are rarely used, but some big ones are quite common ...


T-SQL has statement-level triggers, and they get used a lot (some big apps ONLY 
put code in triggers). Statement-level triggers are very efficient for 
maintaining aggregates; the closest PG has are rewrite rules.


Other issues: stored procs returning multiple result sets; "print" statements; 
SELECT TOP n PERCENT; COMPUTE-expressions (subtotals); and some of the @@global 
variables that are hard to emulate @@IDENTITY being the main problem in older 
T-SQL code.


OpenXML is cool, but such a pig, that its difficulty in emulation is probably 
not an issue.


There are plenty of things that happily go away, or can be implemented with a 
client wrapper; for example, BULK INSERT and BACKUP. Other things just have no 
equivalent, and amount to no-ops in a PG world (partition functions)


A few things require some custom metadata tables (MSSQL "RULE" != PG "RULE").

If you want to convince MSSQL users to move over to PG, statement-level triggers
(with "OLD" and "NEW" rowsets) are a bottom-line requirement.
...

For high-end MSSQL shops, a high value is being able to trace and profile 
(EXPLAIN) every client SQL command from the server side ... with plenty of 
options for selective trace.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GPUSort project

2006-04-12 Thread Mischa Sandberg

[short]
This probably would be an uneasy fit into generic backend code.
Was hoping the GPUSort project might have fleeced/sorted out some issues.

[long]
Simon Riggs wrote:

On Wed, 2006-04-12 at 10:00 -0700, Mischa Sandberg wrote:

...
Long answer: we're shipping a server (appliance) product built on stock 
rackmount hardware, that includes an ATI Rage (8MB) with nothing to do. Much of 
what the box does is a single cpu-bound process, sorting  maillog extracts. The 
GPU is an asset, even at 8MB; the headwork is in mapping/truncating sort keys 
down to dense ~32bit prefixes; and in making smooth judgements as to when to 
give the job to (a) the GPU (b) quicksort (c) a tiny bitonic sort in the SSE2 
registers.



It sounds like its possible, but it would have to give incredible gains
before its worth the effort to make it happen. 8MB of video RAM doesn't
score much against 256MB of normal RAM, which is pretty cheap these
days.


A better comparison is 8MB of video RAM vs 512K of L2 cache. GPU's (also) have 
faster access (>32GB/s) to RAM than the CPU, using AGP/PCI with no contention. 
Our product uses Xeons instead of Opterons; the 3GHz CPUs are just slogging, 
waiting >70% for RAM fetch.



The hardware dependency would make this extremely sensitive to change,
so effort in this area might not give lasting benefit. As it happens,
I'm in favour of making code changes to exploit hardware, but this one
is too far for me to encourage anybody to pursue it further.


Fair comment. I'm using OpenGL, and looking at Glift, so it's not as 
hardware-specific as you might think. Other projects at gpgpu.org seem to be 
able to switch among GPU's.


That being said, humbly admit that targetting specific hardware tends to give 
one tunnel vision. Coding "if all these conditions are true, use the fast 
algorithm, else do it the normal way" is also messier to extend than a nice 
clean interface layer :-(


Any of this would apply to postgres, if tuplesort.c can tolerate a preprocessing 
step that looks for special cases, and degrades gracefully into the standard 
case. 



For other techniques, I think it can, depending upon the cost of the
preprocessing step. But the overall improvement from improving small
sorts could well be lost in the noise...so maybe not worth it.


Agreed. GPU setup makes sorts <1MB not worth it.

Small sorts get a boost from bitonic sort in SSE2, which wires into the bottom 
of a special-case quicksort, where any subrange of 9..16 elements gets done in 
xmm registers.


I think the preprocessing to test and format keys for such sorts
is useful anyway. I was trying to make radix sort usable, and that requires the 
same key prep. Even if the key prep hits its space limit and says,
the input is unsuitable for radix sort, it still makes the normal quicksort 
faster, since some key prefixes are shorter.


--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Get explain output of postgresql in Tables

2006-04-12 Thread Mischa Sandberg

Greg Sabino Mullane wrote:


I wonder if it would help much just to change EXPLAIN to indent with
something other than spaces?


I like that. Maybe even decrease the indenting a little more, and compress
some of the inner whitespace (such as the 2 spaces after the operator name)


Might it be worth checking how many people (and apps) use EXPLAIN output to 
drive apps? Our (web) reporting has a paging system for long reports, that 
depends on getting the row/cost estimate from "EXPLAIN somequery" before 
actually executing "somequery". (Yep, we have pg_autovacuum run ANALYZE a lot :-)


Anybody else out there using explain output in an automated way?

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] GPUSort project

2006-04-12 Thread Mischa Sandberg

Martijn van Oosterhout wrote:

On Tue, Apr 11, 2006 at 04:02:07PM -0700, Mischa Sandberg wrote:

Anybody on this list hear/opine anything pf the GPUSort project for 
postgresql? I'm working on a radix-sort subcase for tuplesort, and there 
are similarities.


http://www.andrew.cmu.edu/user/ngm/15-823/project/


I've heard it meantioned, didn't know they'd got it working. However,
none of my database servers have a 3D graphics anywhere near the power
they suggest in the article.

Is this of practical use for run-of-the-mill video cards?


Short answer: maybe.

Long answer: we're shipping a server (appliance) product built on stock 
rackmount hardware, that includes an ATI Rage (8MB) with nothing to do. Much of 
what the box does is a single cpu-bound process, sorting  maillog extracts. The 
GPU is an asset, even at 8MB; the headwork is in mapping/truncating sort keys 
down to dense ~32bit prefixes; and in making smooth judgements as to when to 
give the job to (a) the GPU (b) quicksort (c) a tiny bitonic sort in the SSE2 
registers.


Any of this would apply to postgres, if tuplesort.c can tolerate a preprocessing 
step that looks for special cases, and degrades gracefully into the standard 
case. I'm guessing that there are enough internal sorts (on oid, for example) 
having only small, memcmp-able sort keys, that this is worth adding in.


--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Get explain output of postgresql in Tables

2006-04-12 Thread Mischa Sandberg

Jim C. Nasby wrote:

On Wed, Apr 12, 2006 at 04:53:20PM +0200, Thomas Hallgren wrote:



  
 
  




Well, the downside is that such a format means explain output is now
twice as long. But I'd love to see something like that as an option. I'd
also still like to see an SQL-parseable version as well, since I think
there's applications for that.


On the plus side, a complex xml document is an easy read in a browser (IE or 
Firefox, either way). Hard to picture the representation in relational tables, 
though ... did you have some specific idea for what to do with a plan in SQL,

once it was parsed?

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] GPUSort project

2006-04-11 Thread Mischa Sandberg
Anybody on this list hear/opine anything pf the GPUSort project for postgresql? 
I'm working on a radix-sort subcase for tuplesort, and there are similarities.


http://www.andrew.cmu.edu/user/ngm/15-823/project/

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] plpgsql by default

2006-04-11 Thread Mischa Sandberg
Are there are more possibilities for some bug in the plpgsql engine to allow an 
exploit: actually changing the stack through a buffer overflow, or a bug in an 
intrinsic function, or allowing an injection that crosses some privilege 
boundary, via someone else's EXECUTE?


It's a lot easier to verify the few places where straight SQL can interact with 
the outside world (NOTIFY, COPY, and trojan .so's come to mind). It is harder 
for someone to find an unexpected combined-effect exploit, since there's not 
much you can combine.


Perhaps somebody in the core team has reservations about possible points of 
error to certify in plpgsql: is every possible weird array-overflow case 
covered? Further, can some innocuous side-effects in execution (INOUT 
parameters; function ownership; schema settings) combine to create a hole?

There's just that much more to worry about.

As they say, in theory, theory and practice are the same.
In practice, they differ :0)

I can understand someone being cautious about making guarantees (or even risk 
estimates) about plpgsql versus
the core engine. And so, just like not INITIALLY letting the server listen on 
all TCP sockets, it's modest conservatism to let the default be a bit restricted.


--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Logging explain-analyze output in pg log?

2005-08-11 Thread Mischa Sandberg
I'm stuck with a web app that periodically has truly awful query
response times. The cause may be an interaction between system load,
vacuum-analyze scheduling, the occasional wild variation in join
selectivity, and stats collection. Logging the queries and running them
later doesn't create an obvious pattern, because it's hard to catch the
exact circumstances where the query takes forever, or returns zero rows.

It would be difficult to change the app to run an EXPLAIN ANALYZE and
record the results. I do this when I can, because it's the fastest way
to figure out what went wrong in a 10-way join. 

I would like to add a guc variant on debug_print_plan that logs the
output of ExplainOneQuery(), rather than dumping the plan node-traversal
output --- perhaps it's just me, but I have trouble relating that output
to the tables and columns named neatly in EXPLAIN ANALYZE output.

This is different from just running EXPLAIN ANALYZE: this is asking the
backend to log the EXPLAIN ANALYZE output, but ALSO return the normal
result set. 

Some points I ponder:

(1) Could not find this on the TODO list, nor mentioned in the mail
archive. Has this truly never come up? Logging detail query execution to
to the profiler is something one gets accustomed to, with MSSQL.

(2) src/backend/commands/explain.c:ExplainQuery gives a cautionary
comment on scribbling (recording stats) in the execution tree,
particularly when that tree will be executed repeatedly from a PLPGSQL
proc (not a problem in my case, but ...)

(3) any suggestions on what to avoid ...



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-05-03 Thread Mischa Sandberg
Quoting Josh Berkus : 
 
> Mischa, 
>  
> > Okay, although given the track record of page-based sampling for 
> > n-distinct, it's a bit like looking for your keys under the 
> streetlight, 
> > rather than in the alley where you dropped them :-) 
>  
> Bad analogy, but funny. 
 
Bad analogy? Page-sampling effort versus row-sampling effort, c'est 
moot. It's not good enough for stats to produce good behaviour on the 
average. Straight random sampling, page or row, is going to cause 
enough untrustworthy engine behaviour,for any %ages small enough to 
allow sampling from scratch at any time. 
 
I'm curious what the problem is with relying on a start-up plus 
incremental method, when the method in the distinct-sampling paper 
doesn't degenerate: you can start when the table is still empty. 
Constructing an index requires an initial full scan plus incremental 
update; what's the diff? 
 
> Unless, of course, we use indexes for sampling, which seems like a 
> *really  
> good* idea to me  
 
"distinct-sampling" applies for indexes, too. I started tracking the 
discussion of this a bit late.  Smart method for this is in VLDB'92: 
Gennady Antoshenkov, "Random Sampling from Pseudo-ranked B+-trees". I 
don't think this is online anywhere, except if you have a DBLP 
membership. Does nybod else know better? 
Antoshenkov was the brains behind some of the really cool stuff in DEC 
Rdb (what eventually became Oracle). Compressed bitmap indices, 
parallel competing query plans, and smart handling of keys with 
hyperbolic distributions.  
--  
Engineers think equations approximate reality. 
Physicists think reality approximates the equations. 
Mathematicians never make the connection. 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-05-03 Thread Mischa Sandberg
Quoting Markus Schaber <[EMAIL PROTECTED]>:

> Hi, Josh,
> 
> Josh Berkus wrote:
> 
> > Yes, actually.   We need 3 different estimation methods:
> > 1 for tables where we can sample a large % of pages (say, >= 0.1)
> > 1 for tables where we sample a small % of pages but are "easily
> estimated"
> > 1 for tables which are not easily estimated by we can't afford to
> sample a 
> > large % of pages.
> > 
> > If we're doing sampling-based estimation, I really don't want
> people to lose 
> > sight of the fact that page-based random sampling is much less
> expensive than 
> > row-based random sampling.   We should really be focusing on
> methods which 
> > are page-based.

Okay, although given the track record of page-based sampling for
n-distinct, it's a bit like looking for your keys under the streetlight,
rather than in the alley where you dropped them :-)

How about applying the distinct-sampling filter on a small extra data
stream to the stats collector? 

-- 
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] OLAP and PG and deja-vu

2005-04-30 Thread Mischa Sandberg
Just finished writing the PG rules to maintain a bunch of materialized 
(aggregate) views on a ROLAP cube --- yes, I've seen Jonathan 
Gardner's "matview" for postgres; didnt cover what I needed :-(. PG 
happens to be pretty convenient for roll-your-own OLAP, thanks to 
RULES and ARRAY datatypes. So it hasn't been a pain to implement. 
As it happens, what I've done amounts to implementing covering indexes 
(that's another thread), and an aid for join-selectivity/result-size 
estimation. 
 
And suddenly I had a case of deja-vu... 
 
A decade ago, working with Sybase 4.8/MSSQL 4.2, you could only 
enforce FK relationships by coding them in triggers.  
 
Then, one day, declarative-FK's could be recognized by the engine 
itself. Suddenly, the query planner could make real use of those 
relations, typically dropping them out of complex joins. Hurrah! 
 
Now, here I am, doing what every ROLAP system needs: the ability to 
store and query aggregates of the fact table, then parsing a query to 
determine which (smallest) level of aggregate can answer the query. 
 
There are many such app-level query generators that bear the brunt of 
keeping track of, and choosing from, multiple aggregates --- a task 
that's on a par with creating appropriate indexes, and about as much 
fun.  
 
Maybe one could kill a couple of birds with one stone, by making 
materialized views visible to the query optimizer? 
 
Discussion? 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-28 Thread Mischa Sandberg
Quoting Josh Berkus :

> > >Perhaps I can save you some time (yes, I have a degree in Math). If I
> > >understand correctly, you're trying extrapolate from the correlation
> > >between a tiny sample and a larger sample. Introducing the tiny sample
> > >into any decision can only produce a less accurate result than just
> > >taking the larger sample on its own; GIGO. Whether they are consistent
> > >with one another has no relationship to whether the larger sample
> > >correlates with the whole population. You can think of the tiny sample
> > >like "anecdotal" evidence for wonderdrugs.
>
> Actually, it's more to characterize how large of a sample we need.  For
> example, if we sample 0.005 of disk pages, and get an estimate, and then
> sample another 0.005 of disk pages and get an estimate which is not even
> close to the first estimate, then we have an idea that this is a table
which
> defies analysis based on small samples.   Wheras if the two estimates
are <
> 1.0 stdev apart, we can have good confidence that the table is easily
> estimated.  Note that this doesn't require progressively larger
samples; any
> two samples would work.

We're sort of wandering away from the area where words are a good way
to describe the problem. Lacking a common scratchpad to work with,
could I suggest you talk to someone you consider has a background in
stats, and have them draw for you why this doesn't work?

About all you can get out of it is, if the two samples are
disjunct by a stddev, yes, you've demonstrated that the union
of the two populations has a larger stddev than either of them;
but your two stddevs are less info than the stddev of the whole.
Breaking your sample into two (or three, or four, ...) arbitrary pieces
and looking at their stddevs just doesn't tell you any more than what
you start with.

-- 
"Dreams come true, not free." -- S.Sondheim, ITW 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Mischa Sandberg
Quoting Andrew Dunstan <[EMAIL PROTECTED]>: 
 
> After some more experimentation, I'm wondering about some sort of  
> adaptive algorithm, a bit along the lines suggested by Marko 
Ristola, but limited to 2 rounds. 
>  
> The idea would be that we take a sample (either of fixed size, or 
> some  small proportion of the table) , see how well it fits a larger 
sample 
> > (say a few times the size of the first sample), and then adjust 
the > formula accordingly to project from the larger sample the 
estimate for the full population. Math not worked out yet - I think we 
want to ensure that the result remains bounded by [d,N]. 
 
Perhaps I can save you some time (yes, I have a degree in Math). If I 
understand correctly, you're trying extrapolate from the correlation 
between a tiny sample and a larger sample. Introducing the tiny sample 
into any decision can only produce a less accurate result than just 
taking the larger sample on its own; GIGO. Whether they are consistent 
with one another has no relationship to whether the larger sample 
correlates with the whole population. You can think of the tiny sample 
like "anecdotal" evidence for wonderdrugs.  
--  
"Dreams come true, not free." -- S.Sondheim, ITW  


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Indexed views?

2004-09-13 Thread Mischa Sandberg
Greg Stark wrote:
Mischa Sandberg <[EMAIL PROTECTED]> writes:
I take it that it is a very reasonable assumption that only a small proportion
of index records are actually invalid (else Yurk why use the index?). 

That's faulty logic, the percentage of tuples that are valid is entirely
independent from the percentage of tuples that match your range criterion. Ie,
I could be selecting 100 tuples out of a million -- even if 99 are invalid
it's still worthwhile to use the index.
Ummm ... perhaps I glossed over a bit of inference. If only a small 
proportion of the index contains invalid row references, then (in the 
absence of specific biases otherwise) arbitrary queries using that index 
will average the same proportion of invalid row references. And agreed, 
it would still be worthwhile to use the index in that case. Your analyze 
stats would be a bit queered, though.

Since you're using an index at all, the planner must be expecting a restricted
set of rows to make it up through to the root. If there is any  filter criteria
against the values from the index rows, you won't even have to check rows for
tuple visibility, that don't pass that filter.
It's an interesting idea though. But I can't think of many queries where it
would be interesting. The query would still have to visit every page
containing a record used in the final result. So the only time this would be a
significant win is if you're applying very selective restrictions to columns
that were in the index but weren't able to put in the index condition. 

This seems like a pretty rare situation; usually the reason you put columns in
an index definition is because it is going to be useful for index conditions--
especially if it's a particularly selective column.
Ummm ... two situations where filters on index columns do not fit the 
standard index probe are:

- filtering by restrictive join. Whether the index is the source or 
target of restriction, you get a better choice of join operators/orders.
For example, if the index is the restrictor, you may be able to build a 
hash table of (filtered) index rows, where building a hash table from a 
heap scan would be a bad choice.

- filtering of non-root index fields, or filtering with inequalities. 
Normally, the planner will not bother with the index for these, and may 
do a serial scan of the table. This can be done with a serial scan of 
the index, with possible optimizations like Oracle's "skip scan".

Furthermore, what 'covering' indexes buy you is, they have all the data 
you need for the query results, whether you apply predicates to them all 
or not.

At another level, people are talking about decomposition storage models 
for data in disk pages, versus n-ary storage models. That's more or less 
a fancy way of saying, organize data by columns instead of groups. This 
storage model pays you back in CPU cycles on most computers with L1/L2 
cache splits. At such point as PG might consider moving to that, then 
the row validity columns would be grouped together in a page, and the 
verification of index rows would be significantly faster: only a small 
portion of a large page need be read and pushed through the CPU.

[For more on DSM vs NSM, google: NSM n-ary DSM ]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Indexed views?

2004-09-10 Thread Mischa Sandberg
Greg Stark wrote:
Doug McNaught <[EMAIL PROTECTED]> writes:

Short answer: MVCC tuple visibility status isn't (and can't be) stored
in the index.  

Well the "can't" part is false or at least unproven. From prior discussion the
only thing that would be technically challenging would be avoiding deadlocks.
Rather than yank the MVCC visibility around, how about a (relatively 
small) change to the query plan ...

I take it that it is a very reasonable assumption that only a small 
proportion of index records are actually invalid (else Yurk why use the 
index?). In that case, how about tacking the heap table row ptr onto 
result tuples, and letting them percolate up through the tree?

Since you're using an index at all, the planner must be expecting a 
restricted set of rows to make it up through to the root. If there is 
any  filter criteria against the values from the index rows, you won't 
even have to check rows for tuple visibility, that don't pass that filter.

As soon as you rise to a point where (index) rows will either be 
multiplied (a join) or combined (a group-by/distinct), you can validate 
them against the heap file in relatively large batches, with a hash 
caching of which ones have been checked for visibility.

Just a thought.
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Installing PostgreSQL in a Unix Platform

2004-08-24 Thread Mischa Sandberg
Ummm  time to get a 'Unix in 21 days' book,
because what you're relating indicates you are battling with
some beginner basics:
Try:
$ cd /usr/local

or:
$ cd /usr
$ cd local
not:
$ cd /usr
$ cd /local
And you can't execute a program in the current directory
(like DOS) unless you explicitly have '.' in your $PATH.
It would have executed if you'd entered:
$ ./pg_ctl
... but I STRONGLY suggest that, until you understand
why, you do not continue with what you're doing, but
rather, learn the Unix basics first.
Eyinagho Newton wrote:
Dear Drake,
Thanks for your response and suggestions on what i
should do to solve the above problem.
I have tried it and here are the results i had:
doing a cd  /usr from the command line took me into
usr directory. However when i tried to do another 
cd  /local from the above resulting command line, i
got the following error:

bash: cd: /local: no such file or directory.
I later tried to change directory again to 'bin' and
therein i found 'pg_ctl' . However, when i tried to
click on it, i got a display message with the words
'couldn't find the program "pg_ctl" . Again, i tried
to enter shell command as a superuser, and through it
executed the following shell command cd /usr and then
cd /bin. When in the bin directory, i did an 'ls'
command but couldn't find pg_ctl. I suspect that it
truely wasn't installed.
What should i do next please?
Cheers,
Newton Eyinagho


--- "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:

Hello,
If you are compiling from source then the postgresql
startup script does 
not get installed.
You can use pg_ctl in /usr/local/pgsql/bin/pg_ctl to
start postgresql. 
Alternatively you
can install the startup script from the contrib
directory startscripts/linux

Sincerely,
Joshua D. Drake
Eyinagho Newton wrote:

It has been impossible to install PostgreSQL, an
open
source software, from my SUSE 8.0 . Although it
tells
you its been installed, each time i try to start it
using SysV-Init Editor, i get the following
message:
starting/etc/init.d/postgresql""
starting postgreSQL
/etc/init.d/postgresql.checkproc:command not found
done
What exactly am i doing wrong? Or better still,
what
haven't i done yet? Please help me
Newton Eyinagho

	
		
__
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

---(end of
broadcast)---
TIP 3: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
[EMAIL PROTECTED] so that your
message can get through to the mailing list
cleanly


--
Command Prompt, Inc., home of Mammoth PostgreSQL -
S/ODBC and S/JDBC
Postgresql support, programming shared hosting and
dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] -
http://www.commandprompt.com
PostgreSQL Replicator -- production quality
replication for PostgreSQL




___
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] PG replic refs

2004-08-24 Thread Mischa Sandberg
http://www.cs.mcgill.ca/~kemme/papers/vldb00.html
ARC buffer replacement policy supersedes LRU.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings