Re: [GENERAL] postgres generates too much processes per minute

2008-06-22 Thread Josh Tolley
On Thu, Jun 19, 2008 at 8:17 AM, Bill Moran
<[EMAIL PROTECTED]> wrote:
> In response to "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>:
>
>> Hello,
>>
>> i have the problem that postgres ist starting and stopping several (up
>> to 4)  processes per minute, so that the error log in windows is
>> running full, with more than 14 entries every minute.
>>
>> Does someone know, how to reduce the start and the end of so many
>> processes, is there a variable or something ?
>
> A process is started for every connection and ends when the connection
> is closed.  If you're opening/closing connections frequently, you'll
> see this.  It's by design.

You might consider using a connection pooler as well. The point of
such a thing is to allow applications requiring lots of connections to
share a smaller set of real connections to the database by proxying
connections. Look for pgBouncer and pgPool as example connection
pooling applications

- Josh / eggyknap

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server not listening

2008-05-12 Thread Josh Tolley
On Mon, May 12, 2008 at 4:53 PM, D Galen <[EMAIL PROTECTED]> wrote:
> If this isn't the right place to post this, please advise.
>
>  I've spent a week trying to get PostgreSQL 8.3 to install correctly on
> WIN2K.  Server will load & I see the server processes loaded but none of
> them have any open ports.  I keep getting the message  the server isn't
> listening.  Server set up to connect to default port 5432 on localhost but
> doesn't appear to be opening the port when it loads.
>
>Any help would be welcome.
> Thanks,
>
> Dennis

Well, easy ones first, I guess: what does postgresql.conf look like,
specifically "listen_addresses" and "port"; what are their values and
are they commented out?

- Josh / eggyknap

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Problem returning strings with pgsql 8.3.x

2008-05-10 Thread Josh Tolley
While developing PL/LOLCODE, I've found something wrong with returning
strings from LOLCODE functions using 8.3.0 or greater. Using 8.4beta
from a few days ago, for instance, a function that should return "test
string" returns
"\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F" in
pgsql (sometimes the number of \x7F characters varies). In 8.2.4 it
works fine.

Here's the code involved, from pl_lolcode_call_handler, the call
handler function for PL/LOLCODE. First, the bit that finds the
FmgrInfo structure and typioparam for the result type:

procTup = SearchSysCache(PROCOID,
ObjectIdGetDatum(fcinfo->flinfo->fn_oid), 0, 0, 0);
if (!HeapTupleIsValid(procTup)) elog(ERROR, "Cache lookup
failed for procedure %u", fcinfo->flinfo->fn_oid);
procStruct = (Form_pg_proc) GETSTRUCT(procTup);

typeTup = SearchSysCache(TYPEOID,
ObjectIdGetDatum(procStruct->prorettype), 0, 0, 0);
if (!HeapTupleIsValid(typeTup)) elog(ERROR, "Cache lookup
failed for type %u", procStruct->prorettype);
typeStruct = (Form_pg_type) GETSTRUCT(typeTup);

resultTypeIOParam = getTypeIOParam(typeTup);
fmgr_info_cxt(typeStruct->typinput, &flinfo,
TopMemoryContext); /*CurTransactionContext); */
ReleaseSysCache(typeTup);

Here's the code that converts the return value into a Datum later on
in the function:

if (returnTypeOID != VOIDOID) {
if (returnVal != NULL) {
if (returnVal->type == ident_NOOB)
fcinfo->isnull = true;
else  {
SPI_push();
if (returnTypeOID == BOOLOID)
retval =
InputFunctionCall(&flinfo, lolVarGetTroof(returnVal) == lolWIN ?
"TRUE" : "FALSE", resultTypeIOParam, -1);
else {
/* elog(NOTICE,
lolVarGetString(returnVal, true)); */
retval =
InputFunctionCall(&flinfo, lolVarGetString(returnVal, true),
resultTypeIOParam, -1);
}
SPI_pop();
}
}
else {
fcinfo->isnull = true;
}
}

SPI_finish();
/* elog(NOTICE, "PL/LOLCODE ending"); */

return retval;

returnVal is an instance of the struct PL/LOLCODE uses to store its
variables. The key line in this case is the one after the
commented-out call to elog. retval is a Datum type. lolVarGetString()
returns the string value the returnVal struct represents -- I'm
certain of that thanks to gdb and other testing. All other data types
PL/LOLCODE knows about internally seem to return just fine. I'm fairly
certain I'm screwing up memory somewhere, but I can't see what I've
done wrong.

I'm glad to provide further details, but those included above are all
the ones I thought were relevant. Thanks in advance for any help you
can provide.

- Josh / eggyknap

Note: The -hackers list seemed like the place for this post, but its
list description gives instructions to try another list first, hence
the post here.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using PL/R for predictive analysis of data.

2008-03-17 Thread Josh Tolley
On Mon, Mar 17, 2008 at 2:27 AM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> Hi Sam,
>
>  Thankyou for the suggestions. They make perfect sense to me. I
>  appreciate your time and input. The lack of optimiser usage was
>  something that I had not considered, and I thank you for making me
>  aware of it.
>
>  Cheers
>
>  The Frog

On the subject of the planner and optimizer, as of 8.3 (I think it's
new to 8.3...) you can tell the planner somewhat about how it might
expect your function to behave. See
http://www.postgresql.org/docs/8.3/interactive/sql-createfunction.html

- Josh/eggyknap

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Plotting with PL/R on *NIX - A HOWTO

2007-12-28 Thread Josh Tolley
People tend to like the idea of PL/R, because R draws some awfully
nice (and frequently useful) eye candy, and it sounds cool to be able
to make such things from inside PostgreSQL. In my (probably very
limted) experience those people often find that it's harder than it
looks to make these plots work, and if Google and my ability to come
up with applicable search terms are any indication, instructions are
not readily available. Hence this HOWTO. I hope I'll be forgiven for
the cross post.

This HOWTO covers two topics: "How to get PL/R to plot something other
than PDFs", and "How to make a function return the image data".

HOW TO GET PL/R TO PLOT SOMETHING OTHER THAN PDFS

The PL/R initiate will often try to create a plot as follows:
CREATE FUNCTION plr_plot_simple() RETURNS VOID AS $$
png('test.png')
plot(c(1:10))
dev.off()
$$ LANGUAGE plr;

SImple enough, but when called, this happens:

eggyknap=# select plr_plot_simple();
ERROR:  R interpreter expression evaluation error
DETAIL:  Error in X11(paste("png::", filename, sep = ""), width,
height, pointsize,  :
  unable to start device PNG
CONTEXT:  In PL/R function plr_plot_simple

This means R is trying to connect to an X server, and can't. R
requires an X server to handle most plot devices, and PostgreSQL
typically doesn't have a DISPLAY environment variable set to tell it
where to find such a server. R's built-in pdf() device is a notable
exception, but the PDFs R will create when asked to plot anything
particularly complex are very large, and slow to render, making them
an unattractive option.

The answer is to tell PostgreSQL where to find an X server, and here
there are two alternatives: connect to an exist X server, or make a
new one. The casual desktop experimenter will probably have an X
server running on the same machine as PostgreSQL, and can simply set
the DISPLAY variable PostgreSQL uses to point to this existing server.
Production PostgreSQL servers run X servers less commonly, so an
alternative is to use Xvfb, a "virtual" X server designed for just
such an occasion. This is simple enough:

$ Xvfb :5 -screen 0 1024x768x24

This will start Xvfb as server 5, defining screen 0 on that server as
having a resolution of 1024x768 with 24-bit color depth. Users may
also need to configure Xvfb's authorization properly, using the -ac
option (allow all hosts to connect to this server), or the -auth
 option, where  is a file that contains a list of hosts
allowed to connect to this Xvfb server. If this is running on the
PostgreSQL server itself, most likely this file just contains
"localhost" on one line.

The PostgreSQL server will have to know where to connect to this
server; for the Xvfb command line given above and assuming Xvfb runs
on the same server as PostgreSQL, the DISPLAY variable is ":5.0". Most
of the time, a script in /etc/init.d calls pg_ctl to start the
database when the system boots, and the DISPLAY variable can be set
there. Users of Debian packages for PostgreSQL may find that starting
the package abstracts pg_ctl far enough away that it's difficult to
tell in what script the variable should be set; as it turns out the
packagers have created a new file for the sole purpose of managing
environment variables. In my box it's called
/etc/postgresql/8.2/main/environment, and can just contain the
following line:

DISPLAY = ":5.0"

Note that since PL/R requires a properly set R_HOME environment
variable, PL/R users may already have had to deal with setting
environment variables inside of PostgreSQL. In fact, R provides a nice
function to examine the server's environment variables, called
plr_environ().

eggyknap=# select * from plr_environ() where name = 'DISPLAY';
  name   | value
-+---
 DISPLAY | :5.0
(1 row)

With that environment variable set, the original function works:

eggyknap=# select plr_plot_simple();
 plr_plot_simple
-

(1 row)

But where's the image? The function put it in a file called test.png,
and it turns out that file is in the server's data directory by
default. If we want it somewhere else, we need to code the png() call
accordingly. Note that the PL/R process runs as the same user as the
PostgreSQL server; that user must have proper permissions to create
the specified file in the specified location, or it won't work.

HOW TO MAKE A FUNCTION RETURN THE IMAGE DATA

Telling PostgreSQL to make exciting graphs is all well and good, but
if my end goal is to have some client somewhere (for instance, a web
application) display the graphs. It's possible to define a function
like plr_plot_simple() above, and use pg_read_file() to read the image
from the filesystem, but only superusers can call pg_read_file(), so
the client would have to connect as a superuser, which is a deal
breaker for those web applications interested in staying decently
secured. Another option is to make the PL/R function return the binary
data. WARNING: The method shown here is fairly convoluted; I'm
unconvinced that there exists no s

Re: [GENERAL] COPY ... FROM and index usage

2007-11-04 Thread Josh Tolley
On 11/4/07, Reg Me Please <[EMAIL PROTECTED]> wrote:
> Hi all.
>
> I'd like to know whether the indexes on a table are updated or not during
> a "COPY ... FROM" request.
>
> That is, should I drop all indexes during a "COPY ... FROM" in order to gain
> the maximum speed to load data?
>
> Thanks.

Although questions of "which is faster" often depend very heavily on
the data involved, the database schema, the hardware, etc., typically
people find it best to drop all indexes during a large import and
recreate them afterward.

- Josh/eggyknap

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] PostgreSQL and AutoCad

2007-10-25 Thread Josh Tolley
On 10/24/07, Bob Pawley <[EMAIL PROTECTED]> wrote:
> Is there any way of converting text from an AutoCad (.dwg ot .dxf) file into
> a PostgreSQL  Database??
>
> Bob Pawley

I know nothing of AutoCad, but your message has been sitting for a
while without response, so I'll throw out the suggestion that you
probably want AutoCad to export the text to some more common format
(like a ASCII or UTF8 file or some such) and import that.

-Josh/eggyknap

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Partitioned tables, rules, triggers

2007-10-08 Thread Josh Tolley
On 10/6/07, Goboxe <[EMAIL PROTECTED]> wrote:
> Josh,
>
> Thanks for sharing a very good info on partitioning.
>

Don't thank me -- this comes from Robert Treat. I'm just the messenger :)

-Josh/eggyknap

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Partitioned tables, rules, triggers

2007-10-05 Thread Josh Tolley
On 10/3/07, Goboxe <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have two questions on the above:
>
> 1. I found in some postings recommended to use triggers instead of
> rules. Is this documented somewhere?

A howto is available at
http://images.omniti.net/omniti.com/talks/partitions-public.pdf

- Josh/eggyknap

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Design Question (Time Series Data)

2007-10-05 Thread Josh Tolley
On 10/4/07, Ted Byers <[EMAIL PROTECTED]> wrote:
> --- Michael Glaesemann <[EMAIL PROTECTED]> wrote:
>
> >
> > On Oct 4, 2007, at 9:30 , Ted Byers wrote:
> >
> > > I do not know if PostgreSQL, or any other RDBMS,
> > > includes the ability to call on software such as
> > "R"
> >
> > See PL/R:
> >
> > http://www.joeconway.com/plr/
> >
> Thanks.  Good to know.

See also RdbiPgSQL
(http://bioconductor.org/packages/2.0/bioc/html/RdbiPgSQL.html)

PL/R lets you write R functions as procedural functions you can call
from pgsql (e.g. select my_r_function(myfield) from mytable. RdbiPgSQL
creates R functions you can use to query pgsql from within R.

-Josh/eggyknap

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Feature Request - Defining default table space for Indexes in Conf file

2007-10-03 Thread Josh Tolley
On 10/1/07, S Sharma <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> The default table space defined in db conf file is used for all database
> tables as well as indexes. So putting the indexes on another table space
> requires manually dropping and re-creating indexes.
> It would be nice to have a feature to define a default table space for
> indexes in db conf file and all indexed are created in that table space.
> This would allow creating a good database architecture to avoid disc
> contention easily.
>
> Thanks
> Data_arch

Although the most basic optimization suggested when using tablespaces
is always "Put indexes on one and data on another to avoid disk
contention", I doubt that the ideal optimization for many workloads,
which means sticking such a thing in a config file might not be such a
good idea. In other words, a DBA probably ought to think harder about
optimizing his/her use of tablespaces than just "I'll put indexes on
this one and data on another".  See
http://www.depesz.com/index.php/2007/09/30/finding-optimum-tables-placement-in-2-tablespace-situation/
and http://people.planetpostgresql.org/xzilla/ for two recent blog
posts on the subject. But now I'll be quiet, because I have no
evidence to prove any of the above :)

 - Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Open Source Application Server (PostgreSQL Perspective)

2007-09-02 Thread Josh Tolley
On 9/1/07, john_sm <[EMAIL PROTECTED]> wrote:
>
> Hey Guys, can you suggest any Open Source Application Server for SOA
> deployments.  Also, wondering, if our needs are somewhat lightweight, is
> there some lightweight open source Application Server you could Suggest.

What language are your applications written in? What services do you
expect your app server to provide?

- Josh/eggyknap

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Obtaining random rows from a result set

2007-08-31 Thread Josh Tolley
On 8/31/07, Alban Hertroys <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I've recently been busy improving a query that yields a fixed number of
> random records matching certain conditions. I have tried all the usual
> approaches, and although they do work, they're all limited in some way
> and don't translate really well to what you "want". They're kludges, IMHO.
>
> The methods I've tried are explained quite well on
> http://people.planetpostgresql.org/greg/index.php?/archives/40-Getting-random-rows-from-a-database-table.html
>
> All these methods involve calculating a random number for every record
> in the result set at some point in time, which is really not what I'm
> trying to model. I think the database should provide some means to get
> those records, so...
>
> Dear Santa,
>
> I'd like my database to have functionality analogue to how LIMIT works,
> but for other - non-sequential - algorithms.
>
> I was thinking along the lines of:
>
> SELECT *
>   FROM table
>  WHERE condition = true
>  RANDOM 5;
>
> Which would (up to) return 5 random rows from the result set, just as
> LIMIT 5 returns (up to) the first 5 records in the result set.
>
>
> Or maybe even with a custom function, so that you could get non-linear
> distributions:
>
> SELECT *
>   FROM table
>  WHERE condition = true
>  LIMIT 5 USING my_func();
>
> Where my_func() could be a user definable function accepting a number
> that should be (an estimate of?) the number of results being returned so
> that it can provide pointers to which rows in the resultset will be
> returned from the query.
>
> Examples:
> * random(maxrows) would return random rows from the resultset.
> * median() would return the rows in the middle of the result set (this
> would require ordering to be meaningful).
>
> What do people think, is this feasable? Desirable? Necessary?
>
> If I'd have time I'd volunteer for at least looking into this, but I'm
> working on three projects simultaneously already. Alas...
>
> Regards,
> Alban Hertroys.
>
> --
> Alban Hertroys
> [EMAIL PROTECTED]
>
> magproductions b.v.
>
> T: ++31(0)534346874
> F: ++31(0)534346876
> M:
> I: www.magproductions.nl
> A: Postbus 416
>7500 AK Enschede
>
> // Integrate Your World //
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>

It seems to me that anything that wants to return a random set of rows
will need to calculate a random number for all the rows it processes,
unless you change how the database scans rows in indexes or tables,
which if at all possible will probably make things *really* slow. If
it's a given that the database will always sequentially scan whatever
it is the query plan tells it to scan, you're pretty much stuck with
the rows in your result set being in the same order unless you start
picking random numbers.

One possible alternative not mentioned on the site you linked to is to
as follows:

select [whatever] from [table] where random() < [some number between 0
and 1] limit [limit value]

That doesn't require assigning a random number for *every* row in the
table, nor does it require sorting everything. It does mean that
numbers encountered earlier in the query processing have a higher
likelihood of being returned, and it also means that there's some
chance you won't actually get as many as [limit value] rows returned.

jtolley=# create table a (i integer);
CREATE TABLE
jtolley=# insert into a (i) select * from generate_series(1, 100);
INSERT 0 100
jtolley=# create table a (i integer);
CREATE TABLE
jtolley=# insert into a (i) select * from generate_series(1, 100);
INSERT 0 100
jtolley=# select * from a where random() < .1 limit 3;
 i

 22
 23
 25
(3 rows)

Hope this helps...

-Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] URGENT: Whole DB down ("no space left on device")

2007-08-31 Thread Josh Tolley
On 8/31/07, Zoltan Boszormenyi <[EMAIL PROTECTED]> wrote:
> Phoenix Kiula írta:
> > I am getting this message when I start the DB:
> >
> >
> > psql: FATAL:  could not access status of transaction 0
> > DETAIL:  Could not write to file "pg_subtrans/01F8" at offset 221184:
> > No space left on device.
> >
> >
> > What is this about and how do I solve this? A "df -h" on my system shows 
> > this:
> >
> >
> > FilesystemTypeSize  Used Avail Use% Mounted on
> > ...
> > /dev/sda2 ext39.9G  9.5G 0 100% /var
> >

In addition to what others have already said, when things calm down
you should consider implementing some sort of monitoring system that
is configured to start screaming before you run into problems like
this. At my place of work, we've set up Nagios to monitor the space
left on various partitions, and email us when a partition gets above
90% full.

- eggyknap

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Can this function be declared IMMUTABLE?

2007-08-29 Thread Josh Tolley
On 8/27/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have a question about whether I can safely declare a function IMMUTABLE.
>  Citing the documentation under "Function Volatility Categories" in the
> section on "Extending SQL":
> 
> It is generally unwise to select from database tables within an IMMUTABLE
> function at all, since the immutability will be broken if the table
> contents ever change.
> 
>
> Well, I am considering a function that does read from a table, but the
> table contents change extremely infrequently (the table is practically a
> list of constants).  Would it be safe to declare the function IMMUTABLE
> provided that the table itself is endowed with a trigger that will drop
> and recreate the function any time the table contents are modified?  In
> this way, it seems that the database would gain the performance benefit of
> an immutable function for the long stretches of time in between changes to
> the table.
>
> I apologize that I don't have any details -- it is still very early in the
> development of the database design, and I was just hoping to get a better
> understanding of whether an immutable function would safely offer any
> benefit in this scenario.
>

Lemme see if I can embarrass myself trying to answer something like
this. It seems like your function really ought to be declared STABLE,
because during a single transaction MVCC will make sure your function
sees the same values in its references table each time you call it,
but between transactions the reference table might change, changing
the result of the function. The benefits of an IMMUTABLE function over
a STABLE one, as far as I know, are these:

1) The planner can take advantage of the fact that this function is
IMMUTABLE to evaluate it only once if its arguments are constant, and
keep this result throughout the life of the query plan. If you don't
cache the query plan (e.g. with PREPARE) this is identical to STABLE,
but if you PREPARE a query, for instance, involving a call to an
IMMUTABLE function with constant arguments, the system can evaluate
the function only once during planning, and never again, whereas for
STABLE you'd have to execute the function each time it was called. For
this to be a big win over STABLE, you have to both call your function
with constant arguments and cache the query plan somehow, such as by
having the query inside another pl/pgsql function or by using PREPARE.

2) You can use IMMUTABLE functions, but not STABLE ones, in
expression-based indexes. If you declared your function IMMUTABLE, you
could build an index on a bunch of data using an index expression
involving your function, have your referenced table change somehow,
and end up not being able to correctly use that index anymore.

There may well be other advantages of IMMUTABLE over STABLE that I
don't know about. Were we talking about data I was supposed to care
for, I'd make the function STABLE, not IMMUTABLE, because that's the
most appropriate for the function.

-Josh

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Writing most code in Stored Procedures

2007-08-18 Thread Josh Tolley
On 8/18/07, Ron Johnson <[EMAIL PROTECTED]> wrote:
> Interesting.  Does PG have to initiate the Perl interpreter every
> time you call a Perl-written SP?

IIRC PostgreSQL should only load the perl interpreter once per session.

- Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Partitioning

2007-08-18 Thread Josh Tolley
On 8/18/07, Julio Cesar Sánchez González
<[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> It's natural what master table in the partitioning table contain data
> (http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html) ?
> or to be empty.

I'm no partitioning expert, but I would say most of the time the
parent table is supposed to be empty. That said, partitioning schemes
need to be developed based on what data you have and what exactly
you're trying to do. If you're using the common example of keeping a
bunch of historical data where partitions represent a distinct chunk
of time (for instance, one partition for every month of data) you
probably want to keep the parent table empty. Every row has a date,
and so there's a child table for each row naturally, so it doesn't
make sense to put anything in the parent tables. But I'm sure someone
can come up with a scheme where having data in the parent is also
useful.

- Josh

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Writing most code in Stored Procedures

2007-08-15 Thread Josh Tolley
On 8/15/07, Rohit <[EMAIL PROTECTED]> wrote:
> I have few queries regarding the use of Stored Procedures, Functions
> and Triggers in an RDBMS.
>
> (1) When to use Stored Procedure? Writing an INSERT query in a Stored
> Procedure is better or firing it from the application level?
>
> (2) Can a Trigger call a Stored Procedure?
>
> (3) What type of code must reside in application and what type in
> RDBMS?
>
> (4) Is it faster to work at application level or at the database level?

One of the major advantages of stored procedures over application code
is that stored procedures can get to the data much more quickly than
an application can, in general. An application needs to talk to
PostgreSQL through some sort of driver (JDBC, libpq, etc.) and
typically data need to traverse a network to get from the database
machine to the application server. Stored procedures, on the other
hand, don't have the overhead of either the network or the driver.
However, stored procedures can be more difficult to debug, profile,
etc., so they might not be the best for really complex logic. They
tend to be really useful if you're doing something that requires lots
and lots of queries to the database, and don't need anything else
(data from other sources, user interaction, etc.), because that takes
greatest advantage of their quick connection to the data. Richard
Huxton's point that stored procedures are typically best for data
integrity types of functions, whereas business logic should often be
in application code is an excellent rule of thumb.

-Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] language interface in postgresql

2007-08-15 Thread Josh Tolley
On 8/15/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Trevor Talbot wrote:
> > On 8/14/07, Jasbinder Singh Bali <[EMAIL PROTECTED]> wrote:
> >
> >> Let me fine tune my question here. What I mean to say is the way we can
> >> write stored procedures in C, perl etc in Postgres specifying the language
> >> parameter at the end of stored procedure, compared to that, in SQL Server
> >> 2000 I've seen SP writing in pure SQL only.
> >> Can you write Stored Procedures in SQL Server 2000 or Oracle in different
> >> programing languages as well ?
> >
> > AFAIK SQL Server 2000 only has a C interface as the other option; CLR
> > hosting was added in SQL Server 2005.  Because the CLR is a virtual
> > machine that runs compiled bytecode, and compilers for all of the
> > available languages are not necessarily available at runtime, it
> > doesn't make sense to specify such code in source form.  The process
> > is more like creating a function in C in PostgreSQL (compile and load
> > a shared library).  Details here, if you're curious:
> > http://msdn2.microsoft.com/en-us/library/ms345136.aspx
> >
> > I don't know what Oracle supports.
>
> I believe Oracle support Java in the same way MSSQL supports .net, give
> or take.

I don't know specifics of what exactly you can do with it nor how
exactly you go about it, but I know at least Oracle 10g supports
extension in Java. We had grand plans to improve some sort of
processing by writing a comparison function in Java for one the Oracle
databases I've been cursed to associate with. I don't know of any
other languages supported by Oracle for stored procedures and the
like, though it does have an ECPG-like system for C, C++, COBOL,
FORTRAN, and PL/1 of all things.

- Josh

- Josh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] query help

2007-08-14 Thread Josh Tolley
On 8/14/07, Kirk Wythers <[EMAIL PROTECTED]> wrote:
>
> I need some help with rewriting a query. I have a fairly complicated query
> (for me anyway) that dumps daily climate data, filling in missing data with
> monthly averages (one line per day).
>
> I want to output monthly averages (one line per month). I am having a hard
> time wrapping my head around this. Particularly how to deal with the doy
> column (day of year). I have tried several approaches and my forehead is
> starting to get my keyboard bloody.

I think this came up on IRC today, so perhaps this is only for the
archives' sake, but you want to do something like this:

Assuming you have a table as follows:

CREATE TABLE climate_data (
   measurement_time  timestamp,
   measurement_value  integer);

...and you insert data into it regularly, you can get the average
measurement over a period of time with date_trunc(), which will
truncate a date or timestamp value to match whatever precision you
specify. For example, see the following:

eggyknap=# select date_trunc('month', now());
   date_trunc

 2007-08-01 00:00:00-06
(1 row)

Note: the -06 at the end means I'm in mountain time.

So if you want to get the average measurement over a month's time, you
need to do something like this:

SELECT DATE_TRUNC('MONTH', measurement_time), AVG(measurement_value)
FROM climate_data GROUP BY DATE_TRUNC('MONTH', measurement_time);

This will chop all the measurement_time values down to the month the
measurement was taken in, put all measurements in groups based on the
resulting value, and take the average measurement_value from each
group.

- Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Intelligent Database in postgreSQL

2007-08-06 Thread Josh Tolley
On 8/6/07, Rodrigo Faccioli <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm a new user's postgreSQL and I have a question: Is possible add function
> of intelligent System like Neural Networks or Fuzzy Logic within postgre
> SGDB?
>
> For example: I have a database and I want to execute a select instruction.
> But, this select constains there is  column  that uses a Perceptron Neural
> Networks for return a result, follow below:
>
> Select perceptron(vendas_cli)
> from client
> Where vendas_cli > 1000
>
> Thanks,

PostgreSQL allows you to add your own functions in any of several
different languages, including Perl and C. If you can code your
perceptron function in one of the languages PostgreSQL understands,
you can do exactly what you suggest. Refer to
http://www.postgresql.org/docs/8.2/static/xplang.html for starter
documentation.

- Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread Josh Tolley
On 8/3/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
> Can anyone shed some light on this. I just would like to know if queries
> for raw data (not aggregregates) is expected to take a long time.
> Running times between 30 - 2 hours for large dataset pulls.
>
> Involves lots of joins on very large tables (min 1 millon rows each
> table, 300 columns per table)
>
> Joins are done in the form of Left joins (sometimes on the same tables,
> due to normalisation)
>
> Is 30min - 2hours too long or is this considered "normal"??

That depends entirely on your query, your dataset, the machine you're
using, etc. Your best bet is probably to post an EXPLAIN ANALYZE of
your query along with some description of what's going on and what
you're running this all on (pg version, machine specs, etc.) and see
if someone can come up with an acceptable way to make the query
faster.

- Josh

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

2007-08-02 Thread Josh Tolley
On 8/2/07, Gavin M. Roy <[EMAIL PROTECTED]> wrote:
> Are you contemplating providing access to data that's currently not stored
> in the pg_ catalog tables?  I currently monitor the statio data,
> transactions per second, and active/idle backends.  Things that I think
> would be useful would be average query execution time, longest execution
> time, etc.  Other pie in the sky ideas would include current level of total
> bloat in a database, total size on disk of a database broken down by tables,
> indexes, etc.
>
> Regards,
>
> Gavin

My own goal is to have pgsnmpd able, as much as possible, to fill the
same role the set of scripts an arbitrary PostgreSQL DBA sets up on a
typical production server. That includes statistics tables and catalog
tables, but certainly isn't limited to just that. So doing things like
categorizing total sessions in interesting and useful ways (for
instance, # of idle connections, # of active connections, max
transaction length, etc.) are certainly within pgsnmpd's purview.

In short, all the suggestions you listed are useful, and provided the
framework allows us to get reasonably good values for them, worthy of
implementation in pgsnmpd. Thanks.

-Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

2007-08-01 Thread Josh Tolley
Work is beginning on pgsnmpd v 2.0, and I figured it would be a good
time to ask folks what they typically like to monitor, so we can make
sure pgsnmpd instruments it properly. The current version of pgsnmpd
supports something called RDBMS-MIB, which is a set of data designed
to be applicable to any relational database, so it doesn't get very
PostgreSQL-specific. The next version will augment that with
PGSQL-MIB, which we have yet to write.

PGSQL-MIB should contain data elements for, ideally, anything specific
to the database that someone could possibly want to monitor in a
generic PostgreSQL installation within reason. Things like CPU load,
available disk space, total system memory, etc. would not be included,
because they're not PostgreSQL specific, but things like CPU and
memory usage of individual PostgreSQL processes are very good
candidates for inclusion in PGSQL-MIB. Current plans have us including
SNMP representations of all the statistics tables as well as the
system catalogs, runtime information about PostgreSQL processes (such
as CPU and RAM usage), shared memory usage information, and
potentially mechanisms to easily include administrator-specified
queries and generate SNMP traps based on LISTEN/NOTIFY.

So please respond, if you feel so inclined, describing things you like
to monitor in your PostgreSQL instances as well as things you would
like to be able to easily monitor in a more ideal world. Many thanks,
and apologies for any breach of netiquette I may have committed in
posting to two lists simultaneously.

- Josh Tolley

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Difference between PRIMARY KEY index and UNIQUE-NOT NULL index

2007-07-20 Thread Josh Tolley

On 7/20/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:


On Jul 20, 2007, at 17:54 , Vincenzo Romano wrote:

> In an inner join involving a 16M+ rows table and a 100+ rows table
> performances got drastically improved by 100+ times by replacing a
> UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in
> the very same order. The query has not been modified.

There should be no difference in query performance, AIUI.


If I read the documentation correctly, PRIMARY KEY is simply syntactic
sugar equivalent to UNIQUE + NOT NULL, the only difference being that
a PRIMARY KEY is reported as such to someone looking at the table
structure, which becomes more intuitive than seeing UNIQUE + NOT NULL.



> In the older case, thanks to the EXPLAIN command, I saw that the join
> was causing a sort on the index elements, while the primary key was
> not.



Might it just be that the original UNIQUE + NOT NULL index was bloated
or otherwise degraded, and reindexing it would have resulted in the
same performance gain? That's just a guess.

-Josh

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] The speed problem of Varchar vs. Char

2007-07-08 Thread Josh Tolley

On 7/8/07, Crystal <[EMAIL PROTECTED]> wrote:

Hi All,

Our company need to save contact details into the PostgreSQL database. I
just begin to learn it, so I got many questions. I am not sure which
data type I should choose for website address, varchar or char. The
website address may be very long, and we also don't want to lose the
speed. Thus, the question is: if we have a large contact database, how
much slowdown or speed up will be expected if we choose variable length
rather than fixed length? Thanks forward.

Best Wishes,
Crystal

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/



See http://www.postgresql.org/docs/8.2/interactive/datatype-character.html
(or whatever the corresponding page is for the version you're using).
Specifically, the following:

Tip:  There are no performance differences between these three types,
apart from the increased storage size when using the blank-padded
type. While character(n) has performance advantages in some other
database systems, it has no such advantages in PostgreSQL. In most
situations text or character varying should be used instead.

- Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] ORDER BY with exception

2007-06-21 Thread Josh Tolley

On 6/21/07, brian <[EMAIL PROTECTED]> wrote:

I have a lookup table with a bunch of disciplines:

# SELECT id, name FROM discipline;
  id |name
+-
   1 | writing
   2 | visual arts
   3 | music
   4 | dance
   5 | film and television
   6 | theatre
   7 | media arts
   8 | community
   9 | fine craft
  10 | other
(10 rows)

and a function that returns each discipline name along with the total
number of records in another table (showcase) that are related to each
discipline. Each showcase entry may have 0 or more items (showcase_item)
related to it, so ones that have no items are disregarded here. Also,
only showcases that have been accepted should be counted.

First, here's the working function:

CREATE FUNCTION getshowcasetotalsbydiscipline(OUT name text, OUT total
integer) RETURNS SETOF record
AS $$

DECLARE
  rec record;

BEGIN
  FOR rec IN
   EXECUTE 'SELECT id, name, 1 AS total FROM discipline'
   LOOP
 name := rec.name;

 SELECT INTO rec.total

   -- a showcase may be in the DB but not accepted by an admin
   SUM(CASE s.accepted WHEN TRUE THEN 1 ELSE 0 END)
   FROM showcase AS s
   WHERE s.id IN

 -- a showcase may exist with no items, so should be ignored
 (SELECT si.showcase_id FROM showcase_item AS si
   WHERE si.discipline_id = rec.id);

  -- If no showcase items have this discipline,
  -- give it a total of zero

 IF rec.total IS NULL THEN
   SELECT INTO total 0;
 ELSE
   total := rec.total;
 END IF;

   RETURN NEXT;
   END LOOP;

   RETURN;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

test=# SELECT * FROM getShowcaseTotalsByDiscipline();
 name | total
-+---
  writing |130
  visual arts |252
  music   |458
  dance   |131
  film and television |102
  theatre |271
  media arts  | 83
  community   | 20
  fine craft  | 78
  other   | 59
(10 rows)

Works fine, but i'd like to order the disciplines alphabetically
*except* have 'other' fall at the end. So, should i loop a second time,
after summing the totals, and keep the 'other' row aside, then add it to
the end?

(btw, the output of this function is cached until a new showcase is
accepted)

Or, should i re-order the disciplines alphabetically in the lookup
trable, keeping 'other' to be last?

I could do the latter, although it would mean a fair bit of work because
the disciplines table relates to a bunch of other stuff, as well. Also,
there's always the chance that a new discipline will be added in the
future. I suppose i could write a trigger that bumped the 'other' id
above that of the new entry, then re-relate everything else in the DB
that's connected to the 'other' discipline. But that strikes me as kind
of a hack.

The third option is to re-order the resultset in the PHP script that
displays this. But that wasn't why i chose Postgres for this app ;-)

brian

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



It seems to me you could replace it all with one query, something like this:

SELECT discipline, COUNT(1) FROM showcase WHERE EXISTS (SELECT * FROM
showcase_item WHERE showcase_id = showcase.id LIMIT 1) GROUP BY
discipline ORDER BY (discipline != 'other'), discipline;

- Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] finding items with 0 rels for a 0 to many relationship

2007-06-21 Thread Josh Tolley

On 6/21/07, danmcb <[EMAIL PROTECTED]> wrote:

Hi

I have two tables, say A and B,  that have a many-to-many
relationship, implemented in the usual way with a join table A_B.

How can I economically find all the rows in table A whose id's are not
in A_B at all (i.e. they have zero instances of B associated)?


Use a left join. For instance, say there are a.id and b.id columns,
which are the primary keys in A and B respectively. Also say A_B
contains columns aid and bid which reference a.id and b.id
respectively.


SELECT * FROM A LEFT JOIN A_B ON (A.ID = A_B.AID) WHERE A_B.BID IS NULL;


- Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend