[GENERAL] optimizing postgres

2007-07-12 Thread lawpoop
Hello all -

I'm working on a postgres project after coming from a MySQL background
( no flames, please :). We are importing fairly large xml datasets
( 10-20 MB of xml files per 'project', currently 5 projects) into the
database for querying.

We are using PHP to create a web interface where users can create
projects and upload their files. We have a parsing routine that gets
the xml data into the database.

The parsing has turned out to be pretty intense. It takes about 10-20
minutes for any project. When we are parsing data, it really slows
down the site's  response. I tested serving static webpages from
apache, endless loops in php , but the choke point seems to be doing
any other query on postgres when constructing a php page during
parsing.

As an example, the original designer specified separate tables for
each project. Since they were all the same structure, I suggested
combining them into a single table with a project_id column, but he
said it would take too long to query. I was suspicious, but I went
with his design anyway.

It turned out he was right for our current set up. When I needed to
empty the project table to re-parse data, doing a cascading delete
could take up to 10 minutes! I cut re-parsing time in half by just
dropping the table and creating a new one. Which was an okay thing to
do, since the table only belonged to one project anyway. But I hate to
think how long it would have taken to do a delete, cascading to child
tables, if all the project data was in a single table.

Since I'm not an expert in Postgres database design, I'm assuming I've
done something sub-optimal. Are there some common techniques for
tuning postgres performance? Do we need beefier hardware?

Or is it a problem with how PHP or apache pools connections to the
database?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] table disk space usage query?

2007-06-29 Thread lawpoop
Hello all -

I was looking for a way to find out how much disk space each table is
using.

I stumbled upon this page ( 
http://www.ffnn.nl/pages/articles/linux/postgresql-tips-and-tricks.php
) which gave me a query to show the number of disk pages per object.
Given that a page is 8kb, I added these calculated columns to the
query:

SELECT relname, reltuples, relpages,
relpages * 8 AS relpagesKB,
(relpages * 8 )/1024 AS relpagesMB
FROM pg_class ORDER BY relpages DESC ;

Is this correct?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] persistent db connections in PHP

2007-06-18 Thread lawpoop
This seems to be a problem with PHP, or at least my set up.

I'm writing pages in basically the same way. Each page has an include
at the top that gets you a database session. The function, either
pg_connect() or mysql_connect(), is supposed to either create a new
connection, or return your existing one.

So after I have a connection, I can navigate to other pages, reload or
post to the current one, trigger the x_connect(), and get the session
I created earlier.

In my Mysql site, if I create temporary tables, I still have access to
them after I have traversed a mysql_connect. So it looks like PHP is
giving me the connection I had when I created the temp tables.

However, with this new Postgres site, I don't have access to my temp
tables after I've traversed another pg_connect. So PHP is either
creating a new connection, or giving me another session, not the one
which I created my tables in.

Steve


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] persistent db connections in PHP

2007-06-16 Thread lawpoop
Hello all!

I'm working on a PHP site using Postgres as a back-end. I have an
include at the top of each page that runs the pg_connect function.

I'm implementing some temporary tables, which I understand are
destroyed automatically at the end of the session. It seems to me that
when I navigate to a new page, or re-load my existing page, I lose my
temporary tables. Is this because I am creating a new session at the
beginning of each page with my pg_connect include?

Also, are temporary tables globally visible, or only within that
session? I mean, can one user see another user's temporary tables?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] using subselects

2007-06-07 Thread lawpoop
Hello everyone -

I'm moving from MySQL to Postgres and I am in a situation where I am
trying to use subselects properly.

I have a table of projects, users, and user_projects. The table
user_projects creates a many-to-many relationship between users and
projects.

I'm creating a select list on a web form for adding a new user to a
project. I want to select users from the user table, who aren't
already listed in the join table under that project.

Here's my select:
SELECT * FROM users
WHERE user_id <> $current_user_id
AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
$project_id )

This query returns no rows, even on projects that have no records in
the user_projects table!

I am certain that I am not the $current_user_id. If I run this query:
SELECT * FROM users
WHERE user_id <> $current_user_id

I get all the user records besides myself.


What am I doing wrong?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] collision in serial numbers after INSERT?

2007-06-01 Thread lawpoop
Hello all -

I'm working on a site with PHP and Postgres, coming from a MySQL
background.

I was looking for an equivalent to the mysql_insert_id() function, and
a site recommended this:

function postg_insert_id($tablename, $fieldname)
{
 global connection_id;
 $result=pg_exec($connection_id, "SELECT last_value FROM ${tablename}_
${fieldname}_seq");
 $seq_array=pg_fetch_row($result, 0);
 return $seq_array[0];
}

It relies on pg's sequencing ability.

However, I wondered, if I were in an environment where there were many
concurrent inserts, would it be possible that I didn't get the serial
number of the insert that *I* just did? That if I do an insert, and
several inserts happen after mine, wouldn't I get the id of the latest
row, which is several inserts after mine?

I don't think this would be a problem in our environment, but I am
just wondering.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/