Re: [GENERAL] v8.3.4 metadata changes while users active

2012-04-05 Thread Martin Gregorie
On Thu, 2012-04-05 at 03:17 +, Gauthier, Dave wrote:
 H... I don't have root access :-(
 
In that case, ask your sysadmin to grant you sudo access to iptables or,
if he thinks that is excessive,  to write a wrapper script that
enables/disables just that port and give you sudo access to that script.


Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Versioned, chunked documents

2012-04-01 Thread Martin Gregorie
On Mon, 2012-04-02 at 00:38 +0200, Ivan Voras wrote:
 Hi,
 
 I have documents which are divided into chunks, so that the (ordered)
 concatenation of chunks make the whole document. Each of the chunks may
 be edited separately and past versions of the chunks need to be kept.
 
 The structure looks fairly simple:
 
 The first goal is to retrieve the latest version of the whole document,
 made from the latest versions of all chunks, but later the goal will
 also be to fetch the whole version at some point in time (i.e. with
 chunks created before a point in time).
 
 I did the first goal by creating two helper views:
 
 CREATE VIEW documents_chunks_last_version_chunk_ids AS
 SELECT documents_id, max(id), seq FROM documents_chunks GROUP BY
 documents_id, seq;
 
 CREATE VIEW documents_chunks_last_version_content AS
 SELECT documents_chunks.documents_id, content
 FROM documents_chunks
 JOIN documents_chunks_last_version_chunk_ids ON
 documents_chunks.id=documents_chunks_last_version_chunk_ids.max
 ORDER BY documents_chunks_last_version_chunk_ids.seq;
 
 There are indexes on the document_chunks fields seq and documents_id.
 
 Everything looked fine until I examined the output of EXPLAIN ANALYZE
 and saw this:
 
I'm not surprised. First guess: I'd use 
id (FK of documents), seq, ctime 

as the prime key of document_chunk, which would work for your initial
requirement but is far too simplistic to deal with the general
requirement of retrieving a specific document version. You'd probably
need something like:

CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
 ...
 );

Create table version ( 
   version_no serial primary key,
   author var char 40,
   ctime timestamp );

create table document_version (
   id serial references document(id),
   version_number serial references version(version_no),
 primary_key (id, version_no),
);
 
CREATE TABLE documents_chunks (
id SERIAL references document_version(id),
version_number serial references document_version(version_number),
seq serial,
content TEXT,
primary_key(id, version_number, seq)
};

Disclaimer: this is not syntax checked or tested. It may/or may not
match your requirements, but since I haven't seen your ERD or the 3NF
you derived from it I can't offer any more applicable advice.
 

Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Versioned, chunked documents

2012-04-01 Thread Martin Gregorie
Correction interpolated - see below

On Mon, 2012-04-02 at 00:22 +0100, Martin Gregorie wrote:
 On Mon, 2012-04-02 at 00:38 +0200, Ivan Voras wrote:
  Hi,
  
  I have documents which are divided into chunks, so that the (ordered)
  concatenation of chunks make the whole document. Each of the chunks may
  be edited separately and past versions of the chunks need to be kept.
  
  The structure looks fairly simple:
  
  The first goal is to retrieve the latest version of the whole document,
  made from the latest versions of all chunks, but later the goal will
  also be to fetch the whole version at some point in time (i.e. with
  chunks created before a point in time).
  
  I did the first goal by creating two helper views:
  
  CREATE VIEW documents_chunks_last_version_chunk_ids AS
  SELECT documents_id, max(id), seq FROM documents_chunks GROUP BY
  documents_id, seq;
  
  CREATE VIEW documents_chunks_last_version_content AS
  SELECT documents_chunks.documents_id, content
  FROM documents_chunks
  JOIN documents_chunks_last_version_chunk_ids ON
  documents_chunks.id=documents_chunks_last_version_chunk_ids.max
  ORDER BY documents_chunks_last_version_chunk_ids.seq;
  
  There are indexes on the document_chunks fields seq and documents_id.
  
  Everything looked fine until I examined the output of EXPLAIN ANALYZE
  and saw this:
  
 I'm not surprised. First guess: I'd use 
   id (FK of documents), seq, ctime 
 
 as the prime key of document_chunk, which would work for your initial
 requirement but is far too simplistic to deal with the general
 requirement of retrieving a specific document version. You'd probably
 need something like:
 
 CREATE TABLE documents (
 id SERIAL PRIMARY KEY,
 title TEXT NOT NULL,
  ...
  );
 
 Create table version ( 
version_no serial primary key,
author var char 40,
ctime timestamp );
 
 create table document_version (
id serial references document(id),
version_number serial references version(version_no),
  primary_key (id, version_no),
 );
I used a version number in the key because a TIMESTAMP is much too fine
grained unless you're planning to obtain its value before committing all
the changed document_chunks affected by this editing session.

BTW, why use document_chunks when a text field can hold megabytes,
especially if they will be concatenated to form a complete document
which is then edited as a whole item and before being split into chunks
and saved back to the database? If the chunks represent chapters or
other logical sections that are always edited separately, why not name
them to reflect this? 'document_chunk' just sounds too arbitrary to me.

  
 CREATE TABLE documents_chunks (
 id SERIAL references document_version(id),
 version_number serial references document_version(version_number),
 seq serial,
 content TEXT,
 primary_key(id, version_number, seq)
== should be 
 primary_key(id, seq, version_number)# of course!
 };
 

 Disclaimer: this is not syntax checked or tested. It may/or may not
 match your requirements, but since I haven't seen your ERD or the 3NF
 you derived from it I can't offer any more applicable advice.
 

Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help in Parsing PG log usings CSV format

2012-03-22 Thread Martin Gregorie
On Thu, 2012-03-22 at 09:32 +, Arvind Singh wrote:
 Help needed in parsing PostgreSQL CSV Log
 Hello friends,
 I am working an a section of application which needs to Parse CSV Logs 
 generated by PostgreSql server.
 - The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log
 - The Server version in 9.0.4
 - The application is developed in C Sharp 
 * The basic utility after Parse the Log is to show contents in a 
 DataGridView. 
 * There are other filter options like to view log contents for a 
 particular range of Time for a Day.
  
 **However the main problem that is, the Log format is not readable** 
  
 
 A Sample Log data line 
  2012-03-21 11:59:20.640 
  IST,postgres,stock_apals,3276,localhost:1639,4f697540.ccc,10,idle,2012-03-21
   11:59:20 IST,2/163,0,LOG,0,statement: SELECT 
  version()exec_simple_query, .\src\backend\tcop\postgres.c:900,
  
 As you can see the columns in the Log are comma separated , But however 
 individual values  are not Quote Enclosed.
  
 For instance the 1st,4rth,6th .. columns

  
Thats fairly standard. A good CSV parser only requires a field to be
quoted if it contains commas or quotes. In the latter case the internal
quotes should be doubled, i.e the three fields in the following:

  unquoted field,contains commas, etc.,Fred said Cool!.

should be handled correctly by a decent CSV parser.


Martin




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] copy in date string 00-00-00 00:00:00

2012-03-15 Thread Martin Gregorie
On Wed, 2012-03-14 at 21:52 -0700, Mark Phillips wrote:
 I am not familiar with sed, except for some trivial bits I nicked off
 the web. Enough to know it works, and to be dangerous. Nonetheless,
 using SED may be the way to go as there are two tables that contain a
 bit over 3,000,000 rows each. 
 
You should also consider using awk/gawk with the field separator (FS
variable) set to match the one in your input (','). The advantages in
this case are that it can be made to work on specific fields in the CSV
file and not look at the rest, something like:

BEGIN { FS = ','; }  # set field sep to comma
$5 == '-00-00 00:00:00' { $5 = '' }  # empty field 5 if it matches
{print } # output all lines

Disclaimer: this is untested example code


Martin






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to find compiled-in default port number?

2012-03-13 Thread Martin Gregorie
On Tue, 2012-03-13 at 11:16 +0200, Виктор Егоров wrote:
 Greetings.
 
 Is there a way to find out the compiled-in port number?
 
Two ways, with Postgres running:
- Scan the server's ports with nmap.
- as root on the server, run lsof | less and look at the 
  Postgres process(es).

Both are fast, so do both. If lsof shows the PostgreSQL port but nmap
doesn't, then you'll know its being blocked by a firewall.


Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error installing postgresq91-python package

2012-03-12 Thread Martin Gregorie
On Sat, 2012-03-10 at 14:08 -0800, PgSQL wrote:
 If the OP can identify and download the relevant .rpm file
 
Have you checked the CentOS bugzilla to see if the yum failure has been
reported? You should raise a bug report if it isn't there: if nobody
reports a bug its unlikely to get fixed.

 
 I download this packages:
 
When you download rpms manually, you use the rpm command to install
the rpm packages, not yum.

 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-0.137-3.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libs-0.137-3.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libelf-0.137-3.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/elfutils-libs-0.137-3.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/expat-1.95.8-8.3.el5_5.3.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/gmp-4.1.4-10.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/libxml2-2.6.26-2.1.12.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/libxml2-python-2.6.26-2.1.12.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/m2crypto-0.16-8.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-2.4.3-44.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-elementtree-1.2.6-5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-iniparse-0.2.3-4.el5.noarch.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-sqlite-1.1.7-1.2.1.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/python-urlgrabber-3.1.0-6.el5.noarch.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/readline-5.1-3.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-4.4.2.3-22.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-libs-4.4.2.3-22.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/rpm-python-4.4.2.3-22.el5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/sqlite-3.3.6-5.i386.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-3.2.22-37.el5.centos.noarch.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-NetworkManager-dispatcher-1.1.16-16.el5.centos.noarch.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-fastestmirror-1.1.16-16.el5.centos.noarch.rpm
 wget 
 http://mirror.centos.org/centos-5/5.7/os/i386/CentOS/yum-metadata-parser-1.1.2-3.el5.centos.i386.rpm
 
Why did you download these? Only five of them appear to have anything to
do with Python and four to be connected with yum.  

Which, if any, of them contain mx? 

Hint: try using rpm -qip filename.rpm or rpm -qilp filename.rpm to
see what is in them. 

In my Fedora15 installation the mx package is in the mx and mx-devel
packages: running yum info '*mx*' told me that.


Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Matching on keyword or phrases within a field that is delimited with an or operator |

2012-03-12 Thread Martin Gregorie
On Mon, 2012-03-12 at 15:57 -0700, Jim Ostler wrote:
 I have a table that is around 20 GB, so I need to optimize as best as
 possible the matching with another table on keywords across multiple
 fields. I have around 10 fields that have keywords or phrases
 delimited with the or operator  |. So it would be in the form of
  a | b  |  and jack  | cd . There are around 20 keywords or phrases
 per field, and these keywords could be any word. 
 
How static is the list of keywords?
Is there any significance in their order? (I'm guessing there isn't).
How many rows are there in the table, IOW how big is each row?

I wonder if putting the keywords in a separate table with a many:many
relationship with your big table would help. This would retain your
ability to add or remove keywords without affecting the schema. The
selection rules used to combine keywords would also run fast. 

I'm getting good performance from a mail archive where messages are
selected using:
- a 1:M relationship with a subject table
- a M:M relationship with an address table
- range comparison with the date sent (non-indexed field in the message)
- and ILIKE(%phrase%) comparison with the message text

The query is assembled using WHERE clauses for each of the four
selection possibilities listed above before being prepared and issued. A
where clause is only included if the user has specified a value for it.
There is an AND relationship between the clauses. Currently there are
around 130,000 messages in the database. Speed: it took 0.6 seconds to
find 209 messages to or from an address in the last 5 years and
containing the keyword 'f1a'. The same search without the date range
specified ran in 0.86 seconds. It took 16.9 seconds to find the 12331
messages containing 'f1a' when no address or data range were specified.
This is on a dual core, 3.2 GHz Athlon with 4GB of RAM using PG 9.1. 

The other possibility is to use a star schema with each keyword being a
dimension - IOW a traditional data warehouse set-up, but of course this
does require a static list of keywords to be defined.


Martin




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error installing postgresq91-python package

2012-03-10 Thread Martin Gregorie
On Sat, 2012-03-10 at 13:28 -0800, Adrian Klaver wrote:
 On 03/10/2012 09:59 AM, PgSQL wrote:
  Thanks.
 
  You can see:
 
  root@s1 [/000/yuyuyum/2012/2012]# yum install mx
  Traceback (most recent call last):
  File /usr/bin/yum, line 4, in ?
  import yum
  File /usr/lib/python2.4/site-packages/yum/__init__.py, line 50, in ?
  import comps
  File /usr/lib/python2.4/site-packages/yum/comps.py, line 23, in ?
  iterparse = cElementTree.iterparse
  AttributeError: 'module' object has no attribute 'iterparse'
  root@s1 [/000/yuyuyum/2012/2012]#
 
 A yum problem with Python 2.4. A test on my Ubuntu machine confirmed 
 that the iterparse function is not included in cElementTree shipped with 
 Python 2.4 even though it is supported from 2.2+  In my previous post I 
 erroneously said you where on Fedora, not Centos.  Might want to file a 
 bug report with the CentOS folks. Is there another mechanism to install 
 mx, other than yum in the mean time?
 
If the OP can identify and download the relevant .rpm file it should be
possible to install mx with the rpm utility.

The next time yum is used it will complain that the package database was
updated outside of it but that can be ignored: this happens every time I
upgrade Opera, which is distributed for Fedora as a manually applied
RPM.


Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Complex transactions without using plPgSQL Functions. It is possible?

2012-03-07 Thread Martin Gregorie
On Wed, 2012-03-07 at 14:19 +0400, Sergey Konoplev wrote:
 On Tue, Mar 6, 2012 at 11:30 PM, Andre Lopes lopes80an...@gmail.com wrote:
  This is the plPgSQL code that I need to write in Python. It is
  possible to do this without using PlPgSQL?
 
Have you looked at pyodbc?

ODBC will usually accept statements allowing you to turn autocommit off
and to use connection.commit() to group a set of statements into a
transaction. 

Note that there's a documentation comment saying that autocommit
settings are not passed to the driver, However, as the documentation is
still talking about bytea fields it may be out of date so I'd suggest
running a test. Try writing a Python test program that turns autocommit
off and does a couple of inserts followed by a rollback. File a bug if
the inserted data is in the table after a successful run.


Martin




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Fixing the loss of 'template1'

2012-03-07 Thread Martin Gregorie
In the course of migrating from (I think) Postgres 8.4 under Fedora 12
to Postgres 9.1 under Fedora 16 I managed to loose 'template1' and
associated data. The immediate effect is that, although my schema and
data seem to be intact and are working correctly, pg_dumpall refused to
run because it can't find 'template1'.

The sequence was something like this:
1) first step was to bring my updates fully up to date by running
pg_dumpall with output to a USB hard drive and to run the incremental
backup that's part of my application (a mail archive that backs up into
a set of mbox files).

2)installed Fedora 16, including the Postgres 9.1 package, and fully
updated all packages

3)initialised the Postgres user and basic database structures, started
the server.

4)attempted to use pg_restore to recover my database. This threw lots of
errors and was obviously stupid, so I killed it and reloaded the backup
with psql. This ran OK except that some data was incorrectly restored to
my mail messages table because tabs in some messages confused the COPY
command.

5) Started again: I dropped my schema and recreated it before letting my
application restore the database from its mbox files.

6) The next scheduled backup using pg_dumpall failed immediately because
it couldn't find 'template1'.

Observations

a) could the abrupt termination of pg_restore have removed 'template1'
from the database structure?

b) I've seen the confusion between tabs in data and the tabs used as
delimiters by pg_dump cause this problem in the past. It was fixed then,
but seems to have crept back in. 

Question

Is it possible to reinstate 'template1' and all its works without
reinitialising the database from scratch. It was suggested to me that
running initdb might fix it while leaving my data in place though my
source was uncertain about its success. Would this work or is there a
better way to restore 'template1' in a populated database system?


Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fixing the loss of 'template1'

2012-03-07 Thread Martin Gregorie
On Wed, 2012-03-07 at 09:38 -0500, Gary Chambers wrote:
 Martin,
 
  6) The next scheduled backup using pg_dumpall failed immediately because
  it couldn't find 'template1'.
 
 The template1 database is the default database to which pg_dumpall attempts
 to connect.  If you use the -l or --database option, you can change that and
 pg_dumpall will resume functioning as you expect.
 
I've just logged in under postgres and run '\l' - and the databases
postgres, template0 and template1 are still there:

postgres=# \l
  List of databases
   Name|  Owner   | Encoding |   Collate   |Ctype|   Access
privileges   
---+--+--+-+-+---
 postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
   |  |  | | |
postgres=CTc/postgres
 template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres=CTc/postgres+
   |  |  | | |
=c/postgres

So, is this a privilege issue? I don't understand the content of that
somewhat cryptic 'privilege' column. Is it set how you'd expect?

Marti:
==
I got this output:

postgres=# SELECT datname FROM pg_database WHERE datistemplate;
  datname  
---
 template0
 template1
(2 rows)

so it doesn't look like its been renamed.
 

Martin



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general