Re: [GENERAL] Need suggestion

2011-06-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Following up on myself:

Just stumbled upon this in the hackers mailing list, which might be
interesting to you, since it highlights pros & cons of current
implementations:

  

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN6HEPBcgs9XrR2kYRAsg0AJ4o2fLheYZQAhpKE7cd7LWEOJc2vwCfUvnu
+Skz5eZti3cdDoode6Zu6s4=
=ImVK
-END PGP SIGNATURE-

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Jun 01, 2011 at 10:08:14AM +0200, Carl von Clausewitz wrote:
> Hello Everyone,
> 
> I got a new project, with 100 user in Europe. In this case, I need to handle
> production and sales processes an its documentations in PostgreSQL with PHP.

Something to consider too -- if you decide to store the big objects in
the database, that is -- is PostgreSQL's large object interface
. The
problems with backup someone else mentioned in this thread would remain,
but you wouldn't have large blobs of data clobbering your "regular"
queries. You could pass the scans and pics piecemeal between client and
database without having to store them in the middleware (which may be an
advantage or a disadvantage, mind you).

Don't know whether PHP has bindings for that, though.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN6G38Bcgs9XrR2kYRAmOyAJwIGwk57tH5X8V4uEV5c3peQv7aKACfZ+Tm
9ogbAeWTKwxM2/o7aKz9kbc=
=MMDN
-END PGP SIGNATURE-

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Hidden Risk w/ UPDATE Cascade and Trigger-Based Validation

2011-06-02 Thread David Johnston
Hi,

 

I am trying to get a better understanding of how the following Foreign Keys
with Update Cascades and validation trigger interact.  The basic setup is a
permission table where the two permission parts share a common
"group/parent" which is embedded into their id/PK and which change via the
FK cascade mechanism.  Rest of my thoughts and questions follow the setup.

 

I have the following schema (parts omitted/simplified for brevity since
everything works as expected)

 

CREATE TABLE userstorepermission (

userid text NOT NULL FK UPDATE CASCADE,

storeid text NOT NULL FK UPDATE CASCADE,

PRIMARY KEY (userid, storeid)

);

 

FUNCTION validate() RETURNS trigger AS

SELECT groupid FROM store WHERE storeid = [NEW.storeid] INTO storegroup

SELECT groupid FROM user WHERE userid = [NEW.userid] INTO usergroup

 

RAISE NOTICE 'Validating User Store Permission U:%;%, S:%;%', NEW.userid,
usergroup, NEW.storeid, storegroup;

 

IF (usergroup <> storegroup) THEN

RAISE NOTICE 'Disallow';

RETURN null;

ELSE

RAISE NOTICE 'Allow';

RETURN NEW;

 

END;

 

CREATE TRIGGER INSERT OR UPDATE EXECUTE validate();

 

Basically if I change the groupid both the userid and storeid values in
userstorepermission will change as well.  This is desired.  When I do update
the shared groupid the following NOTICES are raised from the validation
function above:

 

The change for groupid was TESTSGB -> TESTSGD:

 

NOTICE:  Validating User Store Permission U:tester@TESTSGB;
S:[TESTSGD]STORE01;TESTSGD [at this point apparently both user and store
have been updated and storeid in the permission table is being change]

CONTEXT:  SQL statement "UPDATE ONLY "domain"."userstorepermission" SET
"s_id" = $1 WHERE $2 OPERATOR(pg_catalog.=) "s_id""

 

NOTICE:  Allow

CONTEXT:  SQL statement "UPDATE ONLY "domain"."userstorepermission" SET
"s_id" = $1 WHERE $2 OPERATOR(pg_catalog.=) "s_id""

 

NOTICE:  Validating User Store Permission U:tester@TESTSGD;TESTSGD
S:[TESTSGD]STORE01;TESTSGD [and now the userid in the permission table gets
its turn]

CONTEXT:  SQL statement "UPDATE ONLY "domain"."userstorepermission" SET
"u_id" = $1 WHERE $2 OPERATOR(pg_catalog.=) "u_id""

 

NOTICE:  Allow

CONTEXT:  SQL statement "UPDATE ONLY "domain"."userstorepermission" SET
"u_id" = $1 WHERE $2 OPERATOR(pg_catalog.=) "u_id""

 

The end result is that both values are changed as desired but the notices,
while they indirectly make sense (only one of the values can be update
cascaded at a time), are somewhat confusing and thus I am not sure if I am
possibly missing something that could eventually blow up in my face.  I
expect other similar situations will present themselves in my model so I
want to get more understanding on at least whether what I am doing is safe
and ideally whether the CASCADE rules possibly relax intra-process
enforcement of constraints in order to allow this kind of multi-column key
update to succeed.

 

I see BUG #5505 from January of last year where Tom confirms that the
trigger will fire but never addresses the second point about the referential
integrity check NOT FAILING since the example's table_2 contains a value not
present in table_1.

 

Conceptually, as long as I consistently update ALL the relevant FKs the
initial and resulting state should remain consistent but only with a
different value.  I'll probably do some more playing with "missing" a FK
Update Cascade and see whether the proper failures occurs but regardless
some thoughts and/or pointers are welcomed.

 

Thanks,

 

David J.

 

 

 

 



Re: [GENERAL] invalid byte sequence for encoding "UTF8"

2011-06-02 Thread Derrick Rice
That specific character sequence is a result of Unicode implementations
prior to 6.0 mixing with later implementations.  See here:

http://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character

You could replace that sequence with the correct 0xFFFD sequence with `sed`
for example (if using a plaintext dump format).

On Thu, Jun 2, 2011 at 4:17 PM, BRUSSER Michael wrote:

>  We upgrading some old  database (7.3.10 to 8.4.4).   This involves
> running pg_dump on the old db
>
> and loading the datafile to the new db.  If this matters we do not use
> pg_restore, the dump file is just sourced with psql,
>
> and this is where I ran into problem:
>
>
>
> psql: .../postgresql_archive.src/... ERROR:  invalid byte sequence for
> encoding "UTF8": 0xedbebf
>
> HINT:  This error can also happen if the byte sequence does not match the
> encoding
>
> expected by the server, which is controlled by "client_encoding".
>
>
>
> The server and client encoding are both Unicode. I think we may have some
> copy/paste MS-Word markup
>
> and possibly other odd things  on the old database.  All this junk is found
> on the ‘text’ fields.
>
>
>
> I found a number of related postings, but did not see a good solution.
> Some folks suggested cleaning the datafile prior to loading,
>
> while someone else did essentially the same thing on the database before
> dumping it.
>
> I am looking for advice, hopefully the “best technique” if there is one,
>   any suggestion is appreciated.
>
>
>
> Thanks,
>
> Michael.
>
>
>
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential and/or
> privileged.
>
> If you are not one of the named recipients or have received this email in
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
>
> (iii) Dassault Systemes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
>
>  For other languages, go to http://www.3ds.com/terms/email-disclaimer
>


[GENERAL] invalid byte sequence for encoding "UTF8"

2011-06-02 Thread BRUSSER Michael
We upgrading some old  database (7.3.10 to 8.4.4).   This involves running 
pg_dump on the old db
and loading the datafile to the new db.  If this matters we do not use 
pg_restore, the dump file is just sourced with psql,
and this is where I ran into problem:

psql: .../postgresql_archive.src/... ERROR:  invalid byte sequence for encoding 
"UTF8": 0xedbebf
HINT:  This error can also happen if the byte sequence does not match the 
encoding
expected by the server, which is controlled by "client_encoding".

The server and client encoding are both Unicode. I think we may have some 
copy/paste MS-Word markup
and possibly other odd things  on the old database.  All this junk is found on 
the 'text' fields.

I found a number of related postings, but did not see a good solution.  Some 
folks suggested cleaning the datafile prior to loading,
while someone else did essentially the same thing on the database before 
dumping it.
I am looking for advice, hopefully the "best technique" if there is one,   any 
suggestion is appreciated.

Thanks,
Michael.


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer


Re: [GENERAL] Postgres 8.3.5 - ECPG and the use of descriptors and cursors in multi-threaded programs

2011-06-02 Thread Bosco Rama
Merlin Moncure wrote:
> On Tue, May 31, 2011 at 7:35 PM, Bosco Rama  wrote:
>> Unfortunately, like you, I am just a user of this wonderful DB.  Since
>> we are not seeing any other input here on the 'general' list it may be
>> time to move this thread to the pgsql-interfaces list.  Are you subscribed
>> to it?  It is a very low bandwidth list but it does tend to highlight the
>> interface issues distinct from the general DB discussions.
> 
> hm, iirc pg-interfaces is deprecated.

There was discussion of that some time ago.  I'm not sure what the final
decision was.  I still get the occasional message on that list.  And in
the past, messages sent to that list got some sort of attention.  It
seems that ecpg gets lost in the crowd here on the general list.  I'm not
sure if this is because of the ecpg folks not being subscribed to general
(which I highly doubt since I see Tom here, though I don't see Michael) or
if it's due to the different SNR.

Bosco.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread Leif Biberg Kristensen
On Thursday 2. June 2011 18.58.23 Gauthier, Dave wrote:
> Hi:
> 
> I'd like to pass a parameter into an inline psql call that itself calls an
> sql script, something like...
> 
> psql mydb -c "\i thesqlscript foo"
> 
> Where"foo" is the value I want to pass in.
> 
> Just as good would be the ability to sniff out an environment variable from
> within the sql script ("thesqlscript" in the example above).  In perl, I
> would use $ENV{VARNAME}.  Is there something like that in Postgres SQL?
> 
> V8.3.4 on Linux (upgrading to v9 very soon).
> 
> Thanks for any ideas !

Personally I prefer to write a small wrapper in Perl for interaction with 
Postgres from the command line. Here's a boilerplate:

#! /usr/bin/perl

use strict;
use DBI;

my $val = shift;
if ((!$val) || !($val =~ /^\d+$/)) {
print "Bad or missing parameter $val\n";
exit;
}
my $dbh = DBI->connect("dbi:Pg:dbname=mydb", '', '',
{AutoCommit => 1}) or die $DBI::errstr;
my $sth = $dbh->prepare("SELECT foo(?)");
while (my $text = ) {
chomp($text);
$sth->execute($val);
my $retval = $sth->fetch()->[0];
if ($retval < 0) {
$retval = abs($retval);
print "Duplicate of $retval, not added.\n";
}
else {
print "$retval added.\n";
}
}
$sth->finish;
$dbh->disconnect;

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread Bosco Rama
Gauthier, Dave wrote:
> 
> I'd like to pass a parameter into an inline psql call that itself
> calls an sql script, something like...
> 
> psql mydb -c "\i thesqlscript foo"
> 
> Where"foo" is the value I want to pass in.

You may want to use the --set or --variable options of psql and then
reference the variable name in thesqlscript.

So the psql becomes:
   psql --set 'var=foo' -c '\i thesqlscript'

and then in thesqlscript:
   update table set column = :var;

HTH

Bosco.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread John R Pierce

On 06/02/11 9:58 AM, Gauthier, Dave wrote:


Hi:

I'd like to pass a parameter into an inline psql call that itself 
calls an sql script, something like...


psql mydb -c "\i thesqlscript foo"

Where"foo" is the value I want to pass in.



on the psql command line,
-v name=value
or
 --set name=value

then in your script, use :name if you want to use "value" as a sql 
identifier and (in 9.x), you can use :'value'  if you want to use 
'value' as a string literal.




--
john r pierceN 37, W 123
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread Gauthier, Dave
Hi:

I'd like to pass a parameter into an inline psql call that itself calls an sql 
script, something like...

psql mydb -c "\i thesqlscript foo"

Where"foo" is the value I want to pass in.

Just as good would be the ability to sniff out an environment variable from 
within the sql script ("thesqlscript" in the example above).  In perl, I would 
use $ENV{VARNAME}.  Is there something like that in Postgres SQL?

V8.3.4 on Linux (upgrading to v9 very soon).

Thanks for any ideas !


Re: [GENERAL] Need suggestion

2011-06-02 Thread John R Pierce

On 06/02/11 2:19 AM, Ognjen Blagojevic wrote:

1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving 


with many terabytes of large file data accumulating, the database will 
become very unweildy to do any maintenance on.  a simple pg_dump will 
take many hours vs a few minutes.


I would almost certainly use a filesystem for an app like this, and just 
store the metadata in the database.


--
john r pierceN 37, W 123
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread akp geek
The only problem I am seeing with dates as you mentioned. when I export the
data to csv the date is getting the format of 8/1/1955 0:00:00 , but
postgres not accepting that. Any clues?

Regards

On Thu, Jun 2, 2011 at 11:23 AM, Vick Khera  wrote:

> On Thu, Jun 2, 2011 at 10:01 AM, akp geek  wrote:
> > Thanks so much . I was using bullzip What I felt with Bullzip was it is
> good
> > for less amount of data.  I have 2 tables each of which has 2.5 million
> > records.  For me it is taking for ever, The job that I set up has been
> > running since 12 hours.
>
> Export to CSV or tab delimited file, then suck it in with a COPY
> statement in postgres.  Just make sure that there is no "invalid" data
> like fake dates.  2.5 million rows should take a couple of minutes
> tops to insert into a modern hardware server.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Postgres 8.3.5 - ECPG and the use of descriptors and cursors in multi-threaded programs

2011-06-02 Thread Merlin Moncure
On Tue, May 31, 2011 at 7:35 PM, Bosco Rama  wrote:
> Unfortunately, like you, I am just a user of this wonderful DB.  Since
> we are not seeing any other input here on the 'general' list it may be
> time to move this thread to the pgsql-interfaces list.  Are you subscribed
> to it?  It is a very low bandwidth list but it does tend to highlight the
> interface issues distinct from the general DB discussions.

hm, iirc pg-interfaces is deprecated.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread Vick Khera
On Thu, Jun 2, 2011 at 12:01 PM, akp geek  wrote:
> The only problem I am seeing with dates as you mentioned. when I export the
> data to csv the date is getting the format of 8/1/1955 0:00:00 , but
> postgres not accepting that. Any clues?

Should work:


test=> select '8/1/1955 0:00:00'::date;
date

 1955-08-01
(1 row)

Time: 0.325 ms
test=> select '8/1/1955 0:00:00'::timestamp;
  timestamp
-
 1955-08-01 00:00:00
(1 row)


at worst I guess you run the export file thru an editing filter that
removes the '0:00:00' off the date column.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread Ben Chobot
On Jun 1, 2011, at 1:08 AM, Carl von Clausewitz wrote:

> Hello Everyone,
> 
> I got a new project, with 100 user in Europe. In this case, I need to handle 
> production and sales processes an its documentations in PostgreSQL with PHP. 
> The load of the sales process is negligible, but every user produces 2 
> transaction in the production process, with 10-30 scanned documents (each are 
> 400kb - 800kb), and 30-50 high resolution pictures (each are 3-8 MB), and 
> they wanted to upload it to 'somewhere'. 'Somewhere' could be the server 
> files system, and a link in the PostgreSQL database for the location of the 
> files (with some metadata), or it could be the PostgreSQL database. 
> 
> My question is that: what is your opinion about to store the scanned 
> documentation and the pictures in the database? This is a huge amount of data 
> (between daily 188MB and 800MB data, average year is about 1 TB data), but is 
> must be searchable, and any document must be retrieved within 1 hour. Every 
> documentations must be stored for up to 5 years... It means the database 
> could be about 6-7 TB large after 5 years, and then we can start to archive 
> documents. Any other data size is negligible. 
> 
> If you suggest, to store all of the data in PostgreSQL, what is your 
> recommendation about table, index structure, clustering, archiving? 

So, you're mostly storing ~1TB of images/year? That doesn't seem so bad. How 
will the documents be searched? Will their contents be OCR'd out and put into a 
full text search? How many searches will be going on?

If you're asking whether or not it makes sense to store 7TB of images in the 
database, as opposed to storing links to those images and keeping the images 
themselves on a normal filesystem, there's no clear answer. Check the archives 
for pros and cons of each method.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread Ognjen Blagojevic

Carl,

Please keep in mind I am not a Postgres expert nor consultant, I'm just 
sharing my experience. I would also like to hear the opinion of other 
people who worked on projects with similar database sizes.


I would keep all files in the single table -- most probably they will be 
served to the user by the same code (e.g. file download servlet or 
something similar) so it is good if all relevant info is in one table, 
something like:


file {
 id
 mime_type
 name
 content bytea
 creation_date datetime
 modification_date datetime
 creation_user
 modification_user
}

Since both image and document inherits file, you may choose any of the 
common RDB inheritance modeling strategies (one table per hierarchy, one 
table per class...), but since there is just a few fields, I would put 
everything in the same table.


Consider cardinality between cases and files/users. Can one file be 
related with two cases and so on...


Toast table will be splitted in 1GB pieces.

Create indexes considering ways your users will browse or search data.

Regards,
Ognjen


On 2.6.2011 12:22, Carl von Clausewitz wrote:

Dear Ognjen,

thank you - that was my idea too, but I've never seen such a workload
like this. The docu's (which are not processed by any ocr hopefully) and
the pictures are not indexed off course, just some metadatas, which are
related to the exact docu, or pic For example:

productions_docu1:
-sent date
-recieved date
-type
-owner
-case_id
-etc

image_001:
-picturetaken date
-case_id
-image_type

Just these metadatas need to be searched. My questions about the
structure was like this: do you recommend, to store the images and the
docu's in a same table (CREATE TABLE docu_img_store (id
BIGSERIAL, case_id BIGINT, content_type INTEGER, content bytea), or
store it in two different tables? Is there any special settings while
table creations, that I have to set for optimal work (like index,
storage parameter, toast, etc).

(:-) I know, that this project could be a high value revenue for any DB
consultancy related company, but this is a small country, with small
project fees, and I'm employee, not a contractor at my company :-)

Thanks you in advance,
Regards,
Carl


2011/6/2 Ognjen Blagojevic mailto:ognjen.d.blagoje...@gmail.com>>

Carl,

I don't have experience with that big databases, but I did both
solutions, and here are pros of both of them:

1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving

I slightly prefer option no. 2, since transaction handling, rollback
and ref. integrity is not so easy to implement when you have two
different storage systems (FS and RDB).

As for indexes and tables it is not clear form your message whether
you need just a regular search (field LIKE 'something'), full text
search of metadata, or full text search of scanned documents (in
case they are OCRed).

Regards,
Ognjen



On 1.6.2011 10:08, Carl von Clausewitz wrote:

Hello Everyone,

I got a new project, with 100 user in Europe. In this case, I
need to
handle production and sales processes an its documentations in
PostgreSQL with PHP. The load of the sales process is
negligible, but
every user produces 2 transaction in the production process,
with 10-30
scanned documents (each are 400kb - 800kb), and 30-50 high
resolution
pictures (each are 3-8 MB), and they wanted to upload it to
'somewhere'.
'Somewhere' could be the server files system, and a link in
the PostgreSQL database for the location of the files (with some
metadata), or it could be the PostgreSQL database.

My question is that: what is your opinion about to store the scanned
documentation and the pictures in the database? This is a huge
amount of
data (between daily 188MB and 800MB data, average year is about 1 TB
data), but is must be searchable, and any document must be retrieved
within 1 hour. Every documentations must be stored for up to 5
years...
It means the database could be about 6-7 TB large after 5 years, and
then we can start to archive documents. Any other data size is
negligible.

If you suggest, to store all of the data in PostgreSQL, what is your
recommendation about table, index structure, clustering, archiving?

Thank you in advance!
Regards,
Carl



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Sent via pgsql-general mailing list (pgsql-general@postg

Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread Vick Khera
On Thu, Jun 2, 2011 at 10:01 AM, akp geek  wrote:
> Thanks so much . I was using bullzip What I felt with Bullzip was it is good
> for less amount of data.  I have 2 tables each of which has 2.5 million
> records.  For me it is taking for ever, The job that I set up has been
> running since 12 hours.

Export to CSV or tab delimited file, then suck it in with a COPY
statement in postgres.  Just make sure that there is no "invalid" data
like fake dates.  2.5 million rows should take a couple of minutes
tops to insert into a modern hardware server.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread akp geek
Thanks so much . I was using bullzip What I felt with Bullzip was it is good
for less amount of data.  I have 2 tables each of which has 2.5 million
records.  For me it is taking for ever, The job that I set up has been
running since 12 hours.

I appreciate if you could share the VBA code that you were mentioning. I
would love to use that to make the data transfer faster

Regards



On Thu, Jun 2, 2011 at 9:32 AM, Thomas Harold wrote:

> On 5/25/2011 3:42 PM, akp geek wrote:
>
>> Dear all -
>>
>> I would like to know if any one has migrated database from
>> MS access to Postgres . We use postgres 9.0.2 on solaris . Are there any
>> open source tools that you have used to do this task. Can you please
>> share your experiences ?
>>
>>
> I rolled my own.
>
> If the number of rows in the MDB table is not that many (under 100k), then
> I'll create a new table up on pgsql, link to it with the ODBC driver, and
> append from the source table to the pgsql table.  You can get away with
> larger appends if both systems are on the same network.
>
> If it was a table with a few million rows, then I wrote a little VBA
> snippet that created a pgdump compatible SQL text file from the source data.
>  To figure out the format, I just pgdump'd an existing table from
> PostgreSQL, then patterned my SQL file after it.  While it was extremely
> fast at doing the conversion (both generating the SQL file and the time it
> took for pgdump to process the SQL file), I only recommend that method for
> cases where you have millions and millions of rows.  Or a lot of identical
> tables.
>
> (The VBA module was about 100-150 lines of code in total.)
>


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread Thomas Harold

On 5/25/2011 3:42 PM, akp geek wrote:

Dear all -

 I would like to know if any one has migrated database from
MS access to Postgres . We use postgres 9.0.2 on solaris . Are there any
open source tools that you have used to do this task. Can you please
share your experiences ?



I rolled my own.

If the number of rows in the MDB table is not that many (under 100k), 
then I'll create a new table up on pgsql, link to it with the ODBC 
driver, and append from the source table to the pgsql table.  You can 
get away with larger appends if both systems are on the same network.


If it was a table with a few million rows, then I wrote a little VBA 
snippet that created a pgdump compatible SQL text file from the source 
data.  To figure out the format, I just pgdump'd an existing table from 
PostgreSQL, then patterned my SQL file after it.  While it was extremely 
fast at doing the conversion (both generating the SQL file and the time 
it took for pgdump to process the SQL file), I only recommend that 
method for cases where you have millions and millions of rows.  Or a lot 
of identical tables.


(The VBA module was about 100-150 lines of code in total.)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] using jboss with ident auth

2011-06-02 Thread eyal edri
Hi,

I'm trying to use postgres with ident auth and jboss.

here's my postgres files:

pg_hba.conf:

# TYPE  DATABASEUSERCIDR-ADDRESS  METHOD

# "local" is for Unix domain socket connections only
local   all all  ident map=vmap
# IPv4 local connections:
hostall all 127.0.0.1/32  ident map=vmap
# IPv6 local connections:
hostall all ::1/128   ident map=vmap


pg_ident.conf:

# MAPNAME SYSTEM-USERNAMEPG-USERNAME
vmap postgres  postgres
vmap rootpostgres

i've set the postres password to 'postgres' and updated the postres-ds.xml
file under jboss:



DataSource

jdbc:postgresql://localhost:5432/dbname
org.postgresql.Driver
postgres
postgres
100
select
1



when i try to run any command that uses the xml file for auth, it failes
with:

FATAL: Ident authentication failed for user "postgres"

when i use 'password' or 'md5' it works.


btw, i would have used password or md5 if there was a away to run psql
commands with password unattended (without password prompt..).

any ideas?

thanks!!


Re: [GENERAL] Table with active and historical data

2011-06-02 Thread salah jubeh
Hello,

I think, one good thing to do is partionning, you have already mentioned that 
in 
your mail, 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

try to run also vaccuum command it might help in increasing the performance.

create a boolean flag i.e. active and create an index on it. I think this will 
be faster than having index on date field

Regards
 





From: Robert James 
To: Postgres General 
Cc: srobertja...@gmail.com
Sent: Thu, June 2, 2011 1:30:11 AM
Subject: [GENERAL] Table with active and historical data

I have a table with a little active data and a lot of historical data.
I'd like to be able to access the active data very quickly - quicker
than an index.  Here are the details:

1. Table has about 1 million records
2. Has a column active_date - on a given date, only about 1% are
active.  active_date is indexed and clustered on.
3. Many of my queries are WHERE active_date = today.  Postgres uses
the index for these, but still lakes quite a lot of time.  I repeat
these queries regularly.
4. I'd like to somehow presort or partition the data so that Postgres
doesn't have to do an index scan each time.  I'm not sure how to do
this? Idea?  I know it can be done with inheritance and triggers (
http://stackoverflow.com/questions/994882/what-is-a-good-way-to-horizontal-shard-in-postgresql

), but that method looks a little too complex for me.  I'm looking for
something simple.
5. Finally, I should point out that I still do a large number of
queries on historical data as well.

What do you recommend? Ideas? Also: Why doesn't cluster on active_date
solve the problem? Specifically, if I run SELECT * FROM full_table
WHERE active_date = today, I get a cost of 3500.  If I first select
those records into a new table, and then do SELECT * on the new table,
I get a cost of 64.  Why is that? Why doesn't clustering pregroup
them?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread Carl von Clausewitz
Dear Ognjen,

thank you - that was my idea too, but I've never seen such a workload like
this. The docu's (which are not processed by any ocr hopefully) and the
pictures are not indexed off course, just some metadatas, which are related
to the exact docu, or pic For example:

productions_docu1:
-sent date
-recieved date
-type
-owner
-case_id
-etc

image_001:
-picturetaken date
-case_id
-image_type

Just these metadatas need to be searched. My questions about the structure
was like this: do you recommend, to store the images and the docu's in a
same table (CREATE TABLE docu_img_store (id BIGSERIAL, case_id
BIGINT, content_type INTEGER, content bytea), or store it in two different
tables? Is there any special settings while table creations, that I have to
set for optimal work (like index, storage parameter, toast, etc).

(:-) I know, that this project could be a high value revenue for any DB
consultancy related company, but this is a small country, with small project
fees, and I'm employee, not a contractor at my company :-)

Thanks you in advance,
Regards,
Carl


2011/6/2 Ognjen Blagojevic 

> Carl,
>
> I don't have experience with that big databases, but I did both solutions,
> and here are pros of both of them:
>
> 1. Files stored on the filesystem:
> - Small database footprint
> - Faster backup, export and import
>
> 2. Files stored in the database
> - RDBMS takes care of transactions and ref. int.
> - Slower backup, export and import but all done in one step
> - Easier continuous archiving
>
> I slightly prefer option no. 2, since transaction handling, rollback and
> ref. integrity is not so easy to implement when you have two different
> storage systems (FS and RDB).
>
> As for indexes and tables it is not clear form your message whether you
> need just a regular search (field LIKE 'something'), full text search of
> metadata, or full text search of scanned documents (in case they are OCRed).
>
> Regards,
> Ognjen
>
>
>
> On 1.6.2011 10:08, Carl von Clausewitz wrote:
>
>> Hello Everyone,
>>
>> I got a new project, with 100 user in Europe. In this case, I need to
>> handle production and sales processes an its documentations in
>> PostgreSQL with PHP. The load of the sales process is negligible, but
>> every user produces 2 transaction in the production process, with 10-30
>> scanned documents (each are 400kb - 800kb), and 30-50 high resolution
>> pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'.
>> 'Somewhere' could be the server files system, and a link in
>> the PostgreSQL database for the location of the files (with some
>> metadata), or it could be the PostgreSQL database.
>>
>> My question is that: what is your opinion about to store the scanned
>> documentation and the pictures in the database? This is a huge amount of
>> data (between daily 188MB and 800MB data, average year is about 1 TB
>> data), but is must be searchable, and any document must be retrieved
>> within 1 hour. Every documentations must be stored for up to 5 years...
>> It means the database could be about 6-7 TB large after 5 years, and
>> then we can start to archive documents. Any other data size is negligible.
>>
>> If you suggest, to store all of the data in PostgreSQL, what is your
>> recommendation about table, index structure, clustering, archiving?
>>
>> Thank you in advance!
>> Regards,
>> Carl
>>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Need suggestion

2011-06-02 Thread Ognjen Blagojevic

Carl,

I don't have experience with that big databases, but I did both 
solutions, and here are pros of both of them:


1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving

I slightly prefer option no. 2, since transaction handling, rollback and 
ref. integrity is not so easy to implement when you have two different 
storage systems (FS and RDB).


As for indexes and tables it is not clear form your message whether you 
need just a regular search (field LIKE 'something'), full text search of 
metadata, or full text search of scanned documents (in case they are OCRed).


Regards,
Ognjen


On 1.6.2011 10:08, Carl von Clausewitz wrote:

Hello Everyone,

I got a new project, with 100 user in Europe. In this case, I need to
handle production and sales processes an its documentations in
PostgreSQL with PHP. The load of the sales process is negligible, but
every user produces 2 transaction in the production process, with 10-30
scanned documents (each are 400kb - 800kb), and 30-50 high resolution
pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'.
'Somewhere' could be the server files system, and a link in
the PostgreSQL database for the location of the files (with some
metadata), or it could be the PostgreSQL database.

My question is that: what is your opinion about to store the scanned
documentation and the pictures in the database? This is a huge amount of
data (between daily 188MB and 800MB data, average year is about 1 TB
data), but is must be searchable, and any document must be retrieved
within 1 hour. Every documentations must be stored for up to 5 years...
It means the database could be about 6-7 TB large after 5 years, and
then we can start to archive documents. Any other data size is negligible.

If you suggest, to store all of the data in PostgreSQL, what is your
recommendation about table, index structure, clustering, archiving?

Thank you in advance!
Regards,
Carl



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question about configuration and SSD

2011-06-02 Thread Craig Ringer
On 02/06/11 16:26, Szymon Guz wrote:
> Hi,
> do we need some special configuration for SSD drives, or is that enough
> to treat those drives normally?

Make sure the SSDs have a supercapacitor or battery backup for their
write cache. If they do not, then do not use them unless you can disable
write caching completely (probably resulting in horrible performance),
because you WILL get a corrupt database when power fails.

If the SSDs have a supercap or a battery backed write cache so that they
can guarantee that all cached data will be written out if the power goes
down, you won't need any special configuration. You may want to tune
differently for best performance, though - for example, reducing
random_page_cost .

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Question about configuration and SSD

2011-06-02 Thread Szymon Guz
Hi,
do we need some special configuration for SSD drives, or is that enough to
treat those drives normally?


regards
Szymon


Re: [GENERAL] Returning from insert on view

2011-06-02 Thread Dean Rasheed
On 1 June 2011 10:32, Aleksey Chirkin  wrote:
> Hello!
>
> I need your advice.
> My problem is to ensure that the right returning from insert on the view.
>
> For example, I have two tables:
>
> CREATE TABLE country (id serial, nm text);
> CREATE TABLE city (id serial, country_id integer, nm text);
>
> And one view on table "city", which join table "county" and adds
> country_nm column.
>
> CREATE VIEW city_view AS
>  SELECT city.id, city.nm, city.country_id, country.nm AS country_nm
>    FROM city
>    JOIN country ON city.country_id = country.id;
>
> I have two options for ensuring the returning from insert operation on view:
>
> 1) Create rule:
>
> CREATE RULE ins AS ON INSERT
>  TO city_view
>  DO INSTEAD
>  INSERT INTO city (nm, country_id) VALUES (NEW.nm, NEW.country_id)
>    RETURNING id, nm, country_id,
>      (SELECT nm FROM country WHERE id = city.country_id) AS country_nm;
>
> 2) Create trigger on view (for PostgreSQL 9.1):
>
> CREATE FUNCTION city_view_insert()
>  RETURNS trigger AS
> $BODY$
> BEGIN
>  INSERT INTO city
>    ( nm, country_id )
>    VALUES ( NEW.nm, NEW.country_id )
>    RETURNING id INTO NEW.id;
>
>  SELECT * INTO NEW FROM city_view WHERE id = NEW.id;
>
>  RETURN NEW;
> END;
> $BODY$
> LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER;
>
> CREATE TRIGGER on_insert INSTEAD OF INSERT  ON city_view
>  FOR EACH ROW
>  EXECUTE PROCEDURE city_view_insert();
>
> It looks like a trick, and slows the insert, but it looks better and
> avoids the complexities in returning.
>
> Perhaps there is another way (may be a trick) to ensure the returning
> from the insert on the view, without a manual compilation of the
> returning columns?
>

Selecting from the view at the end of the trigger will be slower, so
if performance is a factor it would be better to just select the
required columns from the underlying table, but I'm not aware of any
trick to avoid listing the columns.

The trigger has greater scope for flexibility and validation of the
input data - maybe country names are supplied, which the trigger could
validate and get the corresponding ids. Maybe both are supplied, and
the trigger could check they are consistent, etc...

For bulk operations the rule should out-perform the trigger, since it
is just a query rewrite (like a macro definition). However, there are
a lot more gotchas when it comes to writing rules. So the main
advantages of the trigger are that it is less error-prone, and it is
easier to write complex logic in a procedural language.

Regards,
Dean



> Regards,
> Aleksey
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread Clemens Schwaighofer
yeah there is a NAT firewall inbetween. I can check there too.

But interesting thing is, if I set the tcp_keepalive_time higher it
won't time out. But still ... a bit strange.

2011/6/2 John R Pierce :
> On 06/01/11 11:35 PM, Clemens Schwaighofer wrote:
>>
>> Hi,
>>
>> I have a script that runs a query on a remote server. The query will
>> take quite some time to finish.
>> Now the problem is that the tcp stack will timeout before the query is
>> finished.
>
>
> is there a NAT firewall or something else in the middle thats doing
> connection tracking?    tcp shouldn't time out like that even if your query
> is taking multiple hours.
>
>
> --
> john r pierce                            N 37, W 123
> santa cruz ca                         mid-left coast
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
★ Clemens 呉 Schwaighofer
★ IT Engineer/Web Producer/Planning
★ E-Graphics Communications SP Digital
★ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN
★ Tel: +81-(0)3-3545-7706
★ Fax: +81-(0)3-3545-7343
★ http://www.e-gra.co.jp

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread John R Pierce

On 06/01/11 11:35 PM, Clemens Schwaighofer wrote:

Hi,

I have a script that runs a query on a remote server. The query will
take quite some time to finish.
Now the problem is that the tcp stack will timeout before the query is finished.



is there a NAT firewall or something else in the middle thats doing 
connection tracking?tcp shouldn't time out like that even if your 
query is taking multiple hours.



--
john r pierceN 37, W 123
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread Clemens Schwaighofer
Hi,

I have a script that runs a query on a remote server. The query will
take quite some time to finish.
Now the problem is that the tcp stack will timeout before the query is finished.

I am running the query as async and have a loop where I query the
pg_ready status every 5 seconds.

Is there anyway to send some NOOP or anything so this connection does
not timeout? Just reading pg_ready seems to do nothing, probably
because it gets filled once the query is done.

Running the query as not async has the same issues with timeout.

-- 
★ Clemens 呉 Schwaighofer
★ IT Engineer/Web Producer/Planning
★ E-Graphics Communications SP Digital
★ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN
★ Tel: +81-(0)3-3545-7706
★ Fax: +81-(0)3-3545-7343
★ http://www.e-gra.co.jp

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general