[GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread Clemens Schwaighofer
Hi,

I have a script that runs a query on a remote server. The query will
take quite some time to finish.
Now the problem is that the tcp stack will timeout before the query is finished.

I am running the query as async and have a loop where I query the
pg_ready status every 5 seconds.

Is there anyway to send some NOOP or anything so this connection does
not timeout? Just reading pg_ready seems to do nothing, probably
because it gets filled once the query is done.

Running the query as not async has the same issues with timeout.

-- 
★ Clemens 呉 Schwaighofer
★ IT Engineer/Web Producer/Planning
★ E-Graphics Communications SP Digital
★ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN
★ Tel: +81-(0)3-3545-7706
★ Fax: +81-(0)3-3545-7343
★ http://www.e-gra.co.jp

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread Clemens Schwaighofer
Hi,

I have a script that runs a query on a remote server. The query will
take quite some time to finish.
Now the problem is that the tcp stack will timeout before the query is finished.

I am running the query as async and have a loop where I query the
pg_ready status every 5 seconds.

Is there anyway to send some NOOP or anything so this connection does
not timeout? Just reading pg_ready seems to do nothing, probably
because it gets filled once the query is done.

Running the query as not async has the same issues with timeout.

-- 
★ Clemens 呉 Schwaighofer
★ IT Engineer/Web Producer/Planning
★ E-Graphics Communications SP Digital
★ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN
★ Tel: +81-(0)3-3545-7706
★ Fax: +81-(0)3-3545-7343
★ http://www.e-gra.co.jp

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread John R Pierce

On 06/01/11 11:35 PM, Clemens Schwaighofer wrote:

Hi,

I have a script that runs a query on a remote server. The query will
take quite some time to finish.
Now the problem is that the tcp stack will timeout before the query is finished.



is there a NAT firewall or something else in the middle thats doing 
connection tracking?tcp shouldn't time out like that even if your 
query is taking multiple hours.



--
john r pierceN 37, W 123
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] DBD::PG and long running queries and tcp stack timeout

2011-06-02 Thread Clemens Schwaighofer
yeah there is a NAT firewall inbetween. I can check there too.

But interesting thing is, if I set the tcp_keepalive_time higher it
won't time out. But still ... a bit strange.

2011/6/2 John R Pierce pie...@hogranch.com:
 On 06/01/11 11:35 PM, Clemens Schwaighofer wrote:

 Hi,

 I have a script that runs a query on a remote server. The query will
 take quite some time to finish.
 Now the problem is that the tcp stack will timeout before the query is
 finished.


 is there a NAT firewall or something else in the middle thats doing
 connection tracking?    tcp shouldn't time out like that even if your query
 is taking multiple hours.


 --
 john r pierce                            N 37, W 123
 santa cruz ca                         mid-left coast


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
★ Clemens 呉 Schwaighofer
★ IT Engineer/Web Producer/Planning
★ E-Graphics Communications SP Digital
★ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN
★ Tel: +81-(0)3-3545-7706
★ Fax: +81-(0)3-3545-7343
★ http://www.e-gra.co.jp

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Returning from insert on view

2011-06-02 Thread Dean Rasheed
On 1 June 2011 10:32, Aleksey Chirkin a4ir...@gmail.com wrote:
 Hello!

 I need your advice.
 My problem is to ensure that the right returning from insert on the view.

 For example, I have two tables:

 CREATE TABLE country (id serial, nm text);
 CREATE TABLE city (id serial, country_id integer, nm text);

 And one view on table city, which join table county and adds
 country_nm column.

 CREATE VIEW city_view AS
  SELECT city.id, city.nm, city.country_id, country.nm AS country_nm
    FROM city
    JOIN country ON city.country_id = country.id;

 I have two options for ensuring the returning from insert operation on view:

 1) Create rule:

 CREATE RULE ins AS ON INSERT
  TO city_view
  DO INSTEAD
  INSERT INTO city (nm, country_id) VALUES (NEW.nm, NEW.country_id)
    RETURNING id, nm, country_id,
      (SELECT nm FROM country WHERE id = city.country_id) AS country_nm;

 2) Create trigger on view (for PostgreSQL 9.1):

 CREATE FUNCTION city_view_insert()
  RETURNS trigger AS
 $BODY$
 BEGIN
  INSERT INTO city
    ( nm, country_id )
    VALUES ( NEW.nm, NEW.country_id )
    RETURNING id INTO NEW.id;

  SELECT * INTO NEW FROM city_view WHERE id = NEW.id;

  RETURN NEW;
 END;
 $BODY$
 LANGUAGE plpgsql VOLATILE STRICT SECURITY DEFINER;

 CREATE TRIGGER on_insert INSTEAD OF INSERT  ON city_view
  FOR EACH ROW
  EXECUTE PROCEDURE city_view_insert();

 It looks like a trick, and slows the insert, but it looks better and
 avoids the complexities in returning.

 Perhaps there is another way (may be a trick) to ensure the returning
 from the insert on the view, without a manual compilation of the
 returning columns?


Selecting from the view at the end of the trigger will be slower, so
if performance is a factor it would be better to just select the
required columns from the underlying table, but I'm not aware of any
trick to avoid listing the columns.

The trigger has greater scope for flexibility and validation of the
input data - maybe country names are supplied, which the trigger could
validate and get the corresponding ids. Maybe both are supplied, and
the trigger could check they are consistent, etc...

For bulk operations the rule should out-perform the trigger, since it
is just a query rewrite (like a macro definition). However, there are
a lot more gotchas when it comes to writing rules. So the main
advantages of the trigger are that it is less error-prone, and it is
easier to write complex logic in a procedural language.

Regards,
Dean



 Regards,
 Aleksey

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Question about configuration and SSD

2011-06-02 Thread Szymon Guz
Hi,
do we need some special configuration for SSD drives, or is that enough to
treat those drives normally?


regards
Szymon


Re: [GENERAL] Question about configuration and SSD

2011-06-02 Thread Craig Ringer
On 02/06/11 16:26, Szymon Guz wrote:
 Hi,
 do we need some special configuration for SSD drives, or is that enough
 to treat those drives normally?

Make sure the SSDs have a supercapacitor or battery backup for their
write cache. If they do not, then do not use them unless you can disable
write caching completely (probably resulting in horrible performance),
because you WILL get a corrupt database when power fails.

If the SSDs have a supercap or a battery backed write cache so that they
can guarantee that all cached data will be written out if the power goes
down, you won't need any special configuration. You may want to tune
differently for best performance, though - for example, reducing
random_page_cost .

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread Ognjen Blagojevic

Carl,

I don't have experience with that big databases, but I did both 
solutions, and here are pros of both of them:


1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving

I slightly prefer option no. 2, since transaction handling, rollback and 
ref. integrity is not so easy to implement when you have two different 
storage systems (FS and RDB).


As for indexes and tables it is not clear form your message whether you 
need just a regular search (field LIKE 'something'), full text search of 
metadata, or full text search of scanned documents (in case they are OCRed).


Regards,
Ognjen


On 1.6.2011 10:08, Carl von Clausewitz wrote:

Hello Everyone,

I got a new project, with 100 user in Europe. In this case, I need to
handle production and sales processes an its documentations in
PostgreSQL with PHP. The load of the sales process is negligible, but
every user produces 2 transaction in the production process, with 10-30
scanned documents (each are 400kb - 800kb), and 30-50 high resolution
pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'.
'Somewhere' could be the server files system, and a link in
the PostgreSQL database for the location of the files (with some
metadata), or it could be the PostgreSQL database.

My question is that: what is your opinion about to store the scanned
documentation and the pictures in the database? This is a huge amount of
data (between daily 188MB and 800MB data, average year is about 1 TB
data), but is must be searchable, and any document must be retrieved
within 1 hour. Every documentations must be stored for up to 5 years...
It means the database could be about 6-7 TB large after 5 years, and
then we can start to archive documents. Any other data size is negligible.

If you suggest, to store all of the data in PostgreSQL, what is your
recommendation about table, index structure, clustering, archiving?

Thank you in advance!
Regards,
Carl



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread Carl von Clausewitz
Dear Ognjen,

thank you - that was my idea too, but I've never seen such a workload like
this. The docu's (which are not processed by any ocr hopefully) and the
pictures are not indexed off course, just some metadatas, which are related
to the exact docu, or pic For example:

productions_docu1:
-sent date
-recieved date
-type
-owner
-case_id
-etc

image_001:
-picturetaken date
-case_id
-image_type

Just these metadatas need to be searched. My questions about the structure
was like this: do you recommend, to store the images and the docu's in a
same table (CREATE TABLE docu_img_store (id BIGSERIAL, case_id
BIGINT, content_type INTEGER, content bytea), or store it in two different
tables? Is there any special settings while table creations, that I have to
set for optimal work (like index, storage parameter, toast, etc).

(:-) I know, that this project could be a high value revenue for any DB
consultancy related company, but this is a small country, with small project
fees, and I'm employee, not a contractor at my company :-)

Thanks you in advance,
Regards,
Carl


2011/6/2 Ognjen Blagojevic ognjen.d.blagoje...@gmail.com

 Carl,

 I don't have experience with that big databases, but I did both solutions,
 and here are pros of both of them:

 1. Files stored on the filesystem:
 - Small database footprint
 - Faster backup, export and import

 2. Files stored in the database
 - RDBMS takes care of transactions and ref. int.
 - Slower backup, export and import but all done in one step
 - Easier continuous archiving

 I slightly prefer option no. 2, since transaction handling, rollback and
 ref. integrity is not so easy to implement when you have two different
 storage systems (FS and RDB).

 As for indexes and tables it is not clear form your message whether you
 need just a regular search (field LIKE 'something'), full text search of
 metadata, or full text search of scanned documents (in case they are OCRed).

 Regards,
 Ognjen



 On 1.6.2011 10:08, Carl von Clausewitz wrote:

 Hello Everyone,

 I got a new project, with 100 user in Europe. In this case, I need to
 handle production and sales processes an its documentations in
 PostgreSQL with PHP. The load of the sales process is negligible, but
 every user produces 2 transaction in the production process, with 10-30
 scanned documents (each are 400kb - 800kb), and 30-50 high resolution
 pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'.
 'Somewhere' could be the server files system, and a link in
 the PostgreSQL database for the location of the files (with some
 metadata), or it could be the PostgreSQL database.

 My question is that: what is your opinion about to store the scanned
 documentation and the pictures in the database? This is a huge amount of
 data (between daily 188MB and 800MB data, average year is about 1 TB
 data), but is must be searchable, and any document must be retrieved
 within 1 hour. Every documentations must be stored for up to 5 years...
 It means the database could be about 6-7 TB large after 5 years, and
 then we can start to archive documents. Any other data size is negligible.

 If you suggest, to store all of the data in PostgreSQL, what is your
 recommendation about table, index structure, clustering, archiving?

 Thank you in advance!
 Regards,
 Carl



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Table with active and historical data

2011-06-02 Thread salah jubeh
Hello,

I think, one good thing to do is partionning, you have already mentioned that 
in 
your mail, 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

try to run also vaccuum command it might help in increasing the performance.

create a boolean flag i.e. active and create an index on it. I think this will 
be faster than having index on date field

Regards
 





From: Robert James srobertja...@gmail.com
To: Postgres General pgsql-general@postgresql.org
Cc: srobertja...@gmail.com
Sent: Thu, June 2, 2011 1:30:11 AM
Subject: [GENERAL] Table with active and historical data

I have a table with a little active data and a lot of historical data.
I'd like to be able to access the active data very quickly - quicker
than an index.  Here are the details:

1. Table has about 1 million records
2. Has a column active_date - on a given date, only about 1% are
active.  active_date is indexed and clustered on.
3. Many of my queries are WHERE active_date = today.  Postgres uses
the index for these, but still lakes quite a lot of time.  I repeat
these queries regularly.
4. I'd like to somehow presort or partition the data so that Postgres
doesn't have to do an index scan each time.  I'm not sure how to do
this? Idea?  I know it can be done with inheritance and triggers (
http://stackoverflow.com/questions/994882/what-is-a-good-way-to-horizontal-shard-in-postgresql

), but that method looks a little too complex for me.  I'm looking for
something simple.
5. Finally, I should point out that I still do a large number of
queries on historical data as well.

What do you recommend? Ideas? Also: Why doesn't cluster on active_date
solve the problem? Specifically, if I run SELECT * FROM full_table
WHERE active_date = today, I get a cost of 3500.  If I first select
those records into a new table, and then do SELECT * on the new table,
I get a cost of 64.  Why is that? Why doesn't clustering pregroup
them?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] using jboss with ident auth

2011-06-02 Thread eyal edri
Hi,

I'm trying to use postgres with ident auth and jboss.

here's my postgres files:

pg_hba.conf:

# TYPE  DATABASEUSERCIDR-ADDRESS  METHOD

# local is for Unix domain socket connections only
local   all all  ident map=vmap
# IPv4 local connections:
hostall all 127.0.0.1/32  ident map=vmap
# IPv6 local connections:
hostall all ::1/128   ident map=vmap


pg_ident.conf:

# MAPNAME SYSTEM-USERNAMEPG-USERNAME
vmap postgres  postgres
vmap rootpostgres

i've set the postres password to 'postgres' and updated the postres-ds.xml
file under jboss:

datasources
local-tx-datasource
jndi-nameDataSource/jndi-name

connection-urljdbc:postgresql://localhost:5432/dbname/connection-url
driver-classorg.postgresql.Driver/driver-class
user-namepostgres/user-name
passwordpostgres/password
max-pool-size100/max-pool-size
check-valid-connection-sqlselect
1/check-valid-connection-sql
/local-tx-datasource
/datasources

when i try to run any command that uses the xml file for auth, it failes
with:

FATAL: Ident authentication failed for user postgres

when i use 'password' or 'md5' it works.


btw, i would have used password or md5 if there was a away to run psql
commands with password unattended (without password prompt..).

any ideas?

thanks!!


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread Thomas Harold

On 5/25/2011 3:42 PM, akp geek wrote:

Dear all -

 I would like to know if any one has migrated database from
MS access to Postgres . We use postgres 9.0.2 on solaris . Are there any
open source tools that you have used to do this task. Can you please
share your experiences ?



I rolled my own.

If the number of rows in the MDB table is not that many (under 100k), 
then I'll create a new table up on pgsql, link to it with the ODBC 
driver, and append from the source table to the pgsql table.  You can 
get away with larger appends if both systems are on the same network.


If it was a table with a few million rows, then I wrote a little VBA 
snippet that created a pgdump compatible SQL text file from the source 
data.  To figure out the format, I just pgdump'd an existing table from 
PostgreSQL, then patterned my SQL file after it.  While it was extremely 
fast at doing the conversion (both generating the SQL file and the time 
it took for pgdump to process the SQL file), I only recommend that 
method for cases where you have millions and millions of rows.  Or a lot 
of identical tables.


(The VBA module was about 100-150 lines of code in total.)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread akp geek
Thanks so much . I was using bullzip What I felt with Bullzip was it is good
for less amount of data.  I have 2 tables each of which has 2.5 million
records.  For me it is taking for ever, The job that I set up has been
running since 12 hours.

I appreciate if you could share the VBA code that you were mentioning. I
would love to use that to make the data transfer faster

Regards



On Thu, Jun 2, 2011 at 9:32 AM, Thomas Harold thomas-li...@nybeta.comwrote:

 On 5/25/2011 3:42 PM, akp geek wrote:

 Dear all -

 I would like to know if any one has migrated database from
 MS access to Postgres . We use postgres 9.0.2 on solaris . Are there any
 open source tools that you have used to do this task. Can you please
 share your experiences ?


 I rolled my own.

 If the number of rows in the MDB table is not that many (under 100k), then
 I'll create a new table up on pgsql, link to it with the ODBC driver, and
 append from the source table to the pgsql table.  You can get away with
 larger appends if both systems are on the same network.

 If it was a table with a few million rows, then I wrote a little VBA
 snippet that created a pgdump compatible SQL text file from the source data.
  To figure out the format, I just pgdump'd an existing table from
 PostgreSQL, then patterned my SQL file after it.  While it was extremely
 fast at doing the conversion (both generating the SQL file and the time it
 took for pgdump to process the SQL file), I only recommend that method for
 cases where you have millions and millions of rows.  Or a lot of identical
 tables.

 (The VBA module was about 100-150 lines of code in total.)



Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread Vick Khera
On Thu, Jun 2, 2011 at 10:01 AM, akp geek akpg...@gmail.com wrote:
 Thanks so much . I was using bullzip What I felt with Bullzip was it is good
 for less amount of data.  I have 2 tables each of which has 2.5 million
 records.  For me it is taking for ever, The job that I set up has been
 running since 12 hours.

Export to CSV or tab delimited file, then suck it in with a COPY
statement in postgres.  Just make sure that there is no invalid data
like fake dates.  2.5 million rows should take a couple of minutes
tops to insert into a modern hardware server.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread Ognjen Blagojevic

Carl,

Please keep in mind I am not a Postgres expert nor consultant, I'm just 
sharing my experience. I would also like to hear the opinion of other 
people who worked on projects with similar database sizes.


I would keep all files in the single table -- most probably they will be 
served to the user by the same code (e.g. file download servlet or 
something similar) so it is good if all relevant info is in one table, 
something like:


file {
 id
 mime_type
 name
 content bytea
 creation_date datetime
 modification_date datetime
 creation_user
 modification_user
}

Since both image and document inherits file, you may choose any of the 
common RDB inheritance modeling strategies (one table per hierarchy, one 
table per class...), but since there is just a few fields, I would put 
everything in the same table.


Consider cardinality between cases and files/users. Can one file be 
related with two cases and so on...


Toast table will be splitted in 1GB pieces.

Create indexes considering ways your users will browse or search data.

Regards,
Ognjen


On 2.6.2011 12:22, Carl von Clausewitz wrote:

Dear Ognjen,

thank you - that was my idea too, but I've never seen such a workload
like this. The docu's (which are not processed by any ocr hopefully) and
the pictures are not indexed off course, just some metadatas, which are
related to the exact docu, or pic For example:

productions_docu1:
-sent date
-recieved date
-type
-owner
-case_id
-etc

image_001:
-picturetaken date
-case_id
-image_type

Just these metadatas need to be searched. My questions about the
structure was like this: do you recommend, to store the images and the
docu's in a same table (CREATE TABLE docu_img_store (id
BIGSERIAL, case_id BIGINT, content_type INTEGER, content bytea), or
store it in two different tables? Is there any special settings while
table creations, that I have to set for optimal work (like index,
storage parameter, toast, etc).

(:-) I know, that this project could be a high value revenue for any DB
consultancy related company, but this is a small country, with small
project fees, and I'm employee, not a contractor at my company :-)

Thanks you in advance,
Regards,
Carl


2011/6/2 Ognjen Blagojevic ognjen.d.blagoje...@gmail.com
mailto:ognjen.d.blagoje...@gmail.com

Carl,

I don't have experience with that big databases, but I did both
solutions, and here are pros of both of them:

1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving

I slightly prefer option no. 2, since transaction handling, rollback
and ref. integrity is not so easy to implement when you have two
different storage systems (FS and RDB).

As for indexes and tables it is not clear form your message whether
you need just a regular search (field LIKE 'something'), full text
search of metadata, or full text search of scanned documents (in
case they are OCRed).

Regards,
Ognjen



On 1.6.2011 10:08, Carl von Clausewitz wrote:

Hello Everyone,

I got a new project, with 100 user in Europe. In this case, I
need to
handle production and sales processes an its documentations in
PostgreSQL with PHP. The load of the sales process is
negligible, but
every user produces 2 transaction in the production process,
with 10-30
scanned documents (each are 400kb - 800kb), and 30-50 high
resolution
pictures (each are 3-8 MB), and they wanted to upload it to
'somewhere'.
'Somewhere' could be the server files system, and a link in
the PostgreSQL database for the location of the files (with some
metadata), or it could be the PostgreSQL database.

My question is that: what is your opinion about to store the scanned
documentation and the pictures in the database? This is a huge
amount of
data (between daily 188MB and 800MB data, average year is about 1 TB
data), but is must be searchable, and any document must be retrieved
within 1 hour. Every documentations must be stored for up to 5
years...
It means the database could be about 6-7 TB large after 5 years, and
then we can start to archive documents. Any other data size is
negligible.

If you suggest, to store all of the data in PostgreSQL, what is your
recommendation about table, index structure, clustering, archiving?

Thank you in advance!
Regards,
Carl



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
mailto:pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Sent via pgsql-general 

Re: [GENERAL] Need suggestion

2011-06-02 Thread Ben Chobot
On Jun 1, 2011, at 1:08 AM, Carl von Clausewitz wrote:

 Hello Everyone,
 
 I got a new project, with 100 user in Europe. In this case, I need to handle 
 production and sales processes an its documentations in PostgreSQL with PHP. 
 The load of the sales process is negligible, but every user produces 2 
 transaction in the production process, with 10-30 scanned documents (each are 
 400kb - 800kb), and 30-50 high resolution pictures (each are 3-8 MB), and 
 they wanted to upload it to 'somewhere'. 'Somewhere' could be the server 
 files system, and a link in the PostgreSQL database for the location of the 
 files (with some metadata), or it could be the PostgreSQL database. 
 
 My question is that: what is your opinion about to store the scanned 
 documentation and the pictures in the database? This is a huge amount of data 
 (between daily 188MB and 800MB data, average year is about 1 TB data), but is 
 must be searchable, and any document must be retrieved within 1 hour. Every 
 documentations must be stored for up to 5 years... It means the database 
 could be about 6-7 TB large after 5 years, and then we can start to archive 
 documents. Any other data size is negligible. 
 
 If you suggest, to store all of the data in PostgreSQL, what is your 
 recommendation about table, index structure, clustering, archiving? 

So, you're mostly storing ~1TB of images/year? That doesn't seem so bad. How 
will the documents be searched? Will their contents be OCR'd out and put into a 
full text search? How many searches will be going on?

If you're asking whether or not it makes sense to store 7TB of images in the 
database, as opposed to storing links to those images and keeping the images 
themselves on a normal filesystem, there's no clear answer. Check the archives 
for pros and cons of each method.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread Vick Khera
On Thu, Jun 2, 2011 at 12:01 PM, akp geek akpg...@gmail.com wrote:
 The only problem I am seeing with dates as you mentioned. when I export the
 data to csv the date is getting the format of 8/1/1955 0:00:00 , but
 postgres not accepting that. Any clues?

Should work:


test= select '8/1/1955 0:00:00'::date;
date

 1955-08-01
(1 row)

Time: 0.325 ms
test= select '8/1/1955 0:00:00'::timestamp;
  timestamp
-
 1955-08-01 00:00:00
(1 row)


at worst I guess you run the export file thru an editing filter that
removes the '0:00:00' off the date column.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres 8.3.5 - ECPG and the use of descriptors and cursors in multi-threaded programs

2011-06-02 Thread Merlin Moncure
On Tue, May 31, 2011 at 7:35 PM, Bosco Rama postg...@boscorama.com wrote:
 Unfortunately, like you, I am just a user of this wonderful DB.  Since
 we are not seeing any other input here on the 'general' list it may be
 time to move this thread to the pgsql-interfaces list.  Are you subscribed
 to it?  It is a very low bandwidth list but it does tend to highlight the
 interface issues distinct from the general DB discussions.

hm, iirc pg-interfaces is deprecated.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Access to postgres conversion

2011-06-02 Thread akp geek
The only problem I am seeing with dates as you mentioned. when I export the
data to csv the date is getting the format of 8/1/1955 0:00:00 , but
postgres not accepting that. Any clues?

Regards

On Thu, Jun 2, 2011 at 11:23 AM, Vick Khera vi...@khera.org wrote:

 On Thu, Jun 2, 2011 at 10:01 AM, akp geek akpg...@gmail.com wrote:
  Thanks so much . I was using bullzip What I felt with Bullzip was it is
 good
  for less amount of data.  I have 2 tables each of which has 2.5 million
  records.  For me it is taking for ever, The job that I set up has been
  running since 12 hours.

 Export to CSV or tab delimited file, then suck it in with a COPY
 statement in postgres.  Just make sure that there is no invalid data
 like fake dates.  2.5 million rows should take a couple of minutes
 tops to insert into a modern hardware server.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Need suggestion

2011-06-02 Thread John R Pierce

On 06/02/11 2:19 AM, Ognjen Blagojevic wrote:

1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving 


with many terabytes of large file data accumulating, the database will 
become very unweildy to do any maintenance on.  a simple pg_dump will 
take many hours vs a few minutes.


I would almost certainly use a filesystem for an app like this, and just 
store the metadata in the database.


--
john r pierceN 37, W 123
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread Gauthier, Dave
Hi:

I'd like to pass a parameter into an inline psql call that itself calls an sql 
script, something like...

psql mydb -c \i thesqlscript foo

Wherefoo is the value I want to pass in.

Just as good would be the ability to sniff out an environment variable from 
within the sql script (thesqlscript in the example above).  In perl, I would 
use $ENV{VARNAME}.  Is there something like that in Postgres SQL?

V8.3.4 on Linux (upgrading to v9 very soon).

Thanks for any ideas !


Re: [GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread John R Pierce

On 06/02/11 9:58 AM, Gauthier, Dave wrote:


Hi:

I'd like to pass a parameter into an inline psql call that itself 
calls an sql script, something like...


psql mydb -c \i thesqlscript foo

Wherefoo is the value I want to pass in.



on the psql command line,
-v name=value
or
 --set name=value

then in your script, use :name if you want to use value as a sql 
identifier and (in 9.x), you can use :'value'  if you want to use 
'value' as a string literal.




--
john r pierceN 37, W 123
santa cruz ca mid-left coast


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread Bosco Rama
Gauthier, Dave wrote:
 
 I'd like to pass a parameter into an inline psql call that itself
 calls an sql script, something like...
 
 psql mydb -c \i thesqlscript foo
 
 Wherefoo is the value I want to pass in.

You may want to use the --set or --variable options of psql and then
reference the variable name in thesqlscript.

So the psql becomes:
   psql --set 'var=foo' -c '\i thesqlscript'

and then in thesqlscript:
   update table set column = :var;

HTH

Bosco.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Passing parameters into an in-line psql invocation

2011-06-02 Thread Leif Biberg Kristensen
On Thursday 2. June 2011 18.58.23 Gauthier, Dave wrote:
 Hi:
 
 I'd like to pass a parameter into an inline psql call that itself calls an
 sql script, something like...
 
 psql mydb -c \i thesqlscript foo
 
 Wherefoo is the value I want to pass in.
 
 Just as good would be the ability to sniff out an environment variable from
 within the sql script (thesqlscript in the example above).  In perl, I
 would use $ENV{VARNAME}.  Is there something like that in Postgres SQL?
 
 V8.3.4 on Linux (upgrading to v9 very soon).
 
 Thanks for any ideas !

Personally I prefer to write a small wrapper in Perl for interaction with 
Postgres from the command line. Here's a boilerplate:

#! /usr/bin/perl

use strict;
use DBI;

my $val = shift;
if ((!$val) || !($val =~ /^\d+$/)) {
print Bad or missing parameter $val\n;
exit;
}
my $dbh = DBI-connect(dbi:Pg:dbname=mydb, '', '',
{AutoCommit = 1}) or die $DBI::errstr;
my $sth = $dbh-prepare(SELECT foo(?));
while (my $text = STDIN) {
chomp($text);
$sth-execute($val);
my $retval = $sth-fetch()-[0];
if ($retval  0) {
$retval = abs($retval);
print Duplicate of $retval, not added.\n;
}
else {
print $retval added.\n;
}
}
$sth-finish;
$dbh-disconnect;

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres 8.3.5 - ECPG and the use of descriptors and cursors in multi-threaded programs

2011-06-02 Thread Bosco Rama
Merlin Moncure wrote:
 On Tue, May 31, 2011 at 7:35 PM, Bosco Rama postg...@boscorama.com wrote:
 Unfortunately, like you, I am just a user of this wonderful DB.  Since
 we are not seeing any other input here on the 'general' list it may be
 time to move this thread to the pgsql-interfaces list.  Are you subscribed
 to it?  It is a very low bandwidth list but it does tend to highlight the
 interface issues distinct from the general DB discussions.
 
 hm, iirc pg-interfaces is deprecated.

There was discussion of that some time ago.  I'm not sure what the final
decision was.  I still get the occasional message on that list.  And in
the past, messages sent to that list got some sort of attention.  It
seems that ecpg gets lost in the crowd here on the general list.  I'm not
sure if this is because of the ecpg folks not being subscribed to general
(which I highly doubt since I see Tom here, though I don't see Michael) or
if it's due to the different SNR.

Bosco.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] invalid byte sequence for encoding UTF8

2011-06-02 Thread BRUSSER Michael
We upgrading some old  database (7.3.10 to 8.4.4).   This involves running 
pg_dump on the old db
and loading the datafile to the new db.  If this matters we do not use 
pg_restore, the dump file is just sourced with psql,
and this is where I ran into problem:

psql: .../postgresql_archive.src/... ERROR:  invalid byte sequence for encoding 
UTF8: 0xedbebf
HINT:  This error can also happen if the byte sequence does not match the 
encoding
expected by the server, which is controlled by client_encoding.

The server and client encoding are both Unicode. I think we may have some 
copy/paste MS-Word markup
and possibly other odd things  on the old database.  All this junk is found on 
the 'text' fields.

I found a number of related postings, but did not see a good solution.  Some 
folks suggested cleaning the datafile prior to loading,
while someone else did essentially the same thing on the database before 
dumping it.
I am looking for advice, hopefully the best technique if there is one,   any 
suggestion is appreciated.

Thanks,
Michael.


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systemes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer


Re: [GENERAL] invalid byte sequence for encoding UTF8

2011-06-02 Thread Derrick Rice
That specific character sequence is a result of Unicode implementations
prior to 6.0 mixing with later implementations.  See here:

http://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character

You could replace that sequence with the correct 0xFFFD sequence with `sed`
for example (if using a plaintext dump format).

On Thu, Jun 2, 2011 at 4:17 PM, BRUSSER Michael michael.brus...@3ds.comwrote:

  We upgrading some old  database (7.3.10 to 8.4.4).   This involves
 running pg_dump on the old db

 and loading the datafile to the new db.  If this matters we do not use
 pg_restore, the dump file is just sourced with psql,

 and this is where I ran into problem:



 psql: .../postgresql_archive.src/... ERROR:  invalid byte sequence for
 encoding UTF8: 0xedbebf

 HINT:  This error can also happen if the byte sequence does not match the
 encoding

 expected by the server, which is controlled by client_encoding.



 The server and client encoding are both Unicode. I think we may have some
 copy/paste MS-Word markup

 and possibly other odd things  on the old database.  All this junk is found
 on the ‘text’ fields.



 I found a number of related postings, but did not see a good solution.
 Some folks suggested cleaning the datafile prior to loading,

 while someone else did essentially the same thing on the database before
 dumping it.

 I am looking for advice, hopefully the “best technique” if there is one,
   any suggestion is appreciated.



 Thanks,

 Michael.



 This email and any attachments are intended solely for the use of the
 individual or entity to whom it is addressed and may be confidential and/or
 privileged.

 If you are not one of the named recipients or have received this email in
 error,

 (i) you should not read, disclose, or copy it,

 (ii) please notify sender of your receipt by reply email and delete this
 email and all attachments,

 (iii) Dassault Systemes does not accept or assume any liability or
 responsibility for any use of or reliance on this email.

  For other languages, go to http://www.3ds.com/terms/email-disclaimer



[GENERAL] Hidden Risk w/ UPDATE Cascade and Trigger-Based Validation

2011-06-02 Thread David Johnston
Hi,

 

I am trying to get a better understanding of how the following Foreign Keys
with Update Cascades and validation trigger interact.  The basic setup is a
permission table where the two permission parts share a common
group/parent which is embedded into their id/PK and which change via the
FK cascade mechanism.  Rest of my thoughts and questions follow the setup.

 

I have the following schema (parts omitted/simplified for brevity since
everything works as expected)

 

CREATE TABLE userstorepermission (

userid text NOT NULL FK UPDATE CASCADE,

storeid text NOT NULL FK UPDATE CASCADE,

PRIMARY KEY (userid, storeid)

);

 

FUNCTION validate() RETURNS trigger AS

SELECT groupid FROM store WHERE storeid = [NEW.storeid] INTO storegroup

SELECT groupid FROM user WHERE userid = [NEW.userid] INTO usergroup

 

RAISE NOTICE 'Validating User Store Permission U:%;%, S:%;%', NEW.userid,
usergroup, NEW.storeid, storegroup;

 

IF (usergroup  storegroup) THEN

RAISE NOTICE 'Disallow';

RETURN null;

ELSE

RAISE NOTICE 'Allow';

RETURN NEW;

 

END;

 

CREATE TRIGGER INSERT OR UPDATE EXECUTE validate();

 

Basically if I change the groupid both the userid and storeid values in
userstorepermission will change as well.  This is desired.  When I do update
the shared groupid the following NOTICES are raised from the validation
function above:

 

The change for groupid was TESTSGB - TESTSGD:

 

NOTICE:  Validating User Store Permission U:tester@TESTSGB;NULL
S:[TESTSGD]STORE01;TESTSGD [at this point apparently both user and store
have been updated and storeid in the permission table is being change]

CONTEXT:  SQL statement UPDATE ONLY domain.userstorepermission SET
s_id = $1 WHERE $2 OPERATOR(pg_catalog.=) s_id

 

NOTICE:  Allow

CONTEXT:  SQL statement UPDATE ONLY domain.userstorepermission SET
s_id = $1 WHERE $2 OPERATOR(pg_catalog.=) s_id

 

NOTICE:  Validating User Store Permission U:tester@TESTSGD;TESTSGD
S:[TESTSGD]STORE01;TESTSGD [and now the userid in the permission table gets
its turn]

CONTEXT:  SQL statement UPDATE ONLY domain.userstorepermission SET
u_id = $1 WHERE $2 OPERATOR(pg_catalog.=) u_id

 

NOTICE:  Allow

CONTEXT:  SQL statement UPDATE ONLY domain.userstorepermission SET
u_id = $1 WHERE $2 OPERATOR(pg_catalog.=) u_id

 

The end result is that both values are changed as desired but the notices,
while they indirectly make sense (only one of the values can be update
cascaded at a time), are somewhat confusing and thus I am not sure if I am
possibly missing something that could eventually blow up in my face.  I
expect other similar situations will present themselves in my model so I
want to get more understanding on at least whether what I am doing is safe
and ideally whether the CASCADE rules possibly relax intra-process
enforcement of constraints in order to allow this kind of multi-column key
update to succeed.

 

I see BUG #5505 from January of last year where Tom confirms that the
trigger will fire but never addresses the second point about the referential
integrity check NOT FAILING since the example's table_2 contains a value not
present in table_1.

 

Conceptually, as long as I consistently update ALL the relevant FKs the
initial and resulting state should remain consistent but only with a
different value.  I'll probably do some more playing with missing a FK
Update Cascade and see whether the proper failures occurs but regardless
some thoughts and/or pointers are welcomed.

 

Thanks,

 

David J.

 

 

 

 



Re: [GENERAL] Need suggestion

2011-06-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Jun 01, 2011 at 10:08:14AM +0200, Carl von Clausewitz wrote:
 Hello Everyone,
 
 I got a new project, with 100 user in Europe. In this case, I need to handle
 production and sales processes an its documentations in PostgreSQL with PHP.

Something to consider too -- if you decide to store the big objects in
the database, that is -- is PostgreSQL's large object interface
http://www.postgresql.org/docs/9.1/static/largeobjects.html. The
problems with backup someone else mentioned in this thread would remain,
but you wouldn't have large blobs of data clobbering your regular
queries. You could pass the scans and pics piecemeal between client and
database without having to store them in the middleware (which may be an
advantage or a disadvantage, mind you).

Don't know whether PHP has bindings for that, though.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN6G38Bcgs9XrR2kYRAmOyAJwIGwk57tH5X8V4uEV5c3peQv7aKACfZ+Tm
9ogbAeWTKwxM2/o7aKz9kbc=
=MMDN
-END PGP SIGNATURE-

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need suggestion

2011-06-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Following up on myself:

Just stumbled upon this in the hackers mailing list, which might be
interesting to you, since it highlights pros  cons of current
implementations:

  http://archives.postgresql.org/pgsql-hackers/2011-06/threads.php#00049

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN6HEPBcgs9XrR2kYRAsg0AJ4o2fLheYZQAhpKE7cd7LWEOJc2vwCfUvnu
+Skz5eZti3cdDoode6Zu6s4=
=ImVK
-END PGP SIGNATURE-

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general