[GENERAL] tsearch2, locale, UTF-8 and Windows

2007-01-28 Thread Pierre Thibaudeau

Over the past few days, I have been reading everything I could about
tsearch2, but I cannot figure out what the latest status is concerning the
default locale on a Windows UTF-8 database under PostgreSQL 8.2.

More specifically, I have a UTF-8 database containing information in five
different European languages (English, French, Spanish, German and
Italian).  I am coding on a Windows system with locale French_Canada.1252.
The server that will soon run the database will likely have locale en_US...
I am at a loss concerning the locale with which I should initdb on my
system!!!

What am I getting myself into?  A lot of what I read on this matter was
pretty bleak;  has version 8.2 cleared all that?  What is the latest word on
this?


Re: [GENERAL] Predicted lifespan of different PostgreSQL branches

2007-01-28 Thread Dave Page

Oisin Glynn wrote:

My 8.2c,
Having 8.1 end of life this soon after the release of 8.2 seems pretty 
harsh.  


Yeah, I agree. In part I'm basing the idea to support the current and 2 
previous branches on the amount of work required to build a complete set 
of point releases in one go - 3 seems manageable over a weekend. 4 would 
 probably be possible, but would be more rushed than I'd like.


Also, three just seems like a sensible number to maintain. I kinda like 
Magnus' idea to put older releases into a sort of 'retired' mode though, 
and build only the binaries for PostgreSQL itself.


Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] counting query

2007-01-28 Thread garry saddington
I have a table definition such as:

CREATE TABLE attendance
(
  attendanceid serial primary key,
  entered date DEFAULT current_date NOT NULL,
  absent boolean,
  authorization text default 'N',
  timeperiod char(2) check(timeperiod in('AM','PM')),
  days varchar(10),
  studentid int,
  unique(entered,timeperiod,studentid)
)

Which is used to record school attendance data. I am now trying to write
a query to identify trends in absences by counting the days column and
returning any student that has repeated absences on certain days. I am
struggling to return anything that does not need further manipulation in
Python before being useful.
Does anyone have any ideas?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Predicted lifespan of different PostgreSQL branches

2007-01-28 Thread Shane Ambler

Dave Page wrote:

Oisin Glynn wrote:

My 8.2c,
Having 8.1 end of life this soon after the release of 8.2 seems pretty 
harsh.  


Yeah, I agree. In part I'm basing the idea to support the current and 2 
previous branches on the amount of work required to build a complete set 
of point releases in one go - 3 seems manageable over a weekend. 4 would 
 probably be possible, but would be more rushed than I'd like.


I don't see a problem if the updates for the current and previous 
release binaries are available on the current schedule and older release 
 update binaries are available a few days to a week later.


The other option is to extend the time allocated between the decision to 
OK the releases and making the binaries available.


For example we could say source release available friday with 8.2.x and 
8.1.x binaries available monday then 8.0.x and 7.4.x binaries available 
thursday or friday.


A lot of users may download the releases as they come out but most with 
production servers won't be installing them the same day.




--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Password issue revisited

2007-01-28 Thread Shane Ambler

Michael Schmidt wrote:

Fellow PostgreSQL fans,


1.  I don't see that this would pose a major security risk.  In 

 fact, in applications where the user enters the password for each
 session, the password need never be saved to disk, which seems a
 definite security advantage.  Some folks have noted that .pgpass is
 a plain text file, hence it could be vulnerable.

Yes it is a plain text file but if you want to use it then you need to 
ensure the security is sufficient on the file or it won't be used.


As per the manual -

 The permissions on .pgpass must disallow any access to world or 
group;  achieve this by the command chmod 0600 ~/.pgpass. If the 
permissions

 are less strict than this, the file will be ignored. (The file
 permissions are not currently checked on Microsoft Windows, however.)


So this security feature should be something that gets added to the 
windows version. But otherwise the security of the user's account that 
has a .pgpass file is the decider on whether it is vulnerable.



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] counting query

2007-01-28 Thread Joris Dobbelsteen
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of garry 
saddington
Sent: zondag 28 januari 2007 14:06
To: pgsql-general@postgresql.org
Subject: [GENERAL] counting query

I have a table definition such as:

CREATE TABLE attendance
(
  attendanceid serial primary key,

Why you have this??? You already have (entered,timeperiod,studentid)
that you can use, since that must be unique too. Try to avoid surrogate
keys as much as possible (it really increases performance and
ease-of-writing for complex queries!

  entered date DEFAULT current_date NOT NULL,
  absent boolean,
  authorization text default 'N',
  timeperiod char(2) check(timeperiod in('AM','PM')),
  days varchar(10),
  studentid int,
  unique(entered,timeperiod,studentid)
)

Guessing the meaning a bit, not too self-explaining.

Which is used to record school attendance data. I am now 
trying to write a query to identify trends in absences by 
counting the days column and returning any student that has 
repeated absences on certain days. I am struggling to return 
anything that does not need further manipulation in Python 
before being useful.
Does anyone have any ideas?

Yes, before starting you must have a well-defined idea on what you want
to know. What should the result look like? In most situations start
simple things, and eventually combine these to something more complex,
but always know what you are querying. You should have some idea of what
results you will be getting.
Of course, this is a proces of discovery rather than following set
rules. Some background on probabilities and statistics really helps.
Personally I like visualization quite a lot to help me with this.

Personally I've found nothing that will beat Excel for doing data
analysis. Learn to use the pivot table and pivot charts. They are
extremely powerful.
However, it can be a bit tricky to transform the input into something
the tool can use. A good starting point is to split the dates into
seperate year, month, day, week values.

Some idea's that might work to get it started:
* A graph with days vs occurrences (count).
* Graph of total sick days per student vs occurrences.
* Graph of Check the occurrences per month/day/week

Next try filtering of the data to form sequential periods and more funny
things.

I believe all this can be done with plain SQL and you don't need any
python or plsql or other languages.

- Joris Dobbelsteen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] counting query

2007-01-28 Thread John Meyer
Joris Dobbelsteen wrote:

 CREATE TABLE attendance
 (
  attendanceid serial primary key,
 
 Why you have this??? You already have (entered,timeperiod,studentid)
 that you can use, since that must be unique too. Try to avoid surrogate
 keys as much as possible (it really increases performance and
 ease-of-writing for complex queries!


Correct me if I am wrong, but wouldn't a simple number be a lot easier
to look up than a composite key?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] counting query

2007-01-28 Thread Joris Dobbelsteen
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of John Meyer
Sent: zondag 28 januari 2007 15:36
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] counting query

Joris Dobbelsteen wrote:

 CREATE TABLE attendance
 (
  attendanceid serial primary key,
 
 Why you have this??? You already have (entered,timeperiod,studentid) 
 that you can use, since that must be unique too. Try to avoid 
 surrogate keys as much as possible (it really increases performance 
 and ease-of-writing for complex queries!


Correct me if I am wrong, but wouldn't a simple number be a 
lot easier to look up than a composite key?

No, it is not.
Better, it is, It might be, until you go just a bit larger.

I've learned it when I've build a production database (that's still used
in production and still performs excellent. At the time I was only 15 or
16 or 17 years old. That was 7 to 10 years ago. Since then I've learned
a lot.)

The trouble is, the database consists of well over 40 tables (with
nearly surrogate keys) and joining a bit of data on the far ends of the
database requires you to join arround 10 to 15 tables. If you are doing
something complex you will get lost at some point and really need
graphical tools to just grasp what you are trying to query.
So a bit of a complex query easily results in a excessive number of
tables that must be queried.

Why I did this. At this time I was not aware that you could build a
primary key consisting of multiple columns. And if you look arround you
at the Internet you see (nearly) all databases of free 'web
applications' making excessive use of surrogate keys.
This resulted in a lot of excessive surrogate keys that could have been
easily avoided and where not a required atrifact of the inconsistent
data I had to import.

What would have been better without surrogate keys all-over:
* Easier to write complex queries with much fewer tables to be queried.
* Much faster query performance, as fewer tables need to be referenced.
* Better integrity enforcement with simple foreign key constraints.

If fact, in (guessed) 50% of the queries I could have avoided at least 2
table joins!

Think big. That is why...

- Joris Dobbelsteen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] counting query

2007-01-28 Thread Douglas McNaught
Joris Dobbelsteen [EMAIL PROTECTED] writes:

 What would have been better without surrogate keys all-over:
 * Easier to write complex queries with much fewer tables to be queried.
 * Much faster query performance, as fewer tables need to be referenced.
 * Better integrity enforcement with simple foreign key constraints.

Not this debeta again.  ;)

Surrugate vs natural keys shouldn't make a difference in how many
yables you have--they depends on the degree of normalization.  Sounds
like you denormalized your database and happened to eliminate
surrogate keys at the same time.  Using that to say surrogate keys
are bad is kind of misleading.

-Doug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] counting query

2007-01-28 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/28/07 08:36, John Meyer wrote:
 Joris Dobbelsteen wrote:
 CREATE TABLE attendance
 (
  attendanceid serial primary key,
 Why you have this??? You already have (entered,timeperiod,studentid)
 that you can use, since that must be unique too. Try to avoid surrogate
 keys as much as possible (it really increases performance and
 ease-of-writing for complex queries!
 
 
 Correct me if I am wrong, but wouldn't a simple number be a lot easier
 to look up than a composite key?

This is the great synthetic-vs-natural key debate.

Sure, it's easier to write queries that join on a synthetic integer
field.  However, adding 3 extra fields to a few other tables is not
onerous, and it adds useful information to the other tables, since
(entered,timeperiod,studentid) are what *really* makes a record
unique.  Also, synthetic keys mean that you have to do more joins,
since if you want to know about entered and table T_FOO, you'd
have to join attendance to T_FOO.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFvMH/S9HxQb37XmcRAoefAKDpf/6TG5WzP4nBIEcqVHE1dmb4/gCgxkZd
5fxfG4NoBR/Ul3fhqmpuTFQ=
=g/F2
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] counting query

2007-01-28 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/28/07 07:05, garry saddington wrote:
 I have a table definition such as:
 
 CREATE TABLE attendance
 (
   attendanceid serial primary key,
   entered date DEFAULT current_date NOT NULL,
   absent boolean,
   authorization text default 'N',
   timeperiod char(2) check(timeperiod in('AM','PM')),
   days varchar(10),
   studentid int,
   unique(entered,timeperiod,studentid)
 )
 
 Which is used to record school attendance data. I am now trying to write
 a query to identify trends in absences by counting the days column and
 returning any student that has repeated absences on certain days. I am
 struggling to return anything that does not need further manipulation in
 Python before being useful.
 Does anyone have any ideas?

When you say certain days, you mean days of the week?

If so, create a view like:
CREATE VIEW V_DAY_ABSENCES AS
SELECT ENTERED,
   AUTHORIZATION,
   TIMEPERIOD,
   DAYS,
   STUDENTID,
   DOW(CAST(ENTERED AS TIMESTAMP)) AS WEEKDAY
FROM ATTENDANCE
WHERE ABSENT = TRUE;

Then, this query should do what you want:
SELECT STUDENTID,
   TIMEPERIOD,
   WEEKDAY,
   COUNT(*)
FROM V_DAY_ABSENSES
GROUP BY STUDENTID,
 TIMEPERIOD,
 WEEKDAY
HAVING COUNT(*)  3;






-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFvMgCS9HxQb37XmcRAkvrAJ9ZiOJCM8GGE7ptIzcZsUJc7T2fnQCgpkUn
/9nkR9BO04WB0XThPlx+254=
=9D2A
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Limit on number of users in postgresql?

2007-01-28 Thread Furface
Thanks Tom.  You know I thought about this approach a little more.  I 
don't think there's a simple answer to this security problem short of 
placing a proxy server application between the clients and the 
database.  The problem with giving database role accounts to each and 
every user is that the users now have uncontrolled access to the 
database.  That's not good either because I want to do fine grained 
access control within my application.


The more I think about it, the more I think a proxy app is necessary.  
It seems like a lot of work just for security issues, but basically most 
web based database apps use this model, with the web application acting 
as a proxy between the database and the client.


Thanks.


Tom Lane wrote:

Mark Walker [EMAIL PROTECTED] writes:
  
I'm wondering if there's a manageable limit on the number of users in 
postgresql.  I could have conceivably thousands of different users for 
any particular app and multiple apps in the same server.



Numbers in the low thousands are probably workable, as long as you
aren't adding and removing users at a great rate.  I think that the
main bottleneck would be the flat file that's used to tell the
postmaster about the set of valid users --- every time a user is
added/dropped/changed, that file gets rewritten and then re-parsed
by the postmaster.  So you could eat a lot of overhead if you change
users every few seconds or something like that.

I'd suggest doing some testing to see if there are any unexpected
bottlenecks, but I don't see a reason to reject the idea out of hand.
(If you find any slownesses, report 'em, and we might be able to fix
'em.)

regards, tom lane

  



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] counting query

2007-01-28 Thread Chad Wagner

On 1/28/07, Ron Johnson [EMAIL PROTECTED] wrote:


This is the great synthetic-vs-natural key debate.



Truly.  But what the heck!

Surrogate keys are not evil, and they do have value.  I see no value in
proclaiming surrogate keys are evil, do not use them.

Surrogate keys do have advantages:
- Disassociation of natural data to other referential tables (which can also
be confusing)
   Imagine a social security number, drivers license number, or any other
natural key.  Now imagine that
   key value has changed for a specific person, and you have used it as a
natural key throughout your data
   structures.  (and they do change)
- Reduced storage requirements (yields better performance)
   It is cheaper to store a 50 byte field + a 4 byte surrogate key once,
then it is to store it a million times:
   (surrogate key)   54 bytes + (4 bytes * 1 million) = 4MB
   vs.
   (natural key)   50 bytes * 1 million = 50 MB


Natural keys are not evil either, and they have their own advantages.  But
when your modeling very large databases (around 10TB range) then you
absolutely have to consider every single decision, and natural keys (in my
opinion) is not always a good one as a single natural key could result in
another 100GB of storage requirements.

There should be some thought when you are modeling and these are some of the
things to consider.  I don't see a 10 table join being a major performance
penalty, especially when 8 of the tables may be a few MB in size.


--
Chad
http://www.postgresqlforums.com/


[GENERAL] explain analyse much slower than actual query

2007-01-28 Thread Phil Endecott
Dear All,

I want to find all of the msg_ids from messages that are not in table 
part_tsearch, where both tables are large but the result of the 
query is normally small, and often empty.  To 'protect' the application
against the unusual case where the result of the query is large I have 
added a limit clause:

select msg_id from messages where msg_id not in (select msg_id from 
part_tsearch) limit 10;

Currently there are about 30,000 rows in each table and about 500
rows in the result of the (un-limit-ed) query.  So it is unusual in
the sense that the size of the result is relatively large and the limit
clause will take effect.  Both tables are indexed by msg_id.

This was taking longer than I expected, so I decided to 
explain-analyse it.  However, when I run it under explain-analyse, 
it takes even longer than before:

decimail= select msg_id from messages where msg_id not in (select msg_id from 
part_tsearch) limit 10;
 msg_id 

  67894
  67809
  52548
  67745
  67538
  67540
  67329
  67246
  67235
  67140
(10 rows)

(that takes about 2 seconds)

decimail= explain analyse select msg_id from messages where msg_id not in 
(select msg_id from part_tsearch) limit 10;
   QUERY PLAN   


 Limit  (cost=4301.99..10534.34 rows=10 width=4) (actual 
time=6677.791..72417.068 rows=10 loops=1)
   -  Seq Scan on messages  (cost=4301.99..11966058.86 rows=19193 width=4) 
(actual time=6677.725..72416.427 rows=10 loops=1)
 Filter: (NOT (subplan))
 SubPlan
   -  Materialize  (cost=4301.99..4830.07 rows=37908 width=4) (actual 
time=0.097..39.494 rows=862 loops=903)
 -  Seq Scan on part_tsearch  (cost=0.00..4115.08 rows=37908 
width=4) (actual time=0.104..1679.258 rows=37901 loops=1)
 Total runtime: 72424.066 ms
(7 rows)


As I increase the limit the runtime increases as follows:

limit   normal runtime   explain-anlyse runtime
10  277
20  5309
40  12   807
80  88   
160 149  
320 1016

I was not patient enough to wait for the remaining explain-analyse results,
but I feel that there is a linear slowdown of about 60x between the raw 
query and the explain-analyse version.

In general, for other queries, explain-analyse reports runtimes that agree 
with the runtimes of the actual queries.

But the peculiar behaviour of explain-analyse is really a distraction from
the fact that the query is slow, especially when the limit value is large.

The system seems to be largely CPU-bound during these long run-times.

The query plan reported by explain-analyse is the same in each case.  How
many times is it actually doing the seq-scan on part_tsearch?  I see
that the rows value reported for Materialize is rather different in the
planned and actual numbers.  What is this telling me?  I analysed the
tables immediately before starting on this.

I was hoping that it would be implemented using some sort of index scan on
the two tables, maybe something involing bitmaps.  Is there something that 
I can do to the query, or to my indexes, to make this happen?

I tried using except rather than not in subquery; this isn't 
exactly identical as except is required to return sorted results, which 
I don't need; when the limit clause is in effect I don't care
which of the possible ids are returned and I never care about the order.
In this case the runtimes are as follows:

limit   normal runtime   explain-anlyse runtime
10  15   54
20  15   55
40  19   94
80  955
160 20   68
320 10   70

Note that again explain-analyse is slower than the normal runtime, but now
by a factor of about 5 rather than the previous factor of about 60.

Basically, the query runtimes are now essentially constant; I imagine that 
they would be flatter if the machine wasn't also running other processes.
The query plan shows that it is doing a sequential pass over each of the
tables, sorting and then doing a set-op - exactly as expected and OK
for large limits, but not for small limits.

I feel that it should be possible to do this in miliseconds, not seconds,
using the existing indexes.  

This is with Postgresql 8.1 on Debian GNU/Linux.

(BTW, this is for Decimail, my postgresql-based IMAP mail server.  I have
recently added tsearch2-based searching.  It is still somewhat experimental
but has never lost any mail.  If you're interested, have a look at
http://decimail.org/.)

Many thanks for any advice,

Phil.


(You are welcome to CC: me in any replies)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] virtual (COMPUTED BY) columns?

2007-01-28 Thread A. Kretschmer
am  Sun, dem 28.01.2007, um 10:25:33 -0600 mailte Ron Johnson folgendes:
 Hi.
 
 These fields do not use any disk space, as the data in them is
 derived on the fly.
 
 For example:
 CREATE TABLE T_EXAMPLE (
 SOME_DATE DATE,
 JDATE COMPUTED BY EXTRACT(JULIAN FROM SOME_DATE)
 );
 
 A work-around is to create a function, and reference it in every
 query, but storing them in the table definition is the tidy,
 low-maintenance way to do it.

Was this a question how to do this in PostgreSQL?

You can use a VIEW. Just create T_EXAMPLE ( SOME_DATE DATE ); 
and then 

CREATE VIEW view_example AS SELECT some_date, EXTRACT(JULIAN FROM
some_date) AS julian_date FROM T_EXAMPLE;


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] virtual (COMPUTED BY) columns?

2007-01-28 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/28/07 10:43, A. Kretschmer wrote:
 am  Sun, dem 28.01.2007, um 10:25:33 -0600 mailte Ron Johnson folgendes:
 Hi.

 These fields do not use any disk space, as the data in them is
 derived on the fly.

 For example:
 CREATE TABLE T_EXAMPLE (
 SOME_DATE DATE,
 JDATE COMPUTED BY EXTRACT(JULIAN FROM SOME_DATE)
 );

 A work-around is to create a function, and reference it in every
 query, but storing them in the table definition is the tidy,
 low-maintenance way to do it.
 
 Was this a question how to do this in PostgreSQL?
 
 You can use a VIEW. Just create T_EXAMPLE ( SOME_DATE DATE ); 
 and then 
 
 CREATE VIEW view_example AS SELECT some_date, EXTRACT(JULIAN FROM
 some_date) AS julian_date FROM T_EXAMPLE;

Good point.

But then you have 2 bits of metadata, and yet one more object to
update when you add a column.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFvNQjS9HxQb37XmcRArguAJ4y4e5rbpe4YoH+VNJXIW0XSrjRqQCfcLO4
78WYNnFb14wlI9hXJtwbSeM=
=KZkP
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] explain analyse much slower than actual query

2007-01-28 Thread Tom Lane
Phil Endecott [EMAIL PROTECTED] writes:
 I was not patient enough to wait for the remaining explain-analyse results,
 but I feel that there is a linear slowdown of about 60x between the raw 
 query and the explain-analyse version.

Slow gettimeofday() ... fairly common on desktop-grade PC hardware :-(.
You seem to have a particularly bad case of it, though, as extrapolating
from your numbers suggests the overhead is something like 20
microseconds per clock reading; the other reporters we've heard from
seemed to get around 1 to 5 usec IIRC.  A lot of PCs still use clock
chips that were designed back when multiple microseconds to read the
clock wasn't unreasonable, but with CPU speeds in the GHz range this is
just sucky hardware.  It shows up on this example because most of the
node entry/exits are for the Materialize node, which can return the next
row from its internal array in about no time flat, so the clock readings
represent huge percentage overhead.

 But the peculiar behaviour of explain-analyse is really a distraction from
 the fact that the query is slow, especially when the limit value is large.

You need a hashed subplan for NOT IN to work reasonably fast.  The
fact you're not getting one suggests you either have to raise work_mem,
or you're using some weird datatype that doesn't support hashing.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Password issue revisited

2007-01-28 Thread Michael Schmidt
Mr. Lane and Mr. Momjian,
Well, I asked and I got an answer.  So be it.  Regarding how I concluded that 
PGPASSFILE was deprecated for pg_dump, I offer the following.

1.  The documentation for pg_dump in the manual (Section VI) includes a section 
labeled Environment.  This lists PGDATABASE, PGHOST, PGPORT, and PGUSER.  It 
also says default connection parameters but there is no hyperlink or 
reference to another manual section to explain/define this term.  

2.  Neither the Index nor the Table of Contents has an entry for connection 
parameters or default connection parameters.  

3.  The application help (pg_dump --help) provides no further information.

4.  The pg_dump documentation in Section VI has a See Also section, with a 
link to 29.12.  In 29.12 there is a list of variables that can be used to 
select default connection parameters for PQconnectdb, PQsetdbLogin, and 
PQsetdb (no mention of pg_dump).  The list includes those mentioned in the 
pg_dump documentation () see 1 above) along with several others (including 
PGPASSFILE).  I was left wondering why would some of these be mentioned in the 
pg_dump section and not the others?  Perhaps those not mentioned can't be used 
by pg_dump?  

Given the above ambiguities, I was not able to conclude that pg_dump would 
accept the PGPASSFILE environment variable.

Michael Schmidt


Re: [GENERAL] counting query

2007-01-28 Thread Joris Dobbelsteen
-Original Message-
From: Douglas McNaught [mailto:[EMAIL PROTECTED] 
Sent: zondag 28 januari 2007 16:29
To: Joris Dobbelsteen
Cc: John Meyer; pgsql-general@postgresql.org
Subject: Re: [GENERAL] counting query

Joris Dobbelsteen [EMAIL PROTECTED] writes:

 What would have been better without surrogate keys all-over:
 * Easier to write complex queries with much fewer tables to 
be queried.
 * Much faster query performance, as fewer tables need to be 
referenced.
 * Better integrity enforcement with simple foreign key constraints.

Not this debeta again.  ;)

Surrugate vs natural keys shouldn't make a difference in how 
many yables you have--they depends on the degree of 
normalization.  Sounds like you denormalized your database and 
happened to eliminate surrogate keys at the same time.  Using 
that to say surrogate keys are bad is kind of misleading.

I have perhaps formulated it quite extreme. It was not intended to take
such a extreme stance. I appologize if it will lead to such a
discussion. I'll try to do it a bit more careful next time.

My point is only, be careful with surrogate keys and try not the use
them for everything. In my sole opinion I see them getting used too
much. But then again, what is good and wrong will always be subjective.

So perhaps quoteWhat would have been better without surrogate keys
all-overquote should have been My database where I extremely overdid
it with surrogate keys.

Lets leave it to this.

- Joris

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] counting query

2007-01-28 Thread Douglas McNaught
Joris Dobbelsteen [EMAIL PROTECTED] writes:

 So perhaps quoteWhat would have been better without surrogate keys
 all-overquote should have been My database where I extremely overdid
 it with surrogate keys.

Fair enough.  It's generally true that going to extremes with anything
causes problems.  :)

-Doug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Predicted lifespan of different PostgreSQL branches

2007-01-28 Thread Peter Eisentraut
Dave Page wrote:
 Also, three just seems like a sensible number to maintain. I kinda
 like Magnus' idea to put older releases into a sort of 'retired' mode
 though, and build only the binaries for PostgreSQL itself.

But would that give people who have previously used the full installer 
an upgrade path (that doesn't break everything around it)?
-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Password issue revisited

2007-01-28 Thread Tom Lane
Michael Schmidt [EMAIL PROTECTED] writes:
 ... Regarding how I concluded 
 that PGPASSFILE was deprecated for pg_dump, I offer the following.

 1.  The documentation for pg_dump in the manual (Section VI) includes a 
 section labeled Environment.  This lists PGDATABASE, PGHOST, PGPORT, 
 and PGUSER.  It also says default connection parameters but there is 
 no hyperlink or reference to another manual section to explain/define 
 this term.

Yeah.  There is a link down in See Also but the incomplete
Environment section of these man pages seems misleading.

Rather than try to maintain complete lists in each of the
client-application man pages, I propose we remove those sections
completely, and just rely on the See Also links to section 29.12.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Firing triggers based on certain Insert conditions

2007-01-28 Thread Harpreet Dhaliwal

Hi
I have a table in which i have a field named 'source'
A trigger is written on this table.
I want this trigger to fire only when after Insert this field 'source'
has value = 'from', otherwise trigger should not be fired at all.
Just wondering if its really possible?

Thanks in advance.
Harpreet


Re: [GENERAL] Questions about horizontal partitioning

2007-01-28 Thread Anton Melser

However,  if the primary key is entirely within those six columns,  there
will have to be an index on it in both tables to enforce the primary key
constraint.  In that case,  an inner join could be performed with an index
lookup or an index scan plus hash join,  for a query that didn't use any
other columns.  Whether that translates into a significant I/O reduction
depends on how wide and how frequently non-NULL those other columns are.




... if someone is feeling pedagogical (and the answer isn't that
complicated), could they explain why a simple index on the desired
columns wouldn't be the best solution?
Cheers
Antoine

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Firing triggers based on certain Insert conditions

2007-01-28 Thread Tom Lane
Harpreet Dhaliwal [EMAIL PROTECTED] writes:
 I want this trigger to fire only when after Insert this field 'source'
 has value = 'from', otherwise trigger should not be fired at all.
 Just wondering if its really possible?

No, and it seems pretty silly as a feature request.  Why don't you just
put the test in the trigger, and have it do nothing when you don't want
it to do anything?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Limit on number of users in postgresql?

2007-01-28 Thread Bill Moran
Furface [EMAIL PROTECTED] wrote:

 Thanks Tom.  You know I thought about this approach a little more.  I 
 don't think there's a simple answer to this security problem short of 
 placing a proxy server application between the clients and the 
 database.  The problem with giving database role accounts to each and 
 every user is that the users now have uncontrolled access to the 
 database.

Ummm ... huh?

PostgreSQL has a pretty nice security model that gives you a great deal
of control over what users have access to:
http://www.postgresql.org/docs/8.2/static/user-manag.html

The only thing that's missing is row-level granularity.  There's at least
one project out there supporting that, and you can also simulate it with
clever usage of stored procedures and the ability to run them with the
permissions of the definer instead of the executer.

-Bill

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Firing triggers based on certain Insert conditions

2007-01-28 Thread Brendan Jurd

On 1/29/07, Harpreet Dhaliwal [EMAIL PROTECTED] wrote:


Hi
I have a table in which i have a field named 'source'
A trigger is written on this table.
I want this trigger to fire only when after Insert this field 'source'
has value = 'from', otherwise trigger should not be fired at all.
Just wondering if its really possible?


AFAIK you can't prevent the trigger from firing, but you can make the
trigger behave differently based on what's in the fields of the new
row:

CREATE FUNCTION source_insert() RETURNS trigger AS $$
BEGIN
 IF NEW.source = 'from' THEN

  /* do stuff */

 END IF;

 RETURN NULL;
END;
$$ LANGUAGE plpgsql VOLATILE;

CREATE TRIGGER post_insert AFTER INSERT ON [your table name] FOR EACH
ROW EXECUTE PROCEDURE source_insert();

Regards,
BJ

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Firing triggers based on certain Insert conditions

2007-01-28 Thread Harpreet Dhaliwal

I never said I don't want my trigger to do anything.
My subject line only made it pretty clear that I want to fire my trigger
based on
certain conditions.
I know its kind of a silly feature request but you really can't help it
when you are working with stupid advisors :)
thanks for your reponse anyways.
Harpreet


On 1/28/07, Tom Lane [EMAIL PROTECTED] wrote:


Harpreet Dhaliwal [EMAIL PROTECTED] writes:
 I want this trigger to fire only when after Insert this field 'source'
 has value = 'from', otherwise trigger should not be fired at all.
 Just wondering if its really possible?

No, and it seems pretty silly as a feature request.  Why don't you just
put the test in the trigger, and have it do nothing when you don't want
it to do anything?

   regards, tom lane



Re: [GENERAL] Predicted lifespan of different PostgreSQL branches

2007-01-28 Thread Dave Page


 --- Original Message ---
 From: Peter Eisentraut [EMAIL PROTECTED]
 To: pgsql-general@postgresql.org
 Sent: 28/01/07, 17:39:00
 Subject: Re: [GENERAL] Predicted lifespan of different PostgreSQL branches
 
 Dave Page wrote:
  Also, three just seems like a sensible number to maintain. I kinda
  like Magnus' idea to put older releases into a sort of 'retired' mode
  though, and build only the binaries for PostgreSQL itself.
 
 But would that give people who have previously used the full installer 
 an upgrade path (that doesn't break everything around it)?

Yes - they'd just unpack the archive over their install directory. Might screw 
up the permissions though, and wouldn't include the docs :-( We'd certainly 
need to try it out thoroughly first...

/D

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Limit on number of users in postgresql?

2007-01-28 Thread Mark Walker
OK, let me think.  In my situation, I'm writing an accounting app.  A 
typical situation would be a standard user would be able to update data 
in a timesheet while an administrator would be able to approve the time 
sheet.  If I gave the standard user access to the timesheet header 
table, they would also have access to an approval field, so I'd need 
something like column level security.  I could put in a trigger where 
I check the permissions of someone attempting to set the approval 
column.  Yes, I think that would probably work.


I think you're correct.  I can do pretty much what I want without 
developing a proxy server.  That's what I think I'll do, my original 
plan of adding users with different roles.  It still gives me the 
creeps, allowing lots of people direct access to my server.


Thanks.


Bill Moran wrote:

Furface [EMAIL PROTECTED] wrote:
  
Thanks Tom.  You know I thought about this approach a little more.  I 
don't think there's a simple answer to this security problem short of 
placing a proxy server application between the clients and the 
database.  The problem with giving database role accounts to each and 
every user is that the users now have uncontrolled access to the 
database.



Ummm ... huh?

PostgreSQL has a pretty nice security model that gives you a great deal
of control over what users have access to:
http://www.postgresql.org/docs/8.2/static/user-manag.html

The only thing that's missing is row-level granularity.  There's at least
one project out there supporting that, and you can also simulate it with
clever usage of stored procedures and the ability to run them with the
permissions of the definer instead of the executer.

-Bill

  



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] explain analyse much slower than actual query

2007-01-28 Thread Phil Endecott
Thanks for the quick reply Tom.

Tom Lane wrote:
Phil Endecott spam_from_postgresql_general ( at ) chezphil ( dot ) org 
writes:
 I was not patient enough to wait for the remaining explain-analyse results,
 but I feel that there is a linear slowdown of about 60x between the raw 
 query and the explain-analyse version.

 Slow gettimeofday() ... fairly common on desktop-grade PC hardware :-(.

It's actually a virtual machine, and I seem to recall reading something about 
the
virtualised gettimeofday() being slow.  OK, that explains it.  Thanks.

 But the peculiar behaviour of explain-analyse is really a distraction from
 the fact that the query is slow, especially when the limit value is large.
 
 You need a hashed subplan for NOT IN to work reasonably fast.  The
 fact you're not getting one suggests you either have to raise work_mem,
 or you're using some weird datatype that doesn't support hashing.

It's an int, and yes, increasing work_mem makes it use a hashed subplan:

QUERY PLAN  
  
--
 Limit  (cost=4209.76..4213.61 rows=10 width=4) (actual time=5432.840..5461.518 
rows=10 loops=1)
   -  Seq Scan on messages  (cost=4209.76..11608.23 rows=19218 width=4) 
(actual time=5432.776..5460.859 rows=10 loops=1)
 Filter: (NOT (hashed subplan))
 SubPlan
   -  Seq Scan on part_tsearch  (cost=0.00..4115.01 rows=37901 
width=4) (actual time=0.390..2984.783 rows=37907 loops=1)
 Total runtime: 5468.817 ms

So presumably work_mem must be greater than some function of the size of the 
table in the subquery.  Is there some way to work that out?  This (virtual) 
machine doesn't have an enormous amount of RAM so I like to keep settings 
like this as high as necessary but no higher.

If I understand it correctly, it is still doing a sequential scan on 
part_tsearch that does not terminate early due to the limit clause.  So 
I'm still seeing run times that are rather worse than I think should be 
possible.  Can it not step through the indexes in the way that it does 
for a Merge Join until it has got enough results to satisfy the limit, 
and then terminate?


Thanks,

Phil.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] counting query

2007-01-28 Thread garry saddington
On Sun, 2007-01-28 at 09:57 -0600, Ron Johnson wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 01/28/07 07:05, garry saddington wrote:
  I have a table definition such as:
  
  CREATE TABLE attendance
  (
attendanceid serial primary key,
entered date DEFAULT current_date NOT NULL,
absent boolean,
authorization text default 'N',
timeperiod char(2) check(timeperiod in('AM','PM')),
days varchar(10),
studentid int,
unique(entered,timeperiod,studentid)
  )
  
  Which is used to record school attendance data. I am now trying to write
  a query to identify trends in absences by counting the days column and
  returning any student that has repeated absences on certain days. I am
  struggling to return anything that does not need further manipulation in
  Python before being useful.
  Does anyone have any ideas?
 
 When you say certain days, you mean days of the week?
 
 If so, create a view like:
 CREATE VIEW V_DAY_ABSENCES AS
 SELECT ENTERED,
AUTHORIZATION,
TIMEPERIOD,
DAYS,
STUDENTID,
DOW(CAST(ENTERED AS TIMESTAMP)) AS WEEKDAY
 FROM ATTENDANCE
 WHERE ABSENT = TRUE;
 
 Then, this query should do what you want:
 SELECT STUDENTID,
TIMEPERIOD,
WEEKDAY,
COUNT(*)
 FROM V_DAY_ABSENSES
 GROUP BY STUDENTID,
  TIMEPERIOD,
  WEEKDAY
 HAVING COUNT(*)  3;
Thank you, this works great. But I have another problem: Is it possible
to identify absences in consecutive weeks on the same day. EG. If a
pupil has a pattern of having every monday AM off school, how could that
be identified?
Regards
Garry


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] explain analyse much slower than actual query

2007-01-28 Thread Tom Lane
Phil Endecott [EMAIL PROTECTED] writes:
 If I understand it correctly, it is still doing a sequential scan on 
 part_tsearch that does not terminate early due to the limit clause.  So 
 I'm still seeing run times that are rather worse than I think should be 
 possible.  Can it not step through the indexes in the way that it does 
 for a Merge Join until it has got enough results to satisfy the limit, 
 and then terminate?

Nope, there is not that much intelligence about NOT IN.

You could possibly manually rewrite the thing as a LEFT JOIN
with a WHERE inner-join-key IS NULL clause.  This would probably
lose if most of the outer relation's rows join to many inner rows,
though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] PostgreSQL data loss

2007-01-28 Thread Merlin Moncure

On 1/26/07, BluDes [EMAIL PROTECTED] wrote:

Hi everyone,
  I have a problem with one of my costomers.
I made a program that uses a PostgreSQL (win32) database to save its data.
My customer claims that he lost lots of data reguarding his own clients
and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants to
blame someone else, obviously I can't prove it.


I've been working with PostgreSQL since early 7.1 on dozens of
projects and I've had maybe two or three cases of data corruption that
were not explained by hardware failure or something like that (and
even these cases were debatable since I was not in direct control of
the server).  Both of those cases had side effects...the corruption
busted something else which sent immediate red flags that something
was wrong.

I think your customer is CYA.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] counting query

2007-01-28 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/28/07 15:18, garry saddington wrote:
 On Sun, 2007-01-28 at 09:57 -0600, Ron Johnson wrote:
 On 01/28/07 07:05, garry saddington wrote:
[snip]
 When you say certain days, you mean days of the week?

 If so, create a view like:
 CREATE VIEW V_DAY_ABSENCES AS
 SELECT ENTERED,
AUTHORIZATION,
TIMEPERIOD,
DAYS,
STUDENTID,
DOW(CAST(ENTERED AS TIMESTAMP)) AS WEEKDAY
 FROM ATTENDANCE
 WHERE ABSENT = TRUE;

 Then, this query should do what you want:
 SELECT STUDENTID,
TIMEPERIOD,
WEEKDAY,
COUNT(*)
 FROM V_DAY_ABSENSES
 GROUP BY STUDENTID,
  TIMEPERIOD,
  WEEKDAY
 HAVING COUNT(*)  3;

 Thank you, this works great. But I have another problem: Is it possible
 to identify absences in consecutive weeks on the same day. EG. If a
 pupil has a pattern of having every monday AM off school, how could that
 be identified?

I'd use the T_CALENDAR table, modified for your purposes.  (It's a
static that we create on every database.)  We populate it with 22
years of dates.  You'll have to write a small procedure to do it.

CREATE TABLE T_CALENDAR (
DATE_ANSI DATE,
YEAR_NUM  SMALLINT,
MONTH_NUM SMALLINT,
DAY_OF_MONTH  SMALLINT,
DAY_OF_WEEK   SMALLINT,
JULIAN_DAYSMALLINT,
DAY_OF_WEEK   SMALLINT,
IS_SCHOOL_DAY BOOL,
SCHOOL_YEAR   SMALLINT,  -- 2006 for the 2006/07 school year
SCHOOL_MONTH  SMALLINT); -- 1 for August, 2 for September, etc

Then, join T_CALENDAR to ATTENDANCE, WHERE DATE_ANSI = ENTERED
AND DAY_OF_WEEK = 1
AND IS_SCHOOL_DAY = TRUE
AND SCHOOL_YEAR = 2006;

Making that join into a view and then, as Joris suggested, connect
it to a spreadsheet.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFvRmAS9HxQb37XmcRAvpcAKDQ7kjs9/rMb39w5JYRRTl65mYoKQCfVVUm
NTv6r6Kzu8T5D+SS8vxwFjs=
=VDXa
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] text storage and parsing errors

2007-01-28 Thread tom
For what may be really strange reasons I am trying to store emails  
into a database table.  The idea is to take the entire original  
message and store it into a table with two columns, an serial primary  
key column and a column for the message.


Originally I thought I would just use column type text but changed  
that to bytea based on the fact that the dbmail project uses bytea  
for reasons of character support.


I'm running into problems with perl throwing an error of :'invalid  
input syntax for type bytea'.
I have not yet been able to capture a message for any further testing  
because the error tends to destroy the message in process...


the way that I'm using perl is to do a full prepare and execute  
statements, which as I understand perl, will do all the character  
escaping necessary to store the message.  meaning, If I have  
characters like (') or (`) they should be escaped when they are  
entered into the SQL parameter.  I am wondering if this is indeed the  
case.


But I'm looking for any suggestions or considerations when trying to  
d this approach to avoid this kind of text parsing problem.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] MULE_INTERNAL translation to win1250

2007-01-28 Thread NTPT

Hi.

I have a strange problem  in postgres 8.1.4  (gentoo 64bit on AMD64
platform)

My database is  created  vith LATIN-2 encoding for  correct vieving of
nacional specific characters ( czech language  )

inside  code of my php application is setting client encoding to win1250
because I need output of  query in this encoding.

On some parts of data I got an error :

Query failed: ERROR: character 0x829a of encoding MULE_INTERNAL has no
equivalent in WIN1250

Without set client_encoding to win1250 query works. I am curious why there
is a MULE_INTERNAL  mentioned even when \l+  say that corresponding database
is created with  (and even all  the cluster)  LATIN2 encoding.

Strange enough that ALL INSERTS  are done with WIN1250 client encoding too.
May be a bug in charset translation routines of postgres ?


And how can I  repair it, preferable in whole  database ?


Thanx for help.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] MULE_INTERNAL translation to win1250

2007-01-28 Thread Tom Lane
NTPT [EMAIL PROTECTED] writes:
 Without set client_encoding to win1250 query works. I am curious why there
 is a MULE_INTERNAL  mentioned even when \l+  say that corresponding database
 is created with  (and even all  the cluster)  LATIN2 encoding.

The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to
reduce duplication of code.  It shouldn't make any difference to the end
result though.  Are you sure that the characters you're using are
supposed to have representations in both character sets?

 May be a bug in charset translation routines of postgres ?

If you think that, you need to provide us with the exact codes that are
being mistranslated and what you think they should translate to.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Limit on number of users in postgresql?

2007-01-28 Thread Ron Peterson
On Sun, Jan 28, 2007 at 01:21:09PM -0500, Bill Moran wrote:

 The only thing that's missing is row-level granularity.  There's at least
 one project out there supporting that, and you can also simulate it with
 clever usage of stored procedures and the ability to run them with the
 permissions of the definer instead of the executer.

You can also use rules to protect rows.  E.g.

CREATE RULE atable__lock_user_insert
AS ON INSERT TO atable
WHERE
  CURRENT_USER != 'mysuper'
AND
  new.username != CURRENT_USER
DO INSTEAD nothing;

CREATE RULE atable__lock_user_update
AS ON UPDATE TO atable
WHERE
  CURRENT_USER != 'mysuper'
AND
  old.username != CURRENT_USER
DO INSTEAD nothing;

CREATE RULE atable__lock_user_delete
AS ON DELETE TO atable
WHERE
  CURRENT_USER != 'mysuper'
AND
  old.username != CURRENT_USER
DO INSTEAD nothing;

-- 
Ron Peterson
https://www.yellowbank.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] text storage and parsing errors

2007-01-28 Thread Daniel Verite
tom wrote:

 the way that I'm using perl is to do a full prepare and execute  
 statements, which as I understand perl, will do all the character  
 escaping necessary to store the message.  meaning, If I have  
 characters like (') or (`) they should be escaped when they are  
 entered into the SQL parameter.  I am wondering if this is indeed the  
 case.
 
 But I'm looking for any suggestions or considerations when trying to  
 d this approach to avoid this kind of text parsing problem.

Assuming you're using DBD::Pg and a placeholder for the bytea parameter of your
query, you must specify its type explicitly, like this:

use DBD::Pg qw(:pg_types);
[...]
$s=$dbh-prepare(INSERT INTO the_table(contents) VALUES(?));
$s-bind_param(1, $the_data, { pg_type = DBD::Pg::PG_BYTEA });
$s-execute;

Hope this helps,

-- 
 Daniel
 PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] PostgreSQL data loss

2007-01-28 Thread Harpreet Dhaliwal

While making POC (proof of concept) for any project, we clearly mention at
the end of the document that loss of data is not going to be our
responsibility and thats how we guys save our ass right in the begening.
What happened with you has happened with us many a times but our bold and
italicized lines about data loss have always saved us. I suggest you
something like this for your future projects.
Hope this helps.
Regards


On 1/28/07, Merlin Moncure [EMAIL PROTECTED] wrote:


On 1/26/07, BluDes [EMAIL PROTECTED] wrote:
 Hi everyone,
   I have a problem with one of my costomers.
 I made a program that uses a PostgreSQL (win32) database to save its
data.
 My customer claims that he lost lots of data reguarding his own clients
 and that those data had surely been saved on the database.
 My first guess is that he is the one who deleted the data but wants to
 blame someone else, obviously I can't prove it.

I've been working with PostgreSQL since early 7.1 on dozens of
projects and I've had maybe two or three cases of data corruption that
were not explained by hardware failure or something like that (and
even these cases were debatable since I was not in direct control of
the server).  Both of those cases had side effects...the corruption
busted something else which sent immediate red flags that something
was wrong.

I think your customer is CYA.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



Re: [GENERAL] MULE_INTERNAL translation to win1250

2007-01-28 Thread Tatsuo Ishii
MULE_INTERNAL is used for an intermediate encoding between LATIN2 and
WIN1250. The error message indicates that 0x9a of LATIN2 cannot be
mapped to WIN1250.

You can see 0x00 in the position for 0x9a (between 0x99 and 0x9b) in
the encoding map in
src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c,
which indicates nothing is corresponding to LATIN2 0x9a.  If you know
what should be mapped for LATIN2 0x9a, please let know us.

static const unsigned char iso88592_2_win1250[] = {
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x00, 0x8B, 0x00, 0x00, 0x00, 0x00,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x00, 0x9B, 0x00, 0x00, 0x00, 0x00,
0xA0, 0xA5, 0xA2, 0xA3, 0xA4, 0xBC, 0x8C, 0xA7,
0xA8, 0x8A, 0xAA, 0x8D, 0x8F, 0xAD, 0x8E, 0xAF,
0xB0, 0xB9, 0xB2, 0xB3, 0xB4, 0xBE, 0x9C, 0xA1,
0xB8, 0x9A, 0xBA, 0x9D, 0x9F, 0xBD, 0x9E, 0xBF,
0xC0, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7,
0xC8, 0xC9, 0xCA, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF,
0xD0, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, 0xD7,
0xD8, 0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE, 0xDF,
0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7,
0xE8, 0xE9, 0xEA, 0xEB, 0xEC, 0xED, 0xEE, 0xEF,
0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
};
--
Tatsuo Ishii
SRA OSS, Inc. Japan

 Hi.
 
 I have a strange problem  in postgres 8.1.4  (gentoo 64bit on AMD64
 platform)
 
 My database is  created  vith LATIN-2 encoding for  correct vieving of
 nacional specific characters ( czech language  )
 
 inside  code of my php application is setting client encoding to win1250
 because I need output of  query in this encoding.
 
 On some parts of data I got an error :
 
 Query failed: ERROR: character 0x829a of encoding MULE_INTERNAL has no
 equivalent in WIN1250
 
 Without set client_encoding to win1250 query works. I am curious why there
 is a MULE_INTERNAL  mentioned even when \l+  say that corresponding database
 is created with  (and even all  the cluster)  LATIN2 encoding.
 
  Strange enough that ALL INSERTS  are done with WIN1250 client encoding too.
 May be a bug in charset translation routines of postgres ?
 
 
 And how can I  repair it, preferable in whole  database ?
 
 
 Thanx for help.
 
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org/
 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] tsearch2, locale, UTF-8 and Windows

2007-01-28 Thread Pierre Thibaudeau

Just to pinpoint the meaning of my dismay, let me add one comment to my
previous post.

In the What'sNew document for tsearch2 with 8.2
http://www.sai.msu.su/~megera/wiki/Tsearch2WhatsNew
we read:


Don't forget to initdb cluster with correct utf8-locale !
initdb -D /usr/local/pgsql-dev/data.el_utf8 --locale=el_GR.utf8


I have never seen any detail of what was meant by the correct locale.
Clearly if I was using a database with French content on a French system, I
would intuitively choose fr_FR.utf8 as the locale, but if my database
contains texts in several languages (see quoted post below), I don't know on
what factor to base my choice of initdb locale.

Suggestions?

On Jan 28, 3:28 am, [EMAIL PROTECTED] (Pierre Thibaudeau) wrote:

[...] I cannot figure out what the latest status is concerning the
default locale on a Windows UTF-8 database under PostgreSQL 8.2.

[...] I have a UTF-8 database containing information in five
different European languages (English, French, Spanish, German and
Italian).  I am coding on a Windows system with locale French_Canada.1252.
The server that will soon run the database will likely have locale

en_US...


Re: [GENERAL] MULE_INTERNAL translation to win1250

2007-01-28 Thread Michael Fuhr
On Sun, Jan 28, 2007 at 06:33:16PM -0500, Tom Lane wrote:
 NTPT [EMAIL PROTECTED] writes:
  May be a bug in charset translation routines of postgres ?
 
 If you think that, you need to provide us with the exact codes that are
 being mistranslated and what you think they should translate to.

I wonder if the OP is doing something like this:

test= SELECT getdatabaseencoding();
 getdatabaseencoding 
-
 LATIN2
(1 row)

test= SHOW client_encoding;
 client_encoding 
-
 win1250
(1 row)

test= CREATE TABLE test (t text);
CREATE TABLE
test= INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a
INSERT 0 1
test= SELECT * FROM test;
ERROR:  character 0x829a of encoding MULE_INTERNAL has no equivalent in 
WIN1250

The intent might be that E'\202\232' is a string in the client's
encoding, where it would represent the same characters as Unicode
U+201A SINGLE LOW-9 QUOTATION MARK, U+0161 LATIN SMALL LETTER S
WITH CARON (I'm using Unicode as the pivot for convenience).  But
the backend is handling the string in the database's encoding, where
it represents U+0082,U+009A, which are control characters that
don't have mappings in win1250; hence the conversion error when the
client tries to read the data.

Just a guess.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Load balancing across disks

2007-01-28 Thread Paul Lambert
Im in the process of finalising my conversion from M$ SQL server to 
Postgres - all of which I'm very happy about so far.


The database I work with has 37 tables, 5 of which run into the order of 
tens of millions of records and approximately another 10 can run into 
millions depending on the size of the customers main system. The DB is 
going to be expanded to have another 15 or so tables when I step up 
replication from my primary OpenVMS based application, some of these 
will also have extremely high usage - again depending on the size of the 
customers main system.


In order to balance disk load and ensure faster data access, my current 
SQL server setup has the data spread across 3 physical disk devices. One 
question I would like to know which I can't find in the documentation 
I've been reading is if Postgres has any similar data distribution 
abilities.


I.e. can I create a data file on D drive which holds tables a, b and e, 
and a data file on E drive which holds tables c, d and f.


If this is possible, could someone point me to some documentation so I 
can experiment a little.


If not possible, I guess I'll have to upgrade to some faster hardware... 
if they'll be willing to give me money for that ;)


BTW, I'm using 8.2 on M$ Weenblows (Yes I know weenblows sucks, but i 
don't have sufficient unix/linux/other platform Postgres runs on 
experience to run the db on another server I'll wait for someone to 
port it to OpenVMS ;) and then use it on that (I know, I'm dreaming) - 
OpenVMS makes unixish systems look like they have the reliability of 
weenblows, I'll tell you that much :D)


Cheers.

--
Paul Lambert
Database Administrator
AutoLedgers


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Load balancing across disks

2007-01-28 Thread Douglas McNaught
Paul Lambert [EMAIL PROTECTED] writes:

 I.e. can I create a data file on D drive which holds tables a, b and
 e, and a data file on E drive which holds tables c, d and f.

 If this is possible, could someone point me to some documentation so I
 can experiment a little.

Read the doc section on tablespaces.

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] MULE_INTERNAL translation to win1250

2007-01-28 Thread NTPT
I made a some future investigation. I find and identified an exact line in 
databse. Exact column that cause a problem, I am able to select column  into 
testtable while in testtable  it retain  its bad behavior. fortunally, 
this row  does not contain vital
data so I can drop it rather without a  bigger problem, but I would like to 
know why


I am able to  identify a single character that cause a problem in real data 
and in testtable  too.  (rather character combination using substring 
function - it seems that in certain point it take two characters as  single 
16bit one ) but I am not able to reproduce this behavior on fresh table 
using insert and  select statements. Please give me a  some tip where to 
search and what else informations to provide.


thank you.

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]

To: NTPT [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Sent: Monday, January 29, 2007 12:33 AM
Subject: Re: [GENERAL] MULE_INTERNAL translation to win1250



NTPT [EMAIL PROTECTED] writes:
Without set client_encoding to win1250 query works. I am curious why 
there
is a MULE_INTERNAL  mentioned even when \l+  say that corresponding 
database

is created with  (and even all  the cluster)  LATIN2 encoding.


The conversions between LATIN2 and WIN1250 go by way of MULE_INTERNAL to
reduce duplication of code.  It shouldn't make any difference to the end
result though.  Are you sure that the characters you're using are
supposed to have representations in both character sets?


May be a bug in charset translation routines of postgres ?


If you think that, you need to provide us with the exact codes that are
being mistranslated and what you think they should translate to.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend



--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.410 / Virus Database: 268.17.12/654 - Release Date: 27.1.2007





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Load balancing across disks

2007-01-28 Thread Joshua D. Drake
Paul Lambert wrote:
 Im in the process of finalising my conversion from M$ SQL server to
 Postgres - all of which I'm very happy about so far.
 
 The database I work with has 37 tables, 5 of which run into the order of
 tens of millions of records and approximately another 10 can run into
 millions depending on the size of the customers main system. The DB is
 going to be expanded to have another 15 or so tables when I step up
 replication from my primary OpenVMS based application, some of these
 will also have extremely high usage - again depending on the size of the
 customers main system.
 
 In order to balance disk load and ensure faster data access, my current
 SQL server setup has the data spread across 3 physical disk devices. One
 question I would like to know which I can't find in the documentation
 I've been reading is if Postgres has any similar data distribution
 abilities.
 
 I.e. can I create a data file on D drive which holds tables a, b and e,
 and a data file on E drive which holds tables c, d and f.

You are looking for tablespaces :).


 
 If this is possible, could someone point me to some documentation so I
 can experiment a little.

http://www.postgresql.org/docs/8.0/interactive/sql-createtablespace.html


 
 If not possible, I guess I'll have to upgrade to some faster hardware...
 if they'll be willing to give me money for that ;)
 
 BTW, I'm using 8.2 on M$ Weenblows (Yes I know weenblows sucks, but i
 don't have sufficient unix/linux/other platform Postgres runs on
 experience to run the db on another server I'll wait for someone to
 port it to OpenVMS ;) and then use it on that (I know, I'm dreaming) -
 OpenVMS makes unixish systems look like they have the reliability of
 weenblows, I'll tell you that much :D)
 
 Cheers.
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] MULE_INTERNAL translation to win1250

2007-01-28 Thread Michael Fuhr
On Sun, Jan 28, 2007 at 07:27:12PM -0700, Michael Fuhr wrote:
 I wonder if the OP is doing something like this:
[...]
 test= INSERT INTO test VALUES (E'\202\232'); -- \202=0x82, \232=0x9a

Another possibility, perhaps more likely, is that some connection
didn't set client_encoding to win1250 before it inserted win1250-encoded
data; in that case the data was probably treated as LATIN2 and
stored without conversion.  When a connection with client_encoding
set to win1250 tries to fetch the data, conversion is attempted and
fails because some LATIN2 values don't have win1250 mappings.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] convert(USING utf8_to_iso_8859_15) on Windows

2007-01-28 Thread Pierre Thibaudeau

Is this a documented phenomenon with the convert function?  The first
result is what's expected:

SELECT convert('Gregoire' USING utf8_to_iso_8859_15);
Gregoire

But I don't understand the next result, when I put an acute accent over the
first e:

SELECT convert('Grégoire' USING utf8_to_iso_8859_15);


(The output is an empty string.)

Likewise, whenever I enter a string containing non-ASCII characters, the
convert function outputs an empty string.  Same results when I change the
conversion type from UTF8 to any other encoding which accepts those
non-ASCII characters...  (When I try a conversion to an encoding that
doesn't accept the characters, I get an error message, and that's normal.)

My setup is as follows:
PostgreSQL 8.2.1 on WindowsXP
The database has UTF8 encoding.
SHOW lc_ctype; gives: French_Canada.1252

Is my problem related to Windows' lack of UTF8 support?  I thought those
problems were solved with version 8.2...