Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-08 Thread Tino Wildenhain
Am Sonntag, den 08.05.2005, 14:30 +0900 schrieb Tatsuo Ishii:
...
 Actually I myself thought as you are before. Later I found that it was
 not so good idea. People already have invalid encoded data in their
 precious database and have very hard time to migrate to newer version
 of PostgreSQL because of encoding validation.
...
 Now I need to upgrade to newer PostgreSQL within 1 day. I know I have
 to fix wrong encoded field but it will take more than 1 day. So I
 would like to import the data first then fix wrong encoded field on
 running database since I can reach members by phone or email even with
 wrong encoded address field...

Well, if you are so in a hurry you better not migrate. Postgres is
proud of validating the input and to have no invalid data. So if
you have invalid data, better fix it.

 I saw this kind of situation in the real world and that's why we
 developed the patches.

Why not developing a helper for contrib to help reencoding the
database instead?

Regards
Tino


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-08 Thread John Hansen
Tatsuo Ishii wrote:
 Sent: Sunday, May 08, 2005 3:31 PM
 To: John Hansen
 Cc: [EMAIL PROTECTED]; pgsql-general@postgresql.org; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem
 
  Tatsuo Ishii wrote:
   Sent: Sunday, May 08, 2005 12:01 PM
   To: [EMAIL PROTECTED]
   Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem
   
   We have developed patches which relaxes the character 
 validation so 
   that PostgreSQL accepts invalid characters. It works like this:
  
  That is just plain 100% wrong!!
 
  Under no circumstances should there be invalid data in a database.
  And if you're trying to make a database of invalid data, 
 then at least 
  encode it using a valid encoding.
  
  In fact, I've proposed strengthening the validation 
 routines for UTF-8.
 
 Actually I myself thought as you are before. Later I found 
 that it was not so good idea. People already have invalid 
 encoded data in their precious database and have very hard 
 time to migrate to newer version of PostgreSQL because of 
 encoding validation.
 
 Think about this kind of situation:
 
 There is a table t1(member_id integer primary key, 
 member_name text, address text, phone text, email text). I 
 have to reach each member by either adress, phone or email. 
 Unfortunately some of address field have wrong encoded data. 
 In this case I will use phone or email to reach them. 
 
 Now I need to upgrade to newer PostgreSQL within 1 day. I 
 know I have to fix wrong encoded field but it will take more 
 than 1 day. So I would like to import the data first then fix 
 wrong encoded field on running database since I can reach 
 members by phone or email even with wrong encoded address field...

Actually would be very simple, create function isvalidutf8(text) in your
preferred language.
C source is available from unicode.org.
Create function converttoutf8(text) using whatever code is required to
transform the _wrong_ encoding (SQL_ASCII - UTF8 for instance) to
utf-8.
Update table set field=converttoutf8(field) where !isvalidutf8(field);

Now sit back and relax while your invalid data is converted to utf-8.

When done, pg_dump the database, upgrade, and reload.

This should take less than a day.

 I saw this kind of situation in the real world and that's why 
 we developed the patches.
 --
 Tatsuo Ishii
 
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-08 Thread Karsten Hilbert
   Is there a way to store the name in raw binary? If so, would this not 
 be safe because to postgresql it should no longer matter what data is or 
 represents, right? Maybe there is a third option I am not yet concidering?
In the backup rename the file and add another file
 file-name-with-invalids-set-to-?.README
which explains the issue, details the steps taken (eg
renaming) and offers some sort of raw binary value of the
original file name.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-08 Thread Karsten Hilbert
 There is a table t1(member_id integer primary key, member_name text,
 address text, phone text, email text). I have to reach each member by
 either adress, phone or email. Unfortunately some of address field
 have wrong encoded data. In this case I will use phone or email to
 reach them. 
 
 Now I need to upgrade to newer PostgreSQL within 1 day. I know I have
 to fix wrong encoded field but it will take more than 1 day. So I
 would like to import the data first then fix wrong encoded field on
 running database since I can reach members by phone or email even with
 wrong encoded address field...
You could add a bytea field in the migration process which
holds the original data and put some sloppily corrected data
inside the real field for the time being. Then, once corrected
set the additional bytea field to NULL. Pretty much the
concept of staging tables for data import.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] SQL History

2005-05-08 Thread Michael Ben-Nes
You can set postgresql.conf to log all queries to a file.
Sadly you cant distinguish between the databases.
Hrishikesh Deshmukh wrote:
Hi All,
Does Postgresql have a sql history of queries run capability?
Any ideas/pointers will be a great help. Could one capture the order
and SQL queries run?!
Thanks in advance and your help is appreciated.
Hrishi
---(end of broadcast)---
TIP 8: explain analyze is your friend
 

--
--
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Cel: 972-52-8555757
Fax: 972-4-6990098
http://www.canaan.net.il
--
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[GENERAL] Shorthand for foreign key indices

2005-05-08 Thread Brendan Jurd
Does anybody else think it would be cool if you could use a shorthand
expression for creating an index on a foreign key?  I think it's fair
to say that in the majority of cases, if you're using a foreign key,
you're going to want an index on it.

I know that it was decided a fair few releases ago to stop creating an
implicit index for each foreign key, and that's cool, but should the
creation syntax really require a separate statement for every single
FK you want to index?

Something like ...

CREATE TABLE foo (
 foo int NOT NULL REFERENCES bar INDEX
);

... would be marvellous

My apologies if this has been answered before, but a search of the
-general mailing list was not fruitful.

BJ

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] SQL History

2005-05-08 Thread Tom Lane
Michael Ben-Nes [EMAIL PROTECTED] writes:
 You can set postgresql.conf to log all queries to a file.
 Sadly you cant distinguish between the databases.

See log_line_prefix in 8.0.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Shorthand for foreign key indices

2005-05-08 Thread Jim C. Nasby
On Mon, May 09, 2005 at 02:05:14AM +1000, Brendan Jurd wrote:
 CREATE TABLE foo (
  foo int NOT NULL REFERENCES bar INDEX
 );
 
 ... would be marvellous

I agree that it would be handy. Another possibility is throwing a NOTICE
or even WARNING if you create a foreign key that isn't covered by an
index.

 My apologies if this has been answered before, but a search of the
 -general mailing list was not fruitful.

FYI, often times new ideas are only discussed on -hackers, so you should
search there as well.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[GENERAL] starting

2005-05-08 Thread wayne schlemitz
I am interested in seting up postgres 7.4 or 8.0
which is best on SuSe 8.0 professional and where
should
I put the tar.gz for unzip and install?

Thank you.



Yahoo! Mail
Stay connected, organized, and protected. Take the tour:
http://tour.mail.yahoo.com/mailtour.html


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Howto prevent write based on date

2005-05-08 Thread Fmiser
rumor has it that Richard wrote:

 Franco Bruno Borghesi wrote:
  You could write a trigger like this:
  
  
  CREATE OR REPLACE FUNCTION checkDate() RETURNS TRIGGER LANGUAGE
  'plpgsql' AS ' DECLARE
  limitDate DATE DEFAULT current_date-''1 year''::INTERVAL;
  BEGIN
  IF (OLD.date=limitDate) THEN
  RAISE EXCEPTION ''Cannot change record.'';
  END IF;
  
  RETURN NEW;
  END;
  ';
  
  CREATE TRIGGER _tg1 BEFORE UPDATE OR DELETE ON  FOR EACH ROW
  EXECUTE PROCEDURE checkDate();
  
  This should do the job :)

I feel like I'm 1 meter tall and the wave on the beach are more than 3
meters high...

Thank you for the code.

It looks like it would need to be a part of any access to the database,
so I imagine I would have to figure out where to put it into the
front-end code. Is this correct?

 Franco's trigger function should do the job just fine, but speaking
 from  experience you'll want to take further steps.
 
 Take a backup of the database, restore it to another system and also 
 burn a copy to a CD.
 
 If the auditors come round it's simple to explain what you've done and
 
 demonstrate the data on the CD and backup system match. It also means 
 that should any changes occur to your historical data despite your 
 precautions you can prove that this happened.

Ahh, that is a good idea! A database dump is a part of my daily backup.

I guess I could also make a read-only copy of the year-end as a second
database on the same system. That could make it easy to keep an eye on
the main database so I (hopefully) spot any ripples that reach back to
last year.

Thanks for the help!   Philip

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-08 Thread Oliver Jowett
Madison Kelly wrote:
  Is there a way to store the name in raw binary? 
Yes: bytea.
-O
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq