[HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
Hello,

combination functions to_ascii and convert_to is broken now. Problem
is in convert_to function. It doesn't support 8bit output encoding.

Current workaround:

CREATE FUNCTION to_ascii(bytea, name)
RETURNS text AS 'to_ascii_encname' LANGUAGE internal;

SELECT to_ascii(convert_to('Příliš žlutý kůň', 'latin2'),'latin2');

I don't expect column collate for 8.4, so we need to have workable
to_ascii function.

I propose function to_ascii(text, name) that internally convert text
from utf8 encoding when it's necessary.

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Heikki Linnakangas

Zdenek Kotala wrote:
Current content of control file is insufficient to check if database is 
compatible with postgres server. 


It is? Do you have an example of where it's insufficient?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question regarding the database page layout.

2008-08-11 Thread Ryan Bradetich
Hello all,

I have been digging into the database page layout (specifically the tuples)
to ensure the unsigned integer types were consuming the proper storage.
While digging around, I found one thing surprising:

It appears the heap tuples are padded at the end to the MAXALIGN distance.

Below is my data that I used to come to this conclusion.
(This test was performed on a 64-bit system with --with-blocksize=32).

The goal was to compare data from comparable type sizes.
The first column indicates the type (char, uint1, int2, uint2, int4, and
uint4),
the number in () indicates the number of columns in the table.

The Length is from the .lp_off field in the ItemId structure.
The Offset is from the .lp_len field in the ItemId structure.
The Size is the offset difference.

char (1)Length  Offset  Sizechar (9)
Length   Offset   Size
25  32736
32 33  32728  40
25  32704
32 33  32688  40
25  32672
32 33  32648  40
25
32640  33  32608

uint1 (1)   Length   Offset   Sizeuint1 (9)
Length   Offset  Size
 25  32736
3233  32728 40
 25  32704
3233  32688 40
 25  32672
3233  32648 40
 25  32640
33  32608

int2 (1) Length   Offset   Sizeint2 (5)
Length   Offset  Size
 26  32736
3234  32728 40
 26  32704
3234  32688 40
 26  32672
3234  32648 40
 26
32640  34  32608

uint2 (1)Length  Offset   Sizeunt2 (5)
Length   Offset  Size
 26  32736
3234  32728 40
 26  32704
3234  32688 40
 26  32672
3234  32648 40
 26
32640  34  32608

int4 (1)   Length  Offset  Sizeint4 (3)
Length Offset  Size
 28  32736
32  36 32728 40
 28  32704
32  36 32688 40
 28  32672
32  36 32648 40
 28
32640   36 32608

uint4 (1) Length   Offset  Sizeuint4 (3)
Length  Offset  Size
  28  32736
32 36 32728 40
  28  32704
32 36 32688 40
  28  32672
32 36 32648 40
  28
32640  36 32608

From the documentation at:
http://www.postgresql.org/docs/8.3/static/storage-page-layout.html
and from the comments in src/include/access/htup.h I understand the user
data (indicated by t_hoff)
must by a multiple of MAXALIGN distance, but I did not find anything
suggesting the heap tuple itself
had this requirement.

After a cursory glance at the HeapTupleHeaderData structure, it appears it
could be aligned with
INTALIGN instead of MAXALIGN.  The one structure I was worried about was the
6 byte t_ctid
structure.  The comments in src/include/storage/itemptr.h file indicate the
ItemPointerData structure
is composed of 3 int16 fields.  So everthing in the HeapTupleHeaderData
structure is 32-bits or less.

I am interested in attempting to generate a patch if this idea appears
feasible.   The current data
set I am playing with it would save over 3GB of disk space.  (Back of the
envelope calculations
indicate that 5% of my current storage is consumed by this padding.   My
tuple length is 44 bytes.)

Thanks,

- Ryan


Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-11 Thread Heikki Linnakangas

Jan Urbański wrote:

Heikki Linnakangas wrote:

Jan Urbański wrote:
Another thing are cstring_to_text_with_len calls. I'm doing them so I 
can use bttextcmp in bsearch(). I think I could come up with a 
dedicated function to return text Datums and WordEntries (read: 
non-NULL terminated strings with a given length).


Just keep them as cstrings and use strcmp. We're only keeping the 
array sorted so that we can binary search it, so we don't need proper 
locale-dependent collation. Note that we already assume that two 
strings ('text's) are equal if and only if their binary 
representations are equal (texteq() uses strcmp).


OK, I got rid of cstring-text calls and memory contexts as I went 
through it. The only tiny ugliness is that there's one function used for 
qsort() and another for bsearch(), because I'm sorting an array of texts 
(from pg_statistic) and I'm binary searching for a lexeme (non-NULL 
terminated string with length).


It would be nice to clean that up a bit. I think you could convert the 
lexeme to a TextFreq, or make the TextFreq.element a text * instead of 
Datum (ie., detoast it with PG_DETOAST_DATUM while you build the array 
for qsort).



My medicore gprof skills got me:
0.000.22   5/5   OidFunctionCall4 [37]
[38]28.40.000.22   5 tssel [38]
0.000.17   5/5 get_restriction_variable [40]
0.030.01   5/10  pg_qsort [60]
0.000.00   5/5   get_attstatsslot [139]

Hopefully that says that the qsort() overhead is small compared to 
munging through the planner Node.


I'd like to see a little bit more testing of that. I can't read gprof 
myself, so the above doesn't give me much confidence. I use oprofile, 
which I find is much simpler to use.


I think the worst case scenario is with statistics_target set to 
maximum, with a simplest possible query and simplest possible tsquery.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-08-11 Thread Magnus Hagander
Stephen Frost wrote:
 Magnus,
 
 * Magnus Hagander ([EMAIL PROTECTED]) wrote:
 Yeah. I think the question there is just - how likely is it that the
 same installation actually uses 1 authentication method. Personally, I
 think it's not very uncommon at all, but fact remains that as long as
 you only use one of them at a time, using a shared file doesn't matter.
 
 We use multiple authentication types *alot*..  ident, md5, kerberos, and
 gssapi are all currently in use on our systems.  ident for local unix
 logins, md5 for 'role' accounts and software the doesn't understand
 kerberos, kerberos/gssapi depending on the age of the client library
 connecting.  Oh, and we use pam too..  We use some mappings now with
 ident, which I'd expect to continue to do, and I've got definite plans
 for mappings under Kerberos/GSSAPI once it's supported..

Ok. Good to know - if you want to use it, there are bound to be a number
of others who would like it as well :)


 It wouldn't be very easy/clean to do that w/o breaking the existing
 structure of pg_ident though, which makes me feel like using seperate
 files is probably the way to go.
 Yeah, thats my feeling as well. Now, can someone figure out a way to do
 that without parsing the file in the postmaster? (And if we do parse it,
 there's no point in not storing the parsed version, IMHO). And if not,
 the question it comes down to is which is most important - keeping the
 parsing away, or being able to do this ;-)
 
 I don't have an answer wrt the parsing issue, but I definitely want to
 be able to do this. :)

Right.

I guess one option would be to load the map file at runtime in the
backend, and not pre-load/cache it from the postmaster. But that seems
rahter sub-optimal to me. Other thoughts?

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-08-11 Thread Gregory Stark
Ryan Bradetich [EMAIL PROTECTED] writes:

 After a cursory glance at the HeapTupleHeaderData structure, it appears it
 could be aligned with INTALIGN instead of MAXALIGN. The one structure I was
 worried about was the 6 byte t_ctid structure. The comments in
 src/include/storage/itemptr.h file indicate the ItemPointerData structure is
 composed of 3 int16 fields. So everthing in the HeapTupleHeaderData
 structure is 32-bits or less.

Sure, but the tuple itself could contain something with double alignment. If
you have a bigint or double in the tuple then heap_form_tuple needs to know
where to put it so it ends up at right alignment.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multiple anyelement types

2008-08-11 Thread ITAGAKI Takahiro
I'm working on improvements of orafce.
http://pgfoundry.org/projects/orafce

I found postgres supports only one type of anyelement at one time
when I added nvl2() and decode(). I'd like to use multiple types of
anyelement something like:

  template  typename Expr, typename Ret 
  CREATE FUNCTION nvl(Expr, Ret, Ret) RETURNS Ret

  template  typename Expr, typename Ret 
  CREATE FUNCTION decode(Expr, Expr, Ret, ..., Ret) RETURNS Ret

I don't mean to propose the above C++-like syntax, but such feature
is important to develop a generic porting tool. Two independent
anyelements are enough for me, but three or more might be better
for general use.

What syntax is suitable for postgres? Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
Current content of control file is insufficient to check if database 
is compatible with postgres server. 


It is? Do you have an example of where it's insufficient?



Current control file contain following information (related to page layout):

maxAlign
blcksz
toast_max_chunk_size

But you don't have control how aligned is each member of data structure.

By my opinion -fipa-struct-reorg GCC option could break structure. And maybe 
there are more compiler magic switches and optimization on different platforms 
which can modify structure alignment or member order. It probably does not 
happen often but footprint should protect people to shot himself.


Zdenek

PS: And of course toast_max_chunk_size is not insufficient as well. There are 
more constants like MaxHeapTupleSize and so on, but it is different story.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple anyelement types

2008-08-11 Thread Pavel Stehule
He

2008/8/11 ITAGAKI Takahiro [EMAIL PROTECTED]:
 I'm working on improvements of orafce.
 http://pgfoundry.org/projects/orafce

 I found postgres supports only one type of anyelement at one time
 when I added nvl2() and decode(). I'd like to use multiple types of
 anyelement something like:

  template  typename Expr, typename Ret 
  CREATE FUNCTION nvl(Expr, Ret, Ret) RETURNS Ret

  template  typename Expr, typename Ret 
  CREATE FUNCTION decode(Expr, Expr, Ret, ..., Ret) RETURNS Ret

 I don't mean to propose the above C++-like syntax, but such feature
 is important to develop a generic porting tool. Two independent
 anyelements are enough for me, but three or more might be better
 for general use.


it's good idea - I though so 2 independent types are enough:
anyelement2, enyarray2. If you are C coder, you should use any type.

regards
Pavel Stehule

 What syntax is suitable for postgres? Comments welcome.

 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Heikki Linnakangas

Zdenek Kotala wrote:

By my opinion -fipa-struct-reorg GCC option could break structure.


That option would probably break a lot of things. Like our 
variable-sized array as last field of a struct hacks.


And 
maybe there are more compiler magic switches and optimization on 
different platforms which can modify structure alignment or member 
order. It probably does not happen often but footprint should protect 
people to shot himself.


We depend on a certain member order and alignment rules. If we're 
worried about that, we should add checks in configure instead, to barf 
if you try to use such options.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Gregory Stark
Zdenek Kotala [EMAIL PROTECTED] writes:

 By my opinion -fipa-struct-reorg GCC option could break structure. And maybe
 there are more compiler magic switches and optimization on different platforms
 which can modify structure alignment or member order. It probably does not
 happen often but footprint should protect people to shot himself.

My version of GCC doesn't have that option, what does it do?

If structure members aren't in the order they're defined and padded to the
alignment they're declared to have in pg_type then Postgres catalogs won't
work anyways.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Heikki Linnakangas

Gregory Stark wrote:

Zdenek Kotala [EMAIL PROTECTED] writes:

By my opinion -fipa-struct-reorg GCC option could break structure. And maybe
there are more compiler magic switches and optimization on different platforms
which can modify structure alignment or member order. It probably does not
happen often but footprint should protect people to shot himself.


My version of GCC doesn't have that option, what does it do?


From gcc man page:


   -fipa-struct-reorg
   Perform structure reorganization optimization, that change C-like
   structures layout in order to better utilize spatial locality.
   This transformation is affective for programs containing arrays of
   structures.  Available in two compilation modes: profile-based
   (enabled with -fprofile-generate) or static (which uses built-in
   heuristics).  Require -fipa-type-escape to provide the safety of
   this transformation.  It works only in whole program mode, so it
   requires -fwhole-program and -combine to be enabled.  Structures
   considered cold by this transformation are not affected (see
   --param struct-reorg-cold-struct-ratio=value).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Andrew Dunstan



Pavel Stehule wrote:

Hello,

combination functions to_ascii and convert_to is broken now. Problem
is in convert_to function. It doesn't support 8bit output encoding.

Current workaround:

CREATE FUNCTION to_ascii(bytea, name)
RETURNS text AS 'to_ascii_encname' LANGUAGE internal;

SELECT to_ascii(convert_to('Příliš žlutý kůň', 'latin2'),'latin2');

I don't expect column collate for 8.4, so we need to have workable
to_ascii function.

I propose function to_ascii(text, name) that internally convert text
from utf8 encoding when it's necessary.cheers


  


convert_to is not broken. It returns a bytea, and it is up to you to 
de-escape it if you get the text representation.


We are surely not going to go back to a situation where we have 
functions returning text in any encoding other than the database 
encoding. That becomes a vehicle for storing wrongly encoded data in the 
database, and we have just gone through the exercise of plugging those 
holes. I privately predicted when we did this work that it might 
motivate people who had been abusing convert_to to get proper support 
for multiple encodings done. That is the right way to go, not re-opening 
holes we have just very deliberately plugged.




cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:

By my opinion -fipa-struct-reorg GCC option could break structure.


That option would probably break a lot of things. Like our 
variable-sized array as last field of a struct hacks.


Yes, it is extreme case.

And maybe there are more compiler magic switches and optimization on 
different platforms which can modify structure alignment or member 
order. It probably does not happen often but footprint should protect 
people to shot himself.


We depend on a certain member order and alignment rules. If we're 
worried about that, we should add checks in configure instead, to barf 
if you try to use such options.




You are able to check order, but you cannot complain about structure member 
alignment during configure time. But if you have two binaries which you get from 
two sources then you need to verify that both binaries has same structure footprint.


Similar is 32/64 bit compilation. It is handled on x86 by MAXALIGN but MAXALIGN 
is same on SPARC for both binaries, but I'm not sure if it works correctly.


Any other usage is to protect developers to make a mistake and break silently 
compatibility, but it should be caught by --footprint switch.


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Skimming icc warnings on mongoose

2008-08-11 Thread Gregory Stark

I happened to be skimming the compile warnings on mongoose, an icc buildfarm
member and noticed two interesting warnings.


icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic 
-I../../../src/include/snowball -I../../../src/include/snowball/libstemmer 
-I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  
-c -o stem_UTF_8_swedish.o ./libstemmer/stem_UTF_8_swedish.c
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic 
-I../../../src/include/snowball -I../../../src/include/snowball/libstemmer 
-I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  
-c -o stem_UTF_8_turkish.o ./libstemmer/stem_UTF_8_turkish.c
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -shared  
dict_snowball.o api.o utilities.o stem_ISO_8859_1_danish.o 
stem_ISO_8859_1_dutch.o stem_ISO_8859_1_english.o stem_ISO_8859_1_finnish.o 
stem_ISO_8859_1_french.o stem_ISO_8859_1_german.o stem_ISO_8859_1_hungarian.o 
stem_ISO_8859_1_italian.o stem_ISO_8859_1_norwegian.o stem_ISO_8859_1_porter.o 
stem_ISO_8859_1_portuguese.o stem_ISO_8859_1_spanish.o 
stem_ISO_8859_1_swedish.o stem_ISO_8859_2_romanian.o stem_KOI8_R_russian.o 
stem_UTF_8_danish.o stem_UTF_8_dutch.o stem_UTF_8_english.o 
stem_UTF_8_finnish.o stem_UTF_8_french.o stem_UTF_8_german.o 
stem_UTF_8_hungarian.o stem_UTF_8_italian.o stem_UTF_8_norwegian.o 
stem_UTF_8_porter.o stem_UTF_8_portuguese.o stem_UTF_8_romanian.o 
stem_UTF_8_russian.o stem_UTF_8_spanish.o stem_UTF_8_swedish.o 
stem_UTF_8_turkish.o   -L../../../src/port   -o dict_snowball.so
ld: warning: creating a DT_TEXTREL in object.


There are a few instances of this. I think it indicates we are either spelling
-fpic wrong or including some .o object that was not compiled with -fpic
(perhaps something from the ports directory?).

This can result in a performance drain when the .so is linked in since the
text segment can't be mapped directly from the file and instead needs to be
read in and adjusted to the base address to which it was loaded.

It isn't unique to tsearch stuff either, plpgsql has the same warning:


make[4]: Entering directory 
`/home/data/local/jeremyd/postgres/buildfarm/root/HEAD/pgsql.4376/src/pl/plpgsql/src'
bison -y -d  gram.y
mv -f y.tab.c ./pl_gram.c
mv -f y.tab.h ./pl.tab.h
LC_CTYPE=C /usr/bin/flex  -o'pl_scan.c' scan.l
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -I. 
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/et  -c -o pl_gram.o pl_gram.c
pl_scan.c(1944) : (col. 2) remark: LOOP WAS VECTORIZED.
pl_scan.c(2495) : (col. 2) remark: LOOP WAS VECTORIZED.
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -I. 
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/et  -c -o pl_handler.o pl_handler.c
pl_handler.c(199) : (col. 3) remark: LOOP WAS VECTORIZED.
pl_handler.c(200) : (col. 3) remark: LOOP WAS VECTORIZED.
pl_handler.c(206) : (col. 4) remark: LOOP WAS VECTORIZED.
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -I. 
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/et  -c -o pl_comp.o pl_comp.c
pl_comp.c(157) : (col. 3) remark: LOOP WAS VECTORIZED.
pl_comp.c(211) : (col. 4) remark: LOOP WAS VECTORIZED.
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -I. 
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/et  -c -o pl_exec.o pl_exec.c
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -I. 
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2  
-I/usr/include/et  -c -o pl_funcs.o pl_funcs.c
icc -O3 -xN -parallel -ip -mp1 -fno-strict-aliasing -g -fpic -shared  pl_gram.o 
pl_handler.o pl_comp.o pl_exec.o pl_funcs.o   -L../../../../src/port   -o 
plpgsql.so
ld: warning: creating a DT_TEXTREL in object.



And then there's this -- plen is a size_t...

plpython.c(1420): warning #186: pointless comparison of unsigned integer with 
zero
Assert(plen = 0  plen  mlen);


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
Hello

2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:

 Hello,

 combination functions to_ascii and convert_to is broken now. Problem
 is in convert_to function. It doesn't support 8bit output encoding.

 Current workaround:

 CREATE FUNCTION to_ascii(bytea, name)
 RETURNS text AS 'to_ascii_encname' LANGUAGE internal;

 SELECT to_ascii(convert_to('Příliš žlutý kůň', 'latin2'),'latin2');

 I don't expect column collate for 8.4, so we need to have workable
 to_ascii function.

 I propose function to_ascii(text, name) that internally convert text
 from utf8 encoding when it's necessary.cheers




 convert_to is not broken. It returns a bytea, and it is up to you to
 de-escape it if you get the text representation.


ok, I talked about combination convert_to and to_ascii. to_ascii
doesn't support bytea, what is probably correct. We cannot use
descape, because it remove 8bit. This issue was noticed more times -
http://archives.postgresql.org/pgsql-general/2008-06/msg00495.php


 We are surely not going to go back to a situation where we have functions
 returning text in any encoding other than the database encoding. That
 becomes a vehicle for storing wrongly encoded data in the database, and we
 have just gone through the exercise of plugging those holes. I privately
 predicted when we did this work that it might motivate people who had been
 abusing convert_to to get proper support for multiple encodings done. That
 is the right way to go, not re-opening holes we have just very deliberately
 plugged.


to_ascii isn't related to multiple encodings. And actually there is
only one man who works on it. We will be happy for database collation
in 8.4. So without any change this feature will be broken more than
two years.

Regards
Pavel



 cheers

 andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skimming icc warnings on mongoose

2008-08-11 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I happened to be skimming the compile warnings on mongoose, an icc buildfarm
 member and noticed two interesting warnings.
 ld: warning: creating a DT_TEXTREL in object.

 There are a few instances of this. I think it indicates we are either spelling
 -fpic wrong or including some .o object that was not compiled with -fpic
 (perhaps something from the ports directory?).

If it were the latter, quite a number of platforms would just fail
outright, I believe.  As for spelling -fpic wrong, that seems unlikely
too considering that icc generally tries to pretend it's gcc.
Any other theories?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Heikki Linnakangas

Zdenek Kotala wrote:
Similar is 32/64 bit compilation. It is handled on x86 by MAXALIGN but 
MAXALIGN is same on SPARC for both binaries, but I'm not sure if it 
works correctly.


Are 32/64-bit binaries on Sparc incompatible, then? Does our control 
file check catch it? If not, what's causing it, and would the 
--footprint switch catch it?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:

 Hello,

 combination functions to_ascii and convert_to is broken now. Problem
 is in convert_to function. It doesn't support 8bit output encoding.

 Current workaround:

 CREATE FUNCTION to_ascii(bytea, name)
 RETURNS text AS 'to_ascii_encname' LANGUAGE internal;

 SELECT to_ascii(convert_to('Příliš žlutý kůň', 'latin2'),'latin2');

 I don't expect column collate for 8.4, so we need to have workable
 to_ascii function.

 I propose function to_ascii(text, name) that internally convert text
 from utf8 encoding when it's necessary.cheers




 convert_to is not broken. It returns a bytea, and it is up to you to
 de-escape it if you get the text representation.

One note - convert_to is correct. But we have to use to_ascii without
decode functions. It has same behave - convert from bytea to text.
Text in incorrect encoding is dafacto bytea. So correct to_ascii
function prototypes are:

to_ascii(text)
to_ascii(bytea, integer);
to_ascii(bytea, name);

Regards
Pavel Stehule

 We are surely not going to go back to a situation where we have functions
 returning text in any encoding other than the database encoding. That
 becomes a vehicle for storing wrongly encoded data in the database, and we
 have just gone through the exercise of plugging those holes. I privately
 predicted when we did this work that it might motivate people who had been
 abusing convert_to to get proper support for multiple encodings done. That
 is the right way to go, not re-opening holes we have just very deliberately
 plugged.



 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Andrew Dunstan



Pavel Stehule wrote:



One note - convert_to is correct. But we have to use to_ascii without
decode functions. It has same behave - convert from bytea to text.
Text in incorrect encoding is dafacto bytea. So correct to_ascii
function prototypes are:

to_ascii(text)
to_ascii(bytea, integer);
to_ascii(bytea, name);

  



What you have not said is how you propose to convert UTF8 to ASCII.

Currently to_ascii() converts a small number of single byte charsets to 
ASCII by folding the chars with high bits set, so what we get is a pure 
ASCII result which is safe in any server encoding, as they are all ASCII 
supersets.


But what conversion rule will you use for the gazillions of Unicode 
characters?


I honestly do not understand the use case for this at all.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Jan Urbański

Andrew Dunstan wrote:



Pavel Stehule wrote:



One note - convert_to is correct. But we have to use to_ascii without
decode functions. It has same behave - convert from bytea to text.
Text in incorrect encoding is dafacto bytea. So correct to_ascii
function prototypes are:

to_ascii(text)
to_ascii(bytea, integer);
to_ascii(bytea, name);

 



What you have not said is how you propose to convert UTF8 to ASCII.

Currently to_ascii() converts a small number of single byte charsets to 
ASCII by folding the chars with high bits set, so what we get is a pure 
ASCII result which is safe in any server encoding, as they are all ASCII 
supersets.


But what conversion rule will you use for the gazillions of Unicode 
characters?


I honestly do not understand the use case for this at all.


I do. Often clients want their searches to be 
accented-or-language-specific letters insensitive. So searching for 
'łódź' returns 'lodz'. So the use case is there (in fact, the lack of 
such facility made me consider not upgrading particular client to 8.3...).

Or maybe there's a better way to do it?

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Andrew Dunstan



Jan Urbański wrote:

Andrew Dunstan wrote:



Pavel Stehule wrote:



One note - convert_to is correct. But we have to use to_ascii without
decode functions. It has same behave - convert from bytea to text.
Text in incorrect encoding is dafacto bytea. So correct to_ascii
function prototypes are:

to_ascii(text)
to_ascii(bytea, integer);
to_ascii(bytea, name);

 



What you have not said is how you propose to convert UTF8 to ASCII.

Currently to_ascii() converts a small number of single byte charsets 
to ASCII by folding the chars with high bits set, so what we get is a 
pure ASCII result which is safe in any server encoding, as they are 
all ASCII supersets.


But what conversion rule will you use for the gazillions of Unicode 
characters?


I honestly do not understand the use case for this at all.


I do. Often clients want their searches to be 
accented-or-language-specific letters insensitive. So searching for 
'łódź' returns 'lodz'. So the use case is there (in fact, the lack of 
such facility made me consider not upgrading particular client to 
8.3...).

Or maybe there's a better way to do it?


Well, my first question would be Why aren't you using a database 
encoding that supports to_ascii()?


However, I suppose that your use case would support this signature:

   to_ascii(bytea, name)

where it would just error out if the encoding name were something other 
than LATIN1, LATIN2, LATIN9, or WIN1250.


But what would be the meaning of this?:

   to_ascii(bytea, integer)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
Similar is 32/64 bit compilation. It is handled on x86 by MAXALIGN but 
MAXALIGN is same on SPARC for both binaries, but I'm not sure if it 
works correctly.


Are 32/64-bit binaries on Sparc incompatible, then? Does our control 
file check catch it? If not, what's causing it, and would the 
--footprint switch catch it?




It is what I don't know. controlfile only says  incorrect checksum in control 
file. It seems to me that CRC computing does not work correctly. I check 
pg_control footprint and it is same for 32/64. It seems to me a false positive 
complain, but ...


Footprint switch should help show you why it is incompatible and also protect 
you during development to break some structure.



Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Heikki Linnakangas

Zdenek Kotala wrote:
It is what I don't know. controlfile only says  incorrect checksum in 
control file. It seems to me that CRC computing does not work 
correctly. I check pg_control footprint and it is same for 32/64. It 
seems to me a false positive complain, but ...


Endianness perhaps? Per comment in ControlFileData:


 * This data is used to check for hardware-architecture compatibility of
 * the database and the backend executable.  We need not check 
endianness
 * explicitly, since the pg_control version will surely look wrong to a
 * machine of different endianness, but we do need to worry about 
MAXALIGN
 * and floating-point format.  (Note: storage layout nominally also
 * depends on SHORTALIGN and INTALIGN, but in practice these are the 
same
 * on all architectures of interest.)



Footprint switch should help show you why it is incompatible and also 
protect you during development to break some structure.


Apparently the footprint switch didn't provide any insight into the 
Sparc 32/64-bit issue, so I'm not too impressed..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Jan Urbański

Andrew Dunstan wrote:



Jan Urbański wrote:

Andrew Dunstan wrote:



Pavel Stehule wrote:
What you have not said is how you propose to convert UTF8 to ASCII.

Currently to_ascii() converts a small number of single byte charsets 
to ASCII by folding the chars with high bits set, so what we get is a 
pure ASCII result which is safe in any server encoding, as they are 
all ASCII supersets.


But what conversion rule will you use for the gazillions of Unicode 
characters?


I honestly do not understand the use case for this at all.


I do. Often clients want their searches to be 
accented-or-language-specific letters insensitive. So searching for 
'łódź' returns 'lodz'. So the use case is there (in fact, the lack of 
such facility made me consider not upgrading particular client to 
8.3...).

Or maybe there's a better way to do it?


Well, my first question would be Why aren't you using a database 
encoding that supports to_ascii()?


Because I want UTF-8 in it ;) It's mostly LATIN2, but clients sometimes 
input Cyrillic, Greek or Hebrew letters, and sometimes use Unicode 
characters like (U+2026)  HORIZONTAL ELLIPSIS.


I'd like to have
to_ascii(text, [error_handling]) returns text

So no bytea, to_ascii would accept text that's legal in my current 
database encoding and return text in that encoding. And error_handling 
would be something like:
- 'error' (the default, throw an error if a character is untranslable to 
ASCII)

- 'ignore' (omit untranslable characters)
- 'transliterate' (do your best to transliterate the character, or leave 
it as it is if impossible).


Examples would include (assuming UTF-8 database)
to_ascii('łódź') - 'lodz'
to_ascii('china is written 中國') - ERROR
to_ascii('china is written 中國', 'ignore') - 'china is written '
to_ascii('china is written 中國', 'transliterate') - 'china is written 
zhong guo' (in an ideal world)
to_ascii('china is written 中國', 'transliterate') - 'china is written 
中國' (in reality)\


These would have the property, that:
to_ascii(X, 'ignore') is always pure ASCII data and never throws an error
to_ascii(X, 'transliterate') is sometimes non-ASCII data and never 
throws an error

to_ascii(X) is sometimes non-ASCII data and sometimes throws an error

It's something like PHP's iconv that can have //TRANSLIT or somesuch 
(forgive me for giving PHP as an example...). Now I'd love to hear 
people punch holes in my daydreaming design ;)


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: PageLayout footprint

2008-08-11 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
It is what I don't know. controlfile only says  incorrect checksum in 
control file. It seems to me that CRC computing does not work 
correctly. I check pg_control footprint and it is same for 32/64. It 
seems to me a false positive complain, but ...


Endianness perhaps? Per comment in ControlFileData:


No. It is same Endianness. SPARC allows to map memory with different endian, but 
it needs extra magic.



Zdenek



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:


 One note - convert_to is correct. But we have to use to_ascii without
 decode functions. It has same behave - convert from bytea to text.
 Text in incorrect encoding is dafacto bytea. So correct to_ascii
 function prototypes are:

 to_ascii(text)
 to_ascii(bytea, integer);
 to_ascii(bytea, name);





 What you have not said is how you propose to convert UTF8 to ASCII.

 Currently to_ascii() converts a small number of single byte charsets to
 ASCII by folding the chars with high bits set, so what we get is a pure
 ASCII result which is safe in any server encoding, as they are all ASCII
 supersets.

 But what conversion rule will you use for the gazillions of Unicode
 characters?

 I honestly do not understand the use case for this at all.


It's typical case in czech language, where some searchings are accents
insensitive - Stěhule, Stehule, Novotný, Novotny.

 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Jan Urbański wrote:

 Andrew Dunstan wrote:


 Pavel Stehule wrote:


 One note - convert_to is correct. But we have to use to_ascii without
 decode functions. It has same behave - convert from bytea to text.
 Text in incorrect encoding is dafacto bytea. So correct to_ascii
 function prototypes are:

 to_ascii(text)
 to_ascii(bytea, integer);
 to_ascii(bytea, name);





 What you have not said is how you propose to convert UTF8 to ASCII.

 Currently to_ascii() converts a small number of single byte charsets to
 ASCII by folding the chars with high bits set, so what we get is a pure
 ASCII result which is safe in any server encoding, as they are all ASCII
 supersets.

 But what conversion rule will you use for the gazillions of Unicode
 characters?

 I honestly do not understand the use case for this at all.

 I do. Often clients want their searches to be
 accented-or-language-specific letters insensitive. So searching for 'łódź'
 returns 'lodz'. So the use case is there (in fact, the lack of such facility
 made me consider not upgrading particular client to 8.3...).
 Or maybe there's a better way to do it?

 Well, my first question would be Why aren't you using a database encoding
 that supports to_ascii()?

 However, I suppose that your use case would support this signature:

   to_ascii(bytea, name)

 where it would just error out if the encoding name were something other than
 LATIN1, LATIN2, LATIN9, or WIN1250.

 But what would be the meaning of this?:

   to_ascii(bytea, integer)


it's symmetric. Nothing more.


 cheers

 andrew



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Andrew Dunstan



Pavel Stehule wrote:


But what would be the meaning of this?:

  to_ascii(bytea, integer)




it's symmetric. Nothing more.

  


Symmetric to what? What is the second argument supposed to be?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:

 But what would be the meaning of this?:

  to_ascii(bytea, integer)



 it's symmetric. Nothing more.



 Symmetric to what? What is the second argument supposed to be?


postgres=# \df to_ascii
   List of functions
   Schema   |   Name   | Result data type | Argument data types
+--+--+-
 pg_catalog | to_ascii | text | text
 pg_catalog | to_ascii | text | text, integer
 pg_catalog | to_ascii | text | text, name

postgres=# select to_ascii('pavel',8);
 to_ascii
--
 pavel
(1 row)


 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Andrew Dunstan



Pavel Stehule wrote:

2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:
  

Pavel Stehule wrote:


But what would be the meaning of this?:

 to_ascii(bytea, integer)




it's symmetric. Nothing more.


  

Symmetric to what? What is the second argument supposed to be?




postgres=# \df to_ascii
   List of functions
   Schema   |   Name   | Result data type | Argument data types
+--+--+-
 pg_catalog | to_ascii | text | text
 pg_catalog | to_ascii | text | text, integer
 pg_catalog | to_ascii | text | text, name

postgres=# select to_ascii('pavel',8);
 to_ascii
--
 pavel
(1 row)


  



Hmm. That's not documented, and I suspect shouldn't be there. Everywhere 
else pretty much that I can think of we pass the encoding as a name, and 
I think we should be consistent about it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-08-11 Thread Ryan Bradetich
Hello Greg,

On Mon, Aug 11, 2008 at 2:24 AM, Gregory Stark [EMAIL PROTECTED]wrote:

 Ryan Bradetich [EMAIL PROTECTED] writes:

  After a cursory glance at the HeapTupleHeaderData structure, it appears
 it
  could be aligned with INTALIGN instead of MAXALIGN. The one structure I
 was
  worried about was the 6 byte t_ctid structure. The comments in
  src/include/storage/itemptr.h file indicate the ItemPointerData structure
 is
  composed of 3 int16 fields. So everthing in the HeapTupleHeaderData
  structure is 32-bits or less.

 Sure, but the tuple itself could contain something with double alignment.
 If
 you have a bigint or double in the tuple then heap_form_tuple needs to know
 where to put it so it ends up at right alignment.


My first thought was we can still figure this out because the user data is
already
forced to be MAXALIGN.   Then I realized oh, since that is true then I am
still going
to eat the padding anyhow.  The padding would just move to one of two
places:
1. To MAXALIGN the user data or
2. To MAXALIGN the heap tuple header.

Thanks for the sanity check.  I was wrapped up in looking for alignment
requirements
in the HeapTupleHeaderData structure, I overlooked the significance of the
MAXALIGN
on the user data.

Thanks,

- Ryan

P.S.  Just for the archive, it seems this idea still may be workable (need
to look at the
heap_form_tuple in significant more detail) if/when someone implements the
proposal
to separate the physical storage from the column order.   That solution is a
bit more
than I am ready to tackle at the moment :)  Maybe in the future.


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:

 2008/8/11 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:


 But what would be the meaning of this?:

  to_ascii(bytea, integer)




 it's symmetric. Nothing more.




 Symmetric to what? What is the second argument supposed to be?



 postgres=# \df to_ascii
   List of functions
   Schema   |   Name   | Result data type | Argument data types
 +--+--+-
  pg_catalog | to_ascii | text | text
  pg_catalog | to_ascii | text | text, integer
  pg_catalog | to_ascii | text | text, name

 postgres=# select to_ascii('pavel',8);
  to_ascii
 --
  pavel
 (1 row)




 Hmm. That's not documented, and I suspect shouldn't be there. Everywhere
 else pretty much that I can think of we pass the encoding as a name, and I
 think we should be consistent about it.


I don't need it

regards
Pavel

 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: New Page API

2008-08-11 Thread Zdenek Kotala

I finished first prototype of new page API. It contains several new functions 
like:

Pointer PageGetUpperPointer(Page page);

void PageClearPrunable(Page page);
bool PageIsComprimable(Page page);

void PageReserveLinp(Page page);
void PageReleaseLinp(Page page);

LocationIndex PageGetLower(Page page);
LocationIndex PageGetUpper(Page page);
LocationIndex PageGetSpecial(Page page);
void PageSetLower(Page page, LocationIndex lower);
void PageSetUpper(Page page, LocationIndex lower);

Page PageGetTempPage(Page page, bool copy);
Page PageGetTempPageCopySpecial(Page page);
void PageRestoreTempPage(Page tempPage, Page oldPage);

Size PageGetSpecialSize(Page page);
Size PageGetDataSize(Page page);

bool PageLayoutIsValid(Page page);


The main point of the new API is to handle multi page layout versions. The 
current implementation don't uses any speed optimization and performance gap is 
about 5% (big thanks to Paul van den Bogaard for benchmarking). Finally I plan 
to implement hottest function like macro (or inline fn ;-) ) and because most 
structure members are located on same place I will remove extra switch in the code.


I also grab number of calls by DTrace and I got following result:

snip
  PageGetHeapFreeSpace984
  PageGetSpecialSize 1170
  PageGetFreeSpace   1200
  PageGetExactFreeSpace  1399
  PageGetUpper   1419
  PageGetLower   1618
  PageGetLSN 2936
  PageGetMaxOffsetNumber 5504
  PageGetSpecialPointer 13534
  PageGetItemId 71074
  PageGetItem   76629

I plan to remove PageGetItemId and replace it with any other function like 
PageGetHeapTuple and so on. The reason is that ItemId flags has been changed 
between version 3 and 4. And on many times it is called like


itemId = PageGetItemId();
PageGetItem(itemId);

I'm also thinking about add following function:

PageSetXLOG(page,TLI,LSN) - it should replace PageSetLSN();PageSetTLI(); 
sequence in code


I'm not happy with PageSetLower() function which is used in nbtree, but no idea 
yet how to improve it.


Please, let me know your comments. I attached prototype patch.

Thanks Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

diff -r 879ef18ad9cb -r 3e71cf34dced src/backend/access/gin/gindatapage.c
--- a/src/backend/access/gin/gindatapage.c	Tue Aug 05 21:28:29 2008 +
+++ b/src/backend/access/gin/gindatapage.c	Mon Aug 11 15:58:27 2008 +0200
@@ -445,7 +445,7 @@
 	char	   *ptr;
 	OffsetNumber separator;
 	ItemPointer bound;
-	Page		lpage = GinPageGetCopyPage(BufferGetPage(lbuf));
+	Page		lpage = PageGetTempPage(BufferGetPage(lbuf), true);
 	ItemPointerData oldbound = *GinDataPageGetRightBound(lpage);
 	int			sizeofitem = GinSizeOfItem(lpage);
 	OffsetNumber maxoff = GinPageGetOpaque(lpage)-maxoff;
diff -r 879ef18ad9cb -r 3e71cf34dced src/backend/access/gin/ginentrypage.c
--- a/src/backend/access/gin/ginentrypage.c	Tue Aug 05 21:28:29 2008 +
+++ b/src/backend/access/gin/ginentrypage.c	Mon Aug 11 15:58:27 2008 +0200
@@ -458,7 +458,7 @@
 leftrightmost = NULL;
 	static ginxlogSplit data;
 	Page		page;
-	Page		lpage = GinPageGetCopyPage(BufferGetPage(lbuf));
+	Page		lpage = PageGetTempPage(BufferGetPage(lbuf), true);
 	Page		rpage = BufferGetPage(rbuf);
 	Size		pageSize = PageGetPageSize(lpage);
 
diff -r 879ef18ad9cb -r 3e71cf34dced src/backend/access/gin/ginutil.c
--- a/src/backend/access/gin/ginutil.c	Tue Aug 05 21:28:29 2008 +
+++ b/src/backend/access/gin/ginutil.c	Mon Aug 11 15:58:27 2008 +0200
@@ -309,21 +309,6 @@
 	return entries;
 }
 
-/*
- * It's analog of PageGetTempPage(), but copies whole page
- */
-Page
-GinPageGetCopyPage(Page page)
-{
-	Size		pageSize = PageGetPageSize(page);
-	Page		tmppage;
-
-	tmppage = (Page) palloc(pageSize);
-	memcpy(tmppage, page, pageSize);
-
-	return tmppage;
-}
-
 Datum
 ginoptions(PG_FUNCTION_ARGS)
 {
diff -r 879ef18ad9cb -r 3e71cf34dced src/backend/access/gin/ginvacuum.c
--- a/src/backend/access/gin/ginvacuum.c	Tue Aug 05 21:28:29 2008 +
+++ b/src/backend/access/gin/ginvacuum.c	Mon Aug 11 15:58:27 2008 +0200
@@ -529,7 +529,7 @@
 	 * On first difference we create temporary page in memory
 	 * and copies content in to it.
 	 */
-	tmppage = GinPageGetCopyPage(origpage);
+	tmppage = PageGetTempPage(origpage, true);
 
 	if (newN  0)
 	{
diff -r 879ef18ad9cb -r 3e71cf34dced src/backend/access/gist/gist.c
--- a/src/backend/access/gist/gist.c	Tue Aug 05 21:28:29 2008 +
+++ b/src/backend/access/gist/gist.c	Mon Aug 

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote: 
 I believe that the optimizable cases for EXISTS are those where the
 EXISTS() is either at the top level of WHERE, or just underneath a
 NOT,
 
 The rest of the plan makes sense to me, but this part seems narrow. 
 There's probably a good reason for that which is beyond my depth, but
 attached is a view that is used for calculating statistics for a
 database which is primarily used for case management purposes.  If
 EXISTS could also be optimized in the contexts used there, it would be
 great.
 
I chewed on that for awhile.  We can certainly optimize EXISTS that's
appearing in the ON-condition of a regular JOIN, because that's not
really semantically different from a WHERE-condition.  But I don't think
it's going to be reasonable to improve EXISTS in outer-JOIN ON
conditions.  There are a couple of problems.  Consider

t1 LEFT JOIN t2
  ON (t1.f1 = t2.f2 AND
  EXISTS(SELECT 1 FROM t3 WHERE t3.f3 = t1.fx AND t3.f4 = t2.fy))

To implement this with the correct semantics, we'd have to have the
t1/t2 join and the t1/t2/t3 join going on in the same execution node,
with two different join behaviors (LEFT and SEMI).  There isn't any
way AFAICS to factor it into two separate steps.  That's unreasonably
complicated, and it's not clear that you'd get any better performance
anyway than the current implementation (which'd treat the EXISTS as
a subplan).

Even worse, let the EXISTS be a degenerate case:

t1 LEFT JOIN t2
  ON (t1.f1 = t2.f2 AND
  EXISTS(SELECT 1 FROM t3 WHERE t3.f3 = t1.fx));

We can't actually treat this EXISTS as a semijoin between t1 and t3,
either before or after the LEFT JOIN; because then the behavior would be
to drop t1 rows that have no t3 match, which is not what this query
specifies.

(Note: the other degenerate case, where the EXISTS depends only on
t2, *could* be optimized since we could just let the semijoin be
performed within the RHS of the LEFT JOIN.)

So this is not something I'm going to tackle; at least not this
devel cycle.

One small step we can take in this direction, though, is to improve the
planner's internal handling of the qual conditions for IN and EXISTS.
Right now the process is just to throw the sub-select into the main
range table and put the IN join conditions into the same place in WHERE
that the IN-clause was to start with.  The trouble with this is that the
distribute_quals_to_rels processing has no idea that there's anything
special about the IN join conditions.  We got away with that for the
limited case of IN clauses at the top level of WHERE, but it's become
clear to me over the weekend that this has no hope of working for NOT
EXISTS --- since that's effectively an outer join, it absolutely has to
have the same kinds of qual-scheduling constraints as ordinary outer
joins do.  So we need a data structure that distribute_quals_to_rels can
work with.  What I think needs to happen is that the initial pass that
pulls up optimizable IN/EXISTS sub-selects should not merge the
SubLink's replacement qual clauses seamlessly, but put them underneath a
new node type, say FlattenedSubLink, that retains knowledge of the
join it's representing.  The FlattenedSubLink would survive only as far
as distribute_quals_to_rels, which would distribute out the contained
qual conditions instead of the FlattenedSubLink itself --- but only
after marking them properly for the outer-join restrictions.  This
representation would make it feasible to work with IN/EXISTS that are
inside JOIN ON conditions, which the present representation using a
single in_info_list really can't do.  The semantic issues are still
there but at least the representation isn't getting in the way ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Type Categories for User-Defined Types

2008-08-11 Thread David E. Wheeler

On Aug 10, 2008, at 20:58, David E. Wheeler wrote:

Just realized that I forgot to add the DROP FUNCTION statements to  
the uninstall script. New patch attached.


And this one also includes the casts I added. :-)

Best,

David


citext_casting3.patch.gz
Description: GNU Zip compressed data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Peter Eisentraut
On Monday 11 August 2008 16:23:29 Jan Urbański wrote:
 Often clients want their searches to be
 accented-or-language-specific letters insensitive. So searching for
 'łódź' returns 'lodz'. So the use case is there (in fact, the lack of
 such facility made me consider not upgrading particular client to 8.3...).

These are valid ideas, but then please design a new function that addresses 
your use case in a well-defined way, and don't overload questionable old 
interfaces for new purposes.

In the Unicode standard you can find well-defined methods to decompose 
characters into diacritic marks, and then you could strip them off.  But this 
has nothing to do with ASCII or UTF8 or encodings.  Cyrillic characters can 
have diacritic marks as well, for example.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Decibel!

On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:

* has no set operations (UNION etc), grouping, set-returning functions
in the SELECT list, LIMIT, or a few other funny cases



Couldn't union/union all be treated as

EXISTS(a)
OR EXISTS(b)
...

Or am I missing some detail with NULLS?

Personally, I'd rather write it as separate EXISTS clauses rather  
than using UNION, but perhaps others have a different preference...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Visibility Groups

2008-08-11 Thread Decibel!

On Aug 7, 2008, at 9:49 AM, Robert Haas wrote:

This proposal sounds like it would target batch jobs, because those
are the kinds of jobs that where you can predict in advance what
tables will be needed.  I don't know whether my personal set of
problems with MVCC syncs up with anyone else's, but this is rarely how
I get bitten.  Usually, what happens is that a user session (psql or
web server connection) gets left in a transaction for days or weeks.
Now the batch jobs (which are doing lots of updates) start creating
tons of bloat, but it's not their snapshot that is causing the
problem.



We have some cron'd scripts that check for long-running queries, idle  
in transaction and just plain idle. The scripts will kill sessions  
when the sit in different states for too long. It would be nice if  
the database could handle this (no, statement_timeout won't work,  
because the user can just change it to whatever they want), but I  
don't know how many other people have this need. I guess I could at  
least put the scripts up on pgFoundry...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-11 Thread Decibel!

On Aug 3, 2008, at 9:57 PM, Robert Treat wrote:
I think a variation on this could be very useful in development  
and test
environments. Suppose it raised a warning or notice if the cost  
was over
the limit. Then one could set a limit of a few million on the  
development

and test servers and developers would at least have a clue that they
needed to look at explain for that query. As it is now, one can  
exhort
them to run explain, but it has no effect.  Instead we later see  
queries
killed by a 24 hour timeout with estimated costs ranging from  
until they

unplug the machine and dump it to until the sun turns into a red
giant.


Great argument. So that's 4 in favour at least.



Not such a great argument. Cost models on development servers can  
and often
are quite different from those on production, so you might be  
putting an

artifical limit on top of your developers.



We should have an approved API for dumping stats from one database  
and loading them into another. pg_dump needs this as well, IMO.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-11 Thread Decibel!

On Aug 4, 2008, at 3:49 PM, Simon Riggs wrote:

On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:

On Monday 04 August 2008 03:50:40 daveg wrote:



And you'll note, I specifically said that a crude tool is better than
nothing. But your completely ignoring that a crude tool can often
end-up as a foot-gun once relased into the wild.


The proposal is for an option with no consequences when turned off. We
respect your right not to use it. What is the danger exactly?

If we cancel stupid queries before people run them, everybody is a
winner. Even the person who submitted the stupid query, since they  
find

out faster.


I could *really* use this. Unfortunately, we have a lot of folks  
writing some horrible queries and killing our slave databases. I'd  
*love* to be able to throw out any queries that had insane limits...



We'll have to do something with enable_seqscan, BTW, chaps.


My thought would be to back the cost penalty out if we end up with a  
seqscan anyway.


Speaking of which, there is a semi-related issue... if you have a  
large enough table the fixed-size cost we add to a seqscan won't be  
enough to make an alternative plan come out cheaper. Instead of  
adding a fixed cost, I think we should multiply by the estimated  
number of rows.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Kevin Grittner
Our Internet connectivity failed as this was being sent.  It looks
like at least the list didn't get it, so here goes another try. 
Apologies for any duplication.
 
-Kevin
 
 
 Tom Lane [EMAIL PROTECTED] wrote: 
 
 I chewed on that for awhile.  We can certainly optimize EXISTS
that's
 appearing in the ON-condition of a regular JOIN, because that's not
 really semantically different from a WHERE-condition.
 
Good to hear.  I thought that might be doable.  :-)
 
 But I don't think
 it's going to be reasonable to improve EXISTS in outer-JOIN ON
 conditions.  There are a couple of problems.  Consider
 
The discussion did make the difficulties clear.
 
 So this is not something I'm going to tackle; at least not this
 devel cycle.
 
Fair enough.
 
 One small step we can take in this direction, though, is to improve
the
 planner's internal handling of the qual conditions for IN and
EXISTS.
 Right now the process is just to throw the sub-select into the main
 range table and put the IN join conditions into the same place in
WHERE
 that the IN-clause was to start with.  The trouble with this is that
the
 distribute_quals_to_rels processing has no idea that there's
anything
 special about the IN join conditions.  We got away with that for the
 limited case of IN clauses at the top level of WHERE, but it's
become
 clear to me over the weekend that this has no hope of working for
NOT
 EXISTS --- since that's effectively an outer join, it absolutely has
to
 have the same kinds of qual-scheduling constraints as ordinary outer
 joins do.  So we need a data structure that distribute_quals_to_rels
can
 work with.  What I think needs to happen is that the initial pass
that
 pulls up optimizable IN/EXISTS sub-selects should not merge the
 SubLink's replacement qual clauses seamlessly, but put them
underneath a
 new node type, say FlattenedSubLink, that retains knowledge of the
 join it's representing.  The FlattenedSubLink would survive only as
far
 as distribute_quals_to_rels, which would distribute out the
contained
 qual conditions instead of the FlattenedSubLink itself --- but only
 after marking them properly for the outer-join restrictions.  This
 representation would make it feasible to work with IN/EXISTS that
are
 inside JOIN ON conditions, which the present representation using a
 single in_info_list really can't do.  The semantic issues are still
 there but at least the representation isn't getting in the way ...
 
Just curious, is that something for this cycle, or a TODO item?
 
Thanks for looking at this.  The one part I'm not sure about is where
the CASE/EXISTS in the SELECT value list fits into this discussion. 
It seems conceptually similar to the OUTER JOIN, but sort of a special
case, so I'm not sure what you had in mind there.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Column level privileges was:(Re: [HACKERS] Extending grant insert on tables to sequences)

2008-08-11 Thread Jaime Casanova
On 7/25/08, Stephen Frost [EMAIL PROTECTED] wrote:
 * Jaime Casanova ([EMAIL PROTECTED]) wrote:
  ok, seems this is the last one for column level patch
  http://archives.postgresql.org/pgsql-patches/2008-04/msg00417.php
 
  any one working it...

 Yes, I'm working on it

hi, any work on it? may i help?

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Guayaquil - Ecuador
Cel. (593) 87171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Gregory Stark
Decibel! [EMAIL PROTECTED] writes:

 On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:
 * has no set operations (UNION etc), grouping, set-returning functions
 in the SELECT list, LIMIT, or a few other funny cases


 Couldn't union/union all be treated as

 EXISTS(a)
 OR EXISTS(b)

Kind of confused by what you mean here. Can you give an example?

The usual transformation to consider with UNION is to transform

SELECT ... WHERE x OR y

into

SELECT ...
 WHERE x
UNION ALL
SELECT ...
 WHERE y AND NOT x

(modulo handling NULLs properly)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum and TOAST tables

2008-08-11 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Note that this patch allows a toast table to be vacuumed by the user:
  I don't have a problem with that, but if anyone thinks this is not a
  good idea, please speak up.
 
 The permissions on pg_toast will prevent anyone but a superuser from
 doing that anyway, so it's no big deal.
 
 Possibly more interesting is what happens if someone drops the parent
 table while VACUUM is working independently on the toast table.  Does
 DROP take exclusive lock on a toast table?  Probably, but it needs
 to be checked.

Yes, it does.  So the autovacuum process working on the TOAST table
would get cancelled by the DROP TABLE, TRUNCATE, CLUSTER.  The one ALTER
TABLE variant that I think needs to handle the TOAST table is ALTER
TYPE, but I think it should work that it is being vacuumed concurrently.
REINDEX TABLE should perhaps also be concerned because it does reindex
the toast table, but it grabs the lock before actually doing the
reindexing so I don't think there's a problem here.


BTW only now I notice that CLUSTER leaves the toast table name in bad
shape: if you create a table with OID X its TOAST table is named
pg_toast_X.  If you then cluster this table, a new transient table gets
created with OID Y; the TOAST table for Y is named pg_toast_Y, and then
this new TOAST table is used as the new TOAST table for the original
table X.  So you end up with table OID X having TOAST table pg_toast_Y.

This is not a concern from the system standpoint because it doesn't use
this name for anything, but people looking at the catalogs manually may
be taken by surprise.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum and TOAST tables

2008-08-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 BTW only now I notice that CLUSTER leaves the toast table name in bad
 shape: if you create a table with OID X its TOAST table is named
 pg_toast_X.  If you then cluster this table, a new transient table gets
 created with OID Y; the TOAST table for Y is named pg_toast_Y, and then
 this new TOAST table is used as the new TOAST table for the original
 table X.  So you end up with table OID X having TOAST table pg_toast_Y.

Hmm, we could probably fix that if we made the cluster operation swap
the physical storage of the two toast tables, rather than swapping the
tables altogether.  I agree it's not critical but it could be confusing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum and TOAST tables

2008-08-11 Thread Tom Lane
I wrote:
 Hmm, we could probably fix that if we made the cluster operation swap
 the physical storage of the two toast tables, rather than swapping the
 tables altogether.  I agree it's not critical but it could be confusing.

On second thought, I think it *could* lead to a visible failure.
Suppose the OID counter wraps around and the OID that had been used for
the temporary CLUSTER table gets assigned to a new table.  If that table
needs a toast table, it'll try to create one using the name that is
already in use.  We have defenses against picking an OID that's in use,
but none for toast table names.  So I think it's indeed worth fixing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum and TOAST tables

2008-08-11 Thread Alvaro Herrera
Tom Lane wrote:

 On second thought, I think it *could* lead to a visible failure.
 Suppose the OID counter wraps around and the OID that had been used for
 the temporary CLUSTER table gets assigned to a new table.  If that table
 needs a toast table, it'll try to create one using the name that is
 already in use.  We have defenses against picking an OID that's in use,
 but none for toast table names.  So I think it's indeed worth fixing.

Okay, I'll see to it after committing this autovacuum stuff.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-11 Thread Casey Duncan

On Aug 4, 2008, at 1:04 PM, daveg wrote:

Ok, that is a different use case where an error seems very useful.  
What
about slightly extending the proposal to have the severity of  
exceeding

the limit configurable too. Something like:

  costestimate_limit = 10 # default 0 to ignore limit
  costestimate_limit_severity = error # debug, notice, warning,  
error


I very much like this idea, and I would definitely use something like  
this on our production oltp app. We had a case recently where a query  
joining two large tables was very fast 99.9% of the time (i.e., a few  
ms), but for particular, rare key combinations the planner would make  
a poor choice turning into a multi-minute monster. It ran longer than  
the web server timeout, and the client was programmed to retry on  
error, essentially causing a database DoS.


The monster version of the plan had an outrageous cost estimate, many  
orders of magnitude higher than any regular app query, and would be  
easy to peg using even a crudely chosen limit value.


The problem was first mitigated by setting a query timeout a little  
longer than the web server timeout (since the query results are  
discarded for anything running longer), but even this was not a  
solution, since the client would retry on timeout, still keeping the  
db too busy. The real solution was to not do the query, but it would  
have been better to identify this via ERRORs in the logs than by the  
database becoming saturated in the middle of the day.


For our application it is far better for an expensive query to be  
rejected outright than to attempt to run it in vain. Just thought I'd  
throw that out as anecdotal support.


-Casey


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-08-11 Thread Tom Lane
Decibel! [EMAIL PROTECTED] writes:
 On Aug 8, 2008, at 3:23 PM, Tom Lane wrote:
 * has no set operations (UNION etc), grouping, set-returning functions
 in the SELECT list, LIMIT, or a few other funny cases

 Couldn't union/union all be treated as
 EXISTS(a)
 OR EXISTS(b)

Perhaps, but that would end up defeating the optimization anyway,
because as soon as the EXISTS is underneath an OR, it's no longer
representing a potential join clause.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Plans for 8.4

2008-08-11 Thread Bruce Momjian

Added to TODO:

* Add 'hostgss' pg_hba.conf option to allow GSS link-level encryption

  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01454.php


---

Henry B. Hotz wrote:
 What's the time frame for 8.4?
 
 I'm making no promises, but what would people think of a hostgss hba  
 option?
 
 Using it would imply the gssapi/sspi authentication option.  It would  
 be mutually exclusive of the ssl link-encryption option.  It would  
 support strong encryption of the whole connection without the need to  
 get X509 certs deployed (which would be a big win if you're using  
 gssapi/sspi authentication anyway).
 
 The thing that prevented me from including it in the gssapi patches I  
 did for 8.3 was that I couldn't disentangle the program logic to the  
 point of inserting the gssapi security layer code above the SSL code  
 and below everything else.  I'm thinking that doing both is pretty  
 much an edge case, so I propose to do gssapi security layers instead  
 of SSL.  The mods are a lot more obvious.
 
 I'm *NOT* proposing to make build support of gssapi security layers  
 exclusive of SSL.  You might, for example, configure a server to  
 support username/password over SSL for intra-net addresses, but  
 support gssapi for Internet addresses.
 
 --
 The opinions expressed in this message are mine,
 not those of Caltech, JPL, NASA, or the US Government.
 [EMAIL PROTECTED], or [EMAIL PROTECTED]
 
 
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Plugin system like Firefox

2008-08-11 Thread Bruce Momjian
Matthew T. O'Connor wrote:
 A few random thoughts...
 
 The application that comes to mind first for me when you talk plugins is 
 Firefox.  They make it very easy to browse for plugins and to install, 
 update, remove them.  Their plug-in system also tries to account for 
 Firefox version and OS platform which we would need to do also.
 
 Perhaps one thing that would help PostgreSQL plug-ins is a nice GUI 
 plug-in browser and management application.  The logical place to add 
 this IMHO is PGAdmin since it is GUI, already talks to the DB and is 
 cross platform.  I'm not saying a GUI should be required to manage 
 plug-ins, a fully CLI option should be made available too.

I feel the above comment about Firefox is the crux of the plugin issue. 
Having per-operating system options for each plugin is never going to
scale.

Having users compile/install these plugins works, but the effort
required discourages their use, and the fewer people that try them the
fewer people use them and contribute back to the Postgres plugin
ecosystem.

This is similar to the problem of installing Postgres before we had
per-OS installs and Postgres software bundles like the Win32 installer
and Postgres Plus.

The Firefox plugins are successful because it is easy to install stuff.
I run Ubuntu on my laptop so I can easily install software --- I can
compile/install software from source, but I prefer to use the Ubuntu
software install system so I can concentrate on my major task.

So, ideally, if we do a plug-in system, I think we need some way to have
these plugins be very easily installed, perhaps by choosing object files
pre-compile by the build farm for each operating system.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-08-11 Thread Bruce Momjian
Zdenek Kotala wrote:
 Tom Lane napsal(a):
  Zdenek Kotala [EMAIL PROTECTED] writes:
  I understand. However I have another dumb idea/question - It seems to me 
  that it 
  is client code. I think that it should be integrated into psql
  command.
  
  That doesn't seem like a particularly appropriate thing to do ... nor
  do I see the argument for calling it client-side code.
  
 
 I think that best thing at this moment is to add item to the TODO list about 
 cleanup.

I can add a TODO item but I am unsure anyone really cares --- seeing how
long it took us to realize the poor state of the code.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: UTF8 to_ascii function

2008-08-11 Thread Pavel Stehule
2008/8/11 Peter Eisentraut [EMAIL PROTECTED]:
 On Monday 11 August 2008 16:23:29 Jan Urbański wrote:
 Often clients want their searches to be
 accented-or-language-specific letters insensitive. So searching for
 'łódź' returns 'lodz'. So the use case is there (in fact, the lack of
 such facility made me consider not upgrading particular client to 8.3...).

 These are valid ideas, but then please design a new function that addresses
 your use case in a well-defined way, and don't overload questionable old
 interfaces for new purposes.

 In the Unicode standard you can find well-defined methods to decompose
 characters into diacritic marks, and then you could strip them off.  But this
 has nothing to do with ASCII or UTF8 or encodings.  Cyrillic characters can
 have diacritic marks as well, for example.


Hi Peter,

changes to_ascii from text to bytea is more bugfix than new feature
and should be done immediately. Correct conversions are related to
colum collation - is not necessary repeat same work and same code from
some unicode libs.

Regards
Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers