Re: [HACKERS] Proposed patch to clean up signed-ness warnings

2005-09-24 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Are you proposing that we change all the char * to unsigned char *?

 No, I suggest we change all char * to unsigned char * only where
 it points a string which could hold non ASCII character strings.

Which is pretty nearly all of them...

 To support multiple charsets/collataions, I think we need to change
 the way to represent character strings from the unstructured char *
 to more intelligent structure (I know it's hard to implement that
 without significant performance loss, but I know we should do it in
 the future).

Yeah, it's still not clear where we are going to end up, but in the
meantime we've got a lot of warnings cluttering the code and making
it hard to spot real problems.

 So unsigned char* is not enough for the goal anyway, I'm not against
 your patches.

OK.  No one else objected, so I'll go ahead and apply before the code
drifts to the point of breaking the patch.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposed patch to clean up signed-ness warnings

2005-09-22 Thread Tatsuo Ishii
 With gcc 4 spreading, it seems like it's past time to do something about
 all those signed-vs-unsigned-char warnings that it emits.  (Translation:
 now that I have to use gcc 4 regularly, I got annoyed enough to fix it
 ;-))
 
 I looked into it a little and determined that nearly all the warnings
 were associated with the multibyte code.  Outside the mb subsystem,
 our code pretty much uses char * for strings, but inside mb it's
 mostly unsigned char *, which is needed because there are lots of
 inequality comparisons in there.  It seemed to me that the cleanest
 fix was to change the external API of the mb subsystem to take and
 return char *, while still using unsigned char * internally.
 The attached patch eliminates all signed-ness warnings in CVS tip
 using this approach.  It's kinda long and tedious, but straightforward,
 and quite a lot of the changes simplify existing code by removing
 casts that aren't needed anymore.
 
 Two questions for the list:
 
 1. Can anyone think of a cleaner way to do this?
 
 2. Is there objection to applying this patch now (ie, before beta3)?
 It's not quite a bug fix, but I think it'll make it easier to find
 bugs going forward.

For me, your patche seems to be a retrogression. In my understanding,
the reason why PostgreSQL uses char * in many places is just it was
designed in the old days when ASCII was the only charset in the world.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposed patch to clean up signed-ness warnings

2005-09-22 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 1. Can anyone think of a cleaner way to do this?

 For me, your patche seems to be a retrogression. In my understanding,
 the reason why PostgreSQL uses char * in many places is just it was
 designed in the old days when ASCII was the only charset in the world.

Are you proposing that we change all the char * to unsigned char *?
I looked at that briefly but it seems like a huge loss, both in
notational ugliness and in the amount of code that would have to be
touched.  Also, it would force us to add a bunch of explicit casts to
avoid warnings with standard library functions like strlen().  To me the
bottom line is that 99% of the code only needs to know that a character
string is a character string.  As this patch demonstrates, there is only
a tiny fraction that needs to have the unsigned declaration.  I don't
think we should allow that fraction to dictate a notational burden for
all the rest.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Proposed patch to clean up signed-ness warnings

2005-09-22 Thread Tatsuo Ishii
  For me, your patche seems to be a retrogression. In my understanding,
  the reason why PostgreSQL uses char * in many places is just it was
  designed in the old days when ASCII was the only charset in the world.
 
 Are you proposing that we change all the char * to unsigned char *?

No, I suggest we change all char * to unsigned char * only where
it points a string which could hold non ASCII character strings. I
thought we learned the danger of 1) comparing chars with signed bit
on, 2) passing chars with sign bit on to functions which expect int
etc...

 I looked at that briefly but it seems like a huge loss, both in
 notational ugliness and in the amount of code that would have to be
 touched. 

If you are just care the amount of effort, why don't you leave as it
is and use pre v4 gcc? :-)

  Also, it would force us to add a bunch of explicit casts to
 avoid warnings with standard library functions like strlen().

Counter examples could be easily found in isalpha(), toupper() etc.

 To me the
 bottom line is that 99% of the code only needs to know that a character
 string is a character string.  As this patch demonstrates, there is only
 a tiny fraction that needs to have the unsigned declaration.  I don't
 think we should allow that fraction to dictate a notational burden for
 all the rest.

To support multiple charsets/collataions, I think we need to change
the way to represent character strings from the unstructured char *
to more intelligent structure (I know it's hard to implement that
without significant performance loss, but I know we should do it in
the future).

So unsigned char* is not enough for the goal anyway, I'm not against
your patches.
--
SRA OSS, Inc. Japan
Tatsuo Ishii

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings