Re: [PATCHES] [HACKERS] Text - C string

2008-05-04 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 Well ... if somebody does want to change the representation of xml
 down the road, he's going to have to touch all the sites where the
 code converts to and from cstring anyway, right?

True.

 With that in mind, please find attached my followup patch.  It cleans
 up another 21 sites manually copying between cstring and varlena, for
 a net reduction of 115 lines of code.

I applied most of this, but not the parts that were dependent on the
assumption that bytea and text are the same.  That is unlikely to remain
true if we ever get around to putting locale/encoding information into
text values.  Furthermore it's a horrid type pun, because bytea can
contain embedded nulls whereas cstrings can't; you were making
unwarranted assumptions about whether, say, cstring_to_text_with_len
would allow embedded nulls to go by.  And lastly, quite a few of those
changes were just plain broken, eg several places in selfuncs.c where
you allowed strlen() to be applied to a bytea converted to cstring.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Brendan Jurd
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

- -BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Mar 29, 2008 at 5:40 AM, Brendan Jurd  wrote:
 On 29/03/2008, Tom Lane  wrote:
   I intentionally didn't touch xml.c, nor anyplace that is not dealing
in text, even if it happens to be binary-compatible with text.
  

  Hmm, okay.  My original submission did include a few such changes; for
  example, in xml_in and xml_out_internal I saw that the conversion
  between xmltype and cstring was identical to the text conversion, so I
  went ahead and used the text functions.  Feedback upthread suggested
  that it was okay to go ahead with casting in identical cases. [1]

  I saw that these changes made it into the commit, so I assumed that it
  was the right call.

  If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
  have to revert those changes, and I'll have to seriously scale back
  the cleanup patch I was about to post.

Still not sure where we stand on the above.  To cast, or not to cast?

Cheers,
BJ
- -BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBaFy5YBsbHkuyV0RAsMmAKDHaEb7aMudKJgVxcf5RRcOtAJ+bwCgivLI
5B3xJze46NGWjEyOpq/TSaY=
=BObd
- -END PGP SIGNATURE-

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBaIl5YBsbHkuyV0RArDDAJ0QThLXAhzy+NX+2YsF+q4z/sIy1QCeJPiW
s/rVns3FFQVP5g9DTOshDfQ=
=4Tdh
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
 have to revert those changes, and I'll have to seriously scale back
 the cleanup patch I was about to post.

 Still not sure where we stand on the above.  To cast, or not to cast?

I dunno.  I know there was previously some handwaving about representing
XML values in some more intelligent fashion than a plain text string,
but I have no idea if anyone is likely to do anything about it in the
foreseeable future.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Andrew Dunstan



Tom Lane wrote:

Brendan Jurd [EMAIL PROTECTED] writes:
  

If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
have to revert those changes, and I'll have to seriously scale back
the cleanup patch I was about to post.
  


  

Still not sure where we stand on the above.  To cast, or not to cast?



I dunno.  I know there was previously some handwaving about representing
XML values in some more intelligent fashion than a plain text string,
but I have no idea if anyone is likely to do anything about it in the
foreseeable future.


  


It seems very unlikely to me.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-04-16 Thread Brendan Jurd
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Apr 17, 2008 at 1:16 AM, Tom Lane  wrote:
 Brendan Jurd  writes:

  If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
   have to revert those changes, and I'll have to seriously scale back
   the cleanup patch I was about to post.

   Still not sure where we stand on the above.  To cast, or not to cast?

  I dunno.  I know there was previously some handwaving about representing
  XML values in some more intelligent fashion than a plain text string,
  but I have no idea if anyone is likely to do anything about it in the
  foreseeable future.


Well ... if somebody does want to change the representation of xml
down the road, he's going to have to touch all the sites where the
code converts to and from cstring anyway, right?

So the only real difference this will make is that, instead of having
to replace four lines of VARDATA/memcpy per site, he'll have to
replace a single function call.  That doesn't seem like a negative.

With that in mind, please find attached my followup patch.  It cleans
up another 21 sites manually copying between cstring and varlena, for
a net reduction of 115 lines of code.

I didn't attempt to work through every reference to VARDATA, but I did
look at every hit from a  `grep -Rn VARDATA . | grep memcpy`.

All regression tests passed (contrib tests included) on gentoo.

Patch added to commitfest queue.

Cheers,
BJ
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBm105YBsbHkuyV0RAmqfAKCfyNyFdciqX4QV81sG9MhPt+KXuACfe694
3d/ICZF6yqV6K20X3TVX+So=
=CvRM
-END PGP SIGNATURE-


text-cstring-followup.diff.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-28 Thread Brendan Jurd
On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
  There are no textout/textin calls left, but I may have missed some
  places that were doing it the hard way with direct palloc/memcpy
  manipulations.  It might be worth trolling all the VARDATA() references
  to see if any more are easily replaceable.


I think there are perhaps a dozen such sites, and I hope to submit a
cleanup patch that gets rid of these soon.

I did come across one site I'm not sure about in utils/adt/xml.c:291

/*
 * We need a null-terminated string to pass to parse_xml_decl().  Rather
 * than make a separate copy, make the temporary result one byte bigger
 * than it needs to be.
 */
result = palloc(nbytes + 1 + VARHDRSZ);
SET_VARSIZE(result, nbytes + VARHDRSZ);
memcpy(VARDATA(result), str, nbytes);
str = VARDATA(result);
str[nbytes] = '\0';

The author seemed pretty sure he didn't want to make a separate copy
of the string, so I'm thinking there's not a whole lot I can do to
improve this site.

  I notice in particular that xfunc.sgml contains sample C functions to
  copy and concatenate text.  While these aren't directly replaceable
  with the new functions, I wonder whether we ought to change the examples
  to make them less certain to break if we ever change text's
  representation.


Yes, these sample C functions are shown in tutorial/funcs.c and
funcs_new.c as well.  I agree that the examples could do with
changing, but don't have any keen insight on what to change them to.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-28 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 On 26/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
 There are no textout/textin calls left, but I may have missed some
 places that were doing it the hard way with direct palloc/memcpy
 manipulations.  It might be worth trolling all the VARDATA() references
 to see if any more are easily replaceable.

 I think there are perhaps a dozen such sites, and I hope to submit a
 cleanup patch that gets rid of these soon.
 I did come across one site I'm not sure about in utils/adt/xml.c:291

I intentionally didn't touch xml.c, nor anyplace that is not dealing
in text, even if it happens to be binary-compatible with text.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-28 Thread Brendan Jurd
On 29/03/2008, Tom Lane [EMAIL PROTECTED] wrote:
 I intentionally didn't touch xml.c, nor anyplace that is not dealing
  in text, even if it happens to be binary-compatible with text.


Hmm, okay.  My original submission did include a few such changes; for
example, in xml_in and xml_out_internal I saw that the conversion
between xmltype and cstring was identical to the text conversion, so I
went ahead and used the text functions.  Feedback upthread suggested
that it was okay to go ahead with casting in identical cases. [1]

I saw that these changes made it into the commit, so I assumed that it
was the right call.

If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
have to revert those changes, and I'll have to seriously scale back
the cleanup patch I was about to post.

Regards,
BJ

[1] http://archives.postgresql.org/pgsql-hackers/2007-09/msg01094.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-25 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.

Applied with revisions --- the functions were modified as per recent
discussion, and I fixed a lot more potential call sites.

There are no textout/textin calls left, but I may have missed some
places that were doing it the hard way with direct palloc/memcpy
manipulations.  It might be worth trolling all the VARDATA() references
to see if any more are easily replaceable.

I notice in particular that xfunc.sgml contains sample C functions to
copy and concatenate text.  While these aren't directly replaceable
with the new functions, I wonder whether we ought to change the examples
to make them less certain to break if we ever change text's
representation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-19 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.

 The new extern functions, declared in include/utils/builtins.h and
 defined in backend/utils/adt/varlena.c, are:

 char * text_cstring(const text *t)
 char * text_cstring_limit(const text *t, int len)
 text * cstring_text(const char *s)
 text * cstring_text_limit(const char *s, int len)

I started to look at applying this patch and immediately decided that
I didn't like these names --- it's exceeding un-obvious which direction
of conversion any one of the functions performs.  Seems like every time
you wanted to call one, you'd be going back to look at the source code
to remember which to use.

What do people think of text_to_cstring?  Or should we go with
TextPGetCString for consistency with the Datum-whacking macros?  (I seem
to recall having argued against the latter idea, but am reconsidering.)
Or something else?

BTW, I suspect that the _limit functions are mostly useless ---
a quick look through the patch did not suggest that any of the places
using them really needed a limit.  The point of, for instance,
TZ_STRLEN_MAX is to be able to use fixed-size local buffers, and
if you're going to pay for a palloc anyway then you might as well
forget it.  (There might be some value in a strlcpy equivalent that
copies from a text datum into a limited-size caller-supplied buffer,
but that's not what we've got here.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-19 Thread Sam Mason
On Wed, Mar 19, 2008 at 12:51:35PM -0400, Tom Lane wrote:
 Brendan Jurd [EMAIL PROTECTED] writes:
  char * text_cstring(const text *t)
 
 What do people think of text_to_cstring?

I tend to put things the other way around in my code, i.e:

  char * cstring_of_text(const text *t)

mainly because things read more easily---type definitions of new
variables are next to the first part of the word.

  char * str = cstring_of_text(src_text);
vs.
  char * str = text_to_cstring(src_text);

I think I got my original inspiration for doing it this way around from
the Caml language.


  Sam

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-19 Thread Brendan Jurd
On 20/03/2008, Tom Lane [EMAIL PROTECTED] wrote:

 I started to look at applying this patch and immediately decided that
  I didn't like these names --- it's exceeding un-obvious which direction
  of conversion any one of the functions performs.  Seems like every time
  you wanted to call one, you'd be going back to look at the source code
  to remember which to use.


That's a fair criticism.  I wanted to make the function names as
compact as possible, but your comment about the directionality of the
functions rings true.

  What do people think of text_to_cstring?  Or should we go with
  TextPGetCString for consistency with the Datum-whacking macros?  (I seem
  to recall having argued against the latter idea, but am reconsidering.)
  Or something else?


Your original argument against FooGetBar was that it would be *too*
consistent with the Datum macros, leading people to think that these
functions actually were macros.

As long as we don't want people getting confused about the
function/macro distinction, that argument still makes sense to me, and
I'd be more inclined towards the foo_to_bar() convention.

  BTW, I suspect that the _limit functions are mostly useless ---
  a quick look through the patch did not suggest that any of the places
  using them really needed a limit.  The point of, for instance,
  TZ_STRLEN_MAX is to be able to use fixed-size local buffers, and
  if you're going to pay for a palloc anyway then you might as well
  forget it.

What about callers like dotrim() in oracle_compat.c, which only want
to copy characters from the source string up to a particular length?
Doesn't that indicate a legitimate requirement for a
cstring_to_text_limit() (the call site was palloc'ing the text value
anyway)?

On the other hand, we do have those call sites (TZ_STRLEN_MAX is a
good example) where the caller just wanted to use a local buffer.  In
which case your strlcpy-equivalent function would probably be the
right thing to offer.

 (There might be some value in a strlcpy equivalent that
  copies from a text datum into a limited-size caller-supplied buffer,
  but that's not what we've got here.)


Perhaps we keep cstring_to_text_limit(), but make
text_to_cstring_limit() behave like strlcpy() instead?

One of the questions in the original patch submission was whether it
would be worth changing all those DirectFunctionCall(textin) and
(textout) calls to use the new functions.  Is it worthwhile avoiding
the fmgr overhead?

Thanks for taking the time to review my patch and provide this
feedback.  I'm happy to send in an updated version once we settle on
the naming convention for the functions.

Last time I looked, the codebase had shifted quite a bit since I
originally wrote the patch.  So it probably needs some work to apply
cleanly on the latest sources anyway.

Regards,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2008-03-19 Thread Tom Lane
Brendan Jurd [EMAIL PROTECTED] writes:
 One of the questions in the original patch submission was whether it
 would be worth changing all those DirectFunctionCall(textin) and
 (textout) calls to use the new functions.  Is it worthwhile avoiding
 the fmgr overhead?

I think that's worth doing just on notational clarity grounds.
The small cycle savings doesn't excite me, but understanding
DirectFunctionCall1(textin, CStringGetDatum(foo)) just involves
more different bits of trivia than cstring_to_text(foo).

 Last time I looked, the codebase had shifted quite a bit since I
 originally wrote the patch.  So it probably needs some work to apply
 cleanly on the latest sources anyway.

Yeah, with wide-impact patches like this you are always going to have
that problem.  One point though is that we don't have to improve every
call site at the same time.  I'd be inclined to put in the new functions
and hit some representative sample of utils/adt/ files in the first
commit, and then incrementally fix other stuff.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text - C string

2007-11-04 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Brendan Jurd wrote:
 As discussed on -hackers, I'm trying to get rid of some redundant code
 by creating a widely useful set of functions to convert between text
 and C string in the backend.
 
 The new extern functions, declared in include/utils/builtins.h and
 defined in backend/utils/adt/varlena.c, are:
 
 char * text_cstring(const text *t)
 char * text_cstring_limit(const text *t, int len)
 text * cstring_text(const char *s)
 text * cstring_text_limit(const char *s, int len)
 
 Within varlena.c, the actual conversions are performed by:
 
 char * do_text_cstring(const text *t, const int len)
 text * do_cstring_text(const char *s, const int len)
 
 These functions now do the work for the fmgr functions textin and
 textout, as well as being directly accessible by backend code.
 
 I've searched through the backend for any code which converted between
 text and C string manually (with memcpy and VARDATA), replacing with
 calls to one of the four new functions as appropriate.
 
 I came across some areas which were using the same, or similar,
 conversion technique on other varlena data types, such as bytea or
 xmltype.  In cases where the conversion was completely identical I
 used the new functions.  In cases with any differences (even if they
 seemed minor) I played it safe and left them alone.
 
 I'd now like to submit my work so far for review.  This patch compiled
 cleanly on Linux and passed all parallel regression tests.  It appears
 to be performance-neutral based on a few rough tests; I haven't tried
 to profile the changes in detail.
 
 There is still a lot of code out there using DirectFunctionCall1 to
 call text(in|out)).  I've decided to wait for some community feedback
 on the patch as it stands before replacing those calls.  There are a
 great many, and it would be a shame to have to go through them more
 than once.
 
 I would naively expect that replacing fmgr calls with direct calls
 would lead to a performance gain (no fmgr overhead), but honestly I'm
 not sure whether that would actually make a difference.
 
 Thanks for your time,
 BJ

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings