Re: [PATCH] pg_convert improvement
Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [PATCH] pg_convert improvement
On Mon, Nov 27, 2023 at 08:11:06AM +0100, Drouvot, Bertrand wrote: > + PG_RETURN_BYTEA_P(string); I looked around to see whether there was some sort of project policy about returning arguments without copying them, but the only strict rule I see is to avoid scribbling on argument data without first copying it. However, I do see functions that return unmodified arguments both with and without copying. For example, unaccent_dict() is careful to copy the argument before returning it: PG_RETURN_TEXT_P(PG_GETARG_TEXT_P_COPY(strArg)); But replace_text() is not: /* Return unmodified source string if empty source or pattern */ if (src_text_len < 1 || from_sub_text_len < 1) { PG_RETURN_TEXT_P(src_text); } I don't have any specific concerns about doing this, though. Otherwise, the patch looks pretty good to me, so I will plan on committing it shortly. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [PATCH] pg_convert improvement
Hi, On 11/28/23 2:16 AM, Yurii Rashkovskii wrote: Hi Bertrand, Did some minor changes in the attached: - Started the multi-line comment with an upper case and finished it with a "." and re-worded a bit. - Ran pgindent What do you think about the attached? It looks great! Also, might be good to create a CF entry [1] so that the patch proposal does not get lost and gets visibility. Just submitted it to SF. Thank you for the review! Thanks! Just marked it as "Ready for Committer". Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] pg_convert improvement
Hi Bertrand, > Did some minor changes in the attached: > > - Started the multi-line comment with an upper case and finished > it with a "." and re-worded a bit. > - Ran pgindent > > What do you think about the attached? > It looks great! > > Also, might be good to create a CF entry [1] so that the patch proposal > does not get lost > and gets visibility. > Just submitted it to SF. Thank you for the review! -- Y.
Re: [PATCH] pg_convert improvement
Hi, On 11/24/23 3:32 PM, Yurii Rashkovskii wrote: Hi Bertrand, On Fri, Nov 24, 2023 at 6:26 AM Drouvot, Bertrand mailto:bertranddrouvot...@gmail.com>> wrote: The patch is pretty straightforward, I just have one remark: + /* if no actual conversion happened, return the original string */ + /* (we are checking pointers to strings instead of encodings because + `pg_do_encoding_conversion` above covers more cases than just + encoding equality) */ I think this could be done in one single comment and follow the preferred style for multi-line comment, see [1]. Thank you for your feedback. I've attached a revised patch. Did some minor changes in the attached: - Started the multi-line comment with an upper case and finished it with a "." and re-worded a bit. - Ran pgindent What do you think about the attached? Also, might be good to create a CF entry [1] so that the patch proposal does not get lost and gets visibility. [1]: https://commitfest.postgresql.org/46/ Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 5eb1f5d9323b8d270411bf156ec065fe7c69b7b8 Mon Sep 17 00:00:00 2001 From: Yurii Rashkovskii Date: Fri, 24 Nov 2023 06:32:05 -0800 Subject: [PATCH v3] Don't copy bytea/text in convert_from/to unless converted. `pg_convert` allocates a new bytea whether actual conversion occurred or not, followed by copying the result which (by definition) is the same as the original value. In the case where conversion has not occurred, it will now simply return the original value, this avoiding unnecessary allocation and copying. --- src/backend/utils/mb/mbutils.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) 100.0% src/backend/utils/mb/ diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c index 67a1ab2ab2..7666d166a5 100644 --- a/src/backend/utils/mb/mbutils.c +++ b/src/backend/utils/mb/mbutils.c @@ -585,19 +585,26 @@ pg_convert(PG_FUNCTION_ARGS) src_encoding, dest_encoding); - /* update len if conversion actually happened */ - if (dest_str != src_str) - len = strlen(dest_str); /* -* build bytea data type structure. +* If no actual conversion happened, return the original string (we are +* checking pointers to strings instead of encodings because +* pg_do_encoding_conversion() above covers more cases than just encoding +* equality). */ + if (dest_str == src_str) + { + PG_RETURN_BYTEA_P(string); + } + + /* if conversion happened, build a new bytea of a correct length */ + len = strlen(dest_str); retval = (bytea *) palloc(len + VARHDRSZ); SET_VARSIZE(retval, len + VARHDRSZ); memcpy(VARDATA(retval), dest_str, len); - if (dest_str != src_str) - pfree(dest_str); + /* free the original result of conversion */ + pfree(dest_str); /* free memory if allocated by the toaster */ PG_FREE_IF_COPY(string, 0); -- 2.34.1
Re: [PATCH] pg_convert improvement
Hi Bertrand, On Fri, Nov 24, 2023 at 6:26 AM Drouvot, Bertrand < bertranddrouvot...@gmail.com> wrote: > > The patch is pretty straightforward, I just have one remark: > > + /* if no actual conversion happened, return the original string */ > + /* (we are checking pointers to strings instead of encodings > because > + `pg_do_encoding_conversion` above covers more cases than just > + encoding equality) */ > > I think this could be done in one single comment and follow the preferred > style > for multi-line comment, see [1]. > Thank you for your feedback. I've attached a revised patch. -- Y. v2-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patch Description: Binary data
Re: [PATCH] pg_convert improvement
Hi, On 11/24/23 3:05 PM, Yurii Rashkovskii wrote: Hi, I propose a patch that ensures `pg_convert` doesn't allocate and copy data when no conversion is done. It is an unnecessary overhead, especially when such conversions are done frequently and for large values. +1 for the patch, I think the less is done the better. Happy to hear any feedback! The patch is pretty straightforward, I just have one remark: + /* if no actual conversion happened, return the original string */ + /* (we are checking pointers to strings instead of encodings because + `pg_do_encoding_conversion` above covers more cases than just + encoding equality) */ I think this could be done in one single comment and follow the preferred style for multi-line comment, see [1]. [1]: https://www.postgresql.org/docs/current/source-format.html Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
[PATCH] pg_convert improvement
Hi, I propose a patch that ensures `pg_convert` doesn't allocate and copy data when no conversion is done. It is an unnecessary overhead, especially when such conversions are done frequently and for large values. I've tried measuring the performance impact, and the patched version has a small but non-zero gain. The patch builds against `master` and `make check` succeeds. Happy to hear any feedback! -- Y. v1-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patch Description: Binary data