Re: replace strtok()

2024-07-23 Thread Peter Eisentraut

On 08.07.24 07:45, David Steele wrote:

On 6/24/24 19:57, Peter Eisentraut wrote:

On 24.06.24 02:34, Michael Paquier wrote:

On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:

Peter Eisentraut  writes:

On 18.06.24 13:43, Ranier Vilela wrote:

I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any 
testing.



Yeah, surely there are many possible implementations.  I'm thinking,
since we already took other str*() functions from OpenBSD, it makes
sense to do this here as well, so we have only one source to deal 
with.


Why not use strpbrk?  That's equally thread-safe, it's been there
since C89, and it doesn't have the problem that you can't find out
which of the delimiter characters was found.


Yeah, strpbrk() has been used in the tree as far as 2003 without any
port/ implementation.


The existing uses of strpbrk() are really just checking whether some 
characters exist in a string, more like an enhanced strchr().  I don't 
see any uses for tokenizing a string like strtok() or strsep() would 
do.   I think that would look quite cumbersome.  So I think a simpler 
and more convenient abstraction like strsep() would still be worthwhile.


I agree that using strsep() in these cases seems more natural. Since 
this patch provides a default implementation compatibility does not seem 
like a big issue.


I've also reviewed the rest of the patch and it looks good to me.


This has been committed.  Thanks.





Re: replace strtok()

2024-07-07 Thread David Steele

On 6/24/24 19:57, Peter Eisentraut wrote:

On 24.06.24 02:34, Michael Paquier wrote:

On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:

Peter Eisentraut  writes:

On 18.06.24 13:43, Ranier Vilela wrote:

I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any 
testing.



Yeah, surely there are many possible implementations.  I'm thinking,
since we already took other str*() functions from OpenBSD, it makes
sense to do this here as well, so we have only one source to deal with.


Why not use strpbrk?  That's equally thread-safe, it's been there
since C89, and it doesn't have the problem that you can't find out
which of the delimiter characters was found.


Yeah, strpbrk() has been used in the tree as far as 2003 without any
port/ implementation.


The existing uses of strpbrk() are really just checking whether some 
characters exist in a string, more like an enhanced strchr().  I don't 
see any uses for tokenizing a string like strtok() or strsep() would do. 
  I think that would look quite cumbersome.  So I think a simpler and 
more convenient abstraction like strsep() would still be worthwhile.


I agree that using strsep() in these cases seems more natural. Since 
this patch provides a default implementation compatibility does not seem 
like a big issue.


I've also reviewed the rest of the patch and it looks good to me.

Regards,
-David




Re: replace strtok()

2024-06-24 Thread Peter Eisentraut

On 24.06.24 02:34, Michael Paquier wrote:

On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:

Peter Eisentraut  writes:

On 18.06.24 13:43, Ranier Vilela wrote:

I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any testing.



Yeah, surely there are many possible implementations.  I'm thinking,
since we already took other str*() functions from OpenBSD, it makes
sense to do this here as well, so we have only one source to deal with.


Why not use strpbrk?  That's equally thread-safe, it's been there
since C89, and it doesn't have the problem that you can't find out
which of the delimiter characters was found.


Yeah, strpbrk() has been used in the tree as far as 2003 without any
port/ implementation.


The existing uses of strpbrk() are really just checking whether some 
characters exist in a string, more like an enhanced strchr().  I don't 
see any uses for tokenizing a string like strtok() or strsep() would do. 
 I think that would look quite cumbersome.  So I think a simpler and 
more convenient abstraction like strsep() would still be worthwhile.





Re: replace strtok()

2024-06-23 Thread Michael Paquier
On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 18.06.24 13:43, Ranier Vilela wrote:
> >> I found another implementation of strsep, it seems lighter to me.
> >> I will attach it for consideration, however, I have not done any testing.
> 
> > Yeah, surely there are many possible implementations.  I'm thinking, 
> > since we already took other str*() functions from OpenBSD, it makes 
> > sense to do this here as well, so we have only one source to deal with.
> 
> Why not use strpbrk?  That's equally thread-safe, it's been there
> since C89, and it doesn't have the problem that you can't find out
> which of the delimiter characters was found.

Yeah, strpbrk() has been used in the tree as far as 2003 without any 
port/ implementation.
--
Michael


signature.asc
Description: PGP signature


Re: replace strtok()

2024-06-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 18.06.24 13:43, Ranier Vilela wrote:
>> I found another implementation of strsep, it seems lighter to me.
>> I will attach it for consideration, however, I have not done any testing.

> Yeah, surely there are many possible implementations.  I'm thinking, 
> since we already took other str*() functions from OpenBSD, it makes 
> sense to do this here as well, so we have only one source to deal with.

Why not use strpbrk?  That's equally thread-safe, it's been there
since C89, and it doesn't have the problem that you can't find out
which of the delimiter characters was found.

regards, tom lane




Re: replace strtok()

2024-06-22 Thread Peter Eisentraut

On 18.06.24 13:43, Ranier Vilela wrote:

But I would like to see more const char * where this is possible.

For example, in pg_locale.c
IMO, the token variable can be const char *.

At least strchr expects a const char * as the first parameter.


This would not be future-proof.  In C23, if you pass a const char * into 
strchr(), you also get a const char * as a result.  And in this case, we 
do write into the area pointed to by the result.  So with a const char 
*token, this whole thing would not compile cleanly under C23.



I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any testing.


Yeah, surely there are many possible implementations.  I'm thinking, 
since we already took other str*() functions from OpenBSD, it makes 
sense to do this here as well, so we have only one source to deal with.






Re: replace strtok()

2024-06-19 Thread Kyotaro Horiguchi
At Tue, 18 Jun 2024 09:18:28 +0200, Peter Eisentraut  
wrote in 
> Under the topic of getting rid of thread-unsafe functions in the
> backend [0], here is a patch series to deal with strtok().
> 
> Of course, strtok() is famously not thread-safe and can be replaced by
> strtok_r().  But it also has the wrong semantics in some cases,
> because it considers adjacent delimiters to be one delimiter.  So if
> you parse
> 
> SCRAM-SHA-256$:$:
> 
> with strtok(), then
> 
> SCRAM-SHA-256$$::$$::
> 
> parses just the same.  In many cases, this is arguably wrong and could
> hide mistakes.
> 
> So I'm suggesting to use strsep() in those places.  strsep() is
> nonstandard but widely available.
> 
> There are a few places where strtok() has the right semantics, such as
> parsing tokens separated by whitespace.  For those, I'm using
> strtok_r().

I agree with the distinction.

> A reviewer job here would be to check whether I made that distinction
> correctly in each case.

0001 and 0002 look correct to me regarding that distinction. They
applied correctly to the master HEAD and all tests passed on Linux.

> On the portability side, I'm including a port/ replacement for
> strsep() and some workaround to get strtok_r() for Windows.  I have
> included these here as separate patches for clarity.

0003 looks fine and successfully built and seems working on an MSVC
build.

About 0004, Cygwin seems to have its own strtok_r, but I haven't
checked how that fact affects the build.

> [0]:
> https://www.postgresql.org/message-id/856e5ec3-879f-42ee-8258-8bcc6ec9b...@eisentraut.org

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: replace strtok()

2024-06-18 Thread Ranier Vilela
Em ter., 18 de jun. de 2024 às 04:18, Peter Eisentraut 
escreveu:

> Under the topic of getting rid of thread-unsafe functions in the backend
> [0], here is a patch series to deal with strtok().
>
> Of course, strtok() is famously not thread-safe and can be replaced by
> strtok_r().  But it also has the wrong semantics in some cases, because
> it considers adjacent delimiters to be one delimiter.  So if you parse
>
>  SCRAM-SHA-256$:$:
>
> with strtok(), then
>
>  SCRAM-SHA-256$$::$$::
>
> parses just the same.  In many cases, this is arguably wrong and could
> hide mistakes.
>
> So I'm suggesting to use strsep() in those places.  strsep() is
> nonstandard but widely available.
>
> There are a few places where strtok() has the right semantics, such as
> parsing tokens separated by whitespace.  For those, I'm using strtok_r().
>
> A reviewer job here would be to check whether I made that distinction
> correctly in each case.
>
> On the portability side, I'm including a port/ replacement for strsep()
> and some workaround to get strtok_r() for Windows.  I have included
> these here as separate patches for clarity.
>
+1 For making the code thread-safe.
But I would like to see more const char * where this is possible.

For example, in pg_locale.c
IMO, the token variable can be const char *.

At least strchr expects a const char * as the first parameter.

I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any testing.

best regards,
Ranier Vilela
/* strsep.h
 *
 *  Provides the 4.4BSD strsep(3) function for those that don't have it.
 *
 *  Copyright 2011 Michael Thomas Greer
 *  Distributed under the Boost Software License, Version 1.0.
 *  ( See accompanying file LICENSE_1_0.txt or copy at
 *   http://www.boost.org/LICENSE_1_0.txt )
 *
 *  Including this file modifies the std namespace in C++.
 *
 *  Don't include this file if your compiler provides the strsep function in 
.
 *  Make sure your build process tests for this and behaves accordingly!
 *
 */

#include 

char *
strsep(char **stringp, const char *delim)
{
char *result;

if ((stringp == NULL) || (*stringp == NULL))
return NULL;

result = *stringp;

while(**stringp && !(strchr(delim, **stringp)))
++*stringp;

if (**stringp)
*(*stringp)++ = '\0';
else
*stringp = NULL;

return result;
}


replace strtok()

2024-06-18 Thread Peter Eisentraut
Under the topic of getting rid of thread-unsafe functions in the backend 
[0], here is a patch series to deal with strtok().


Of course, strtok() is famously not thread-safe and can be replaced by 
strtok_r().  But it also has the wrong semantics in some cases, because 
it considers adjacent delimiters to be one delimiter.  So if you parse


SCRAM-SHA-256$:$:

with strtok(), then

SCRAM-SHA-256$$::$$::

parses just the same.  In many cases, this is arguably wrong and could 
hide mistakes.


So I'm suggesting to use strsep() in those places.  strsep() is 
nonstandard but widely available.


There are a few places where strtok() has the right semantics, such as 
parsing tokens separated by whitespace.  For those, I'm using strtok_r().


A reviewer job here would be to check whether I made that distinction 
correctly in each case.


On the portability side, I'm including a port/ replacement for strsep() 
and some workaround to get strtok_r() for Windows.  I have included 
these here as separate patches for clarity.



[0]: 
https://www.postgresql.org/message-id/856e5ec3-879f-42ee-8258-8bcc6ec9b...@eisentraut.orgFrom 8ab537885f007d8bed58f839ca91108ef20422a6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 10 Jun 2024 16:08:08 +0200
Subject: [PATCH v1 1/4] Replace some strtok() with strsep()

strtok() considers adjacent delimiters to be one delimiter, which is
arguably the wrong behavior in some cases.  Replace with strsep(),
which has the right behavior: Adjacent delimiters create an empty
token.
---
 src/backend/libpq/auth-scram.c| 11 +--
 src/backend/utils/adt/pg_locale.c |  3 ++-
 src/common/logging.c  |  4 +++-
 src/test/regress/pg_regress.c |  5 ++---
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/backend/libpq/auth-scram.c b/src/backend/libpq/auth-scram.c
index 41619599148..03c3c27 100644
--- a/src/backend/libpq/auth-scram.c
+++ b/src/backend/libpq/auth-scram.c
@@ -608,16 +608,15 @@ parse_scram_secret(const char *secret, int *iterations,
 * SCRAM-SHA-256$:$:
 */
v = pstrdup(secret);
-   if ((scheme_str = strtok(v, "$")) == NULL)
+   if ((scheme_str = strsep(&v, "$")) == NULL)
goto invalid_secret;
-   if ((iterations_str = strtok(NULL, ":")) == NULL)
+   if ((iterations_str = strsep(&v, ":")) == NULL)
goto invalid_secret;
-   if ((salt_str = strtok(NULL, "$")) == NULL)
+   if ((salt_str = strsep(&v, "$")) == NULL)
goto invalid_secret;
-   if ((storedkey_str = strtok(NULL, ":")) == NULL)
-   goto invalid_secret;
-   if ((serverkey_str = strtok(NULL, "")) == NULL)
+   if ((storedkey_str = strsep(&v, ":")) == NULL)
goto invalid_secret;
+   serverkey_str = v;
 
/* Parse the fields */
if (strcmp(scheme_str, "SCRAM-SHA-256") != 0)
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 7e5bb2b703a..21924d89877 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2809,6 +2809,7 @@ icu_set_collation_attributes(UCollator *collator, const 
char *loc,
char   *icu_locale_id;
char   *lower_str;
char   *str;
+   char   *token;
 
/*
 * The input locale may be a BCP 47 language tag, e.g.
@@ -2834,7 +2835,7 @@ icu_set_collation_attributes(UCollator *collator, const 
char *loc,
return;
str++;
 
-   for (char *token = strtok(str, ";"); token; token = strtok(NULL, ";"))
+   while ((token = strsep(&str, ";")))
{
char   *e = strchr(token, '=');
 
diff --git a/src/common/logging.c b/src/common/logging.c
index e9a02e3e46a..aedd1ae2d8c 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -119,7 +119,9 @@ pg_logging_init(const char *argv0)
 
if (colors)
{
-   for (char *token = strtok(colors, ":"); token; 
token = strtok(NULL, ":"))
+   char   *token;
+
+   while ((token = strsep(&colors, ":")))
{
char   *e = strchr(token, '=');
 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 06f6775fc65..1abfe6c4a5c 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -234,12 +234,11 @@ static void
 split_to_stringlist(const char *s, const char *delim, _stringlist **listhead)
 {
char   *sc = pg_strdup(s);
-   char   *token = strtok(sc, delim);
+   char   *token;
 
-   while (token)
+   while ((token = strsep(&sc, delim)))
{
add_stringlist_item(listhead, token);
-   token = strtok(NULL, delim);
}
free(sc);
 }

base-commit: ae482a7ec521e09bb0dbfc261da6e6170a5ac29f
-