Re: [PATCH] wildmatch: correct isprint and isspace

2012-11-15 Thread Nguyen Thai Ngoc Duy
On Fri, Nov 16, 2012 at 12:13 AM, "Jan H. Schönherr"
 wrote:
>>  #define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \
>>   GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \
>> - GIT_PATHSPEC_MAGIC))
>> + GIT_PATHSPEC_MAGIC) && \
>> + (x) >= 32)
>
> May I suggest the current is_print() implementation in master:
>
> #define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)
>
>
> To summarize my opinion:
>
> I no longer see a reason to correct isspace() (unless somebody with an actual
> use case complains), and a more POSIXly isprint() is already in master.
>
> => Nothing to do. :)


Yeah. I remember to remind myself to check "the implementation in
master" you mentioned but I probably failed at that. Just checked that
isprint() is already in master, and your comment about isspace() use
in wildmatch.c makes sense too. So I'm all for doing nothing.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wildmatch: correct isprint and isspace

2012-11-15 Thread Jan H. Schönherr
Am 15.11.2012 13:19, schrieb Nguyễn Thái Ngọc Duy:
>  On Thu, Nov 15, 2012 at 2:30 AM, René Scharfe  
> wrote:
>  > Nevertheless, it's unfortunate that we have an isspace() that *almost* does
>  > what the widely known thing of the same name does.  I'd shy away from
>  > changing git's version directly, because it's used more than a hundred 
> times
>  > in the code, and estimating the impact of adding \v and \f to it.
>  > Perhaps renaming it to isgitspace() is a good first step, followed by
>  > adding a "standard" version of isspace() for wildmatch?
> 
>  There are just too many call sites of isspace() and there is a risk
>  of new call sites coming in independently. So I think keeping isspace()
>  as-is and using a different name for the standard version is probably
>  a better choice.

After having a closer look, where wildmatch is actually used -- matching
filenames -- and I've not yet seen \v or \f in a filename, it's possibly
unnecessary to do anything about isspace() right now.

(It's probably more an issue that filenames can be localized, and we only
support unlocalized character classes.)

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 02f48f6..d4c3fda 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -486,6 +486,7 @@ extern const unsigned char sane_ctype[256];
>  #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
>  #define isascii(x) (((x) & ~0x7f) == 0)
>  #define isspace(x) sane_istest(x,GIT_SPACE)
> +#define isspace_posix(x) (((x) >= 9 && (x) <= 13) || (x) == 32)
>  #define isdigit(x) sane_istest(x,GIT_DIGIT)
>  #define isalpha(x) sane_istest(x,GIT_ALPHA)
>  #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
> @@ -499,7 +500,8 @@ extern const unsigned char sane_ctype[256];
>  #define isxdigit(x) (hexval_table[x] != -1)

This was from a previous patch, but maybe: "hexval_table[(unsigned char)x]"

>  #define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \
>   GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \
> - GIT_PATHSPEC_MAGIC))
> + GIT_PATHSPEC_MAGIC) && \
> + (x) >= 32)

May I suggest the current is_print() implementation in master:

#define isprint(x) ((x) >= 0x20 && (x) <= 0x7e)


To summarize my opinion:

I no longer see a reason to correct isspace() (unless somebody with an actual
use case complains), and a more POSIXly isprint() is already in master.

=> Nothing to do. :)

Regards
Jan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] wildmatch: correct isprint and isspace

2012-11-15 Thread Nguyễn Thái Ngọc Duy
Current isprint() incorrectly includes control characters 9, 10 and
13, which is fixed by this patch.

Current isspace() lacks 11 and 12. But Git's isspace() has been
designed this way since the beginning and has over 100 call sites
relying on this. Instead of updating isspace() behavior (which could be
tricky as patches from other topics may come in parallel that assume
the old isspace()), a new isspace_posix() is introduced and used by
wildmatch.c. Other part of Git can be converted to use this new
function if it seems appropriate.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Sorry for the late response. I'll reply to everybody in one mail.

 On Wed, Nov 14, 2012 at 1:58 AM, "Jan H. Schönherr"  
wrote:
 > An alternative to switching from 1-byte to 4-byte values (don't we have
 > a 2-byte datatype?), would be to free up GIT_CNTRL and simply do:
 >
 > #define iscntrl(x) ((x) < 0x20)
 
 No. 127 is also a control character.

 On Wed, Nov 14, 2012 at 2:41 AM, Johannes Sixt  wrote:
 > So we have two properties that overlap:
 >
 >   SS
 >
 >
 > You seem to generate partions:
 >
 >XXXYZ
 >
 > then assign individual bits to each partition. Now each entry in the
 > lookup table has only one bit set. Then you define isxxx() to check for
 > one of the two possible bits:
 >
 >iscntrl is X or Y
 >isspace is Y or Z
 >
 > But shouldn't you just assign one bit for S and another one for C, have
 > entries in the lookup table with more than one bit set, and check for
 > only one bit in the isxxx macro?
 >
 > That way you don't run out of bits as easily as you do with this patch.

 I need three sets of characters actually: control, spaces and
 printable (which contains non-control spaces). Making it
 (isspace(x) && (x) >= 32) is simpler and because isprint() is only used in
 wildmatch, I don't need to think about performance penalty (yet).

 On Thu, Nov 15, 2012 at 2:30 AM, René Scharfe  
wrote:
 > Nevertheless, it's unfortunate that we have an isspace() that *almost* does
 > what the widely known thing of the same name does.  I'd shy away from
 > changing git's version directly, because it's used more than a hundred times
 > in the code, and estimating the impact of adding \v and \f to it.
 > Perhaps renaming it to isgitspace() is a good first step, followed by
 > adding a "standard" version of isspace() for wildmatch?

 There are just too many call sites of isspace() and there is a risk
 of new call sites coming in independently. So I think keeping isspace()
 as-is and using a different name for the standard version is probably
 a better choice.

 As the new isspace_posix() is only used by wildmatch, its performance
 as of now is not critical and a simple macro like in this patch is
 probably enough. We can optimize it later if we need to.

 git-compat-util.h | 4 +++-
 wildmatch.c   | 8 ++--
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 02f48f6..d4c3fda 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -486,6 +486,7 @@ extern const unsigned char sane_ctype[256];
 #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
 #define isascii(x) (((x) & ~0x7f) == 0)
 #define isspace(x) sane_istest(x,GIT_SPACE)
+#define isspace_posix(x) (((x) >= 9 && (x) <= 13) || (x) == 32)
 #define isdigit(x) sane_istest(x,GIT_DIGIT)
 #define isalpha(x) sane_istest(x,GIT_ALPHA)
 #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT)
@@ -499,7 +500,8 @@ extern const unsigned char sane_ctype[256];
 #define isxdigit(x) (hexval_table[x] != -1)
 #define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \
GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \
-   GIT_PATHSPEC_MAGIC))
+   GIT_PATHSPEC_MAGIC) && \
+   (x) >= 32)
 #define tolower(x) sane_case((unsigned char)(x), 0x20)
 #define toupper(x) sane_case((unsigned char)(x), 0)
 #define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC)
diff --git a/wildmatch.c b/wildmatch.c
index 3972e26..fd74efd 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -37,11 +37,7 @@ typedef unsigned char uchar;
 # define ISBLANK(c) ((c) == ' ' || (c) == '\t')
 #endif
 
-#ifdef isgraph
-# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
-#else
-# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
-#endif
+#define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace_posix(c))
 
 #define ISPRINT(c) (ISASCII(c) && isprint(c))
 #define ISDIGIT(c) (ISASCII(c) && isdigit(c))
@@ -50,7 +46,7 @@ typedef unsigned char uchar;
 #define ISCNTRL(c) (ISASCII(c) && iscntrl(c))
 #define ISLOWER(c) (ISASCII(c) && islower(c))
 #define ISPUNCT(c) (ISASCII(c) && ispunct(c))
-#define ISSPACE(c) (ISASCII(c) && isspace(c))
+#define ISSPACE(c) (ISASCII(c) && isspace_posix(c))
 #define ISUPPER(c) (ISASCII(c) && isupper(c))
 #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
 
-- 
1.8.0.rc2.23.g1fb49df

--
To unsubscribe from this list: send th