Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-07-08 Thread Kris Kennaway

Gábor Kövesdán wrote:


Well, it seems you have missed the first nits of the discussion. GNU 
grep has some regression test, which doesn't pass completely itself 
either. :) I've mentioned here that I used those tests to find out 
what incompatible options are there. Unfortunately, I have to say 
that BSD grep won't pass all of those, because GNU allows some 
non-standard regexes, which are rejected by our libc-regex library, 
like for example (a|) is not standard because it has an empty 
subexpression. First, I tried to pre-edit such expression in the 
code. It was ugly enough but I thought: Ok, this code is pretty 
ugly, but compatibility is important, maybe we can later revise 
and/or change our regexp library and get rid of these snippets. 
Later, when Andrey pointed it out, I realized that my workarounds 
adressed those incompatibilities but didn't work completely, they 
broke compatibility at other places, thus I just removed them, 
because it was not that easy to fix. The version that I sent you for 
the portbuild test, doesn't have those workarounds. The regression 
test helped though to fix other compatibility issues, like return 
values. All of these trivial things are supposed to be compatible 
now, the only exceptions are the non-standard regexes. That's why I'm 
so curious about the results. If they are inacceptable, we can try to 
build BSD grep with the GNU regexp lib (it's in the tree, as Pedro F. 
Giffuni pointed it out). It doesn't work by just linking with that 
library, so it will need more work and investigation then, not 
speaking about that GNU regex should go one day...


OK, yes I did miss the start of the thread, but I was trying to 
suggest that grep doesn't seem to be functional enough yet and this is 
a way to work on identifying what needs to be fixed.
Could you please send me some logs of ports which build with GNU grep 
but not with BSD grep? That would help me to identify the problems and 
find out if those problems come from non-standard regexes or what's 
happening here?


No, because every port build fails because egrep -v is failing to work 
properly in the management scripts :)  I sent you mail about this already.


Kris

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-07-08 Thread Gábor Kövesdán


Well, it seems you have missed the first nits of the discussion. GNU 
grep has some regression test, which doesn't pass completely itself 
either. :) I've mentioned here that I used those tests to find out 
what incompatible options are there. Unfortunately, I have to say 
that BSD grep won't pass all of those, because GNU allows some 
non-standard regexes, which are rejected by our libc-regex library, 
like for example (a|) is not standard because it has an empty 
subexpression. First, I tried to pre-edit such expression in the 
code. It was ugly enough but I thought: Ok, this code is pretty 
ugly, but compatibility is important, maybe we can later revise 
and/or change our regexp library and get rid of these snippets. 
Later, when Andrey pointed it out, I realized that my workarounds 
adressed those incompatibilities but didn't work completely, they 
broke compatibility at other places, thus I just removed them, 
because it was not that easy to fix. The version that I sent you for 
the portbuild test, doesn't have those workarounds. The regression 
test helped though to fix other compatibility issues, like return 
values. All of these trivial things are supposed to be compatible 
now, the only exceptions are the non-standard regexes. That's why I'm 
so curious about the results. If they are inacceptable, we can try to 
build BSD grep with the GNU regexp lib (it's in the tree, as Pedro F. 
Giffuni pointed it out). It doesn't work by just linking with that 
library, so it will need more work and investigation then, not 
speaking about that GNU regex should go one day...


OK, yes I did miss the start of the thread, but I was trying to 
suggest that grep doesn't seem to be functional enough yet and this is 
a way to work on identifying what needs to be fixed.
Could you please send me some logs of ports which build with GNU grep 
but not with BSD grep? That would help me to identify the problems and 
find out if those problems come from non-standard regexes or what's 
happening here? I've looked at our regex library and it is written by 
Henry Spencer. He has a slightly newer version, but he seems to be 
consequent and the implementation choices are the same, those 
non-standard regexes are still rejected by his library. I've also looked 
at PCRE, which was mentioned in this list. In fact, PCRE actually has a 
POSIX-compliant interface, but it's just the interface, the interpreted 
regexes are still Perl-like.


--
Gabor Kovesdan

EMAIL: [EMAIL PROTECTED]
WWW:   http://www.kovesdan.org

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-07-07 Thread Kris Kennaway

Maxim Sobolev wrote:

Dag-Erling Smørgrav wrote:

Andrey Chernov [EMAIL PROTECTED] writes:
BSD sort as an idea will be a good project indeed, but BSD sort 
implementation we currently have at hand is totally misleading and 
should be rewritten from the scratch, I realize it when long time ago 
I try to localize it for single byte locales.


I think part of the problem is that there aren't enough people who truly
understand localization.  I think I understand most of it, but I'm
pretty sure I *don't* understand how collation works, or is supposed to
work.  Amongst other things, I don't understand how (or whether) it
handles cases like aa and å, which are considered the same letter in
Norwegian.

Perhaps you could create a Localization page on wiki.freebsd.org which
addresses these issues, or at least points to relevant resources?


Good regression test suite which would include cases in different single 
and multi-byte locates for grep/sort/etc could also be a big help.


What regression suites do other implementations have?  e.g. the GNU 
textutils.


Kris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-07-07 Thread Andrey Chernov
On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote:
 What regression suites do other implementations have?  e.g. the GNU 
 textutils.

They basically have regex tests, but nothing locale specific, since locale 
ordering is different from platform to platform (until Unicode Collation 
Algorithm will win).

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-07-07 Thread Kris Kennaway

Andrey Chernov wrote:

On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote:
What regression suites do other implementations have?  e.g. the GNU 
textutils.


They basically have regex tests, but nothing locale specific, since locale 
ordering is different from platform to platform (until Unicode Collation 
Algorithm will win).




OK.  Well at least it is a start - passing those existing regression 
tests should be a goal.


Kris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-07-07 Thread Gabor Kovesdan

Kris Kennaway escribió:

Andrey Chernov wrote:

On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote:
What regression suites do other implementations have?  e.g. the GNU 
textutils.


They basically have regex tests, but nothing locale specific, since 
locale ordering is different from platform to platform (until Unicode 
Collation Algorithm will win).




OK.  Well at least it is a start - passing those existing regression 
tests should be a goal.
Well, it seems you have missed the first nits of the discussion. GNU 
grep has some regression test, which doesn't pass completely itself 
either. :) I've mentioned here that I used those tests to find out what 
incompatible options are there. Unfortunately, I have to say that BSD 
grep won't pass all of those, because GNU allows some non-standard 
regexes, which are rejected by our libc-regex library, like for example 
(a|) is not standard because it has an empty subexpression. First, I 
tried to pre-edit such expression in the code. It was ugly enough but I 
thought: Ok, this code is pretty ugly, but compatibility is important, 
maybe we can later revise and/or change our regexp library and get rid 
of these snippets. Later, when Andrey pointed it out, I realized that 
my workarounds adressed those incompatibilities but didn't work 
completely, they broke compatibility at other places, thus I just 
removed them, because it was not that easy to fix. The version that I 
sent you for the portbuild test, doesn't have those workarounds. The 
regression test helped though to fix other compatibility issues, like 
return values. All of these trivial things are supposed to be compatible 
now, the only exceptions are the non-standard regexes. That's why I'm so 
curious about the results. If they are inacceptable, we can try to build 
BSD grep with the GNU regexp lib (it's in the tree, as Pedro F. Giffuni 
pointed it out). It doesn't work by just linking with that library, so 
it will need more work and investigation then, not speaking about that 
GNU regex should go one day...


Regards,
Gábor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-24 Thread Gabor Kovesdan




1) You can't convert just whole buffer after fread() since it can be 
ended in the middle of multibyte sequence on BUFSIZ edge. Look how GNU 
utils do it.
  

OK, now I haven't thought of this aspect. What about this?

#define iswbinary(ch)   (!iswspace((ch))  iswcntrl((ch)))

int
bin_file(FILE *f)
{
   wint_t   ch = L'\0';
   size_t   i;
   int  ret = 0;

   if (fseek(f, 0L, SEEK_SET) == -1)
   return (0);

   for (i = 0; (i = BUFSIZ)  (ch != WEOF); i++) {
   ch = fgetwc(f);
   if (iswbinary(ch)) {
   ret = 1;
   break;
   }
   }

   rewind(f);
   return (ret);
}

int
mmbin_file(struct mmfile *f)
{
   int  i;
   wchar_t *wbuf;
   size_t   s;

   if ((s = mbstowcs(NULL, f-base, 0)) == -1)
   return (0);

   wbuf = grep_malloc((s + 1) * sizeof(wchar_t));

   if (mbstowcs(wbuf, f-base, s) == -1)
   return (0);

   /* XXX knows too much about mmf internals */
   for (i = 0; i  BUFSIZ  i  f-len; i++)
   if (iswbinary(wbuf[i])) {
   free(wbuf);
   return (1);
   }
   free(wbuf);
   return (0);
}

This should be ok, right?


2) Better use iswspace and iswcntrl instead of iswctype.
  
Ok, changed, thanks. I've also been looking for such functions, but man 
wctype doesn't mention them.



3) util.c needs to be fixed in several places too.
  
Yes, I know, I'm just advancing step by step. The next item will be to 
fix that word boundary handling.


Regards,
Gabor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-24 Thread Andrey Chernov
On Tue, Jun 24, 2008 at 10:32:17PM +0200, Gabor Kovesdan wrote:

 ch = fgetwc(f);

You must clear errno before and handle EILSEQ possible coming after 
fgetwc() somehow. Perhaps by return ret = 1 (binary), I am not sure.
fgetwc() returns WEOF in that case which is not true end of file.

 if ((s = mbstowcs(NULL, f-base, 0)) == -1)
 return (0);

The same here. Check EILSEQ and return 1

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-24 Thread Andrey Chernov
On Wed, Jun 25, 2008 at 01:04:20AM +0400, Andrey Chernov wrote:
  if ((s = mbstowcs(NULL, f-base, 0)) == -1)
  return (0);
 
 The same here. Check EILSEQ and return 1

BTW, do you realyze that this code malloc()s _whole_file_ into memory 
(which not fits for very big files)?
Non-localized old code use mmap, so don't actually malloc() it.
Doe to that perhaps whole mmfile.c should be not used and removed.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-22 Thread Gabor Kovesdan

Andrey Chernov escribió:

On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote:
  

For grep, I believe it should simply be a matter of calling setlocale(),
using wide strings, and using a multibyte regex engine (for appropriate
values of simply).



See my prev reply telling more details. Using wide strings is not so easy, 
f.e. all ctype BSD grep now uses should be converted to wctype, input 
conversion added, etc.
  

I've started to work on doing this big change, the first step:
http://kovesdan.org/patches/grep-i18n.diff

It doesn't work though, each file is recognized as binary with this 
change. Do you have any idea, why this happens? What am I doing wrong?


Regards,
Gabor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-22 Thread Andrey Chernov
On Sun, Jun 22, 2008 at 02:58:17PM +0200, Gabor Kovesdan wrote:
 Andrey Chernov escribi?:
  On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote:

  For grep, I believe it should simply be a matter of calling setlocale(),
  using wide strings, and using a multibyte regex engine (for appropriate
  values of simply).
  
 
  See my prev reply telling more details. Using wide strings is not so easy, 
  f.e. all ctype BSD grep now uses should be converted to wctype, input 
  conversion added, etc.

 I've started to work on doing this big change, the first step:
 http://kovesdan.org/patches/grep-i18n.diff

1) You can't convert just whole buffer after fread() since it can be 
ended in the middle of multibyte sequence on BUFSIZ edge. Look how GNU 
utils do it.

2) Better use iswspace and iswcntrl instead of iswctype.

3) util.c needs to be fixed in several places too.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-19 Thread Konrad Jankowski

Maxim Sobolev wrote:
Good regression test suite which would include cases in different 
single and multi-byte locates for grep/sort/etc could also be a big help.

I will implement test cases for sort in UTF-8 as part of my project.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-19 Thread Dag-Erling Smørgrav
Konrad Jankowski [EMAIL PROTECTED] writes:
 BOM's should be handled at the program level.

Yeah, that makes sense; libc has no way of knowing whether the start of
the string you're processing is actually the start of the file.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Dag-Erling Smørgrav
Andrey Chernov [EMAIL PROTECTED] writes:
 BSD sort as an idea will be a good project indeed, but BSD sort 
 implementation we currently have at hand is totally misleading and should 
 be rewritten from the scratch, I realize it when long time ago I try to 
 localize it for single byte locales.

I think part of the problem is that there aren't enough people who truly
understand localization.  I think I understand most of it, but I'm
pretty sure I *don't* understand how collation works, or is supposed to
work.  Amongst other things, I don't understand how (or whether) it
handles cases like aa and å, which are considered the same letter in
Norwegian.

Perhaps you could create a Localization page on wiki.freebsd.org which
addresses these issues, or at least points to relevant resources?

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 12:58:12PM +0200, Gabor Kovesdan wrote:
  Yes, and once this is done, sort will work out of he box, if it uses 
  strcoll. Already tried on a prototype.
  
 
  Only GNU sort for multibyte chars. BSD sort is programmed too badly and 
  can't be fixed even for single byte sorting.

 BSD sort was going to be the next item of my SoC project. As it is so 
 badly constructed would it be reasonable to give more priority to BSD 
 diff and continue with that one?

BSD sort as an idea will be a good project indeed, but BSD sort 
implementation we currently have at hand is totally misleading and should 
be rewritten from the scratch, I realize it when long time ago I try to 
localize it for single byte locales.

The next nice idea in that area will be updating our regexp engine to most 
recent public code, both for speed and minor compatibility reasons, as 
des@ mentions.

I don't have an opinion for BSD diff.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Andrey Chernov
On Wed, Jun 18, 2008 at 10:22:31AM +0200, Dag-Erling Sm??rgrav wrote:
 I think part of the problem is that there aren't enough people who truly
 understand localization.  I think I understand most of it, but I'm
 pretty sure I *don't* understand how collation works, or is supposed to
 work.  Amongst other things, I don't understand how (or whether) it
 handles cases like aa and ??, which are considered the same letter in
 Norwegian.

Single byte locales collation works through strcoll() via chains, i.e. 
seek all chains starting with given letter. Multibyte locales collation 
currently is not implemented and can't be properly implemented under 
existen single byte framework (it will consume resourses badly in that 
case). I know semi-hacking attempts to implement multibyte collattion via 
single byte one, but all they are only for small ASCII + national alphabet 
subset, rest of Unicode left unsorted.

 Perhaps you could create a Localization page on wiki.freebsd.org which
 addresses these issues, or at least points to relevant resources?

IMHO single byte collating will be obsolete soon when Unicode collation 
will be implemented as SoC project, we needs something like ICU library 
which performs as described below, i.e. unified sorting for all possible 
chars:
http://unicode.org/reports/tr10/

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Dag-Erling Smørgrav
Andrey Chernov [EMAIL PROTECTED] writes:
 Single byte locales collation works through strcoll() via chains, i.e. 
 seek all chains starting with given letter. Multibyte locales collation 
 currently is not implemented and can't be properly implemented under 
 existen single byte framework (it will consume resourses badly in that 
 case). I know semi-hacking attempts to implement multibyte collattion via 
 single byte one, but all they are only for small ASCII + national alphabet 
 subset, rest of Unicode left unsorted.

Does that mean our wcsxfrm() doesn't work?  IIUC, it should convert
wide strings to strings that can be compared directly with strcmp()?

In any case, this is a libc issue, right?  As long as sort / grep uses
the API correctly, they will work fine once libc is fixed?

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Dag-Erling Smørgrav
Konrad Jankowski [EMAIL PROTECTED] writes:
 Dag-Erling Smørgrav [EMAIL PROTECTED] writes:
  In any case, this is a libc issue, right?  As long as sort / grep
  uses the API correctly, they will work fine once libc is fixed?
 Correct.  Given sort uses strcoll()/wcscoll()/strxfrm()/wcsxfrm() and
 call setlocale().  I don't know about grep.

For grep, I believe it should simply be a matter of calling setlocale(),
using wide strings, and using a multibyte regex engine (for appropriate
values of simply).

Another thing I'm unsure about is the matter of input and output.  Do
mbstowcs() / mbtowc() simply trust the input to conform to LC_CTYPE and
convert accordingly?  When reading UTF, do they recognize and handle
BOMs, or simply treat them as zero-width non-breaking space?  In the
absence of a BOM, do they assume that the input follows the system's
native byte order?

(IMHO, the API is broken, since there is no way for the same program to
simultaneously handle streams with different encodings, but I guess it's
too late to fix that)

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Andrey Chernov
On Wed, Jun 18, 2008 at 11:39:10AM +0200, Dag-Erling Sm??rgrav wrote:
 Does that mean our wcsxfrm() doesn't work?  IIUC, it should convert
 wide strings to strings that can be compared directly with strcmp()?

(directly with wcscmp())
For single byte locales wcsxfrm() and wcscoll() works, but for multibyte 
they do just raw binary.

 In any case, this is a libc issue, right?  As long as sort / grep uses
 the API correctly, they will work fine once libc is fixed?

GNU grep and sort will work just fine.

BSD grep not calls setlocale() but even it will be added, BSD grep 
have other places where multibyte is not handled proberly. I already 
notice two of them: ignore case comparison and word boundary sensing, 
perhaps other places exists, I not study the code enough to cach them all.

BSD sort uses upper half of 256 char table on its own purposes so badly 
damage both single byte and multibyte locales and of couse not use 
wcscoll() at all etc.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Andrey Chernov
On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote:
 For grep, I believe it should simply be a matter of calling setlocale(),
 using wide strings, and using a multibyte regex engine (for appropriate
 values of simply).

See my prev reply telling more details. Using wide strings is not so easy, 
f.e. all ctype BSD grep now uses should be converted to wctype, input 
conversion added, etc.

 Another thing I'm unsure about is the matter of input and output.  Do
 mbstowcs() / mbtowc() simply trust the input to conform to LC_CTYPE and
 convert accordingly?  When reading UTF, do they recognize and handle

They return EILSEQ on wrong sequence.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Andrey Chernov
On Wed, Jun 18, 2008 at 11:14:16AM +0200, Konrad Jankowski wrote:
 I think the best place for this type of information is currently my SoC 
 wiki.
 http://wiki.freebsd.org/KonradJankowski/Collation
 I know currently it has very little information, however.
 I can also create another page dedicated to explaining the workings of 
 collation in UCA, given enough interest.

Please look at ICU library. Almong other thing they implement Unicode 
collation:
http://icu-project.org/userguide/Collate_Intro.html

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Sean C. Farley

On Mon, 16 Jun 2008, Dag-Erling Smørgrav wrote:


Doug Barton [EMAIL PROTECTED] writes:

Andrey Chernov [EMAIL PROTECTED] writes:

Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely affects real texts handling.

That is very troubling. In this day and age localization is a
requirement. I cannot imagine being supportive of adding something to
the base that does not have this capability.


We don't have a locale-aware regex implementation.  Henry Spencer wrote
one for Tcl 8, and it seems to be under an MIT-equivalent license, but
I'm not sure how hard it would be to extirpate.  It might be easier to
lift it from PostgreSQL, which also uses it.


Other BSD-license-friendly regex libraries:
1. PCRE (http://www.pcre.org/) (has a POSIX compliant interface too)
2. Oniguruma (http://www.geocities.jp/kosako3/oniguruma/) (from Ruby)
3. Lrexlib (http://lrexlib.luaforge.net/) (no apparent POSIX interface)

Sean
--
[EMAIL PROTECTED]___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-18 Thread Maxim Sobolev

Dag-Erling Smørgrav wrote:

Andrey Chernov [EMAIL PROTECTED] writes:
BSD sort as an idea will be a good project indeed, but BSD sort 
implementation we currently have at hand is totally misleading and should 
be rewritten from the scratch, I realize it when long time ago I try to 
localize it for single byte locales.


I think part of the problem is that there aren't enough people who truly
understand localization.  I think I understand most of it, but I'm
pretty sure I *don't* understand how collation works, or is supposed to
work.  Amongst other things, I don't understand how (or whether) it
handles cases like aa and å, which are considered the same letter in
Norwegian.

Perhaps you could create a Localization page on wiki.freebsd.org which
addresses these issues, or at least points to relevant resources?


Good regression test suite which would include cases in different single 
and multi-byte locates for grep/sort/etc could also be a big help.


Regards,
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Gabor Kovesdan

Andrey Chernov escribió:

On Tue, Jun 17, 2008 at 04:28:10AM +0400, Andrey Chernov wrote:
  
BSD grep is even not bothering to call setlocale(). I can't say is it can 
be simple healed by adding that call, some test suite run is needed.



Quick source inspection reveals that BSD grep operates with single bytes 
only (util.c) so big rewriting with mbrtowc() is needed. Adding 
setlocale() only will makes it only useable with single byte locales, in 
success case.
  
Sorry for the possibly silly question, but what we mean localization 
here in the case of grep? As far as I see, it works with wide chars, 
because the regex library is aware of those. What other aspect needs to 
be taken into account? In case of sort, I understarnd that it should 
explicitly handle wide characters due to the different alphabet of the 
different languages and yes, that seems to be a difficult task...


Gábor
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Dag-Erling Smørgrav
Andrey Chernov [EMAIL PROTECTED] writes:
 Dag-Erling Smørgrav [EMAIL PROTECTED] writes:
  We don't have a locale-aware regex implementation.  Henry Spencer
  wrote one for Tcl 8, and it seems to be under an MIT-equivalent
  license, but I'm not sure how hard it would be to extirpate.  It
  might be easier to lift it from PostgreSQL, which also uses it.
 No, we have it already for many years (libc/regexec).

I hadn't noticed...  ISTR it was an issue back when jphoward wrote his
BSD-licensed grep.

However, it's not the same engine - it's Spencer's old engine with
multibyte support added.  IIRC, it performs very poorly compared to the
GNU regexp engine; it would be interesting to see how well the Tcl
engine performs in comparison.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 09:21:52AM +0200, Gabor Kovesdan wrote:
 Sorry for the possibly silly question, but what we mean localization 
 here in the case of grep? As far as I see, it works with wide chars, 
 because the regex library is aware of those. What other aspect needs to 
 be taken into account? 

See how word boundary handled in util.c there for example. They treat 
buffer as single chars only. wctype should be used instead ctype in all 
places in the code with corresponding mbrtowc conversion.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 11:46:07AM +0400, Andrey Chernov wrote:
 On Tue, Jun 17, 2008 at 09:21:52AM +0200, Gabor Kovesdan wrote:
  Sorry for the possibly silly question, but what we mean localization 
  here in the case of grep? As far as I see, it works with wide chars, 
  because the regex library is aware of those. What other aspect needs to 
  be taken into account? 
 
 See how word boundary handled in util.c there for example. They treat 
 buffer as single chars only. wctype should be used instead ctype in all 
 places in the code with corresponding mbrtowc conversion.

Moreover, ignore case matching there is single byte only too and needs the 
same.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Diomidis Spinellis

Gabor Kovesdan wrote:
In case of sort, I understarnd that it should 
explicitly handle wide characters due to the different alphabet of the 
different languages and yes, that seems to be a difficult task...


Note that Konrad Jankowski in another SoC project is adding to our C 
library support for the Unicode collation algorithm, and importing the 
corresponding language-specific collation tables.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 12:08:38PM +0200, Dag-Erling Sm??rgrav wrote:
 I hadn't noticed...  ISTR it was an issue back when jphoward wrote his
 BSD-licensed grep.

BSD grep have enough (but not fatal, as BSD sort) problems even with 
single byte locales we support initially in our regex (old pre-multibyte 
versions), so lack of multibyte support there is too far seeing issue.

 However, it's not the same engine - it's Spencer's old engine with
 multibyte support added.  IIRC, it performs very poorly compared to the
 GNU regexp engine; it would be interesting to see how well the Tcl
 engine performs in comparison.

Yes. Upgrading to most recent engine will be nice.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Konrad Jankowski

Diomidis Spinellis wrote:

Gabor Kovesdan wrote:
In case of sort, I understarnd that it should explicitly handle wide 
characters due to the different alphabet of the different languages 
and yes, that seems to be a difficult task...


Note that Konrad Jankowski in another SoC project is adding to our C 
library support for the Unicode collation algorithm, and importing the 
corresponding language-specific collation tables.



Yes, and once this is done, sort will work out of he box, if it uses 
strcoll. Already tried on a prototype.


--
Konrad Jankowski
System Network Administrator

Blue Media Sp. z o. o.
+48 58  312
http://www.bluemedia.pl

Niniejsza wiadomość została przekazana w imieniu Blue Media sp. z o. o.
z siedzibą w Sopocie, ul. Haffnera 6, 81-717 Sopot, zarejestrowana
w Sądzie Rejonowym Gdańsk-Północ VIII Wydział Gospodarczy KRS pod
nr 127636, NIP 585-13-51-185, REGON 191781561.

Jeżeli nie jest Pan/Pani zamierzonym i wskazanym adresatem niniejszej
wiadomości, nie może Pan/Pani jej ujawniać,  kopiować, dystrybuować ani
tez w żaden inny sposób udostępniać lub wykorzystywać.
Jeżeli otrzymał/a Pan/Pani tę wiadomość przez pomyłkę prosimy o
niezwłoczne poinformowanie nas o tym i o usunięcie wiadomości.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 10:54:42AM +0200, Konrad Jankowski wrote:
 Diomidis Spinellis wrote:
  Gabor Kovesdan wrote:
  In case of sort, I understarnd that it should explicitly handle wide 
  characters due to the different alphabet of the different languages 
  and yes, that seems to be a difficult task...
 
  Note that Konrad Jankowski in another SoC project is adding to our C 
  library support for the Unicode collation algorithm, and importing the 
  corresponding language-specific collation tables.
 
 
 Yes, and once this is done, sort will work out of he box, if it uses 
 strcoll. Already tried on a prototype.

Only GNU sort for multibyte chars. BSD sort is programmed too badly and 
can't be fixed even for single byte sorting.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Gabor Kovesdan

Andrey Chernov escribió:

On Tue, Jun 17, 2008 at 10:54:42AM +0200, Konrad Jankowski wrote:
  

Diomidis Spinellis wrote:


Gabor Kovesdan wrote:
  
In case of sort, I understarnd that it should explicitly handle wide 
characters due to the different alphabet of the different languages 
and yes, that seems to be a difficult task...

Note that Konrad Jankowski in another SoC project is adding to our C 
library support for the Unicode collation algorithm, and importing the 
corresponding language-specific collation tables.



  
Yes, and once this is done, sort will work out of he box, if it uses 
strcoll. Already tried on a prototype.



Only GNU sort for multibyte chars. BSD sort is programmed too badly and 
can't be fixed even for single byte sorting.
  
BSD sort was going to be the next item of my SoC project. As it is so 
badly constructed would it be reasonable to give more priority to BSD 
diff and continue with that one?


Gábor

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Gabor Kovesdan

Doug Barton escribió:
I use the following construct in portmaster, where pdb=/var/db/pkg, 
origin is set to the origin of a given port, and ro_opd is usually 
empty, but can be another origin directory or the same one. To 
guarantee that you should get some kind of results you can test with 
origin=devel/gettext.


egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS

Obviously this works in portmaster with the gnu grep, but if ro_opd is 
unset with the bsd grep I get:


egrep: empty (sub)expression

I've looked at this and I have a patch with a workaround: 
http://kovesdan.org/patches/grep.dougb.diff
Could you please try it if you have some time? I suppose that it will 
fix your case as it has fixed the 77th Spencer test of the GNU 
regression test suite, which comes with GNU grep.
I'm afraid there isn't a better solution as this regression is coming 
from the different regex interpretations between the GNU regex library 
and our libc regex library. regex(3) says that the RE standard has some 
ambiguities and the particular implementation should make a decision how 
to handle these cases.


Regards,
Gábor

P.S.: Thanks for the WITHOUT_GNU_GREP knob, I hope we will make use of 
it soon, I'm trying to eliminate the remaining regressions.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Gabor Kovesdan

Doug Barton escribió:
I use the following construct in portmaster, where pdb=/var/db/pkg, 
origin is set to the origin of a given port, and ro_opd is usually 
empty, but can be another origin directory or the same one. To 
guarantee that you should get some kind of results you can test with 
origin=devel/gettext.


egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS

Obviously this works in portmaster with the gnu grep, but if ro_opd is 
unset with the bsd grep I get:


egrep: empty (sub)expression

I've looked at this and I have a patch with a workaround: 
http://kovesdan.org/patches/grep.dougb.diff
Could you please try it if you have some time? I suppose that it will 
fix your case as it has fixed the 77th Spencer test of the GNU 
regression test suite, which comes with GNU grep.
I'm afraid there isn't a better solution as this regression is coming 
from the different regex interpretations between the GNU regex library 
and our libc regex library. regex(3) says that the RE standard has some 
ambiguities and the particular implementation should make a decision how 
to handle these cases.


Regards,
Gábor

P.S.: Thanks for the WITHOUT_GNU_GREP knob, I hope we will make use of 
it soon, I've already eliminated some more regressions and I'm fighting 
with the remaining ones.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-17 Thread Jaakko Heinonen
On 2008-06-17, Gabor Kovesdan wrote:
  egrep: empty (sub)expression
 
 I've looked at this and I have a patch with a workaround: 
 http://kovesdan.org/patches/grep.dougb.diff

Unfortunately this breaks things. For example:

$ grep -E '(test||test)' /dev/null
grep: parentheses not balanced
$ grep -E '(test|\|)' /dev/null
grep: parentheses not balanced
$ grep -E '\(|test)' /dev/null
(should give an error but it hangs)

-- 
Jaakko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-16 Thread Joerg Sonnenberger
On Sun, Jun 15, 2008 at 09:11:36PM -0700, Garrett Cooper wrote:
 Now all we need to do is write / import a BSD compatible less(1) into
 FreeBSD =).

less is dual licensed.

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-16 Thread Dag-Erling Smørgrav
Doug Barton [EMAIL PROTECTED] writes:
 Andrey Chernov [EMAIL PROTECTED] writes:
  Please note that BSD grep is not localized (and can't be per design)
  and works only with standard C locale. It may not affect ports
  system processing but shurely affects real texts handling.
 That is very troubling. In this day and age localization is a
 requirement. I cannot imagine being supportive of adding something to
 the base that does not have this capability.

We don't have a locale-aware regex implementation.  Henry Spencer wrote
one for Tcl 8, and it seems to be under an MIT-equivalent license, but
I'm not sure how hard it would be to extirpate.  It might be easier to
lift it from PostgreSQL, which also uses it.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-16 Thread Doug Barton

Dag-Erling Smørgrav wrote:

Doug Barton [EMAIL PROTECTED] writes:

Andrey Chernov [EMAIL PROTECTED] writes:

Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely affects real texts handling.

That is very troubling. In this day and age localization is a
requirement. I cannot imagine being supportive of adding something to
the base that does not have this capability.


We don't have a locale-aware regex implementation.  Henry Spencer wrote
one for Tcl 8, and it seems to be under an MIT-equivalent license, but
I'm not sure how hard it would be to extirpate.  It might be easier to
lift it from PostgreSQL, which also uses it.


Ok, that's a slightly different situation, thanks for clarifying that. 
Sounds like that would be a good project for GSOC next year. :)


Meanwhile, for those who didn't notice last night (*cough*) I added 
the WITHOUT_GNU_GREP knob for src.conf to make it easier for folks to 
test this in HEAD.


hth,

Doug

--

This .signature sanitized for your protection

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-16 Thread Andrey Chernov
On Mon, Jun 16, 2008 at 02:36:23PM +0200, Dag-Erling Sm??rgrav wrote:
   Please note that BSD grep is not localized (and can't be per design)
   and works only with standard C locale. It may not affect ports
   system processing but shurely affects real texts handling.
  That is very troubling. In this day and age localization is a
  requirement. I cannot imagine being supportive of adding something to
  the base that does not have this capability.
 
 We don't have a locale-aware regex implementation.  Henry Spencer wrote
 one for Tcl 8, and it seems to be under an MIT-equivalent license, but
 I'm not sure how hard it would be to extirpate.  It might be easier to
 lift it from PostgreSQL, which also uses it.

No, we have it already for many years (libc/regexec).
BSD grep problem is different one, they use upper half of 256 char table 
on their own.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-16 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 04:22:25AM +0400, Andrey Chernov wrote:
 On Mon, Jun 16, 2008 at 02:36:23PM +0200, Dag-Erling Sm??rgrav wrote:
Please note that BSD grep is not localized (and can't be per design)
and works only with standard C locale. It may not affect ports
system processing but shurely affects real texts handling.
   That is very troubling. In this day and age localization is a
   requirement. I cannot imagine being supportive of adding something to
   the base that does not have this capability.
  
  We don't have a locale-aware regex implementation.  Henry Spencer wrote
  one for Tcl 8, and it seems to be under an MIT-equivalent license, but
  I'm not sure how hard it would be to extirpate.  It might be easier to
  lift it from PostgreSQL, which also uses it.
 
 No, we have it already for many years (libc/regexec).


 BSD grep problem is different one, they use upper half of 256 char table 
 on their own.

Oops, sorry I am thinking about BSD _sort_ when writing last statement.
BSD grep is even not bothering to call setlocale(). I can't say is it can 
be simple healed by adding that call, some test suite run is needed.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-16 Thread Andrey Chernov
On Tue, Jun 17, 2008 at 04:28:10AM +0400, Andrey Chernov wrote:
 BSD grep is even not bothering to call setlocale(). I can't say is it can 
 be simple healed by adding that call, some test suite run is needed.

Quick source inspection reveals that BSD grep operates with single bytes 
only (util.c) so big rewriting with mbrtowc() is needed. Adding 
setlocale() only will makes it only useable with single byte locales, in 
success case.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Doug Barton
I use the following construct in portmaster, where pdb=/var/db/pkg, 
origin is set to the origin of a given port, and ro_opd is usually 
empty, but can be another origin directory or the same one. To 
guarantee that you should get some kind of results you can test with 
origin=devel/gettext.


egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS

Obviously this works in portmaster with the gnu grep, but if ro_opd is 
unset with the bsd grep I get:


egrep: empty (sub)expression

If I set ro_opd to something, it works.


hth,

Doug

--

This .signature sanitized for your protection

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Diomidis Spinellis

Doug Barton wrote:
I use the following construct in portmaster, where pdb=/var/db/pkg, 
origin is set to the origin of a given port, and ro_opd is usually 
empty, but can be another origin directory or the same one. To guarantee 
that you should get some kind of results you can test with 
origin=devel/gettext.


egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS

Obviously this works in portmaster with the gnu grep, but if ro_opd is 
unset with the bsd grep I get:


egrep: empty (sub)expression


To avoid these problems I had proposed to instrument getopt to write 
options passed through argv in a file, build all our ports, and look at 
the options used.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Gabor Kovesdan

Doug Barton escribió:
I use the following construct in portmaster, where pdb=/var/db/pkg, 
origin is set to the origin of a given port, and ro_opd is usually 
empty, but can be another origin directory or the same one. To 
guarantee that you should get some kind of results you can test with 
origin=devel/gettext.


egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS

Obviously this works in portmaster with the gnu grep, but if ro_opd is 
unset with the bsd grep I get:


egrep: empty (sub)expression

If I set ro_opd to something, it works.

Hello Doug,

thanks a lot for you response! I'll look at this issue.

Regards,

--
Gabor Kovesdan

EMAIL: [EMAIL PROTECTED]
WWW:   http://www.kovesdan.org

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Kövesdán Gábor

Diomidis Spinellis escribió:

Doug Barton wrote:
I use the following construct in portmaster, where pdb=/var/db/pkg, 
origin is set to the origin of a given port, and ro_opd is usually 
empty, but can be another origin directory or the same one. To 
guarantee that you should get some kind of results you can test with 
origin=devel/gettext.


egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS

Obviously this works in portmaster with the gnu grep, but if ro_opd 
is unset with the bsd grep I get:


egrep: empty (sub)expression


To avoid these problems I had proposed to instrument getopt to write 
options passed through argv in a file, build all our ports, and look 
at the options used.


Yes, of course, I haven't forgotten about your suggestion. First, I'd
like to process the trivial errors, which come up like this one and make
some tests myself. Then I'll think about this idea and ask portmgr to do
an exp-run with BSD grep.

Regards,

--
Gabor Kovesdan

EMAIL: [EMAIL PROTECTED]
WWW:   http://www.kovesdan.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Andrey Chernov
On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote:
 
 Yes, of course, I haven't forgotten about your suggestion. First, I'd
 like to process the trivial errors, which come up like this one and make
 some tests myself. Then I'll think about this idea and ask portmgr to do
 an exp-run with BSD grep.

Please note that BSD grep is not localized (and can't be per design) and 
works only with standard C locale. It may not affect ports system 
processing but shurely affects real texts handling.

-- 
http://ache.pp.ru/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Garrett Cooper
On Sun, Jun 15, 2008 at 2:26 PM, Andrey Chernov [EMAIL PROTECTED] wrote:
 On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote:

 Yes, of course, I haven't forgotten about your suggestion. First, I'd
 like to process the trivial errors, which come up like this one and make
 some tests myself. Then I'll think about this idea and ask portmgr to do
 an exp-run with BSD grep.

 Please note that BSD grep is not localized (and can't be per design) and
 works only with standard C locale. It may not affect ports system
 processing but shurely affects real texts handling.

Kudos on the hard work Gabor.

Now all we need to do is write / import a BSD compatible less(1) into
FreeBSD =).

-Garrett
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]

2008-06-15 Thread Doug Barton

Andrey Chernov wrote:

On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote:

Yes, of course, I haven't forgotten about your suggestion. First, I'd
like to process the trivial errors, which come up like this one and make
some tests myself. Then I'll think about this idea and ask portmgr to do
an exp-run with BSD grep.


I think that would be very valuable.

Please note that BSD grep is not localized (and can't be per design) and 
works only with standard C locale. It may not affect ports system 
processing but shurely affects real texts handling.


That is very troubling. In this day and age localization is a 
requirement. I cannot imagine being supportive of adding something to 
the base that does not have this capability.


I also found another gratuitous difference in behavior tonight, again 
from portmaster (which uses grep a LOT, which is why I thought to try 
it out in the first place). I do this type of thing in lots of places:


pkg=/var/db/pkg/p5-Net-DNS-0.63
if grep -ql '[EMAIL PROTECTED] ' $pkg/+CONTENTS 2/dev/null; then
do something
fi

With gnu grep I get no output, and if there is a match the if 
statement just runs as I'd expect. With bsd grep I'm getting the name 
of the file as output.


That's 3 strikes and you're out as far as I'm concerned. I think this 
project needs to come a lot closer to feature compatibility with gnu 
grep (including the ability to be localized) before it's ready for a 
wider audience. Of course, that's just my opinion.


Doug

--

This .signature sanitized for your protection

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]