Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Gábor Kövesdán wrote: Well, it seems you have missed the first nits of the discussion. GNU grep has some regression test, which doesn't pass completely itself either. :) I've mentioned here that I used those tests to find out what incompatible options are there. Unfortunately, I have to say that BSD grep won't pass all of those, because GNU allows some non-standard regexes, which are rejected by our libc-regex library, like for example (a|) is not standard because it has an empty subexpression. First, I tried to pre-edit such expression in the code. It was ugly enough but I thought: Ok, this code is pretty ugly, but compatibility is important, maybe we can later revise and/or change our regexp library and get rid of these snippets. Later, when Andrey pointed it out, I realized that my workarounds adressed those incompatibilities but didn't work completely, they broke compatibility at other places, thus I just removed them, because it was not that easy to fix. The version that I sent you for the portbuild test, doesn't have those workarounds. The regression test helped though to fix other compatibility issues, like return values. All of these trivial things are supposed to be compatible now, the only exceptions are the non-standard regexes. That's why I'm so curious about the results. If they are inacceptable, we can try to build BSD grep with the GNU regexp lib (it's in the tree, as Pedro F. Giffuni pointed it out). It doesn't work by just linking with that library, so it will need more work and investigation then, not speaking about that GNU regex should go one day... OK, yes I did miss the start of the thread, but I was trying to suggest that grep doesn't seem to be functional enough yet and this is a way to work on identifying what needs to be fixed. Could you please send me some logs of ports which build with GNU grep but not with BSD grep? That would help me to identify the problems and find out if those problems come from non-standard regexes or what's happening here? No, because every port build fails because egrep -v is failing to work properly in the management scripts :) I sent you mail about this already. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Well, it seems you have missed the first nits of the discussion. GNU grep has some regression test, which doesn't pass completely itself either. :) I've mentioned here that I used those tests to find out what incompatible options are there. Unfortunately, I have to say that BSD grep won't pass all of those, because GNU allows some non-standard regexes, which are rejected by our libc-regex library, like for example (a|) is not standard because it has an empty subexpression. First, I tried to pre-edit such expression in the code. It was ugly enough but I thought: Ok, this code is pretty ugly, but compatibility is important, maybe we can later revise and/or change our regexp library and get rid of these snippets. Later, when Andrey pointed it out, I realized that my workarounds adressed those incompatibilities but didn't work completely, they broke compatibility at other places, thus I just removed them, because it was not that easy to fix. The version that I sent you for the portbuild test, doesn't have those workarounds. The regression test helped though to fix other compatibility issues, like return values. All of these trivial things are supposed to be compatible now, the only exceptions are the non-standard regexes. That's why I'm so curious about the results. If they are inacceptable, we can try to build BSD grep with the GNU regexp lib (it's in the tree, as Pedro F. Giffuni pointed it out). It doesn't work by just linking with that library, so it will need more work and investigation then, not speaking about that GNU regex should go one day... OK, yes I did miss the start of the thread, but I was trying to suggest that grep doesn't seem to be functional enough yet and this is a way to work on identifying what needs to be fixed. Could you please send me some logs of ports which build with GNU grep but not with BSD grep? That would help me to identify the problems and find out if those problems come from non-standard regexes or what's happening here? I've looked at our regex library and it is written by Henry Spencer. He has a slightly newer version, but he seems to be consequent and the implementation choices are the same, those non-standard regexes are still rejected by his library. I've also looked at PCRE, which was mentioned in this list. In fact, PCRE actually has a POSIX-compliant interface, but it's just the interface, the interpreted regexes are still Perl-like. -- Gabor Kovesdan EMAIL: [EMAIL PROTECTED] WWW: http://www.kovesdan.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Maxim Sobolev wrote: Dag-Erling Smørgrav wrote: Andrey Chernov [EMAIL PROTECTED] writes: BSD sort as an idea will be a good project indeed, but BSD sort implementation we currently have at hand is totally misleading and should be rewritten from the scratch, I realize it when long time ago I try to localize it for single byte locales. I think part of the problem is that there aren't enough people who truly understand localization. I think I understand most of it, but I'm pretty sure I *don't* understand how collation works, or is supposed to work. Amongst other things, I don't understand how (or whether) it handles cases like aa and å, which are considered the same letter in Norwegian. Perhaps you could create a Localization page on wiki.freebsd.org which addresses these issues, or at least points to relevant resources? Good regression test suite which would include cases in different single and multi-byte locates for grep/sort/etc could also be a big help. What regression suites do other implementations have? e.g. the GNU textutils. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote: What regression suites do other implementations have? e.g. the GNU textutils. They basically have regex tests, but nothing locale specific, since locale ordering is different from platform to platform (until Unicode Collation Algorithm will win). -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov wrote: On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote: What regression suites do other implementations have? e.g. the GNU textutils. They basically have regex tests, but nothing locale specific, since locale ordering is different from platform to platform (until Unicode Collation Algorithm will win). OK. Well at least it is a start - passing those existing regression tests should be a goal. Kris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Kris Kennaway escribió: Andrey Chernov wrote: On Mon, Jul 07, 2008 at 10:06:31PM +0200, Kris Kennaway wrote: What regression suites do other implementations have? e.g. the GNU textutils. They basically have regex tests, but nothing locale specific, since locale ordering is different from platform to platform (until Unicode Collation Algorithm will win). OK. Well at least it is a start - passing those existing regression tests should be a goal. Well, it seems you have missed the first nits of the discussion. GNU grep has some regression test, which doesn't pass completely itself either. :) I've mentioned here that I used those tests to find out what incompatible options are there. Unfortunately, I have to say that BSD grep won't pass all of those, because GNU allows some non-standard regexes, which are rejected by our libc-regex library, like for example (a|) is not standard because it has an empty subexpression. First, I tried to pre-edit such expression in the code. It was ugly enough but I thought: Ok, this code is pretty ugly, but compatibility is important, maybe we can later revise and/or change our regexp library and get rid of these snippets. Later, when Andrey pointed it out, I realized that my workarounds adressed those incompatibilities but didn't work completely, they broke compatibility at other places, thus I just removed them, because it was not that easy to fix. The version that I sent you for the portbuild test, doesn't have those workarounds. The regression test helped though to fix other compatibility issues, like return values. All of these trivial things are supposed to be compatible now, the only exceptions are the non-standard regexes. That's why I'm so curious about the results. If they are inacceptable, we can try to build BSD grep with the GNU regexp lib (it's in the tree, as Pedro F. Giffuni pointed it out). It doesn't work by just linking with that library, so it will need more work and investigation then, not speaking about that GNU regex should go one day... Regards, Gábor ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
1) You can't convert just whole buffer after fread() since it can be ended in the middle of multibyte sequence on BUFSIZ edge. Look how GNU utils do it. OK, now I haven't thought of this aspect. What about this? #define iswbinary(ch) (!iswspace((ch)) iswcntrl((ch))) int bin_file(FILE *f) { wint_t ch = L'\0'; size_t i; int ret = 0; if (fseek(f, 0L, SEEK_SET) == -1) return (0); for (i = 0; (i = BUFSIZ) (ch != WEOF); i++) { ch = fgetwc(f); if (iswbinary(ch)) { ret = 1; break; } } rewind(f); return (ret); } int mmbin_file(struct mmfile *f) { int i; wchar_t *wbuf; size_t s; if ((s = mbstowcs(NULL, f-base, 0)) == -1) return (0); wbuf = grep_malloc((s + 1) * sizeof(wchar_t)); if (mbstowcs(wbuf, f-base, s) == -1) return (0); /* XXX knows too much about mmf internals */ for (i = 0; i BUFSIZ i f-len; i++) if (iswbinary(wbuf[i])) { free(wbuf); return (1); } free(wbuf); return (0); } This should be ok, right? 2) Better use iswspace and iswcntrl instead of iswctype. Ok, changed, thanks. I've also been looking for such functions, but man wctype doesn't mention them. 3) util.c needs to be fixed in several places too. Yes, I know, I'm just advancing step by step. The next item will be to fix that word boundary handling. Regards, Gabor ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 24, 2008 at 10:32:17PM +0200, Gabor Kovesdan wrote: ch = fgetwc(f); You must clear errno before and handle EILSEQ possible coming after fgetwc() somehow. Perhaps by return ret = 1 (binary), I am not sure. fgetwc() returns WEOF in that case which is not true end of file. if ((s = mbstowcs(NULL, f-base, 0)) == -1) return (0); The same here. Check EILSEQ and return 1 -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Wed, Jun 25, 2008 at 01:04:20AM +0400, Andrey Chernov wrote: if ((s = mbstowcs(NULL, f-base, 0)) == -1) return (0); The same here. Check EILSEQ and return 1 BTW, do you realyze that this code malloc()s _whole_file_ into memory (which not fits for very big files)? Non-localized old code use mmap, so don't actually malloc() it. Doe to that perhaps whole mmfile.c should be not used and removed. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov escribió: On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote: For grep, I believe it should simply be a matter of calling setlocale(), using wide strings, and using a multibyte regex engine (for appropriate values of simply). See my prev reply telling more details. Using wide strings is not so easy, f.e. all ctype BSD grep now uses should be converted to wctype, input conversion added, etc. I've started to work on doing this big change, the first step: http://kovesdan.org/patches/grep-i18n.diff It doesn't work though, each file is recognized as binary with this change. Do you have any idea, why this happens? What am I doing wrong? Regards, Gabor ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Sun, Jun 22, 2008 at 02:58:17PM +0200, Gabor Kovesdan wrote: Andrey Chernov escribi?: On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote: For grep, I believe it should simply be a matter of calling setlocale(), using wide strings, and using a multibyte regex engine (for appropriate values of simply). See my prev reply telling more details. Using wide strings is not so easy, f.e. all ctype BSD grep now uses should be converted to wctype, input conversion added, etc. I've started to work on doing this big change, the first step: http://kovesdan.org/patches/grep-i18n.diff 1) You can't convert just whole buffer after fread() since it can be ended in the middle of multibyte sequence on BUFSIZ edge. Look how GNU utils do it. 2) Better use iswspace and iswcntrl instead of iswctype. 3) util.c needs to be fixed in several places too. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Maxim Sobolev wrote: Good regression test suite which would include cases in different single and multi-byte locates for grep/sort/etc could also be a big help. I will implement test cases for sort in UTF-8 as part of my project. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Konrad Jankowski [EMAIL PROTECTED] writes: BOM's should be handled at the program level. Yeah, that makes sense; libc has no way of knowing whether the start of the string you're processing is actually the start of the file. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov [EMAIL PROTECTED] writes: BSD sort as an idea will be a good project indeed, but BSD sort implementation we currently have at hand is totally misleading and should be rewritten from the scratch, I realize it when long time ago I try to localize it for single byte locales. I think part of the problem is that there aren't enough people who truly understand localization. I think I understand most of it, but I'm pretty sure I *don't* understand how collation works, or is supposed to work. Amongst other things, I don't understand how (or whether) it handles cases like aa and å, which are considered the same letter in Norwegian. Perhaps you could create a Localization page on wiki.freebsd.org which addresses these issues, or at least points to relevant resources? DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 12:58:12PM +0200, Gabor Kovesdan wrote: Yes, and once this is done, sort will work out of he box, if it uses strcoll. Already tried on a prototype. Only GNU sort for multibyte chars. BSD sort is programmed too badly and can't be fixed even for single byte sorting. BSD sort was going to be the next item of my SoC project. As it is so badly constructed would it be reasonable to give more priority to BSD diff and continue with that one? BSD sort as an idea will be a good project indeed, but BSD sort implementation we currently have at hand is totally misleading and should be rewritten from the scratch, I realize it when long time ago I try to localize it for single byte locales. The next nice idea in that area will be updating our regexp engine to most recent public code, both for speed and minor compatibility reasons, as des@ mentions. I don't have an opinion for BSD diff. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Wed, Jun 18, 2008 at 10:22:31AM +0200, Dag-Erling Sm??rgrav wrote: I think part of the problem is that there aren't enough people who truly understand localization. I think I understand most of it, but I'm pretty sure I *don't* understand how collation works, or is supposed to work. Amongst other things, I don't understand how (or whether) it handles cases like aa and ??, which are considered the same letter in Norwegian. Single byte locales collation works through strcoll() via chains, i.e. seek all chains starting with given letter. Multibyte locales collation currently is not implemented and can't be properly implemented under existen single byte framework (it will consume resourses badly in that case). I know semi-hacking attempts to implement multibyte collattion via single byte one, but all they are only for small ASCII + national alphabet subset, rest of Unicode left unsorted. Perhaps you could create a Localization page on wiki.freebsd.org which addresses these issues, or at least points to relevant resources? IMHO single byte collating will be obsolete soon when Unicode collation will be implemented as SoC project, we needs something like ICU library which performs as described below, i.e. unified sorting for all possible chars: http://unicode.org/reports/tr10/ -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov [EMAIL PROTECTED] writes: Single byte locales collation works through strcoll() via chains, i.e. seek all chains starting with given letter. Multibyte locales collation currently is not implemented and can't be properly implemented under existen single byte framework (it will consume resourses badly in that case). I know semi-hacking attempts to implement multibyte collattion via single byte one, but all they are only for small ASCII + national alphabet subset, rest of Unicode left unsorted. Does that mean our wcsxfrm() doesn't work? IIUC, it should convert wide strings to strings that can be compared directly with strcmp()? In any case, this is a libc issue, right? As long as sort / grep uses the API correctly, they will work fine once libc is fixed? DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Konrad Jankowski [EMAIL PROTECTED] writes: Dag-Erling Smørgrav [EMAIL PROTECTED] writes: In any case, this is a libc issue, right? As long as sort / grep uses the API correctly, they will work fine once libc is fixed? Correct. Given sort uses strcoll()/wcscoll()/strxfrm()/wcsxfrm() and call setlocale(). I don't know about grep. For grep, I believe it should simply be a matter of calling setlocale(), using wide strings, and using a multibyte regex engine (for appropriate values of simply). Another thing I'm unsure about is the matter of input and output. Do mbstowcs() / mbtowc() simply trust the input to conform to LC_CTYPE and convert accordingly? When reading UTF, do they recognize and handle BOMs, or simply treat them as zero-width non-breaking space? In the absence of a BOM, do they assume that the input follows the system's native byte order? (IMHO, the API is broken, since there is no way for the same program to simultaneously handle streams with different encodings, but I guess it's too late to fix that) DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Wed, Jun 18, 2008 at 11:39:10AM +0200, Dag-Erling Sm??rgrav wrote: Does that mean our wcsxfrm() doesn't work? IIUC, it should convert wide strings to strings that can be compared directly with strcmp()? (directly with wcscmp()) For single byte locales wcsxfrm() and wcscoll() works, but for multibyte they do just raw binary. In any case, this is a libc issue, right? As long as sort / grep uses the API correctly, they will work fine once libc is fixed? GNU grep and sort will work just fine. BSD grep not calls setlocale() but even it will be added, BSD grep have other places where multibyte is not handled proberly. I already notice two of them: ignore case comparison and word boundary sensing, perhaps other places exists, I not study the code enough to cach them all. BSD sort uses upper half of 256 char table on its own purposes so badly damage both single byte and multibyte locales and of couse not use wcscoll() at all etc. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Wed, Jun 18, 2008 at 12:40:24PM +0200, Dag-Erling Sm??rgrav wrote: For grep, I believe it should simply be a matter of calling setlocale(), using wide strings, and using a multibyte regex engine (for appropriate values of simply). See my prev reply telling more details. Using wide strings is not so easy, f.e. all ctype BSD grep now uses should be converted to wctype, input conversion added, etc. Another thing I'm unsure about is the matter of input and output. Do mbstowcs() / mbtowc() simply trust the input to conform to LC_CTYPE and convert accordingly? When reading UTF, do they recognize and handle They return EILSEQ on wrong sequence. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Wed, Jun 18, 2008 at 11:14:16AM +0200, Konrad Jankowski wrote: I think the best place for this type of information is currently my SoC wiki. http://wiki.freebsd.org/KonradJankowski/Collation I know currently it has very little information, however. I can also create another page dedicated to explaining the workings of collation in UCA, given enough interest. Please look at ICU library. Almong other thing they implement Unicode collation: http://icu-project.org/userguide/Collate_Intro.html -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Mon, 16 Jun 2008, Dag-Erling Smørgrav wrote: Doug Barton [EMAIL PROTECTED] writes: Andrey Chernov [EMAIL PROTECTED] writes: Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. That is very troubling. In this day and age localization is a requirement. I cannot imagine being supportive of adding something to the base that does not have this capability. We don't have a locale-aware regex implementation. Henry Spencer wrote one for Tcl 8, and it seems to be under an MIT-equivalent license, but I'm not sure how hard it would be to extirpate. It might be easier to lift it from PostgreSQL, which also uses it. Other BSD-license-friendly regex libraries: 1. PCRE (http://www.pcre.org/) (has a POSIX compliant interface too) 2. Oniguruma (http://www.geocities.jp/kosako3/oniguruma/) (from Ruby) 3. Lrexlib (http://lrexlib.luaforge.net/) (no apparent POSIX interface) Sean -- [EMAIL PROTECTED]___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Dag-Erling Smørgrav wrote: Andrey Chernov [EMAIL PROTECTED] writes: BSD sort as an idea will be a good project indeed, but BSD sort implementation we currently have at hand is totally misleading and should be rewritten from the scratch, I realize it when long time ago I try to localize it for single byte locales. I think part of the problem is that there aren't enough people who truly understand localization. I think I understand most of it, but I'm pretty sure I *don't* understand how collation works, or is supposed to work. Amongst other things, I don't understand how (or whether) it handles cases like aa and å, which are considered the same letter in Norwegian. Perhaps you could create a Localization page on wiki.freebsd.org which addresses these issues, or at least points to relevant resources? Good regression test suite which would include cases in different single and multi-byte locates for grep/sort/etc could also be a big help. Regards, -- Maksym Sobolyev Sippy Software, Inc. Internet Telephony (VoIP) Experts T/F: +1-646-651-1110 Web: http://www.sippysoft.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov escribió: On Tue, Jun 17, 2008 at 04:28:10AM +0400, Andrey Chernov wrote: BSD grep is even not bothering to call setlocale(). I can't say is it can be simple healed by adding that call, some test suite run is needed. Quick source inspection reveals that BSD grep operates with single bytes only (util.c) so big rewriting with mbrtowc() is needed. Adding setlocale() only will makes it only useable with single byte locales, in success case. Sorry for the possibly silly question, but what we mean localization here in the case of grep? As far as I see, it works with wide chars, because the regex library is aware of those. What other aspect needs to be taken into account? In case of sort, I understarnd that it should explicitly handle wide characters due to the different alphabet of the different languages and yes, that seems to be a difficult task... Gábor ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov [EMAIL PROTECTED] writes: Dag-Erling Smørgrav [EMAIL PROTECTED] writes: We don't have a locale-aware regex implementation. Henry Spencer wrote one for Tcl 8, and it seems to be under an MIT-equivalent license, but I'm not sure how hard it would be to extirpate. It might be easier to lift it from PostgreSQL, which also uses it. No, we have it already for many years (libc/regexec). I hadn't noticed... ISTR it was an issue back when jphoward wrote his BSD-licensed grep. However, it's not the same engine - it's Spencer's old engine with multibyte support added. IIRC, it performs very poorly compared to the GNU regexp engine; it would be interesting to see how well the Tcl engine performs in comparison. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 09:21:52AM +0200, Gabor Kovesdan wrote: Sorry for the possibly silly question, but what we mean localization here in the case of grep? As far as I see, it works with wide chars, because the regex library is aware of those. What other aspect needs to be taken into account? See how word boundary handled in util.c there for example. They treat buffer as single chars only. wctype should be used instead ctype in all places in the code with corresponding mbrtowc conversion. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 11:46:07AM +0400, Andrey Chernov wrote: On Tue, Jun 17, 2008 at 09:21:52AM +0200, Gabor Kovesdan wrote: Sorry for the possibly silly question, but what we mean localization here in the case of grep? As far as I see, it works with wide chars, because the regex library is aware of those. What other aspect needs to be taken into account? See how word boundary handled in util.c there for example. They treat buffer as single chars only. wctype should be used instead ctype in all places in the code with corresponding mbrtowc conversion. Moreover, ignore case matching there is single byte only too and needs the same. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Gabor Kovesdan wrote: In case of sort, I understarnd that it should explicitly handle wide characters due to the different alphabet of the different languages and yes, that seems to be a difficult task... Note that Konrad Jankowski in another SoC project is adding to our C library support for the Unicode collation algorithm, and importing the corresponding language-specific collation tables. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 12:08:38PM +0200, Dag-Erling Sm??rgrav wrote: I hadn't noticed... ISTR it was an issue back when jphoward wrote his BSD-licensed grep. BSD grep have enough (but not fatal, as BSD sort) problems even with single byte locales we support initially in our regex (old pre-multibyte versions), so lack of multibyte support there is too far seeing issue. However, it's not the same engine - it's Spencer's old engine with multibyte support added. IIRC, it performs very poorly compared to the GNU regexp engine; it would be interesting to see how well the Tcl engine performs in comparison. Yes. Upgrading to most recent engine will be nice. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Diomidis Spinellis wrote: Gabor Kovesdan wrote: In case of sort, I understarnd that it should explicitly handle wide characters due to the different alphabet of the different languages and yes, that seems to be a difficult task... Note that Konrad Jankowski in another SoC project is adding to our C library support for the Unicode collation algorithm, and importing the corresponding language-specific collation tables. Yes, and once this is done, sort will work out of he box, if it uses strcoll. Already tried on a prototype. -- Konrad Jankowski System Network Administrator Blue Media Sp. z o. o. +48 58 312 http://www.bluemedia.pl Niniejsza wiadomość została przekazana w imieniu Blue Media sp. z o. o. z siedzibą w Sopocie, ul. Haffnera 6, 81-717 Sopot, zarejestrowana w Sądzie Rejonowym Gdańsk-Północ VIII Wydział Gospodarczy KRS pod nr 127636, NIP 585-13-51-185, REGON 191781561. Jeżeli nie jest Pan/Pani zamierzonym i wskazanym adresatem niniejszej wiadomości, nie może Pan/Pani jej ujawniać, kopiować, dystrybuować ani tez w żaden inny sposób udostępniać lub wykorzystywać. Jeżeli otrzymał/a Pan/Pani tę wiadomość przez pomyłkę prosimy o niezwłoczne poinformowanie nas o tym i o usunięcie wiadomości. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 10:54:42AM +0200, Konrad Jankowski wrote: Diomidis Spinellis wrote: Gabor Kovesdan wrote: In case of sort, I understarnd that it should explicitly handle wide characters due to the different alphabet of the different languages and yes, that seems to be a difficult task... Note that Konrad Jankowski in another SoC project is adding to our C library support for the Unicode collation algorithm, and importing the corresponding language-specific collation tables. Yes, and once this is done, sort will work out of he box, if it uses strcoll. Already tried on a prototype. Only GNU sort for multibyte chars. BSD sort is programmed too badly and can't be fixed even for single byte sorting. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov escribió: On Tue, Jun 17, 2008 at 10:54:42AM +0200, Konrad Jankowski wrote: Diomidis Spinellis wrote: Gabor Kovesdan wrote: In case of sort, I understarnd that it should explicitly handle wide characters due to the different alphabet of the different languages and yes, that seems to be a difficult task... Note that Konrad Jankowski in another SoC project is adding to our C library support for the Unicode collation algorithm, and importing the corresponding language-specific collation tables. Yes, and once this is done, sort will work out of he box, if it uses strcoll. Already tried on a prototype. Only GNU sort for multibyte chars. BSD sort is programmed too badly and can't be fixed even for single byte sorting. BSD sort was going to be the next item of my SoC project. As it is so badly constructed would it be reasonable to give more priority to BSD diff and continue with that one? Gábor ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Doug Barton escribió: I use the following construct in portmaster, where pdb=/var/db/pkg, origin is set to the origin of a given port, and ro_opd is usually empty, but can be another origin directory or the same one. To guarantee that you should get some kind of results you can test with origin=devel/gettext. egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS Obviously this works in portmaster with the gnu grep, but if ro_opd is unset with the bsd grep I get: egrep: empty (sub)expression I've looked at this and I have a patch with a workaround: http://kovesdan.org/patches/grep.dougb.diff Could you please try it if you have some time? I suppose that it will fix your case as it has fixed the 77th Spencer test of the GNU regression test suite, which comes with GNU grep. I'm afraid there isn't a better solution as this regression is coming from the different regex interpretations between the GNU regex library and our libc regex library. regex(3) says that the RE standard has some ambiguities and the particular implementation should make a decision how to handle these cases. Regards, Gábor P.S.: Thanks for the WITHOUT_GNU_GREP knob, I hope we will make use of it soon, I'm trying to eliminate the remaining regressions. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Doug Barton escribió: I use the following construct in portmaster, where pdb=/var/db/pkg, origin is set to the origin of a given port, and ro_opd is usually empty, but can be another origin directory or the same one. To guarantee that you should get some kind of results you can test with origin=devel/gettext. egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS Obviously this works in portmaster with the gnu grep, but if ro_opd is unset with the bsd grep I get: egrep: empty (sub)expression I've looked at this and I have a patch with a workaround: http://kovesdan.org/patches/grep.dougb.diff Could you please try it if you have some time? I suppose that it will fix your case as it has fixed the 77th Spencer test of the GNU regression test suite, which comes with GNU grep. I'm afraid there isn't a better solution as this regression is coming from the different regex interpretations between the GNU regex library and our libc regex library. regex(3) says that the RE standard has some ambiguities and the particular implementation should make a decision how to handle these cases. Regards, Gábor P.S.: Thanks for the WITHOUT_GNU_GREP knob, I hope we will make use of it soon, I've already eliminated some more regressions and I'm fighting with the remaining ones. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On 2008-06-17, Gabor Kovesdan wrote: egrep: empty (sub)expression I've looked at this and I have a patch with a workaround: http://kovesdan.org/patches/grep.dougb.diff Unfortunately this breaks things. For example: $ grep -E '(test||test)' /dev/null grep: parentheses not balanced $ grep -E '(test|\|)' /dev/null grep: parentheses not balanced $ grep -E '\(|test)' /dev/null (should give an error but it hangs) -- Jaakko ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Sun, Jun 15, 2008 at 09:11:36PM -0700, Garrett Cooper wrote: Now all we need to do is write / import a BSD compatible less(1) into FreeBSD =). less is dual licensed. Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Doug Barton [EMAIL PROTECTED] writes: Andrey Chernov [EMAIL PROTECTED] writes: Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. That is very troubling. In this day and age localization is a requirement. I cannot imagine being supportive of adding something to the base that does not have this capability. We don't have a locale-aware regex implementation. Henry Spencer wrote one for Tcl 8, and it seems to be under an MIT-equivalent license, but I'm not sure how hard it would be to extirpate. It might be easier to lift it from PostgreSQL, which also uses it. DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Dag-Erling Smørgrav wrote: Doug Barton [EMAIL PROTECTED] writes: Andrey Chernov [EMAIL PROTECTED] writes: Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. That is very troubling. In this day and age localization is a requirement. I cannot imagine being supportive of adding something to the base that does not have this capability. We don't have a locale-aware regex implementation. Henry Spencer wrote one for Tcl 8, and it seems to be under an MIT-equivalent license, but I'm not sure how hard it would be to extirpate. It might be easier to lift it from PostgreSQL, which also uses it. Ok, that's a slightly different situation, thanks for clarifying that. Sounds like that would be a good project for GSOC next year. :) Meanwhile, for those who didn't notice last night (*cough*) I added the WITHOUT_GNU_GREP knob for src.conf to make it easier for folks to test this in HEAD. hth, Doug -- This .signature sanitized for your protection ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Mon, Jun 16, 2008 at 02:36:23PM +0200, Dag-Erling Sm??rgrav wrote: Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. That is very troubling. In this day and age localization is a requirement. I cannot imagine being supportive of adding something to the base that does not have this capability. We don't have a locale-aware regex implementation. Henry Spencer wrote one for Tcl 8, and it seems to be under an MIT-equivalent license, but I'm not sure how hard it would be to extirpate. It might be easier to lift it from PostgreSQL, which also uses it. No, we have it already for many years (libc/regexec). BSD grep problem is different one, they use upper half of 256 char table on their own. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 04:22:25AM +0400, Andrey Chernov wrote: On Mon, Jun 16, 2008 at 02:36:23PM +0200, Dag-Erling Sm??rgrav wrote: Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. That is very troubling. In this day and age localization is a requirement. I cannot imagine being supportive of adding something to the base that does not have this capability. We don't have a locale-aware regex implementation. Henry Spencer wrote one for Tcl 8, and it seems to be under an MIT-equivalent license, but I'm not sure how hard it would be to extirpate. It might be easier to lift it from PostgreSQL, which also uses it. No, we have it already for many years (libc/regexec). BSD grep problem is different one, they use upper half of 256 char table on their own. Oops, sorry I am thinking about BSD _sort_ when writing last statement. BSD grep is even not bothering to call setlocale(). I can't say is it can be simple healed by adding that call, some test suite run is needed. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Tue, Jun 17, 2008 at 04:28:10AM +0400, Andrey Chernov wrote: BSD grep is even not bothering to call setlocale(). I can't say is it can be simple healed by adding that call, some test suite run is needed. Quick source inspection reveals that BSD grep operates with single bytes only (util.c) so big rewriting with mbrtowc() is needed. Adding setlocale() only will makes it only useable with single byte locales, in success case. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
I use the following construct in portmaster, where pdb=/var/db/pkg, origin is set to the origin of a given port, and ro_opd is usually empty, but can be another origin directory or the same one. To guarantee that you should get some kind of results you can test with origin=devel/gettext. egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS Obviously this works in portmaster with the gnu grep, but if ro_opd is unset with the bsd grep I get: egrep: empty (sub)expression If I set ro_opd to something, it works. hth, Doug -- This .signature sanitized for your protection ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Doug Barton wrote: I use the following construct in portmaster, where pdb=/var/db/pkg, origin is set to the origin of a given port, and ro_opd is usually empty, but can be another origin directory or the same one. To guarantee that you should get some kind of results you can test with origin=devel/gettext. egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS Obviously this works in portmaster with the gnu grep, but if ro_opd is unset with the bsd grep I get: egrep: empty (sub)expression To avoid these problems I had proposed to instrument getopt to write options passed through argv in a file, build all our ports, and look at the options used. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Doug Barton escribió: I use the following construct in portmaster, where pdb=/var/db/pkg, origin is set to the origin of a given port, and ro_opd is usually empty, but can be another origin directory or the same one. To guarantee that you should get some kind of results you can test with origin=devel/gettext. egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS Obviously this works in portmaster with the gnu grep, but if ro_opd is unset with the bsd grep I get: egrep: empty (sub)expression If I set ro_opd to something, it works. Hello Doug, thanks a lot for you response! I'll look at this issue. Regards, -- Gabor Kovesdan EMAIL: [EMAIL PROTECTED] WWW: http://www.kovesdan.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Diomidis Spinellis escribió: Doug Barton wrote: I use the following construct in portmaster, where pdb=/var/db/pkg, origin is set to the origin of a given port, and ro_opd is usually empty, but can be another origin directory or the same one. To guarantee that you should get some kind of results you can test with origin=devel/gettext. egrep -l DEPORIGIN:($origin|$ro_opd)$ $pdb/*/+CONTENTS Obviously this works in portmaster with the gnu grep, but if ro_opd is unset with the bsd grep I get: egrep: empty (sub)expression To avoid these problems I had proposed to instrument getopt to write options passed through argv in a file, build all our ports, and look at the options used. Yes, of course, I haven't forgotten about your suggestion. First, I'd like to process the trivial errors, which come up like this one and make some tests myself. Then I'll think about this idea and ask portmgr to do an exp-run with BSD grep. Regards, -- Gabor Kovesdan EMAIL: [EMAIL PROTECTED] WWW: http://www.kovesdan.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote: Yes, of course, I haven't forgotten about your suggestion. First, I'd like to process the trivial errors, which come up like this one and make some tests myself. Then I'll think about this idea and ask portmgr to do an exp-run with BSD grep. Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. -- http://ache.pp.ru/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
On Sun, Jun 15, 2008 at 2:26 PM, Andrey Chernov [EMAIL PROTECTED] wrote: On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote: Yes, of course, I haven't forgotten about your suggestion. First, I'd like to process the trivial errors, which come up like this one and make some tests myself. Then I'll think about this idea and ask portmgr to do an exp-run with BSD grep. Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. Kudos on the hard work Gabor. Now all we need to do is write / import a BSD compatible less(1) into FreeBSD =). -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CFT: BSD-licensed grep [Fwd: cvs commit: ports/textproc/bsdgrep Makefile distinfo]
Andrey Chernov wrote: On Sun, Jun 15, 2008 at 09:17:01PM +0200, K?vesd?n G?bor wrote: Yes, of course, I haven't forgotten about your suggestion. First, I'd like to process the trivial errors, which come up like this one and make some tests myself. Then I'll think about this idea and ask portmgr to do an exp-run with BSD grep. I think that would be very valuable. Please note that BSD grep is not localized (and can't be per design) and works only with standard C locale. It may not affect ports system processing but shurely affects real texts handling. That is very troubling. In this day and age localization is a requirement. I cannot imagine being supportive of adding something to the base that does not have this capability. I also found another gratuitous difference in behavior tonight, again from portmaster (which uses grep a LOT, which is why I thought to try it out in the first place). I do this type of thing in lots of places: pkg=/var/db/pkg/p5-Net-DNS-0.63 if grep -ql '[EMAIL PROTECTED] ' $pkg/+CONTENTS 2/dev/null; then do something fi With gnu grep I get no output, and if there is a match the if statement just runs as I'd expect. With bsd grep I'm getting the name of the file as output. That's 3 strikes and you're out as far as I'm concerned. I think this project needs to come a lot closer to feature compatibility with gnu grep (including the ability to be localized) before it's ready for a wider audience. Of course, that's just my opinion. Doug -- This .signature sanitized for your protection ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]