Re: manpage searches "^\s+keyword\s" vs. ???

2019-01-30 Thread Brian Inglis
On 2019-01-30 11:40, Andrey Repin wrote:
>> I've always used "^\s+keyword\s" as a way to search for some keyword
>> starting a section.
> Welcome to the club.
>> On linux it still works, but on cygin it doesn't like '\s' as symbol for
>> white space.
>> Any idea why there might be a difference?
>> I note an option that could do similar in less -- '&pattern' turns OFF
>> single special characters, I tried that on linux and it turned off the '\s'
>> matching space.  That's nice..um how about other way?
>> Well didn't know if there might be some other op to go the other way, but
>> didn't see anything. any ideas?
> I've been puzzled by this since… forever, it seems.
> This is something in less, but all the `man less` says is "regular expression
> library provided by your system".
> I guess this is down to compilation options at this point.

The full class [[:space:]] works as expected.
Probably config options picking the BSD POSIX ERE library without char class esc
shortcuts, rather than allowing Glib, ICU, or PCRE ERE library with char class
esc shortcuts.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: manpage searches "^\s+keyword\s" vs. ???

2019-01-30 Thread Wayne Davison
On Wed, Jan 30, 2019 at 11:09 AM Eric Blake wrote:
> Not so much compilation options of man and less, but rather the code
> used in Cygwin itself for handling regex.

The configuration of less supports many different regex libraries.  I
downloaded the source and ran "./configure --with-regex=pcre"  and
built a nice version of less that fully supports \b and the various
other perl regex extensions.  The output of cygwin's standard "less
--version" indicates it was compiled with posix regex, while linux
suppliers seem to all use gnu regex (which also supports various
perl-isms these days).

I think it would be nice to tweak the less package to be compiled with
pcre regex.

..wayne..

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: manpage searches "^\s+keyword\s" vs. ???

2019-01-30 Thread Eric Blake
On 1/30/19 1:09 PM, Eric Blake wrote:

> \s is a non-standard regex extension - glibc provides it, Cygwin has not
> (at least, historically).  POSIX provides [[:space:]] as a portable
> alternative (although not all libc have implemented all of POSIX yet),
> but is annoyingly long to type.
> 
> Similarly, BSD regex (which is where Cygwin derives its regex from)
> supports the non-standard regex extension [[:<:]] as a word boundary,
> while glibc has the same feature but spelled \<.  I also seem to recall
> a patch in the past to teach Cygwin to respect \< by expanding it to
> [[:<:]] before calling into the BSD-derived code (although I couldn't
> actually find one in a quick search); a similar patch to expand \s into
> [[:space:]] would be a reasonable idea.

Found it:
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/regex/regcomp.c;h=180f599c#l425

and indeed, Cygwin fakes \< and \> but NOT \s or \b (for those, you'd
have to submit a patch to that spot in regcomp.c).

> 
>> I guess this is down to compilation options at this point.
> 
> Not so much compilation options of man and less, but rather the code
> used in Cygwin itself for handling regex.

Also a good read:

https://stackoverflow.com/questions/9792702/does-bash-support-word-boundary-regular-expressions

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: manpage searches "^\s+keyword\s" vs. ???

2019-01-30 Thread Corinna Vinschen
On Jan 30 13:09, Eric Blake wrote:
> On 1/30/19 12:40 PM, Andrey Repin wrote:
> 
> > 
> > I've been puzzled by this since… forever, it seems.
> > This is something in less, but all the `man less` says is "regular 
> > expression
> > library provided by your system".
> 
> \s is a non-standard regex extension - glibc provides it, Cygwin has not
> (at least, historically).  POSIX provides [[:space:]] as a portable
> alternative (although not all libc have implemented all of POSIX yet),
> but is annoyingly long to type.
> 
> Similarly, BSD regex (which is where Cygwin derives its regex from)
> supports the non-standard regex extension [[:<:]] as a word boundary,
> while glibc has the same feature but spelled \<.  I also seem to recall
> a patch in the past to teach Cygwin to respect \< by expanding it to
> [[:<:]] before calling into the BSD-derived code (although I couldn't
> actually find one in a quick search); a similar patch to expand \s into
> [[:space:]] would be a reasonable idea.
> 
> > I guess this is down to compilation options at this point.
> 
> Not so much compilation options of man and less, but rather the code
> used in Cygwin itself for handling regex.

FreeBSD code since we can't use glibc code for licensing reasons.

As usual: Patches welcome!  (Even a complet replacement wouldn't hurt
as long as licensing is no issue)


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer


signature.asc
Description: PGP signature


Re: manpage searches "^\s+keyword\s" vs. ???

2019-01-30 Thread Eric Blake
On 1/30/19 12:40 PM, Andrey Repin wrote:

> 
> I've been puzzled by this since… forever, it seems.
> This is something in less, but all the `man less` says is "regular expression
> library provided by your system".

\s is a non-standard regex extension - glibc provides it, Cygwin has not
(at least, historically).  POSIX provides [[:space:]] as a portable
alternative (although not all libc have implemented all of POSIX yet),
but is annoyingly long to type.

Similarly, BSD regex (which is where Cygwin derives its regex from)
supports the non-standard regex extension [[:<:]] as a word boundary,
while glibc has the same feature but spelled \<.  I also seem to recall
a patch in the past to teach Cygwin to respect \< by expanding it to
[[:<:]] before calling into the BSD-derived code (although I couldn't
actually find one in a quick search); a similar patch to expand \s into
[[:space:]] would be a reasonable idea.

> I guess this is down to compilation options at this point.

Not so much compilation options of man and less, but rather the code
used in Cygwin itself for handling regex.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: manpage searches "^\s+keyword\s" vs. ???

2019-01-30 Thread Andrey Repin
Greetings, L A Walsh!

> I've always used "^\s+keyword\s" as a way to search for some
> keyword starting a section. 

Welcome to the club.

> On linux it still works, but on cygin it doesn't like '\s' as
> symbol for white space.

> Any idea why there might be a difference?

> I note an option that could do similar in less -- '&pattern'
> turns OFF single special characters, I tried that on linux
> and it turned off the '\s' matching space.  That's nice..um
> how about other way?

> Well didn't know if there might be some other op to go the
> other way, but didn't see anything.

> any ideas?

I've been puzzled by this since… forever, it seems.
This is something in less, but all the `man less` says is "regular expression
library provided by your system".
I guess this is down to compilation options at this point.


-- 
With best regards,
Andrey Repin
Wednesday, January 30, 2019 21:36:27

Sorry for my terrible english...
--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



manpage searches "^\s+keyword\s" vs. ???

2019-01-27 Thread L A Walsh
I've always used "^\s+keyword\s" as a way to search for some
keyword starting a section. 

On linux it still works, but on cygin it doesn't like '\s' as
symbol for white space.

Any idea why there might be a difference?

I note an option that could do similar in less -- '&pattern'
turns OFF single special characters, I tried that on linux
and it turned off the '\s' matching space.  That's nice..um
how about other way?

Well didn't know if there might be some other op to go the
other way, but didn't see anything.

any ideas?

thanks...



--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple