Here

https://github.com/pharo-project/pharo/wiki/Contribute-a-fix-to-Pharo

Sebastian

________________________________
From: Steffen Märcker <merk...@web.de>
Sent: Tuesday, June 29, 2021 6:13:00 PM
To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
Subject: [Pharo-users] Re: Fwd: [vwnc] Exception in Regex11 1.4.6

Dear all!

I just checked and found that Regex in Pharo is indeed based on Regex11. It 
suffers from the same bug as the original. I'd like to bring the fix to Pharo. 
As a first-timer, where can I read about the procedure of contributing code?

Kind regards,
Steffen

Steffen Märcker schrieb am Donnerstag, 24. Juni 2021 18:52:16 (+02:00):

Hi!

Does Pharo use the Regex11 package? If yes, has it already diverged from the 
version shipped with VisualWorks?

The reason I am asking is that I just pushed an update to the public store. It 
addresses a bug that prevented $[ to be used in a character class. For details, 
see the excerpt below. Furthermore, you might have an opinion on allowing more 
escape sequences in character classes, don't you?

Kind regards,
Steffen


----- Weitergeleitete Nachricht -----
Von: Steffen Märcker <merk...@web.de>
An: 'VWNC' <v...@cs.uiuc.edu>
Betreff: Re: [vwnc] Exception in Regex11 1.4.6
Datum: Thu Jun 24 2021 18:47:01 GMT+0200 (Mitteleuropäische Sommerzeit)


Hi!

I just published Regex11 version 1.4.7 with the following changes:

1. Fix: Character sets could not contain an opening bracket $[.
2. Fix: Character sets could not contain the characters '[:', e.g. as in 
'[[:something]' asRegex.

I also provided additional tests for the improved functionality. I might tidy 
the code a bit later in a minor version bump.


Just to note that Regex11 uses [[:xxx:]] as a special syntax, which might 
interfere with attempts to allow [[] and []].

Indeed. If I did no mistake, the new version does not break this.


I agree with the idea to allow backslash escaping in character classes too, 
with the default being that backslash followed by any character is parsed as 
that character.

I also like the idea of allowing more backslash escaping in character classes. 
However, I still have the bad feeling that this might change the semantics of 
existing code. Hence, I refrained from implementing this right away until I am 
more confident that this does not break other peoples stuff.


Currently only a few explicitly defined backslash escapes are recognized, 
forcing the user to remember whether a given character can be used as-is in a 
given context, or must be escaped.



A couple of gotchas (probably not applicable in a character set?):

\<           an empty string at the beginning of a word

\>           an empty string at the end of a word

Thanks, I'll keep them in mind and consider them when I decide to implement the 
changes.

OT: I also noticed that repetition, e.g. '.{5}' behaves strange. For instance, 
'.{{5}}' should match 'a{{{{{}' but it doesn't. Has anyone an opinion on that 
one?

Best regards, Steffen

--
Gesendet mit Vivaldi Mail. Laden Sie Vivaldi kostenlos von vivaldi.com herunter.

Reply via email to