luigi scarso <[email protected]> a écrit: > On Mon, Jun 17, 2013 at 4:54 PM, Paul Isambert <[email protected]> wrote: > > > luigi scarso <[email protected]> a écrit: > > > On Mon, Jun 17, 2013 at 2:43 PM, Paul Isambert <[email protected]> > > wrote: > > > > > > > Hello all, > > > > > > > > This is not really a LuaTeX question, but I ask it here anyway since a > > > > lot of knowledgeable people read this list. > > > > > > > > I’ve been surprised to discover that > > > > > > > > print(string.gsub('abc', '.*', '(%0)')) > > > > > > > > returns > > > > > > > > (abc)() > > > > > > > > (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d > > > > expect > > > > > > > > (abc) > > > > > > > > > > > > > > myabe this can help > > > > > > > print(string.gsub("abc","%s*","(%0)")) > > > ()a()b()c() 4 > > > > > > > print(string.gsub("abc","%S*","(%0)")) > > > (abc)() 2 > > > > > > """ > > > A pattern item can be > > > > > > a single character class followed by '*', which matches 0 or more > > > repetitions of characters in the class. These repetition items will > > always > > > match the longest possible sequence; > > > """ > > > > Thank you Luigi, but “*” has the same definition in other languages, > > including those where there is no match on a final empty string. > > > > As for your first example, all languages behave the same as far as I > > can tell, as expected. > > > > Best, > > Paul > > > > > $ perl -e '$x="abc"; @w=($x=~ /(.*)/g); print "tot. matches:", scalar(@w), > " matches:($w[0])($w[1])\n"' > tot. matches:2 matches:(abc)() > > $ perl -e '$x="abc"; @w=($x=~ /(.*)/); print "tot. matches:", scalar(@w), > " matches:($w[0])($w[1])\n"' > tot. matches:1 matches:(abc)() > > in perl > "the modifier //g stands for global matching and allows the matching > operator to match within a string as many times as possible" > and I think it corresponds to > "These repetition items will always match the longest possible sequence;" > of pattern.
Thanks again, Luigi... but again, that doesn’t explain away the problem. Actually, I don’t think “g” corresponds to matching the longest possible sequence (simply matching as many times as possible instead of only once), but anyway a similar “g” was included in my Vim and sed codes; as for Python, “re.sub()” replaces several times by default, like Lua’s “string.gsub()”. As far as I can tell, all my code snippets were equivalent, meaning “replace X with Y as many times as possible”; so the question really is: why do some languages seem to consider that there is a “one more time” (the empty string) once the input string has (apparently) been completely consumed? Best, Paul
