On 7/10/19 1:21 PM, astian wrote: > Bash Version: 5.0 > Patch Level: 3 > Release Status: release > > Description: > > I discovered a curious performance degradation in the combined usage of the > constructs "eval set -- ..." and new-style command substitution. In short, > setting the positional arguments via eval and then iterating over each one > while performing $() command substitution(s) is significantly slower than > not using eval, or not making command substitution, or using `` instead. > > I include below a reduced test script that illustrates the issue. A few > notes: > - The pathological case is "1 1 0". > - I did not observe performance difference in unoptimised builds (-O0). >
> -------------------------- > case 1 1 0 > eval set > real 0m0.002s > user 0m0.000s > sys 0m0.000s > for loop cmdsubst-currency > real 0m0.968s > user 0m0.432s > sys 0m0.148s > -------------------------- > > Observations: > - The pathological case "1 1 0" spends about 10 times more time doing > something in userspace during the loop, relative to the comparable cases > "0 1 0", "0 1 1", and "1 1 1". > - $() seems generally slightly slower than ``, but becomes pathologically > so when preceded with "eval set -- ...". It is slightly slower -- POSIX requires that the shell parse the contents of $(...) to determine that it's a valid script as part of finding the closing `)'. The rules for finding the closing "`" don't have that requirement. > - "eval set -- ..." itself doesn't seem slow at all, but obviously it has > side-effects not captured by the "time" measurement tool. What happens is you end up with a 4900-character command string that you have to parse multiple times. But that's not the worst of it. The gprof output provides a clue. > case 1 1 0 (pathological): > % cumulative self self total > time seconds seconds calls us/call us/call name > 38.89 0.21 0.21 28890 7.27 7.27 set_line_mbstate set_line_mbstate() runs through each command line before parsing, creating a bitmap that indicates whether each element is a single-byte character or part of a multi-byte character. The scanner uses this to determine whether a shell metacharacter should act as a delimiter or get skipped over as part of a multibyte character. For a single run with args `1 1 0', it gets called around 7300 times, with around 2400 of them for the 4900-character string with all the arguments. When you're in a multibyte locale (en_US.UTF-8 is one such), each one of those characters requires a call to mbrlen/mbrtowc. So that ends up being 2400 * 4900 calls to mbrlen. There is something happening here -- there's no way there should be that many calls to set_line_mbstate(), even when you have to save and restore the input line because you have to parse the contents of $(). There must be some combination of the effect of `eval' on the line bitmap and the long string. I'll see what I can figure out. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/