Re: Request to a new feature on read
Any reason to justify this instead of using a simple loop? -- Eduardo Bustamante https://dualbus.me/
Re: Request to a new feature on read
While I was developing a small script, I thought about how to use -N flag to a greater extent. Although -N in its own is very limited. It does serve the purpose but not what I need. I also discussed this in #bash freenode, and got some ideas like: pgas: while read -n1 d;do case $d in '')break;; [0-9])var+=$d;;*) echo error;;esac;done A for loop is probably not such a bad idea either. I'll try and see if I can figure out something. Thanks On Thu, Apr 16, 2015 at 3:55 PM, Eduardo A. Bustamante López dual...@gmail.com wrote: Any reason to justify this instead of using a simple loop? -- Eduardo Bustamante https://dualbus.me/ -- Met vriendelijke groet, Valentin Bajrami
Re: Request to a new feature on read
On Thu, Apr 16, 2015 at 09:39:08AM -0500, Dan Douglas wrote: On Thu, Apr 16, 2015 at 9:32 AM, Greg Wooledge wool...@eeg.ccf.org wrote: On Thu, Apr 16, 2015 at 09:29:56AM -0500, Dan Douglas wrote: I find myself in need of something along the lines of Python's `re.split` and `re.findall` all the time. E.g. splitting an ip into an array of octets. IFS=. read -ra octets $ip Sure, but validation is then separate if needed. There are plenty of applications where you want either a multi-character or non-static delimiter, possibly with pattern matching on the data at the same time. I don't see why such features should be compiled into bash's read builtin. I'd have no problem with adding better splitting/joining/parsing features in a more general context, probably operating on a string variable, but certainly not operating on a file descriptor. Doesn't the underlying C library only guarantee you a single character of lookahead when reading? (Or maybe a single byte. I'm way out of date. My knowledge of C comes from the days when char = byte.) You can't do all this fancy perl-RE-style lookahead stuff on a stream with only a single byte/char of lookahead.
Re: Request to a new feature on read
On Thu, Apr 16, 2015 at 9:50 AM, Greg Wooledge wool...@eeg.ccf.org wrote: I don't see why such features should be compiled into bash's read builtin. I'd have no problem with adding better splitting/joining/parsing features in a more general context, probably operating on a string variable, but certainly not operating on a file descriptor. I don't think they should be part of `read` either. Some way to extend the BASH_REMATCH mechanism would be better. Doesn't the underlying C library only guarantee you a single character of lookahead when reading? (Or maybe a single byte. I'm way out of date. My knowledge of C comes from the days when char = byte.) You can't do all this fancy perl-RE-style lookahead stuff on a stream with only a single byte/char of lookahead. Hm, maybe you're referring to ungetc? IIRC one byte is the only guarantee when dealing with pipes. I don't really care about having it pattern match while reading a stream. To make that work well would probably involve mmap (and even then, only on regular files). Probably the most portable way to support fancier regex is to call into std::regex. Any system with a modern C++ compiler should support ECMAScript regex, which is close to a superset of ERE.
Re: [Help-bash] make function local
On 4/12/15 5:56 PM, Eduardo A. Bustamante López wrote: Oh, you already have lots of things to do to bother with this :-) Anyways, I'll expand them. On Fri, Apr 10, 2015 at 04:35:25PM -0400, Chet Ramey wrote: On 4/10/15 10:13 AM, Eduardo A. Bustamante López wrote: - a faster implementation of the variable lookup code What does this mean, exactly? Optimizing the existing code paths? (Have at it.) Different semantics? Static as opposed to dynamic scoping? Yes. I've been using gprof to study the code paths of some basic functions, and it seems like it spends quite some time in the find_variable() and related functions (IIRC, there was an mt_hash or something function taking up some precious time). I knew that rang a bell somewhere. mt_hash is a function in the bash malloc library that keeps track of all allocations and deallocations in a table. It's part of the debugging that is enabled when you build from the devel code. It's been well-known for a long time that the debugging code in malloc slows bash down considerably; that's why it's not enabled as part of bash releases. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: [Help-bash] make function local
On 4/12/15 5:56 PM, Eduardo A. Bustamante López wrote: Oh, you already have lots of things to do to bother with this :-) Anyways, I'll expand them. On Fri, Apr 10, 2015 at 04:35:25PM -0400, Chet Ramey wrote: On 4/10/15 10:13 AM, Eduardo A. Bustamante López wrote: - a faster implementation of the variable lookup code What does this mean, exactly? Optimizing the existing code paths? (Have at it.) Different semantics? Static as opposed to dynamic scoping? Yes. I've been using gprof to study the code paths of some basic functions, and it seems like it spends quite some time in the find_variable() and related functions (IIRC, there was an mt_hash or something function taking up some precious time). I'm not sure if it might be better to have other kind of data structure for this. TBH, I'm not sure if there's even enough justification for this, other than to make bash startup faster. - a shopt to disable evaluation of shell code in places like arithmetic expansion Remember this thread? http://lists.gnu.org/archive/html/bug-bash/2014-12/msg00158.html Sure, of course. Here's how I summarized the concern: assignment statements in arithmetic expressions that contain array references are also word expanded, almost as if they were executed in an assignment statement context At one point, this was brought up: dualbus@hp ~/t % bash -c 'var=a[\$(ls)]; a=(); a[var]=x; declare -p a' bash: bar baz foo: syntax error in expression (error token is baz foo) I understand the reasons behind it. This time I don't want to debate that :-) But, wouldn't it be nice to have a `arith_expand' or something shopt that when turned off, this happened: OK, but you're going to have to specify it more tightly than that. The first question is how bash treat tokens that look like identifiers in arithemtic expression contexts: do you treat them as variables that may specify expressions, or do you treat them as variables whose values must be integer constants? Then you have to specify which word expansions you'd like expressions to undergo, and which word expansions you'd like array subscripts to undergo in case they're different, and in which contexts you'd like that to happen. The answer to the first question should determine whether and why a[var]=x and a[a[\$(ls)]]=x from your example should behave differently. Or is it some middle ground you want: that identifiers are expanded and the expanded values are treated as expressions, but those expressions don't undergo any word expansions. That still leaves the question of what to do about array subscripts in these expressions. That should be enough to get a discussion started. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Bash performance when declaring variables (was: Re: [Help-bash] make function local)
On Thu, Apr 16, 2015 at 11:07:34AM -0400, Chet Ramey wrote: [...] I knew that rang a bell somewhere. mt_hash is a function in the bash malloc library that keeps track of all allocations and deallocations in a table. It's part of the debugging that is enabled when you build from the devel code. It's been well-known for a long time that the debugging code in malloc slows bash down considerably; that's why it's not enabled as part of bash releases. Actually, this is the post that motivated me to look into this: (yes, the conclusion is idiotic, but I guess the rest of the post is pretty okay). http://spencertipping.com/posts/2013.0814.bash-is-irrecoverably-broken.html Now, there is some truth to what he says: dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++1000)); do declare a$RANDOM$RANDOM=1; done' ./bash -c 'i=0; while ((i++1000)); do declare a$RANDOM$RANDOM=1; done' 0.01s user 0.06s system 93% cpu 0.077 total dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++1)); do declare a$RANDOM$RANDOM=1; done' ./bash -c 'i=0; while ((i++1)); do declare a$RANDOM$RANDOM=1; done' 0.16s user 0.48s system 98% cpu 0.643 total dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++10)); do declare a$RANDOM$RANDOM=1; done' ./bash -c 'i=0; while ((i++10)); do declare a$RANDOM$RANDOM=1; done' 15.44s user 6.51s system 99% cpu 21.959 total I built bash like this: CFLAGS='-pg -g -O0' ./configure --silent make -sj4 DEBUG= MALLOC_DEBUG= To make sure the malloc debugging code doesn't interfere. I got a gprof profile with that last run, which gave: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 71.42 12.0712.07 1100104 0.00 0.00 hash_search 21.18 15.65 3.58 275435 0.00 0.00 morecore 1.63 15.93 0.28 6800525 0.00 0.00 internal_malloc 0.71 16.05 0.12 6200116 0.00 0.00 internal_free 0.59 16.15 0.10 31 0.00 0.00 expand_word_internal 0.24 16.19 0.04 6800474 0.00 0.00 sh_xmalloc 0.18 16.22 0.03 7203779 0.00 0.00 is_basic 0.18 16.25 0.03 1932530 0.00 0.00 is_basic 0.18 16.28 0.03 22 0.00 0.00 subexpr 0.18 16.31 0.03 18 0.00 0.00 find_special_var 0.15 16.33 0.03 pagealign Notice how it spends most of the time in these two functions. Yeah, it's not mt_* like I said, because I did this a time ago and forgot to take notes. Does this matter much? I don't know. Having 100,000 variables declared does seem like something stupid. Still, it shouldn't have that quadratic increase in performance (I didn't even try for 1,000,000 because it was very slow), because it is a hash table. -- Eduardo Bustamante https://dualbus.me/
Re: Request to a new feature on read
On Thu, Apr 16, 2015 at 9:32 AM, Greg Wooledge wool...@eeg.ccf.org wrote: On Thu, Apr 16, 2015 at 09:29:56AM -0500, Dan Douglas wrote: I find myself in need of something along the lines of Python's `re.split` and `re.findall` all the time. E.g. splitting an ip into an array of octets. IFS=. read -ra octets $ip Sure, but validation is then separate if needed. There are plenty of applications where you want either a multi-character or non-static delimiter, possibly with pattern matching on the data at the same time.
Re: Request to a new feature on read
On Thu, Apr 16, 2015 at 09:29:56AM -0500, Dan Douglas wrote: I find myself in need of something along the lines of Python's `re.split` and `re.findall` all the time. E.g. splitting an ip into an array of octets. IFS=. read -ra octets $ip
Request to a new feature on read
Hi, According to ''help read'' we can specify -N[chars] to trigger return automatically. Is it possible to approach read differently? For example: $re is some regular expression read -N$re -p Enter two or three digits to continue getInput The above is much of a pseudo-code but I hope you get the idea. -N in this case should be able to handle a range of 2 or 3 chars. If the regex is satisfied then return should be triggered after 2 chars otherwise wait for the third char. Thanks in advance! -- Met vriendelijke groet / Kind regards, Valentin
Re: Request to a new feature on read
On Thu, Apr 16, 2015 at 8:55 AM, Eduardo A. Bustamante López dual...@gmail.com wrote: Any reason to justify this instead of using a simple loop? I find myself in need of something along the lines of Python's `re.split` and `re.findall` all the time. E.g. splitting an ip into an array of octets. On Thu, Apr 16, 2015 at 5:49 AM, Valentin Bajrami valentin.bajr...@gmail.com wrote: Hi, According to ''help read'' we can specify -N[chars] to trigger return automatically. Is it possible to approach read differently? For example: $re is some regular expression FWIW, ksh has two redirect operators that can be used together with `read` to get something like this. They're somewhat difficult to use IMO: #pattern Seeks forward to the beginning of the next line containing pattern. ##patternThe same as # except that the portion of the file that is skipped is copied to standard output. -- Dan Douglas
Re: [Help-bash] make function local
On 4/16/15 11:43 AM, Dan Douglas wrote: I thought Bash always first splits the identifier from the subscript, then checks which attributes the variable has set. If it has the associative array attribute plus a subscript then the subscript is only processed for expansions and the resulting string is used as the key. If the associative array attribute is not set then the subscript is processed for expansions and the resulting string is passed on to arithmetic evaluation. Am I following the discussion correctly? i.e. if you have `a[b[text]]`, the treatment of `text` is entirely determined by b's attributes. Yes, that's correct. In the case I'm talking about, we're only concerned with indexed arrays and the consequent arithmetic evaluation. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: [Help-bash] make function local
Pierre Gaston wrote: Is there a particular problem you're trying to solve for which local functions would be the appropriate solution? Cleanliness. Not polluting the global namespace. Ensuring the function can't be called from outside a function. It's a trite example, but I do something like: sub gvim () { array orig_args=($@) gv_files=() gv_ops=() int use_tab=0 look_for_ops=1 sub _exec_gvim() { array args ((use_tab)) args=(-p) (( ${#gv_ops[@]:-0} )) args+=(${gv_ops[@]}) (( $# )) args+=($@) command gvim ${args[@]} unset -f _exec_gvim } AFAIK, _exec_gvim, can only be called from within function gvim, no?