Re: Request to a new feature on read

2015-04-16 Thread Eduardo A . Bustamante López
Any reason to justify this instead of using a simple loop?

-- 
Eduardo Bustamante
https://dualbus.me/



Re: Request to a new feature on read

2015-04-16 Thread Valentin Bajrami
While I was  developing a small script,  I thought about how to use -N flag
to a greater extent. Although  -N in its own is very limited. It does serve
the purpose but not what I need.  I also discussed this in #bash freenode,
and got some ideas like:

pgas:   while read -n1 d;do case $d in '')break;;  [0-9])var+=$d;;*) echo
error;;esac;done

A for loop is probably not such a bad idea either. I'll try and see if I
can figure out something.

Thanks

On Thu, Apr 16, 2015 at 3:55 PM, Eduardo A. Bustamante López 
dual...@gmail.com wrote:

 Any reason to justify this instead of using a simple loop?

 --
 Eduardo Bustamante
 https://dualbus.me/




-- 
Met vriendelijke groet,

Valentin Bajrami


Re: Request to a new feature on read

2015-04-16 Thread Greg Wooledge
On Thu, Apr 16, 2015 at 09:39:08AM -0500, Dan Douglas wrote:
 On Thu, Apr 16, 2015 at 9:32 AM, Greg Wooledge wool...@eeg.ccf.org wrote:
  On Thu, Apr 16, 2015 at 09:29:56AM -0500, Dan Douglas wrote:
  I find myself in need of something along the lines of Python's
  `re.split` and `re.findall` all the time. E.g. splitting an ip into an
  array of octets.
 
  IFS=. read -ra octets  $ip
 
 Sure, but validation is then separate if needed. There are plenty of
 applications where you want either a multi-character or non-static
 delimiter, possibly with pattern matching on the data at the same
 time.

I don't see why such features should be compiled into bash's read builtin.
I'd have no problem with adding better splitting/joining/parsing features
in a more general context, probably operating on a string variable, but
certainly not operating on a file descriptor.

Doesn't the underlying C library only guarantee you a single character of
lookahead when reading?  (Or maybe a single byte.  I'm way out of date.
My knowledge of C comes from the days when char = byte.)  You can't do
all this fancy perl-RE-style lookahead stuff on a stream with only a
single byte/char of lookahead.



Re: Request to a new feature on read

2015-04-16 Thread Dan Douglas
On Thu, Apr 16, 2015 at 9:50 AM, Greg Wooledge wool...@eeg.ccf.org wrote:
 I don't see why such features should be compiled into bash's read builtin.
 I'd have no problem with adding better splitting/joining/parsing features
 in a more general context, probably operating on a string variable, but
 certainly not operating on a file descriptor.

I don't think they should be part of `read` either. Some way to extend
the BASH_REMATCH mechanism would be better.

 Doesn't the underlying C library only guarantee you a single character of
 lookahead when reading?  (Or maybe a single byte.  I'm way out of date.
 My knowledge of C comes from the days when char = byte.)  You can't do
 all this fancy perl-RE-style lookahead stuff on a stream with only a
 single byte/char of lookahead.

Hm, maybe you're referring to ungetc? IIRC one byte is the only
guarantee when dealing with pipes. I don't really care about having it
pattern match while reading a stream. To make that work well would
probably involve mmap (and even then, only on regular files).

Probably the most portable way to support fancier regex is to call
into std::regex. Any system with a modern C++ compiler should support
ECMAScript regex, which is close to a superset of ERE.



Re: [Help-bash] make function local

2015-04-16 Thread Chet Ramey
On 4/12/15 5:56 PM, Eduardo A. Bustamante López wrote:
 Oh, you already have lots of things to do to bother with this :-)
 
 Anyways, I'll expand them.
 
 On Fri, Apr 10, 2015 at 04:35:25PM -0400, Chet Ramey wrote:
 On 4/10/15 10:13 AM, Eduardo A. Bustamante López wrote:

 - a faster implementation of the variable lookup code

 What does this mean, exactly?  Optimizing the existing code paths? (Have at
 it.)  Different semantics?  Static as opposed to dynamic scoping?
 
 Yes. I've been using gprof to study the code paths of some basic functions, 
 and
 it seems like it spends quite some time in the find_variable() and related
 functions (IIRC, there was an mt_hash or something function taking up some
 precious time). 

I knew that rang a bell somewhere.  mt_hash is a function in the bash
malloc library that keeps track of all allocations and deallocations in
a table.  It's part of the debugging that is enabled when you build from
the devel code.  It's been well-known for a long time that the debugging
code in malloc slows bash down considerably; that's why it's not enabled
as part of bash releases.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: [Help-bash] make function local

2015-04-16 Thread Chet Ramey
On 4/12/15 5:56 PM, Eduardo A. Bustamante López wrote:
 Oh, you already have lots of things to do to bother with this :-)
 
 Anyways, I'll expand them.
 
 On Fri, Apr 10, 2015 at 04:35:25PM -0400, Chet Ramey wrote:
 On 4/10/15 10:13 AM, Eduardo A. Bustamante López wrote:

 - a faster implementation of the variable lookup code

 What does this mean, exactly?  Optimizing the existing code paths? (Have at
 it.)  Different semantics?  Static as opposed to dynamic scoping?
 
 Yes. I've been using gprof to study the code paths of some basic functions, 
 and
 it seems like it spends quite some time in the find_variable() and related
 functions (IIRC, there was an mt_hash or something function taking up some
 precious time). I'm not sure if it might be better to have other kind of data
 structure for this. TBH, I'm not sure if there's even enough justification for
 this, other than to make bash startup faster.
 
 - a shopt to disable evaluation of shell code in places like arithmetic
 expansion
 
 Remember this thread?
 http://lists.gnu.org/archive/html/bug-bash/2014-12/msg00158.html

Sure, of course.  Here's how I summarized the concern:

assignment statements in arithmetic expressions
that contain array references are also word expanded, almost as if they
were executed in an assignment statement context



 
 At one point, this was brought up:
 
 dualbus@hp ~/t % bash -c 'var=a[\$(ls)]; a=(); a[var]=x; declare -p a' 
 bash: bar baz foo: syntax error in expression (error token is baz foo)
 
 I understand the reasons behind it. This time I don't want to debate that :-)
 But, wouldn't it be nice to have a `arith_expand' or something shopt that when
 turned off, this happened:

OK, but you're going to have to specify it more tightly than that.  The
first question is how bash treat tokens that look like identifiers in
arithemtic expression contexts: do you treat them as variables that may
specify expressions, or do you treat them as variables whose values must
be integer constants?  Then you have to specify which word expansions
you'd like expressions to undergo, and which word expansions you'd like
array subscripts to undergo in case they're different, and in which
contexts you'd like that to happen.

The answer to the first question should determine whether and why

a[var]=x

and

a[a[\$(ls)]]=x

from your example should behave differently.

Or is it some middle ground you want: that identifiers are expanded and
the expanded values are treated as expressions, but those expressions
don't undergo any word expansions.  That still leaves the question of
what to do about array subscripts in these expressions.

That should be enough to get a discussion started.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Bash performance when declaring variables (was: Re: [Help-bash] make function local)

2015-04-16 Thread Eduardo A . Bustamante López
On Thu, Apr 16, 2015 at 11:07:34AM -0400, Chet Ramey wrote:
[...]
 I knew that rang a bell somewhere.  mt_hash is a function in the bash
 malloc library that keeps track of all allocations and deallocations in
 a table.  It's part of the debugging that is enabled when you build from
 the devel code.  It's been well-known for a long time that the debugging
 code in malloc slows bash down considerably; that's why it's not enabled
 as part of bash releases.

Actually, this is the post that motivated me to look into this:
(yes, the conclusion is idiotic, but I guess the rest of the post is pretty
okay).
http://spencertipping.com/posts/2013.0814.bash-is-irrecoverably-broken.html

Now, there is some truth to what he says:

dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++1000)); do 
declare a$RANDOM$RANDOM=1; done' 
./bash -c 'i=0; while ((i++1000)); do declare a$RANDOM$RANDOM=1; done'  0.01s 
user 0.06s system 93% cpu 0.077 total
dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++1)); do 
declare a$RANDOM$RANDOM=1; done'
./bash -c 'i=0; while ((i++1)); do declare a$RANDOM$RANDOM=1; done'  0.16s 
user 0.48s system 98% cpu 0.643 total
dualbus@yaqui ...src/gnu/bash % time ./bash -c 'i=0; while ((i++10)); do 
declare a$RANDOM$RANDOM=1; done'
./bash -c 'i=0; while ((i++10)); do declare a$RANDOM$RANDOM=1; done'  
15.44s user 6.51s system 99% cpu 21.959 total

I built bash like this:

CFLAGS='-pg -g -O0' ./configure --silent  make -sj4 DEBUG= MALLOC_DEBUG=

To make sure the malloc debugging code doesn't interfere.

I got a gprof profile with that last run, which gave:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 71.42 12.0712.07  1100104 0.00 0.00  hash_search
 21.18 15.65 3.58   275435 0.00 0.00  morecore
  1.63 15.93 0.28  6800525 0.00 0.00  internal_malloc
  0.71 16.05 0.12  6200116 0.00 0.00  internal_free
  0.59 16.15 0.10   31 0.00 0.00  expand_word_internal
  0.24 16.19 0.04  6800474 0.00 0.00  sh_xmalloc
  0.18 16.22 0.03  7203779 0.00 0.00  is_basic
  0.18 16.25 0.03  1932530 0.00 0.00  is_basic
  0.18 16.28 0.03   22 0.00 0.00  subexpr
  0.18 16.31 0.03   18 0.00 0.00  find_special_var
  0.15 16.33 0.03 pagealign

Notice how it spends most of the time in these two functions. Yeah, it's not
mt_* like I said, because I did this a time ago and forgot to take notes.


Does this matter much? I don't know. Having 100,000 variables declared does
seem like something stupid. Still, it shouldn't have that quadratic increase in
performance (I didn't even try for 1,000,000 because it was very slow), because
it is a hash table.

-- 
Eduardo Bustamante
https://dualbus.me/



Re: Request to a new feature on read

2015-04-16 Thread Dan Douglas
On Thu, Apr 16, 2015 at 9:32 AM, Greg Wooledge wool...@eeg.ccf.org wrote:
 On Thu, Apr 16, 2015 at 09:29:56AM -0500, Dan Douglas wrote:
 I find myself in need of something along the lines of Python's
 `re.split` and `re.findall` all the time. E.g. splitting an ip into an
 array of octets.

 IFS=. read -ra octets  $ip

Sure, but validation is then separate if needed. There are plenty of
applications where you want either a multi-character or non-static
delimiter, possibly with pattern matching on the data at the same
time.



Re: Request to a new feature on read

2015-04-16 Thread Greg Wooledge
On Thu, Apr 16, 2015 at 09:29:56AM -0500, Dan Douglas wrote:
 I find myself in need of something along the lines of Python's
 `re.split` and `re.findall` all the time. E.g. splitting an ip into an
 array of octets.

IFS=. read -ra octets  $ip



Request to a new feature on read

2015-04-16 Thread Valentin Bajrami
Hi,

According to ''help read'' we can specify  -N[chars]  to trigger return
automatically.  Is it possible to approach read differently?

For example:  $re is some regular expression

read -N$re -p Enter two or three digits to continue  getInput

The above is much of a pseudo-code but I hope you get the idea.

 -N in this case should be able to handle a range of 2 or 3 chars. If the
regex is satisfied then return should be triggered after 2 chars otherwise
wait for the third char.

Thanks in advance!

-- 
Met vriendelijke groet / Kind regards,

Valentin


Re: Request to a new feature on read

2015-04-16 Thread Dan Douglas
On Thu, Apr 16, 2015 at 8:55 AM, Eduardo A. Bustamante López
dual...@gmail.com wrote:
 Any reason to justify this instead of using a simple loop?

I find myself in need of something along the lines of Python's
`re.split` and `re.findall` all the time. E.g. splitting an ip into an
array of octets.

On Thu, Apr 16, 2015 at 5:49 AM, Valentin Bajrami
valentin.bajr...@gmail.com wrote:
 Hi,

 According to ''help read'' we can specify  -N[chars]  to trigger return
 automatically.  Is it possible to approach read differently?

 For example:  $re is some regular expression

FWIW, ksh has two redirect operators that can be used together with
`read` to get something like this. They're somewhat difficult to use
IMO:

#pattern Seeks forward to the beginning of the next line
containing pattern.

##patternThe same as # except that the portion of the
file that is skipped is copied to standard output.

-- 
Dan Douglas



Re: [Help-bash] make function local

2015-04-16 Thread Chet Ramey
On 4/16/15 11:43 AM, Dan Douglas wrote:
 I thought Bash always first splits the identifier from the subscript,
 then checks which attributes the variable has set. If it has the
 associative array attribute plus a subscript then the subscript is
 only processed for expansions and the resulting string is used as the
 key. If the associative array attribute is not set then the subscript
 is processed for expansions and the resulting string is passed on to
 arithmetic evaluation.
 
 Am I following the discussion correctly? i.e. if you have
 `a[b[text]]`, the treatment of `text` is entirely determined by b's
 attributes.

Yes, that's correct.  In the case I'm talking about, we're only concerned
with indexed arrays and the consequent arithmetic evaluation.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: [Help-bash] make function local

2015-04-16 Thread Linda Walsh



Pierre Gaston wrote:

  Is there a particular problem you're trying to solve for which local
  functions would be the appropriate solution?


Cleanliness.

Not polluting the global namespace.

Ensuring the function can't be called from outside a function.

It's a trite example, but I do something like:


sub gvim () {
 array orig_args=($@)  gv_files=() gv_ops=()
 int use_tab=0  look_for_ops=1

   sub _exec_gvim() {
 array args
 ((use_tab))  args=(-p)
 (( ${#gv_ops[@]:-0} ))  args+=(${gv_ops[@]})
 (( $# ))  args+=($@)
 command gvim ${args[@]}
 unset -f _exec_gvim
   }
   


AFAIK, _exec_gvim, can only be called from within function gvim, no?