Re: [gentoo-dev] RFC: sed script redundancy

2013-06-09 Thread Jeroen Roovers
On Fri, 20 May 2011 17:39:22 +0200
Jeroen Roovers j...@gentoo.org wrote:

 for a while now I've been wondering if all those sed scripts in all
 those ebuilds are really effective.

This took rather long to find some spare time for.

Plonk the attached bash script into /etc/portage/bashrc.d (which you
source in /etc/portage/bashrc or something) and you should probably be
on your merry way.

The script probably doesn't catch all cases, but sed use in most
ebuilds is pretty standard.


 jer#!/bin/bash
SED=$(type -P sed)

sed() {
case ${FUNCNAME[1]} in
# we only care about direct calls in some ebuild phases
src_prepare|src_configure|src_compile|src_install)
local arg args file files scripts
while [[ $# -gt 0 ]]; do
arg=${1}
if [ -f ${arg} ]; then
files+= ${arg}
else
case ${arg} in
-e) : ;;
--expression=*) scripts+=( 
${arg/--expression=} ) ;;
-i*|--in-place=*) : ;;
-f)
local 
scriptfile=${arg}
shift
scriptfile+= ${arg}
scripts+=( 
${scriptfile} )
;;
--file=*) scripts+=( 
${arg/--file=} ) ;;
-*) args+= ${arg} ;;
*) scripts+=( ${arg} ) ;;
esac
fi
shift
done

for file in ${files}; do
for idx in ${!scripts[@]}; do
${SED} ${args} -e 
${scripts[@]:${idx}:1} ${file}  ${file}.jer-qa-sed
if cmp -s ${file}{,.jer-qa-sed}; then
echo -e \033[01;33m!!! JeR-QA: 
sed expression ${scripts[@]:${idx}:1} did not change ${file}\033[00;00m \
| tee -a ${T}/sed.log  
/dev/stderr
else
mv ${file}{.jer-qa-sed,}
fi
done
done
;;
*) ${SED} ${@}
;;
esac
}


Re: [gentoo-dev] RFC: sed script redundancy

2011-05-29 Thread Christopher Schwan
Thank you for that script. I experimented a bit with it and have a number of 
corrections and suggestions:

- alias does not work because my_sed is not declared at this stage. I removed 
the whole alias line because I want to selectively enable my_sed
- oargs must be an array in order to make quoting work:

   local oargs=( ${@} )

- In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
- is it correct to treat -e and -f alike ? I am not sure about that, because 
the latter expects a file
- If no -e is given, the first non-option argument is treated as the sed-
script-expression, therefore I added hade=yes in the if-branch

The new function now reads:

my_sed() {
local oargs=( $@ )
local arg
local nargs=()
local hadi=
local hade=

while [[ -n $1 ]] ; do
case $1 in
-i|--in-place)
# ignore this flag
hadi=yes
;;
-e|--expression)
shift
nargs+=( -e $1 )
hade=yes
;;
-f|--file)
shift
nargs+=( -f $1 )
hade=yes
;;
-*)
nargs+=( $1 )
;;
*)
if [[ -z ${hade} ]] ; then
nargs+=( -e $1 )
hade=yes
elif [[ -z ${hadi} ]] ; then
# there is no inline replacing, not much we can 
do
break
else
sed ${nargs[@]} $1 | diff -q $1 -  
/dev/null  \
ewarn sed ${nargs[@]} has no effect on 
$1 
fi
;;
esac
shift
done

sed ${oargs[@]}
}

As you can see, I added support for long-options. However, testing the 
individual sed commands remains to be done. This could be especially difficult 
if input is taken from stdin (e.g. in cat foo | sed s:a:b:g).

I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed 
commands and I was able to spot one useless sed.



[1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
mathematics/sage/sage-4.7.ebuild


On Sunday 22 May 2011 12:50:43 Fabian Groffen wrote:
 On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote:
  On Fri, 20 May 2011 17:56:00 +0200
  
  Fabian Groffen grob...@gentoo.org wrote:
 sed -e pattern ${file} | diff ${file} -
   
   followed by the actual sed -i -e ...
   
   This way I didn't need to write an intermediate file.
  
  The problem there is that sed might be called just once on any one file,
  but in the tree it is often invoked with multiple scripts, so this
  simple implementation lacks a way to evaluate which sed scripts are
  useful.
  
  Also, how do I ensure the sed replacement works only on invocations
  inside the ebuild, and not, say, in portage's internals?
 
 (not tested, but as proof of concept)
 
 alias sed my_sed
 my_sed() {
   local oargs=${@}
   local arg
   local nargs=()
   local hadi=
   local hade=
   while [[ -n $1 ]] ; do
   case $1 in
   -i)
   # ignore this flag
   hadi=yes
   ;;
   -e|-f)
   shift
   nargs+=( -e$1 )
   hade=yes
   ;;
   -*)
   nargs+=( $1 )
   hade=yes
   ;;
   *)
   if [[ -z ${hade} ]] ; then
   nargs+=( $1 )
   elif [[ -z ${hadi} ]] ; then
   # there is no inline replacing, not 
 much we can do
   break
   else
   sed ${nargs[@]} $1 | diff -q $1 - 
  /dev/null \
ewarn sed ${oargs} has no 
 effect on $1
   fi
   ;;
   esac
   shift
   done
 
   \sed ${oargs}
 }



Re: [gentoo-dev] RFC: sed script redundancy

2011-05-29 Thread Fabian Groffen
On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote:
 Thank you for that script. I experimented a bit with it and have a number of 
 corrections and suggestions:
 
 - alias does not work because my_sed is not declared at this stage. I removed 
 the whole alias line because I want to selectively enable my_sed
 - oargs must be an array in order to make quoting work:
 
local oargs=( ${@} )
 
 - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
 - is it correct to treat -e and -f alike ? I am not sure about that, because 
 the latter expects a file

Yes, because (also in your function) you always shift, and assume the
next argument is there.  Hence, you have two identical cases in your
script now.  I only distinguised between 1) being able to do something
(-i) and 2) having a pattern to work with (-e/-f or first non-option
argument as string pattern).

 - If no -e is given, the first non-option argument is treated as the sed-
 script-expression, therefore I added hade=yes in the if-branch

That one was missing indeed.  I just quickly wrote the proof of concept
:)

 The new function now reads:
 
[snip improved function]
 
 As you can see, I added support for long-options. However, testing the 
 individual sed commands remains to be done. This could be especially 
 difficult 
 if input is taken from stdin (e.g. in cat foo | sed s:a:b:g).

You might be able to detect input is a pipe, and temporarily
write the input to some file, then perform the sed without the -i
requirement and remove the temp file after the real sed.

 I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed 
 commands and I was able to spot one useless sed.

Cool, nice to see you've made it into something useful!

 [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
 mathematics/sage/sage-4.7.ebuild


-- 
Fabian Groffen
Gentoo on a different level



Re: [gentoo-dev] RFC: sed script redundancy

2011-05-29 Thread Christopher Schwan
On Sunday 29 May 2011 13:00:32 Fabian Groffen wrote:
 On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote:
  Thank you for that script. I experimented a bit with it and have a number
  of corrections and suggestions:
  
  - alias does not work because my_sed is not declared at this stage. I
  removed the whole alias line because I want to selectively enable my_sed
  
  - oargs must be an array in order to make quoting work:
 local oargs=( ${@} )
  
  - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?)
  - is it correct to treat -e and -f alike ? I am not sure about that,
  because the latter expects a file
 
 Yes, because (also in your function) you always shift, and assume the
 next argument is there.  Hence, you have two identical cases in your
 script now.  I only distinguised between 1) being able to do something
 (-i) and 2) having a pattern to work with (-e/-f or first non-option
 argument as string pattern).

Ok sorry - I did not express myself clearly. Your script is replacing -f 
with -e and I wasnt sure if this is correct. Buf of course, they can be 
treated alike:

-e|--expression|-f|--file)
arg=$1
shift
nargs+=( ${arg} $1 )
hade=yes
;;

 
  - If no -e is given, the first non-option argument is treated as the
  sed- script-expression, therefore I added hade=yes in the if-branch
 
 That one was missing indeed.  I just quickly wrote the proof of concept
 
 :)
 :
  The new function now reads:
 [snip improved function]
 
  As you can see, I added support for long-options. However, testing the
  individual sed commands remains to be done. This could be especially
  difficult if input is taken from stdin (e.g. in cat foo | sed
  s:a:b:g).
 
 You might be able to detect input is a pipe, and temporarily
 write the input to some file, then perform the sed without the -i
 requirement and remove the temp file after the real sed.

Good idea, thanks!

 
  I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed
  commands and I was able to spot one useless sed.
 
 Cool, nice to see you've made it into something useful!
 
  [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci-
  mathematics/sage/sage-4.7.ebuild



Re: [gentoo-dev] RFC: sed script redundancy

2011-05-22 Thread Fabian Groffen
On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote:
 On Fri, 20 May 2011 17:56:00 +0200
 Fabian Groffen grob...@gentoo.org wrote:
 
sed -e pattern ${file} | diff ${file} -
  
  followed by the actual sed -i -e ...
  
  This way I didn't need to write an intermediate file.
 
 The problem there is that sed might be called just once on any one file,
 but in the tree it is often invoked with multiple scripts, so this
 simple implementation lacks a way to evaluate which sed scripts are
 useful.
 
 Also, how do I ensure the sed replacement works only on invocations
 inside the ebuild, and not, say, in portage's internals?

(not tested, but as proof of concept)

alias sed my_sed
my_sed() {
local oargs=${@}
local arg
local nargs=()
local hadi=
local hade=
while [[ -n $1 ]] ; do
case $1 in
-i)
# ignore this flag
hadi=yes
;;
-e|-f)
shift
nargs+=( -e$1 )
hade=yes
;;
-*)
nargs+=( $1 )
hade=yes
;;
*)
if [[ -z ${hade} ]] ; then
nargs+=( $1 )
elif [[ -z ${hadi} ]] ; then
# there is no inline replacing, not 
much we can do
break
else
sed ${nargs[@]} $1 | diff -q $1 - 
 /dev/null \
 ewarn sed ${oargs} has no 
effect on $1 
fi
;;
esac
shift
done

\sed ${oargs}
}


-- 
Fabian Groffen
Gentoo on a different level



Re: [gentoo-dev] RFC: sed script redundancy

2011-05-21 Thread Jeroen Roovers
On Fri, 20 May 2011 17:56:00 +0200
Fabian Groffen grob...@gentoo.org wrote:

   sed -e pattern ${file} | diff ${file} -
 
 followed by the actual sed -i -e ...
 
 This way I didn't need to write an intermediate file.

The problem there is that sed might be called just once on any one file,
but in the tree it is often invoked with multiple scripts, so this
simple implementation lacks a way to evaluate which sed scripts are
useful.

Also, how do I ensure the sed replacement works only on invocations
inside the ebuild, and not, say, in portage's internals?


 jer



[gentoo-dev] RFC: sed script redundancy

2011-05-20 Thread Jeroen Roovers
  Hullo developers,


for a while now I've been wondering if all those sed scripts in all
those ebuilds are really effective.

To find out, I've tried a couple of angles on a sed hook that basically
dissects the sed command line provided, divides everything up into sed
scripts, files being processed and other options, and runs everything
through diff to get some meaningful QA output as to the effective use
of the sed scripts invoked.

Of course some of the time a sed script falsely seems to be ineffective,
but could be, when it uses some variable or output that varies depending
on the platform you run it on, like with the likes of $(get_libdir).

I've looked into sed's internal solutions to no avail, but something
like -i[SUFFIX] might help, since it gives you a backup file to compare
with the file that's being streamed.

The idea is to pass the result to
  | diff -u $file $file[SUFFIX]
to figure out what was changed, and what sed script changed it.

Any help?


 jer


PS: Because the outcome may depend on the platform you run the scripts
on, this probably shouldn't make it into a QA test in portage, but it
could still help developers evaluate how effective their ebuilds' and
eclasses' sed scripts are.



Re: [gentoo-dev] RFC: sed script redundancy

2011-05-20 Thread Fabian Groffen
On 20-05-2011 17:39:22 +0200, Jeroen Roovers wrote:
 I've looked into sed's internal solutions to no avail, but something
 like -i[SUFFIX] might help, since it gives you a backup file to compare
 with the file that's being streamed.
 
 The idea is to pass the result to
   | diff -u $file $file[SUFFIX]
 to figure out what was changed, and what sed script changed it.

I like your idea a lot.

I had to do a similar thing once in eapify, and back then I came up with
the following hack:

  sed -e pattern ${file} | diff ${file} -

followed by the actual sed -i -e ...

This way I didn't need to write an intermediate file.


-- 
Fabian Groffen
Gentoo on a different level