Re: [gentoo-dev] RFC: sed script redundancy
On Fri, 20 May 2011 17:39:22 +0200 Jeroen Roovers j...@gentoo.org wrote: for a while now I've been wondering if all those sed scripts in all those ebuilds are really effective. This took rather long to find some spare time for. Plonk the attached bash script into /etc/portage/bashrc.d (which you source in /etc/portage/bashrc or something) and you should probably be on your merry way. The script probably doesn't catch all cases, but sed use in most ebuilds is pretty standard. jer#!/bin/bash SED=$(type -P sed) sed() { case ${FUNCNAME[1]} in # we only care about direct calls in some ebuild phases src_prepare|src_configure|src_compile|src_install) local arg args file files scripts while [[ $# -gt 0 ]]; do arg=${1} if [ -f ${arg} ]; then files+= ${arg} else case ${arg} in -e) : ;; --expression=*) scripts+=( ${arg/--expression=} ) ;; -i*|--in-place=*) : ;; -f) local scriptfile=${arg} shift scriptfile+= ${arg} scripts+=( ${scriptfile} ) ;; --file=*) scripts+=( ${arg/--file=} ) ;; -*) args+= ${arg} ;; *) scripts+=( ${arg} ) ;; esac fi shift done for file in ${files}; do for idx in ${!scripts[@]}; do ${SED} ${args} -e ${scripts[@]:${idx}:1} ${file} ${file}.jer-qa-sed if cmp -s ${file}{,.jer-qa-sed}; then echo -e \033[01;33m!!! JeR-QA: sed expression ${scripts[@]:${idx}:1} did not change ${file}\033[00;00m \ | tee -a ${T}/sed.log /dev/stderr else mv ${file}{.jer-qa-sed,} fi done done ;; *) ${SED} ${@} ;; esac }
Re: [gentoo-dev] RFC: sed script redundancy
Thank you for that script. I experimented a bit with it and have a number of corrections and suggestions: - alias does not work because my_sed is not declared at this stage. I removed the whole alias line because I want to selectively enable my_sed - oargs must be an array in order to make quoting work: local oargs=( ${@} ) - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?) - is it correct to treat -e and -f alike ? I am not sure about that, because the latter expects a file - If no -e is given, the first non-option argument is treated as the sed- script-expression, therefore I added hade=yes in the if-branch The new function now reads: my_sed() { local oargs=( $@ ) local arg local nargs=() local hadi= local hade= while [[ -n $1 ]] ; do case $1 in -i|--in-place) # ignore this flag hadi=yes ;; -e|--expression) shift nargs+=( -e $1 ) hade=yes ;; -f|--file) shift nargs+=( -f $1 ) hade=yes ;; -*) nargs+=( $1 ) ;; *) if [[ -z ${hade} ]] ; then nargs+=( -e $1 ) hade=yes elif [[ -z ${hadi} ]] ; then # there is no inline replacing, not much we can do break else sed ${nargs[@]} $1 | diff -q $1 - /dev/null \ ewarn sed ${nargs[@]} has no effect on $1 fi ;; esac shift done sed ${oargs[@]} } As you can see, I added support for long-options. However, testing the individual sed commands remains to be done. This could be especially difficult if input is taken from stdin (e.g. in cat foo | sed s:a:b:g). I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed commands and I was able to spot one useless sed. [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci- mathematics/sage/sage-4.7.ebuild On Sunday 22 May 2011 12:50:43 Fabian Groffen wrote: On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote: On Fri, 20 May 2011 17:56:00 +0200 Fabian Groffen grob...@gentoo.org wrote: sed -e pattern ${file} | diff ${file} - followed by the actual sed -i -e ... This way I didn't need to write an intermediate file. The problem there is that sed might be called just once on any one file, but in the tree it is often invoked with multiple scripts, so this simple implementation lacks a way to evaluate which sed scripts are useful. Also, how do I ensure the sed replacement works only on invocations inside the ebuild, and not, say, in portage's internals? (not tested, but as proof of concept) alias sed my_sed my_sed() { local oargs=${@} local arg local nargs=() local hadi= local hade= while [[ -n $1 ]] ; do case $1 in -i) # ignore this flag hadi=yes ;; -e|-f) shift nargs+=( -e$1 ) hade=yes ;; -*) nargs+=( $1 ) hade=yes ;; *) if [[ -z ${hade} ]] ; then nargs+=( $1 ) elif [[ -z ${hadi} ]] ; then # there is no inline replacing, not much we can do break else sed ${nargs[@]} $1 | diff -q $1 - /dev/null \ ewarn sed ${oargs} has no effect on $1 fi ;; esac shift done \sed ${oargs} }
Re: [gentoo-dev] RFC: sed script redundancy
On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote: Thank you for that script. I experimented a bit with it and have a number of corrections and suggestions: - alias does not work because my_sed is not declared at this stage. I removed the whole alias line because I want to selectively enable my_sed - oargs must be an array in order to make quoting work: local oargs=( ${@} ) - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?) - is it correct to treat -e and -f alike ? I am not sure about that, because the latter expects a file Yes, because (also in your function) you always shift, and assume the next argument is there. Hence, you have two identical cases in your script now. I only distinguised between 1) being able to do something (-i) and 2) having a pattern to work with (-e/-f or first non-option argument as string pattern). - If no -e is given, the first non-option argument is treated as the sed- script-expression, therefore I added hade=yes in the if-branch That one was missing indeed. I just quickly wrote the proof of concept :) The new function now reads: [snip improved function] As you can see, I added support for long-options. However, testing the individual sed commands remains to be done. This could be especially difficult if input is taken from stdin (e.g. in cat foo | sed s:a:b:g). You might be able to detect input is a pipe, and temporarily write the input to some file, then perform the sed without the -i requirement and remove the temp file after the real sed. I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed commands and I was able to spot one useless sed. Cool, nice to see you've made it into something useful! [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci- mathematics/sage/sage-4.7.ebuild -- Fabian Groffen Gentoo on a different level
Re: [gentoo-dev] RFC: sed script redundancy
On Sunday 29 May 2011 13:00:32 Fabian Groffen wrote: On 29-05-2011 12:44:46 +0200, Christopher Schwan wrote: Thank you for that script. I experimented a bit with it and have a number of corrections and suggestions: - alias does not work because my_sed is not declared at this stage. I removed the whole alias line because I want to selectively enable my_sed - oargs must be an array in order to make quoting work: local oargs=( ${@} ) - In the ewarn line ${oargs} should be changed to ${nargs[@]} (!?) - is it correct to treat -e and -f alike ? I am not sure about that, because the latter expects a file Yes, because (also in your function) you always shift, and assume the next argument is there. Hence, you have two identical cases in your script now. I only distinguised between 1) being able to do something (-i) and 2) having a pattern to work with (-e/-f or first non-option argument as string pattern). Ok sorry - I did not express myself clearly. Your script is replacing -f with -e and I wasnt sure if this is correct. Buf of course, they can be treated alike: -e|--expression|-f|--file) arg=$1 shift nargs+=( ${arg} $1 ) hade=yes ;; - If no -e is given, the first non-option argument is treated as the sed- script-expression, therefore I added hade=yes in the if-branch That one was missing indeed. I just quickly wrote the proof of concept :) : The new function now reads: [snip improved function] As you can see, I added support for long-options. However, testing the individual sed commands remains to be done. This could be especially difficult if input is taken from stdin (e.g. in cat foo | sed s:a:b:g). You might be able to detect input is a pipe, and temporarily write the input to some file, then perform the sed without the -i requirement and remove the temp file after the real sed. Good idea, thanks! I tested my_sed within our sage ebuild[1]. This ebuild contains 39 sed commands and I was able to spot one useless sed. Cool, nice to see you've made it into something useful! [1] https://github.com/cschwan/sage-on-gentoo/blob/master/sci- mathematics/sage/sage-4.7.ebuild
Re: [gentoo-dev] RFC: sed script redundancy
On 21-05-2011 19:34:34 +0200, Jeroen Roovers wrote: On Fri, 20 May 2011 17:56:00 +0200 Fabian Groffen grob...@gentoo.org wrote: sed -e pattern ${file} | diff ${file} - followed by the actual sed -i -e ... This way I didn't need to write an intermediate file. The problem there is that sed might be called just once on any one file, but in the tree it is often invoked with multiple scripts, so this simple implementation lacks a way to evaluate which sed scripts are useful. Also, how do I ensure the sed replacement works only on invocations inside the ebuild, and not, say, in portage's internals? (not tested, but as proof of concept) alias sed my_sed my_sed() { local oargs=${@} local arg local nargs=() local hadi= local hade= while [[ -n $1 ]] ; do case $1 in -i) # ignore this flag hadi=yes ;; -e|-f) shift nargs+=( -e$1 ) hade=yes ;; -*) nargs+=( $1 ) hade=yes ;; *) if [[ -z ${hade} ]] ; then nargs+=( $1 ) elif [[ -z ${hadi} ]] ; then # there is no inline replacing, not much we can do break else sed ${nargs[@]} $1 | diff -q $1 - /dev/null \ ewarn sed ${oargs} has no effect on $1 fi ;; esac shift done \sed ${oargs} } -- Fabian Groffen Gentoo on a different level
Re: [gentoo-dev] RFC: sed script redundancy
On Fri, 20 May 2011 17:56:00 +0200 Fabian Groffen grob...@gentoo.org wrote: sed -e pattern ${file} | diff ${file} - followed by the actual sed -i -e ... This way I didn't need to write an intermediate file. The problem there is that sed might be called just once on any one file, but in the tree it is often invoked with multiple scripts, so this simple implementation lacks a way to evaluate which sed scripts are useful. Also, how do I ensure the sed replacement works only on invocations inside the ebuild, and not, say, in portage's internals? jer
[gentoo-dev] RFC: sed script redundancy
Hullo developers, for a while now I've been wondering if all those sed scripts in all those ebuilds are really effective. To find out, I've tried a couple of angles on a sed hook that basically dissects the sed command line provided, divides everything up into sed scripts, files being processed and other options, and runs everything through diff to get some meaningful QA output as to the effective use of the sed scripts invoked. Of course some of the time a sed script falsely seems to be ineffective, but could be, when it uses some variable or output that varies depending on the platform you run it on, like with the likes of $(get_libdir). I've looked into sed's internal solutions to no avail, but something like -i[SUFFIX] might help, since it gives you a backup file to compare with the file that's being streamed. The idea is to pass the result to | diff -u $file $file[SUFFIX] to figure out what was changed, and what sed script changed it. Any help? jer PS: Because the outcome may depend on the platform you run the scripts on, this probably shouldn't make it into a QA test in portage, but it could still help developers evaluate how effective their ebuilds' and eclasses' sed scripts are.
Re: [gentoo-dev] RFC: sed script redundancy
On 20-05-2011 17:39:22 +0200, Jeroen Roovers wrote: I've looked into sed's internal solutions to no avail, but something like -i[SUFFIX] might help, since it gives you a backup file to compare with the file that's being streamed. The idea is to pass the result to | diff -u $file $file[SUFFIX] to figure out what was changed, and what sed script changed it. I like your idea a lot. I had to do a similar thing once in eapify, and back then I came up with the following hack: sed -e pattern ${file} | diff ${file} - followed by the actual sed -i -e ... This way I didn't need to write an intermediate file. -- Fabian Groffen Gentoo on a different level