On Wednesday 29 January 2003 6:47 pm, Michael Schmitt wrote:
> Hi Angus,
>
> > Rewrite postats.sh to:
> > 1. Not need the existence of a Makefile in order to run;
> > 2. Not result in php warnings about uninitialised variables if any of the
> > translated, fuzzy or untranslated strings are missing.
>
> Why did have to rewrite the script completely for the two reasons
> above??? We could have replaced the makefile invocation by a one-line
> patch. And there are already checks for uninitialized variables in the
> code. If I missed a place, a local change of a few lines would have been
> sufficient.
>
> The new code is _much_ longer. I prefer short and simple solutions as
> they can be maintained more easily. IMHO functions like "dump_tail" only
> introduce overhead.
Oh, come on; dumphead and dumptail are an irrelevance to the process of
extracting the info. Much better to shove them out of the way.
Let's move on to the actual process of extracting info from the msgfmt run. It
is more interesting and was quite challenging. You put a lot of effort into
it and I didn't mean to stand on your toes. If you feel hurt, I apologise
profusely.
That said, perhaps if I explain why I made my changes, my motivation will be
clearer.
If you've got the code that I submitted this afternoon, then you'll see that
the main code is all in one single function that fills a single variable
$output. Given that run_msgfmt fills $output, what is confusing about this
code?
while [ $# -ne 0 ]
do
run_msgfmt $1
shift
if [ $# -eq 0 ]; then
echo "${output});"
echo '?>'
else
echo "${output},"
echo
fi
done
It also has the pretty-printing advantage that the ',' now appears on the same
line as the array element.
Moving into run_msgfmt itself and the extraction of strings from the output of
msgfmt.
Point 1. The .* RE operator is EXTREMELY greedy. Try and avoid it if at all
possible.
Point 2. Sometimes 'cut' is a better tool than 'sed'. I repectfully submit
that this:
input=`grep "Last-Translator" $pofile` && {
# Remove 'Last-Translator: ' from the front of the string
input=`echo $input | sed 's/ */ /g' | cut -d ' ' -f 2-`
# The string now consists of "Jo Bloggs <jo@home>..."
# Use the < and > to extract the two parts.
translator=`echo $input | cut -d '<' -f 1 | sed 's/ *$//'`
email=`echo $input | cut -d '<' -f 2 | cut -d '>' -f 1`
}
is cheaper and more robust than this:
grep "Last-Translator" $x |
sed -e 's/"Last-Translator: \(.*\)\( *\)<\(.*\)>\\n"/"translator" =>
"\1",
"email" => "\3", /'
(Because .* is extremely greedy and multiple instance /can/ do weird things.
I'm not saying that they did here, but you get my point.)
Point 3. This is very neat and compact:
make 2>&1 $y.gmo | grep "^[1-9]" |
sed -e 's/\([0-9]*\) translated m[a-z]*[.,]/"msg_tr" => \1,/' |
sed -e 's/\([0-9]*\) fuzzy t[a-z]*[.,]/"msg_fu" => \1,/' |
sed -e 's/\([0-9]*\) untranslated m[a-z]*./"msg_nt" => \1,/'
But it fails totally if there are no untranslated or fuzzy strings. (msgfmt
simply doesn't output anything about them.)
Remember, last night I generated a php3 page and found it chock full of
warnings. I had no idea why, so had to go hunting. I tried to modify the code
above and failed. In fact, I think it's impossible to get right like this
because it presupposes the existence of a string "xxx fuzzy messages".
That said, it is quite simple to have an integer variable initialised to zero
and to fill it with xxx if the string exists. That's all I did. Granted, I
used a single function for all three extractions because I, personally,
prefer functions ;-)
# $1 is a string like
# '588 translated messages, 1248 fuzzy translations, 2 untranslated messages.'
# Any one of these substrings may not appear if the associated number is 0.
#
# $2 is the word following the number to be extracted,
# ie, 'translated', 'fuzzy', or 'untranslated'.
#
# extract_number fills var $number with this number, or sets it to zero if the
# word is not found in the string.
extract_number () {
test $# -eq 2 || error 'extract_number expects 2 args'
number=0
echo $1 | grep $2 >/dev/null || return
# It /is/ safe to use 'Z' as a delimiter here.
number=`echo $1 | sed "s/\([0-9]*\)[ ]*$2/Z\1Z/" | cut -d 'Z' -f 2`
}
Best regards,
Angus
ps I'm willing to admit that 'unset' is a little excessive ;-)
A