x2sh booster script - works only under bash 3.x editions! - interim announcement post

George Makrydakis Fri, 17 Feb 2006 14:47:58 -0800

Dear Friends,

An interim post, of what happens under the hood here...


An operator to be added in the "parsing script" is this: the =~ operator.
The following script uses the =~ operator for evaluating text from the xml files
in raw mode when the <userinput> </userinput> are met as plain strings in a

given xml file. The way the script is written, it could even be possible toproduce a valid XML file dumped from ALL the xml files of the LFS book ifextended, and then feed it to the x2sh parser for regular processing, thusboosting the performance of the x2sh parser _several_ times.


Of note:

1. I have restructured the x2sh parser script and debugged it, slicing down any
eventual lockups to zero (hopefully). There was a problem with some characters

like '*' once met in an xml file, ending in a corrupt result (_ch5_ glibc forexample..). This has been taken care of and it now works correctly.


2. I made a script based on the x2sh approach and used it to parse the entire
xml source for the LFS book. I checked the output thoroughly. Globally parsing
the XML sources, without optimization takes at about _30 - 40_ min under normal
working load for my boxes. That is parsing only, with the resulting bash array

exceeding 40000 entries. The resulting array dump-to-disk is a nearly 750 Kbfile (big).


3. When using the script below to parse the entire chapter collection of the
book in xml, total duration is nearly 40 _sec_ and the result dump to disk
nearly 60kb, _UNPARSED_. With parsing, this "dump", size is even smaller, while
the total time for extracting, dereferencing and redumping to a pure script file
should be that of the time x2sh would take to parse a 50 - 70kb complex valid
and well formed XML (and probably even less, this is a grosso modo calculation

of course!). Also take note that the hardcoded <userinput> </userinput> part canbecome "uninformed" as well.

4. All of the above lead eventually to a structure that is capable of handlingthe totality of the books in a reasonable amount of time for a pure bash - basedscript, for by lowering array entries and processing time demand, lowersignificantly the effect of O(n) - like phenomena where those are metinevitably. Note that from 40 mins to 40 secs for parsing is more than a 10xfold decrease in execution time, it is a (60x40) / 40 = _60x!_

The current debugged "uninformed" version of x2sh i have is fused with a scriptjust like the one below for the version to be released this weekend! Run thescript below in the root of your LFS book sources and see what happens :)


Cheers,

George Makrydakis

PS: feedback / testing results are _always_ appreciated. Thank you for hostingmy posts.


#--------------------------------CUT FROM HERE---------------------------------

#!/bin/bash

# script: XML pseudoparsing booster

declare -a filearray
declare -a filestore
declare -i linecounter
declare -i collectSTART=0
declare -i collectSTOP=0

declare -a chapterlist=(chapter01 \
                        chapter02 \
                        chapter03 \
                        chapter04 \
                        chapter05 \
                        chapter06 \
                        chapter07 \
                        chapter09);

for selectchapter in [EMAIL PROTECTED]
do
        cd $selectchapter
        echo $selectchapter
        for filenameinput in *.xml
        do
                echo "-------------------------------------------------"
                echo "x2sh:parsing file: "$filenameinput
                echo "-------------------------------------------------"      
        while read filearray[linecounter]
        do
                let "linecounter++"
        done < "$filenameinput"

# case scenarios:
# <userinput> ... </userinput> inline definition
# <userinput> spans in more lines
# </userinput> ends spanning
#

        for ((linecounter=0; linecounter < [EMAIL PROTECTED]; linecounter++));
        do
        if [[ "${filearray[linecounter]}" =~ '<userinput>' ]] && \
           [[ "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 1
        then
                        printf "%s\n" "${filearray[linecounter]}"
        elif [[ "${filearray[linecounter]}" =~ '<userinput>' ]] && \
             [[ ! "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 2
        then
                        printf "%s\n" "${filearray[linecounter]}"
                        let "collectSTART= linecounter + 1"
        elif [[ ! "${filearray[linecounter]}" =~ '<userinput>' ]] && \
             [[  "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 3
        then
        let "collectSTOP=linecounter"
        let "linecounter++"
        if [ $((collectSTOP - collectSTART)) -ge 1 ] ;
        then
        for ((addthis=$collectSTART; addthis < $collectSTOP; addthis++));
        do
                printf "%s\n" "${filearray[addthis]}"
        done
        fi
                        printf "%s\n" "${filearray[linecounter]}"
                        let "collectSTOP=0"
                        let "collectSTART=0"
        fi
        done
        filearray=();
        linecounter=0
        done
        cd ..
done


#--------------------------------ENDS HERE------------------------------------



--
http://linuxfromscratch.org/mailman/listinfo/alfs-discuss
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

x2sh booster script - works only under bash 3.x editions! - interim announcement post

Reply via email to