Dear Friends,

An interim post, of what happens under the hood here...

An operator to be added in the "parsing script" is this: the =~ operator.
The following script uses the =~ operator for evaluating text from the xml files
in raw mode when the <userinput> </userinput> are met as plain strings in a
given xml file. The way the script is written, it could even be possible to produce a valid XML file dumped from ALL the xml files of the LFS book if extended, and then feed it to the x2sh parser for regular processing, thus boosting the performance of the x2sh parser _several_ times.

Of note:

1. I have restructured the x2sh parser script and debugged it, slicing down any
eventual lockups to zero (hopefully). There was a problem with some characters
like '*' once met in an xml file, ending in a corrupt result (_ch5_ glibc for example..). This has been taken care of and it now works correctly.

2. I made a script based on the x2sh approach and used it to parse the entire
xml source for the LFS book. I checked the output thoroughly. Globally parsing
the XML sources, without optimization takes at about _30 - 40_ min under normal
working load for my boxes. That is parsing only, with the resulting bash array
exceeding 40000 entries. The resulting array dump-to-disk is a nearly 750 Kb file (big).

3. When using the script below to parse the entire chapter collection of the
book in xml, total duration is nearly 40 _sec_ and the result dump to disk
nearly 60kb, _UNPARSED_. With parsing, this "dump", size is even smaller, while
the total time for extracting, dereferencing and redumping to a pure script file
should be that of the time x2sh would take to parse a 50 - 70kb complex valid
and well formed XML (and probably even less, this is a grosso modo calculation
of course!). Also take note that the hardcoded <userinput> </userinput> part can become "uninformed" as well.

4. All of the above lead eventually to a structure that is capable of handling the totality of the books in a reasonable amount of time for a pure bash - based script, for by lowering array entries and processing time demand, lower significantly the effect of O(n) - like phenomena where those are met inevitably. Note that from 40 mins to 40 secs for parsing is more than a 10x fold decrease in execution time, it is a (60x40) / 40 = _60x!_

The current debugged "uninformed" version of x2sh i have is fused with a script just like the one below for the version to be released this weekend! Run the script below in the root of your LFS book sources and see what happens :)

Cheers,

George Makrydakis

PS: feedback / testing results are _always_ appreciated. Thank you for hosting my posts.

#--------------------------------CUT FROM HERE---------------------------------

#!/bin/bash

# script: XML pseudoparsing booster

declare -a filearray
declare -a filestore
declare -i linecounter
declare -i collectSTART=0
declare -i collectSTOP=0

declare -a chapterlist=(chapter01 \
                        chapter02 \
                        chapter03 \
                        chapter04 \
                        chapter05 \
                        chapter06 \
                        chapter07 \
                        chapter09);

for selectchapter in [EMAIL PROTECTED]
do
        cd $selectchapter
        echo $selectchapter
        for filenameinput in *.xml
        do
                echo "-------------------------------------------------"
                echo "x2sh:parsing file: "$filenameinput
                echo "-------------------------------------------------"      
        while read filearray[linecounter]
        do
                let "linecounter++"
        done < "$filenameinput"

# case scenarios:
# <userinput> ... </userinput> inline definition
# <userinput> spans in more lines
# </userinput> ends spanning
#

        for ((linecounter=0; linecounter < [EMAIL PROTECTED]; linecounter++));
        do
        if [[ "${filearray[linecounter]}" =~ '<userinput>' ]] && \
           [[ "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 1
        then
                        printf "%s\n" "${filearray[linecounter]}"
        elif [[ "${filearray[linecounter]}" =~ '<userinput>' ]] && \
             [[ ! "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 2
        then
                        printf "%s\n" "${filearray[linecounter]}"
                        let "collectSTART= linecounter + 1"
        elif [[ ! "${filearray[linecounter]}" =~ '<userinput>' ]] && \
             [[  "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 3
        then
        let "collectSTOP=linecounter"
        let "linecounter++"
        if [ $((collectSTOP - collectSTART)) -ge 1 ] ;
        then
        for ((addthis=$collectSTART; addthis < $collectSTOP; addthis++));
        do
                printf "%s\n" "${filearray[addthis]}"
        done
        fi
                        printf "%s\n" "${filearray[linecounter]}"
                        let "collectSTOP=0"
                        let "collectSTART=0"
        fi
        done
        filearray=();
        linecounter=0
        done
        cd ..
done


#--------------------------------ENDS HERE------------------------------------



--
http://linuxfromscratch.org/mailman/listinfo/alfs-discuss
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to