Hi Joel,

I didn't succeed to run your benchmark (not enough memory error occurred),
so I modified it a bit:

Rebol []

file: home/passwords.txt

shells: copy make block! 5
countshell: func [sh [string!] /local shr] [
    either none? shr: find shells sh [
        append shells reduce [sh 1]
    ][
        change shr: next shr 1 + shr/1
    ]
]

t1: now/time

p: open/read/lines/direct file

while [not error? try [line: first p]] [
    countshell any [pick parse/all line ":" 7  "(none)"]
]

foreach [sh n] sort/skip shells 2 [
    n: to-string n
    while [3 > length? n] [insert n " "]
    print [n sh]
]

prin "Time1: " print now/time - t1
close p

The results were:

>> include/mult %smallad.r
Script: "Untitled" (none)
196608 (none)
98304 /bin/bash
16384 /bin/false
16384 /bin/sync
16384 /sbin/halt
16384 /sbin/shutdown
Time1: 0:00:11

Here is my second attempt, which works with the original file:

REBOL []

include %ads.r

file: home/smallad.txt

shells: make-ads
countshell: func [sh [string!] /local shr] [
    associate shells sh 1 + associated/or shells sh [0]
]

p: open/read/lines/direct file
while [not error? try [line: first p]] [
    countshell any [pick parse/all line ":" 7  "(none)"]
]

shells2: sort to block! first shells

foreach sh shells2 [
    n: associated shells sh
    n: to-string n
    while [3 > length? n] [insert n " "]
    print [n sh]
]

close p

The results are:

 12 (none)
  6 /bin/bash
  1 /bin/false
  1 /bin/sync
  1 /sbin/halt
  1 /sbin/shutdown

For the small input file. Problem is, that for the huge file
(%passwords.txt) an error (a GC fault?) occurs. Would anybody have time to
test it?

Regards
    Ladislav

Here is %ads.r:

Hi Rebols,

here is my attempt to satisfy those who don't like objects for ADS
implementation,
those, who would like to have the fast searching capability,
those, who would like to store in Any-type Rebol values and
those who want to use only strings for keys.

Rebol [
    Title: "ADS"
    Name: 'Ads
    File: %Ads.r
    Author: "Ladislav Mecir"
    Email: [EMAIL PROTECTED]
    Date: 19/September/2000
]

make-ads: does [reduce [make hash! 0 make block! 0]]

associate: function [
    ads [block!]
    key [string!]
    value [any-type!]
] [index] [
    either index: find first ads key [
        index: index? index
        change at second ads index head insert/only copy [] get/any 'value
    ] [
        insert tail first ads key
        insert tail second ads get/any 'value
    ]
    ads
]

deassociate: function [
    ads [block!]
    key [string!]
] [index] [
    if index: find first ads key [
        index: index? index
        remove at first ads index
        remove at second ads index
    ]
    ads
]

associated: function [
    ads [block!]
    key [string!]
    /or
    or-block [block!]
] [index] [
    either index: find first ads key [
        return pick second ads index? index
    ] [
        if or [do or-block]
    ]
]

----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, September 18, 2000 6:16 AM
Subject: [REBOL] Small admin/report benchmark


> Here's a small benchmark based on a fairly typical kind
> of sysadmin, file processing, or data reduction task.
> The timing results are given after the description of
> the problem and the code I used for timing.
>
> I'm including a sample data file and output so that
> anyone who wants to improve on my code can test his/her
> solution with the same data.
>
>
> Problem:
>
> Read through the colon-delimited file below (a copy of
> an /etc/passwd file, mangled for security purposes),
> and print a report tallying the distinct values in the
> 7th field (in this case, the field that identifies the
> default shell for each userID).  Sort the results by
> the value of the field being tallied, and print the
> results in neat columns.
>
> ========== begin sample data file ==========
> ayxa:a:824:277:Zmgxoy "Uucl" Tmaam:/ibmg/rsrn:/bin/bash
> ciyp:x:8:72:jksi:/zfp/qnffy/drgn:
> grnfk:p:115:383:SNMKW hwxyzry:/evfk/tsmgt:/bin/bash
> guqkvtwn:o:2:2:blvbnjsg:/lbld:/sbin/shutdown
> kpsbwt:z:85:98:frwbqf:/zeu/bml/egtyin-mexi:
> kst:y:1:3:gsi:/opj:
> kvik:f:3:9:clnd:/bfje:/bin/sync
> lw:b:518:941:Nxpn "VAXVAzjzw" Muhxp:/esaq/jy:/bin/bash
> mmlmgxep:g:69:4:lnuytrat:/vvot:
> nnd:g:46:89:WQT Pcpu:/shrs/vzq:
> ospax:t:707:92:Lpojk bro Gokzfe:/qaqf/gaedx:/bin/bash
> pbubex:v:7:9:eworcq:/khuc:
> qpdd:l:39:19:qckl:/omg/clghd/szdr:
> rkuu:p:2:1:rusj:/ltnm:/sbin/halt
> sft:q:045:235:H Buya Gtdtre:/stj/W75/ot:/bin/false
> srimrg:c:19:41:Ybltzu:/:
> vgonz:n:86:051:oqufv:/hgo/awkxv:
> vqn:n:7:0:ant:/npy/hpm:
> wbci:s:2:5:pyiy:/xngl:/bin/bash
> wlzr:o:7:00:rqwe:/mxy/rkchz/lsfu:
> xao:c:21:50::/fgqu/orw:/bin/bash
> yf:d:1:6:ay:/xzb/njwes/kvd:
> =========== end sample data file ===========
>
> ========= begin sample output list =========
>  12 (none)
>   6 /bin/bash
>   1 /bin/false
>   1 /bin/sync
>   1 /sbin/halt
>   1 /sbin/shutdown
> ========== end sample output list ==========
>
>
> Perl solution:
>
> A fairly typical Perl script to perform this task is
> given below.  I don't claim any particular brilliance
> here, but it does use some fairly Perlish idioms.
>
> ============= begin Perl script =============
> #!/usr/bin/perl -w
>
> my ($line, $shell, %shells) = ("", "");
> open (PW, "<passwords.txt") or die "can't read file\n";
> while ($line = <PW>) {
> ($shell = (split /:/, $line, 7) [6]) =~ s/\s+$//;
> ++$shells{$shell or "(none)"};
> }
> close (PW);
>
> foreach $shell (sort keys %shells) {
> printf "%3d %s\n", $shells{$shell}, $shell;
> }
> ============== end Perl script ==============
>
>
> REBOL solution:
>
> My effort at producing a comparable REBOL script is
> given next.  I tried to use appropriate REBOLish
> idioms to accomplish equivalent results.
>
> ============ begin REBOL script ============
> #!/usr/local/bin/rebol -sq
>
> REBOL []
>
> shells: copy make block! 5
> countshell: func [sh [string!] /local shr] [
>     either none? shr: find shells sh [
>         append shells reduce [sh 1]
>     ][
>         change shr: next shr 1 + shr/1
>     ]
> ]
>
> foreach line read/lines %passwords.txt [
>     countshell any [pick parse/all line ":" 7  "(none)"]
> ]
>
> foreach [sh n] sort/skip shells 2 [
>     n: to-string n
>     while [3 > length? n] [insert n " "]
>     print [n sh]
> ]
> ============= end REBOL script =============
>
>
> Remarks:
>
> Note in particular the need in REBOL for:
>
> 1)  A function (or other hand-written code) to handle
>     the separate cases of updating a counter for a
> key value already present versus initializing a counter
> for the first time a key is encountered.
>
> 2)  The slightly awkward phrase that updates the
>     counter (the "change shr: ..." line).  Can anyone
> suggest a tidier way to do this?  Bear in mind that
> the keys are strings coming from a data file, so we
> don't know in advance what values may occur.
>
> 3)  The explicit code to pad the numeric value of the
>     counter with leading spaces (the "while..." inside
> the last "foreach ...").  Again, any suggestions for
> improvement are welcome.
>
>
> Benchmark results:
>
> Both scripts were run from the command line using the
> "time" command to accumulate statistics.  In order to
> scale the run times up to expose significant differences,
> I concatenated 16k copies of the above sample data file
> into a single data file of 360,448 lines (13,287,424
> bytes).  The output from the benchmark runs (with lines
> slightly rewrapped to fit in email) follows below:
>
> =========== begin benchmark output ===========
> $ time ./nsh.pl
> 196608 (none)
> 98304 /bin/bash
> 16384 /bin/false
> 16384 /bin/sync
> 16384 /sbin/halt
> 16384 /sbin/shutdown
> 34.98user 0.18system 0:35.22elapsed 99%CPU
> (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (259major+48minor)pagefaults 0swaps
> $ time ./nsh.r
> include done
> 196608 (none)
> 98304 /bin/bash
> 16384 /bin/false
> 16384 /bin/sync
> 16384 /sbin/halt
> 16384 /sbin/shutdown
> 70.28user 3.85system 1:17.72elapsed 95%CPU
> (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (4911major+13977minor)pagefaults 3045swaps
> ============ end benchmark output ============
>
>
> Interpretation:
>
> 1)  The Perl version was approximately twice as fast
>     as the REBOL version -- not too bad, considering
> Perl's reputation for a mature, reasonably well-
> optimized interpreter.  However...
>
> 2)  Note the CONSIDERABLY larger number of page faults.
>     This led me to wonder how much of the total run
> time for REBOL was due to the fact that the entire file
> was slurped into memory at once, instead of being dealt
> with a line at a time.  OTOH, isn't that how most of us
> would code up a small QAD task similar to this?
>
> (If anyone wants to code up a buffered version, I'll be
> glad to rerun the timings for all three versions.)
>
> I reran the test with  top  going in another terminal
> window, and saw that the Perl version ran in about 1/20th
> the memory of the REBOL version.  Both nearly saturated
> the CPU, with Perl slightly higher, in both the original
> benchmark run given above and the second run (times not
> reported due to the degradation imposed by running  top
> concurrently with the processing).
>
> Comments welcome.
>
> -jn-
>
>

Reply via email to