"Paul J. Lucas" <[EMAIL PROTECTED]> writes:

> On Fri, 28 Jul 2000, Gerald Richter wrote:
> 
> > As far as I understand you you use mmap to read in the source file, is this
> > correct?
> 
>       Yes.
> 
> > If this is true, then it will not make much difference, because reading in
> > the source is only a very small piece of all the time that it takes to
> > generate the output from a dynamic page.
> 
>       I suggest you do some benchmarks.  I have, albeit many months
>       ago.  If I recall correctly, I took Yahoo's home page and ran
>       it through my HTML Tree and that of Gisle Aas: HTML was about
>       7-8 times faster.

That does not show that mmap is superior.

I have not been able to build your module on my system, so I have not
been able to set up any benchmarks myself.  My guess is that you
compared your module parsing speed with that of the HTML::TreeBuilder
module?  Could you please be specific with what versions of the
modules you compared?  Perhaps also post the benchmark code you used.
Also tell me what HTML::Parser you had installed.  Did you use
HTML-Parser-3?

If your module is that much faster than the basic HTML::Parser then I
must be doing something very wrong.

> > My point was, that the C implementation of parsering and DOM tree
> > storage/caching, is much faster (uses much less memory) then doing the same
> > in Perl.
> 
>       ...and faster still with mmap(2).

I don't believe mmap buys you any significant.  And it has the
drawback that you can't parse from pipes or sockets.

I made this little test program.  I am not able to measure mmap to be
faster than fread on my system.  I am testing with Yahoo's home page.
Do you get different numbers?

Regards,
Gisle

----------------------------------------------------------->8------------
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdio.h>

void
with_mmap(char *file)
{
    struct stat sbuf;
    int fd;
    void* area;
    char* c;
    int size;
    unsigned int checksum;

    fd = open(file, O_RDONLY);
    fstat(fd, &sbuf);
    size = sbuf.st_size;
    area = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);

    /* read the mapped area */
    checksum = 0;
    c = area;
    while (size--) {
        checksum += *c++;
    }
    munmap(area, sbuf.st_size);

    printf("%s: sum=%x\n", file, checksum);
}

void
with_fread(char *file)
{
    FILE* f = fopen(file, "r");
    char buf[32*1024];
    unsigned int checksum;
    size_t n;

    checksum = 0;
    while ( (n = fread(buf, 1, sizeof(buf), f))) {
        char *c = buf;
        int n_orig = n;
        while (n--)
            checksum += *c++;
        if (n_orig < sizeof(buf))
            break;
    }
    fclose(f);

    printf("%s: sum=%x\n", file, checksum);
}



int
main(int argc, char** argv)
{
    int i;
    void (*f)(char*);

    if (argc <= 1) {
        fprintf(stderr, "Missing type\n");
        return -1;
    }

    if (strcmp(argv[1], "mmap") == 0)
        f = with_mmap;
    else if (strcmp(argv[1], "fread") == 0)
        f = with_fread;
    else {
        fprintf(stderr, "Bad type '%s'\n", argv[1]);
        return -1;
    }

    for (i = 2; i < argc; i++) {
        f(argv[i]);
    }
    return 0;
}

Reply via email to