"Paul J. Lucas" <[EMAIL PROTECTED]> writes:
> On Fri, 28 Jul 2000, Gerald Richter wrote:
>
> > As far as I understand you you use mmap to read in the source file, is this
> > correct?
>
> Yes.
>
> > If this is true, then it will not make much difference, because reading in
> > the source is only a very small piece of all the time that it takes to
> > generate the output from a dynamic page.
>
> I suggest you do some benchmarks. I have, albeit many months
> ago. If I recall correctly, I took Yahoo's home page and ran
> it through my HTML Tree and that of Gisle Aas: HTML was about
> 7-8 times faster.
That does not show that mmap is superior.
I have not been able to build your module on my system, so I have not
been able to set up any benchmarks myself. My guess is that you
compared your module parsing speed with that of the HTML::TreeBuilder
module? Could you please be specific with what versions of the
modules you compared? Perhaps also post the benchmark code you used.
Also tell me what HTML::Parser you had installed. Did you use
HTML-Parser-3?
If your module is that much faster than the basic HTML::Parser then I
must be doing something very wrong.
> > My point was, that the C implementation of parsering and DOM tree
> > storage/caching, is much faster (uses much less memory) then doing the same
> > in Perl.
>
> ...and faster still with mmap(2).
I don't believe mmap buys you any significant. And it has the
drawback that you can't parse from pipes or sockets.
I made this little test program. I am not able to measure mmap to be
faster than fread on my system. I am testing with Yahoo's home page.
Do you get different numbers?
Regards,
Gisle
----------------------------------------------------------->8------------
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdio.h>
void
with_mmap(char *file)
{
struct stat sbuf;
int fd;
void* area;
char* c;
int size;
unsigned int checksum;
fd = open(file, O_RDONLY);
fstat(fd, &sbuf);
size = sbuf.st_size;
area = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
/* read the mapped area */
checksum = 0;
c = area;
while (size--) {
checksum += *c++;
}
munmap(area, sbuf.st_size);
printf("%s: sum=%x\n", file, checksum);
}
void
with_fread(char *file)
{
FILE* f = fopen(file, "r");
char buf[32*1024];
unsigned int checksum;
size_t n;
checksum = 0;
while ( (n = fread(buf, 1, sizeof(buf), f))) {
char *c = buf;
int n_orig = n;
while (n--)
checksum += *c++;
if (n_orig < sizeof(buf))
break;
}
fclose(f);
printf("%s: sum=%x\n", file, checksum);
}
int
main(int argc, char** argv)
{
int i;
void (*f)(char*);
if (argc <= 1) {
fprintf(stderr, "Missing type\n");
return -1;
}
if (strcmp(argv[1], "mmap") == 0)
f = with_mmap;
else if (strcmp(argv[1], "fread") == 0)
f = with_fread;
else {
fprintf(stderr, "Bad type '%s'\n", argv[1]);
return -1;
}
for (i = 2; i < argc; i++) {
f(argv[i]);
}
return 0;
}