Thanks for your help. We're parsing on a Windows XP box. The strange thing is that
there are no errors in the log file. The error only appears in POSE (also on a
physical Tungsten T) when I click on a link that points to any of the pages that were
plucked latest in the process (Unhandled Exception - Error Code = 514). I'd love to
send you the zipped files, but the content is proprietary/sensitive to my employer, so
I'll have to clear that first.
We may also try parsing on a Sun Solaris box so we can make sure the resources aren't
an issue. Our production environment will utilize that box anyway, most likely.
Thanks again,
Steve
>>> [EMAIL PROTECTED] 09/25/03 12:02PM >>>
> Our site is very large (the number of files parsed is 16,492), so my
> question is, is it possible that there is some limit in the number of
> pages plucked that would be causing this error?
I don't think there is an upper limit with the Python distiller. I
know myself, Mike and Bill Janssen have all plucked multiple-thousand page
sites in the past without any issues that I can recall. I was doing it to
see how large we could get the content before it actually breaks and to see
if there were any problems with sites that large. Someone else a few months
back reported a similar problem, and it ended up that his machine ran out of
resources to parse all of the pages he was spidering. I think the largest
number of individual pages I've personally plucked is somewhere in the 20k
range. It takes forever to spider them all, but it does actually work.
I doubt that there's an upper limit on the physical number of pages
you can pluck, and certainly not 16,000 of them. Maybe in the 200,000 page
range there might be problems with the amount of ram and CPU required to
manage all of those links (just a wild assertion), but I don't think your
website is reaching any upper limits.
Would it be possible to have those pages zipped up and sent to me
directly, so I can give it a try here on my machine(s)? It would be good to
validate it against two independant installs of the Python distiller
codebase.
Also, what platform are you parsing these on? Mac? Windows? Linux?
d.
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list