-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

        I've been debugging the python parser a bit more with some local
files a user referred me to for a medical reference guide. There are 26522
straight .html files, and I'm not doing anything exotic with them, and all
of these reference local files. --maxdepth is specified to only include
those local files. This process has currently been running on an extremely
fast/large memory box for 3 days, and it's only reached...

        ---- 6924 collected, 17293 to do ----
        Processing file:/tmp/biam/biam/Spe105.html...
          Retrieved ok.
          Parsed ok.
        ---- 6925 collected, 17292 to do ----
        Processing file:/tmp/biam/biam/Spe2586.html...
          Retrieved ok.
          Parsed ok.

        ..so far. I just attached gdb to the process, and found that it is
sitting on the following:

        #0  0x080bbf10 in PyList_SetItem ()
        #1  0x080bdab7 in PyList_AsTuple ()
        #2  0x08078026 in PyEval_CallObjectWithKeywords ()

        At this point, it parses 1 page every 5-6 hours or so. In the
beginning, it ripped through the first few thousand pages no problem, but as
the hours wore on, it slowed down considerably.

        I just cancelled it, and will be installing oprofile and trace on
the box to see if I can get deeper into the reasons why this is happening.
The python process currently is using 995 megs of ram and 396 megs of swap,
but the processor load is well under 1%.

        Once the new Plucker website is complete and launched (a week or two
more, rough estimate), I'm diving full-time back into the perl parser.
There's some places in perl with string and text and associative array
handling which may be a bit more robust than the current (2.1) python stuff.
Note: This is not a dig on the Python parser or the work Bill and Holger
have done in writing it. I just can't debug these problems because the
language is very unfamiliar to me.

        Bill, can you point me to some places to peek/poke some code to try
to expose this a bit more?


d.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8+7kmkRQERnB1rkoRApgYAJ9oZbIq5rTXNZfIFBWfSy34K7jizQCcCKqb
hbmoYUnichBsGWuWmIilECA=
=goks
-----END PGP SIGNATURE-----

Reply via email to