[issue9958] (c)elementTree missing children
New submission from Valentin Kuznetsov : Hi, I found that parsing XML file with identical structure leads to missing children item at some point. In my test case which I attach it happens at id=183. Basically I have XML with bunch of elements of the following structure: when I parse them recursively, all, except the one with id=183, are parsed identically. The one with id=183 does not contain children. I tried the code with python 2.6 and python 2.7 on Mac and Linux. Bug exists in both version of ElementTree and cElementTree. Code and testbed XML are attached. Please untar and run as python test_bug.py. Thanks, Valentin -- files: em_bug.tar messages: 117440 nosy: vkuznet priority: normal severity: normal status: open title: (c)elementTree missing children type: behavior versions: Python 2.6, Python 2.7 Added file: http://bugs.python.org/file19028/em_bug.tar ___ Python tracker <http://bugs.python.org/issue9958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7451] improve json decoding performance
Valentin Kuznetsov added the comment: I wonder if you can make a patch for 2.6 python branch. -- ___ Python tracker <http://bugs.python.org/issue7451> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6594] json C serializer performance tied to structure depth on some systems
Valentin Kuznetsov added the comment: I made data local, but adding del shows the same behavior. This is the test def test(): source = open('mangled.json', 'r') data = json.load(source) source.close() del data test() time.sleep(20) -- ___ Python tracker <http://bugs.python.org/issue6594> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6594] json C serializer performance tied to structure depth on some systems
Valentin Kuznetsov added the comment: Nope, all three json's implementation do not release the memory. I used your patched one, the one shipped with 2.6 and cjson. The one which comes with 2.6, reach 2GB, then release 200MB and stays with 1.8GB during sleep. The cjson reaches 1.5GB mark and stays there. But all three release another 100-200MB just before the exit (one top cycle before process disappear). I used sleep of 20 seconds, so I'm pretty sure memory was not released during that time, since I watched the process with idle CPU. -- ___ Python tracker <http://bugs.python.org/issue6594> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6594] json C serializer performance tied to structure depth on some systems
Valentin Kuznetsov added the comment: Antoine, indeed, both patches improved time and memory foot print. The latest patch shows only 1.1GB RAM usage and is very fast. What's worry me though, that memory is not released back to the system. Is this is the case? I just added time.sleep after json.load and saw that once decoding is done, the resident size still remain the same. -- ___ Python tracker <http://bugs.python.org/issue6594> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6594] json C serializer performance tied to structure depth on some systems
Valentin Kuznetsov added the comment: Oops, that's explain why I saw such small memory usage with cjson. I constructed tests on a fly. Regarding the data structure. Unfortunately it's out of my hands. The data comes from data-service. So, I can't do much and can only report to developers. I'll try your patch tomorrow. Obviously it's a huge gain, both in memory footprint and CPU usage. Thanks. Valentin. -- ___ Python tracker <http://bugs.python.org/issue6594> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6594] json C serializer performance tied to structure depth on some systems
Valentin Kuznetsov added the comment: Hi, I'm sorry for delay, I was busy. Here is a test data file: http://www.lns.cornell.edu/~vk/files/mangled.json Its size is 150 MB, 50MB less of original, due to scrambled values I was forced to do. The tests with stock json module in python 2.6.2 is 2GB source = open('mangled.json', 'r') data = json.load(source) Using simplejson 2.0.9 from PyPi I saw the same performance, please note _speedups.so C module was compiled. Using cjson module, I observed 180MB of RAM utilization source = open('mangled.json', 'r') data = cjson.encode(source.read()) cjson is about 10 times faster! I re-factor code which deals with XML version of the same data and I was able to process it using cElementTree only using 20MB (!) of RAM. -- ___ Python tracker <http://bugs.python.org/issue6594> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6594] json C serializer performance tied to structure depth on some systems
Valentin Kuznetsov added the comment: Hi, I just found this bug and would like to add my experience with performance of large JSON docs. I have a few JSON docs about 180MB in size which I read from data-services. I use python2.6, run on Linux, 64- bit node w/ 16GB of RAM and 8 core CPU, Intel Xeon 2.33GHz each. I used both json and cjson modules to parse my documents. My observation that the amount of RAM used to parse such docs is about 2GB, which is a way too much. The total time spent about 30 seconds (using cjson). The content of my docs are very mixed, lists, strings, other dicts. I can provide them if it will be required, but it's 200MB :) For comparison, I got the same data in XML and using cElementTree.iterparse I stay w/ 300MB RAM usage per doc, which is really reasonable to me. I can provide some benchmarks and perform such tests if it will be required. -- nosy: +vkuznet ___ Python tracker <http://bugs.python.org/issue6594> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com