[issue9958] (c)elementTree missing children

2010-09-27 Thread Valentin Kuznetsov

New submission from Valentin Kuznetsov :

Hi, I found that parsing XML file with identical structure leads to missing 
children item at some point. In my test case which I attach it happens at 
id=183. Basically I have XML with bunch of elements of the following structure:









when I parse them recursively, all, except the one with id=183, are parsed 
identically. The one with id=183 does not contain children. I tried the code 
with python 2.6 and python 2.7 on Mac and Linux. Bug exists in both version of 
ElementTree and cElementTree. Code and testbed XML are attached. Please untar 
and run as python test_bug.py.
Thanks,
Valentin

--
files: em_bug.tar
messages: 117440
nosy: vkuznet
priority: normal
severity: normal
status: open
title: (c)elementTree missing children
type: behavior
versions: Python 2.6, Python 2.7
Added file: http://bugs.python.org/file19028/em_bug.tar

___
Python tracker 
<http://bugs.python.org/issue9958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7451] improve json decoding performance

2009-12-09 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

I wonder if you can make a patch for 2.6 python branch.

--

___
Python tracker 
<http://bugs.python.org/issue7451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6594] json C serializer performance tied to structure depth on some systems

2009-12-07 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

I made data local, but adding del shows the same behavior.
This is the test

def test():
source = open('mangled.json', 'r')
data = json.load(source)
source.close()
del data
test()
time.sleep(20)

--

___
Python tracker 
<http://bugs.python.org/issue6594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6594] json C serializer performance tied to structure depth on some systems

2009-12-07 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

Nope, all three json's implementation do not release the memory. I used 
your patched one, the one shipped with 2.6 and cjson. The one which comes 
with 2.6, reach 2GB, then release 200MB and stays with 1.8GB during 
sleep. The cjson reaches 1.5GB mark and stays there. But all three 
release another 100-200MB just before the exit (one top cycle before 
process disappear). I used sleep of 20 seconds, so I'm pretty sure memory 
was not released during that time, since I watched the process with idle 
CPU.

--

___
Python tracker 
<http://bugs.python.org/issue6594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6594] json C serializer performance tied to structure depth on some systems

2009-12-07 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

Antoine,
indeed, both patches improved time and memory foot print. The latest 
patch shows only 1.1GB RAM usage and is very fast. What's worry me 
though, that memory is not released back to the system. Is this is the 
case? I just added time.sleep after json.load and saw that once decoding 
is done, the resident size still remain the same.

--

___
Python tracker 
<http://bugs.python.org/issue6594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6594] json C serializer performance tied to structure depth on some systems

2009-12-02 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

Oops, that's explain why I saw such small memory usage with cjson. I 
constructed tests on a fly.

Regarding the data structure. Unfortunately it's out of my hands. The 
data comes from data-service. So, I can't do much and can only report to 
developers. 

I'll try your patch tomorrow. Obviously it's a huge gain, both in memory 
footprint and CPU usage.

Thanks.
Valentin.

--

___
Python tracker 
<http://bugs.python.org/issue6594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6594] json C serializer performance tied to structure depth on some systems

2009-12-02 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

Hi,
I'm sorry for delay, I was busy. Here is a test data file:
http://www.lns.cornell.edu/~vk/files/mangled.json

Its size is 150 MB, 50MB less of original, due to scrambled values I was 
forced to do.

The tests with stock json module in python 2.6.2 is 2GB
source = open('mangled.json', 'r')
data = json.load(source)

Using simplejson 2.0.9 from PyPi I saw the same performance, please note 
_speedups.so C module was compiled.

Using cjson module, I observed 180MB of RAM utilization
source = open('mangled.json', 'r')
data = cjson.encode(source.read())

cjson is about 10 times faster!

I re-factor code which deals with XML version of the same data and I was 
able to process it using cElementTree only using 20MB (!) of RAM.

--

___
Python tracker 
<http://bugs.python.org/issue6594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6594] json C serializer performance tied to structure depth on some systems

2009-11-19 Thread Valentin Kuznetsov

Valentin Kuznetsov  added the comment:

Hi,
I just found this bug and would like to add my experience with 
performance of large JSON docs. I have a few JSON docs about 180MB in 
size which I read from data-services. I use python2.6, run on Linux, 64-
bit node w/ 16GB of RAM and 8 core CPU, Intel Xeon 2.33GHz each. I used 
both json and cjson modules to parse my documents. My observation that 
the amount of RAM used to parse such docs is about 2GB, which is a way 
too much. The total time spent about 30 seconds (using cjson). The 
content of my docs are very mixed, lists, strings, other dicts. I can 
provide them if it will be required, but it's 200MB :)

For comparison, I got the same data in XML and using 
cElementTree.iterparse I stay w/ 300MB RAM usage per doc, which is 
really reasonable to me.

I can provide some benchmarks and perform such tests if it will be 
required.

--
nosy: +vkuznet

___
Python tracker 
<http://bugs.python.org/issue6594>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com