On Apr 4, 2009, at 9:02 , George King wrote:

I hit a stumbling block when passing large files (multi-GB) to NSXMLParser.

Are you doing this in 64 bit?

It appears that NSXMLParser's initWithContentsOfURL: method loads the contents of the entire file into memory, which is causing virtual memory thrashing for at file sizes approaching my physical RAM (2 GB in this case, so I start seeing performance issues at around 1.3 GB). After reading the CFXMLParser documentation, I suspect that core foundation does the same thing.

Yes, probably. Have you tried initializing it with a memory-mapped NSData instead of an NSURL?

Can somebody suggest an alternative API for parsing xml that does not have memory requirements linear with file size for the initialization? Given the event-driven design I originally imagined that the parser would read through a file incrementally, without loading it all into memory.

My Objective-XML might help, though I haven't tried it for files quite that large yet. The largest I tried was a couple of hundred MB, which worked fine. For one, it uses significantly less memory than NSXMLParser (and is faster), trying very hard to touch as little memory as possible and keep as little of it around as possible. It also actually does incremental loading of URLs, though it will detect file-URLs and then load them directly (and use dataWithContentsOfURL:, which will likely also do a read() instead of an mmap() ).

It has an NSXMLParser-compatible SAX API as well as a more convenient MAX API.

Current download is at:

        http://www.metaobject.com/downloads/Objective-C/Objective-XML-5.1.tgz

I just tried it with a 190 MB XML file, which took around 7s to parse on my MacBook Pro. RPRVT stayed at 600KB the whole time, RSHRD was also not affected. RSIZE did go to 191MB, reflecting the fact that more and more of the mapped file's memory gets mapped into the process in question.


Note that once the NSXMLParser delegate starts receiving messages, I am able to keep memory usage under control by giving the delegate its own autorelease pool and draining/replacing it once every x calls to parser:didStartElement....




Marcel

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to