I hit a stumbling block when passing large files (multi-GB) to NSXMLParser.

Are you doing this in 64 bit?

Yes, I switched to building x86_64 because NSXMLParser was refusing files over 4GB.

It appears that NSXMLParser's initWithContentsOfURL: method loads the contents of the entire file into memory, which is causing virtual memory thrashing for at file sizes approaching my physical RAM (2 GB in this case, so I start seeing performance issues at around 1.3 GB). After reading the CFXMLParser documentation, I suspect that core foundation does the same thing.

Yes, probably. Have you tried initializing it with a memory-mapped NSData instead of an NSURL?

Thank you for the suggestion; I was unaware of initWithContentsOfMappedFile:. This worked to a certain extent, in that it kept memory consumption to within the bounds of available physical memory, but it still consumed all the memory available. This caused a good deal of thrashing when I tried running the test and working at the same time.

Can somebody suggest an alternative API for parsing xml that does not have memory requirements linear with file size for the initialization? Given the event-driven design I originally imagined that the parser would read through a file incrementally, without loading it all into memory.

My Objective-XML might help, though I haven't tried it for files quite that large yet. The largest I tried was a couple of hundred MB, which worked fine. For one, it uses significantly less memory than NSXMLParser (and is faster), trying very hard to touch as little memory as possible and keep as little of it around as possible. It also actually does incremental loading of URLs, though it will detect file-URLs and then load them directly (and use dataWithContentsOfURL:, which will likely also do a read() instead of an mmap() ).

It has an NSXMLParser-compatible SAX API as well as a more convenient MAX API.

Current download is at:

        http://www.metaobject.com/downloads/Objective-C/Objective-XML-5.1.tgz

I just tried it with a 190 MB XML file, which took around 7s to parse on my MacBook Pro. RPRVT stayed at 600KB the whole time, RSHRD was also not affected. RSIZE did go to 191MB, reflecting the fact that more and more of the mapped file's memory gets mapped into the process in question.


Thanks for the link - I will investigate. Yesterday I got into libxml2, and the xmlReader API provides functionality equivalent to NSXMLParser without the memory consumption.

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to