Hi All,

My goal is to be able to read the www.gutenberg.org
<http://www.gutenberg.org/>  rdf catalog, parse it into a python structure,
and pull out data for each record.

The catalog is a Dublin core RDF/XML catalog, divided into sections for each
book and details for that book.

I have done a very large amount of research on this problem.

I've tried tools such as pyrple, sax/dom/minidom, and some others both
standard and nonstandard to a python installation.

None of the tools has been able to read this file successfully, and those
that can even see the data can take up to half an hour to load with 2 gb of
ram.

So you all know what I'm talking about, the file is located at:

http://www.gutenberg.org/feeds/catalog.rdf.bz2

Does anyone have suggestions for a parser or converter, so I'd be able to
view this file, and extract data?

Any help is appreciated.

 

Thanks,

Brandon McGinty

[EMAIL PROTECTED]

 

 

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to