Re: [SLUG] Python, XML, and Splitting a 750M XML File?
I was a bit bored, and this works for me... http://pastebin.com/srPxwvSm Chris- On Thu, Jan 6, 2011 at 4:12 PM, Peter Miller wrote: > On Thu, 2011-01-06 at 15:50 +1100, Peter Miller wrote: >> > 'Datum' are non-trivial, containing extensive subtrees. >> > ...etc... >> > blah >> > >> >> XML is plain text, use a text tool. >> If the line breaks are as indicated, use split(1) >> and then hand edit the headers and footers. > > Or, use awk(1) and split on lines containing /<.Datum>/ > using awk's ability to write to more than one file. > I suppose much the same could be done in Perl, too, but I'm older than > such new-fangled things as Perl. > > -- > Regards > Peter Miller > /\/\* http://miller.emu.id.au/pmiller/ > > PGP public key ID: 1024D/D0EDB64D > fingerprint = AD0A C5DF C426 4F03 5D53 2BDB 18D8 A4E2 D0ED B64D > See http://www.keyserver.net or any PGP keyserver for public key. > > "A data structure is just a stupid programming language." -- R. Wm. Gosper > -- > SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ > Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html > -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Python, XML, and Splitting a 750M XML File?
On Thu, 2011-01-06 at 15:50 +1100, Peter Miller wrote: > > 'Datum' are non-trivial, containing extensive subtrees. > > ...etc... > > blah > > > > XML is plain text, use a text tool. > If the line breaks are as indicated, use split(1) > and then hand edit the headers and footers. Or, use awk(1) and split on lines containing /<.Datum>/ using awk's ability to write to more than one file. I suppose much the same could be done in Perl, too, but I'm older than such new-fangled things as Perl. -- Regards Peter Miller /\/\*http://miller.emu.id.au/pmiller/ PGP public key ID: 1024D/D0EDB64D fingerprint = AD0A C5DF C426 4F03 5D53 2BDB 18D8 A4E2 D0ED B64D See http://www.keyserver.net or any PGP keyserver for public key. "A data structure is just a stupid programming language." -- R. Wm. Gosper -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Python, XML, and Splitting a 750M XML File?
On Thu, 2011-01-06 at 13:51 +1100, Tom Deckert wrote: > Original file looks like: > > > > blah > A couple hundred thousand Datum elements. > 'Datum' are non-trivial, containing extensive subtrees. > ...etc... > blah > XML is plain text, use a text tool. If the line breaks are as indicated, use split(1) and then hand edit the headers and footers. -- Regards Peter Miller /\/\*http://miller.emu.id.au/pmiller/ PGP public key ID: 1024D/D0EDB64D fingerprint = AD0A C5DF C426 4F03 5D53 2BDB 18D8 A4E2 D0ED B64D See http://www.keyserver.net or any PGP keyserver for public key. "As we said in the preface to the first edition, C 'wears well as one's experience with it grows.' With a decade more experience, we still feel that way." -- Brian Kernighan and Dennis Ritchie -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Python, XML, and Splitting a 750M XML File?
Sorry, I misread your email. Have you tried sax parsing? -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Python, XML, and Splitting a 750M XML File?
On 6 January 2011 13:51, Tom Deckert wrote: > > G'Day, > > Any easy XML (Python or otherwise) tools for splitting a 750M > XML file down into smaller portions? > > Because the file is so large > and exceeds memory size, I think the tool needs to be a 'streaming' > tool. On IBM DeveloperWorks site, I found an article detailing > using XSLT, but in other places it states XSLT tools usually > aren't streaming, so I'm guessing none of the XSLT processors > (xalan, saxon) will succeed. (Not to mention its been more than > 10 years since I last worked with XSLT.) > > Original file looks like: > > > > blah > A couple hundred thousand Datum elements. > 'Datum' are non-trivial, containing extensive subtrees. > ...etc... > blah > > > > I'd like a tool to split that into maybe > 10 different, valid XML files, all of which have the , > and tags, > but 1/10th as many s per file. > > > The problem is that on my 4Gig laptop, I run out of memory > for any tool which tries to read in the whole tree at > one time. In my case, Python's ElementTree fails, ala: > >> fin = open("BigFile.xml", "r") >> tree = xml.etree.ElementTree.parse(fin) --> Out of Memory > > > Solution doesn't have to be Python, but it would be nicest > if it were, as rest of the processing is all done in > a Python script. Out of interest is it just one large xml file or multiple xml files within one file ? Also, have you tried lxml? [0] [0] - http://codespeak.net/lxml/ -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
[SLUG] Python, XML, and Splitting a 750M XML File?
G'Day, Any easy XML (Python or otherwise) tools for splitting a 750M XML file down into smaller portions? Because the file is so large and exceeds memory size, I think the tool needs to be a 'streaming' tool. On IBM DeveloperWorks site, I found an article detailing using XSLT, but in other places it states XSLT tools usually aren't streaming, so I'm guessing none of the XSLT processors (xalan, saxon) will succeed. (Not to mention its been more than 10 years since I last worked with XSLT.) Original file looks like: blah A couple hundred thousand Datum elements. 'Datum' are non-trivial, containing extensive subtrees. ...etc... blah I'd like a tool to split that into maybe 10 different, valid XML files, all of which have the , and tags, but 1/10th as many s per file. The problem is that on my 4Gig laptop, I run out of memory for any tool which tries to read in the whole tree at one time. In my case, Python's ElementTree fails, ala: > fin = open("BigFile.xml", "r") > tree = xml.etree.ElementTree.parse(fin) --> Out of Memory Solution doesn't have to be Python, but it would be nicest if it were, as rest of the processing is all done in a Python script. Cheers, Tom -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Value of Red Hat certification ?
I did my RHCE last year, I did RH253 (Networking and Admin) and RH302 (RHCE exam) for which I paid for out of my own pocket (about $4,000 all up). If you have good experience with Linux (whichever distro), its only a matter of learning how do things the RH specific way and you'll get through the exam fairly easily. It did help to land me a job (in a Ubuntu shop!) and certification will allow perspective employers to see that you have a particular level of knowledge of Linux. My experience with dealing RedHat was very positive, they were very helpful throughout the process. I believe for me it was worth doing as it gave me 'the edge'. cheers Darrin. On Thu, Jan 6, 2011 at 1:33 AM, Rod Butcher wrote: > I had consider that - my plan is to actually train myself to be > vendor-neutral i.e. familiarise myself with the major distros RHEL, Suze, > Ubuntu so that I can administer them all, but to add the RHEL specialisation > on top of that, mainly because RHEL is apparently viewed as Number 1 - but I > think somebody who can only make a single distro work is pretty useless. > I think Red Hat certification will inevitably include a degree of > advertising/brainwashing to try to get people to do things there way purely > to differentiate their brand, but I'm old enough to see through Fudd. > How do employers view this - do they assume that serious admins make sure > they are familiar with multiple distroes, and see RHEL certification as a > bonus (i..e. the person knows More), or do they assume that Red Hat cert > means a person knows Less ? > thanks > Rod > > > On 05/01/11 12:22, onlyjob wrote: > >> Why Get a Vendor/Distribution *Neutral* Linux Certification? >> http://www.youtube.com/watch?v=ZaGjgdYB1vI >> > -- > SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ > Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html > -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Value of Red Hat certification ?
I had consider that - my plan is to actually train myself to be vendor-neutral i.e. familiarise myself with the major distros RHEL, Suze, Ubuntu so that I can administer them all, but to add the RHEL specialisation on top of that, mainly because RHEL is apparently viewed as Number 1 - but I think somebody who can only make a single distro work is pretty useless. I think Red Hat certification will inevitably include a degree of advertising/brainwashing to try to get people to do things there way purely to differentiate their brand, but I'm old enough to see through Fudd. How do employers view this - do they assume that serious admins make sure they are familiar with multiple distroes, and see RHEL certification as a bonus (i..e. the person knows More), or do they assume that Red Hat cert means a person knows Less ? thanks Rod On 05/01/11 12:22, onlyjob wrote: Why Get a Vendor/Distribution *Neutral* Linux Certification? http://www.youtube.com/watch?v=ZaGjgdYB1vI -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html