Devon <dshur...@gmail.com> writes: > I must quickly and efficiently parse some data contained in multiple > XML files in order to perform some learning algorithms on the data. > Info: > > I have thousands of files, each file corresponds to a single song. > Each XML file contains information extracted from the song (called > features). Examples include tempo, time signature, pitch classes, etc. > An example from the beginning of one of these files looks like: > > <analysis decoder="Quicktime" version="0x7608000"> > <track duration="29.12331" endOfFadeIn="0.00000" > startOfFadeOut="29.12331" loudness="-12.097" tempo="71.031" > tempoConfidence="0.386" timeSignature="4" > timeSignatureConfidence="0.974" key="11" keyConfidence="1.000" > mode="0" modeConfidence="1.000"> > <sections> > <section start="0.00000" duration="7.35887"/> > <section start="7.35887" duration="13.03414"/> > <section start="20.39301" duration="8.73030"/> > </sections> > <segments> > <segment start="0.00000" duration="0.56000"> > <loudness> > <dB time="0">-60.000</dB> > <dB time="0.45279" type="max">-59.897</dB> > </loudness> > <pitches> > <pitch class="0">0.589</pitch> > <pitch class="1">0.446</pitch> > <pitch class="2">0.518</pitch> > <pitch class="3">1.000</pitch> > <pitch class="4">0.850</pitch> > <pitch class="5">0.414</pitch> > <pitch class="6">0.326</pitch> > <pitch class="7">0.304</pitch> > <pitch class="8">0.415</pitch> > <pitch class="9">0.566</pitch> > <pitch class="10">0.353</pitch> > <pitch class="11">0.350</pitch> >
You could use XSLT to get the data. For example this xslt script extracts duration, tempo and time signature into a comma separated list. <xsl:stylesheet version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <xsl:template match="/analysis/track"> <xsl:value-of select="concat(@duration, ',', @tempo, ',', @timeSignature)" /><xsl:text>
</xsl:text> </xsl:template> </xsl:stylesheet> With xsltproc song.xsl song*.xml you would get your output. No python necessary. Or if you would like to use it inside a Python program, use lxml to call the xslt processor, or just XPath to extract the values and format them with Python. -- http://mail.python.org/mailman/listinfo/python-list