2008/12/10 Ramon Buckland <[EMAIL PROTECTED]>: > Hi Peoples, > > I am just about finished the proof of concept of using TagSoup as a > DataFormat and as a component. > > For those not familiar with TagSoup, it is a Java Library (APache 2.0 > License) which converts poorly formatted Html > > <html> <p> something > > into well formed (xml) HTML. (not XHTML). > > ie: > > <html> > <body> > <p>something</p> > </body> > </html> > > This is very helpful for a following reason. > > <camelContext xmlns="http://activemq.apache.org/camel/schema/spring"> > <route> > <from uri="direct:start"/> > <to uri="http://myserver.com/somequery?foo=1"/> > <unmarshal><wellFormedHtml/><unmarshal> > <to uri="xslt:file:///foo/bar.xsl"/> > <to .../> > </route> > </camelContext> > > > Questions: > Is this component helpful ? *Should I finish, I have not seen anything > like it in the toolkit yet)
Definitely! Being able to format HTML nicely as XML so you can do XPath and whatnot is *very* useful! > *If continuing is a good idea, what should the "dataFormat" be called ? > ie the <wellFormedHtml/> Oooh thats a tricky one - naming is so hard! Maybe <tagSoup/> ? We might one day have a few different mechanisms? (e.g. jtidy?). Though maybe tagSoup is a bit vague :). How about tidyHtml or tidyMarkup? > Am I unmarshalling or marshalling ? (we of course won't support going > the other way as good to bad html is just hard(er)) > I figured it is <unmarshalling> as the <csv/> dataformat is similar, CSV > --> List<..> is ummarshalling. Yeah. Whats the output btw - is it a DOM? Or can it be converted to a Source so the endpoint could take DOM/SAX/StaX etc? -- James ------- http://macstrac.blogspot.com/ Open Source Integration http://fusesource.com/
