Apologies if this is a duplicate, the original appears to have gone astray.
On Wednesday 01 November 2006 10:57, Albert Lai wrote: > Daniel McAllansmith <[EMAIL PROTECTED]> writes: > > Hello. > > > > I have some html from which I want to extract records. > > Each record is represented within a number of <tr> nodes, and all records > > <tr> nodes are contained by the same parent node. > > This is very poorly written HTML. The original structure of the data > is destroyed - the parse tree no longer reflects the data structure. > (If a record is to be displayed in several rows, there are proper > ways.) It is syntactically incorrect: nested <tr>, and color in <hr>. > (Just ask http://validator.w3.org/ .) Indeed. The original is even worse, with overlapping nodes and other such treasures which makes navigation in HXT tricky at times. > I trust that you are parsing > this because you realize it is all wrong and you want to > programmatically convert it to proper markup. Yep! I sure wouldn't be doing this if I had control of the the original HTML. > > Since the file is unstructured, I choose not to sweat over restoring > the structure in an HXT arrow. The HXT arrow will return a flat list, > just as the file is a flat ensemble. I was about to write a follow-up just as your mail came in... I've ended up with the same solution as you've kindly suggested. Another option I came across is Control.Arrow.ArrowTree.changeChildren which could be used to restore a more normalised structure ready for more processing. Thanks Daniel _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe