Hello Cafe,

I have this HTML structure:
...
<table>
    ...
    <tr>
        <th>Caption</th>
        <td>
            <a href="...">Want this</a>
            <a href="...">And this</a>
        </td>
     </tr>
     <tr>
         <th>Another caption</th>
            <td>
             ....
      <tr>
          <th>Yet another caption</th>
      ...
</table>
...

I'd like to extract A texts from row with header "Caption", and have come up with this

runX $ doc
>>> (deep (hasName "tr") -- filter only TRs
               >>> withTraceLevel 5 traceTree                   -- shows 
correct TR
               `when`
             deep (
hasName "th" >>> -- filter THs with specified text
                getChildren >>> hasText (=="Caption")
             ) -- inner deep
             >>> getChildren >>> hasName "td" -- shouldn't here be only one TR?
             >>> getChildren
          )
>>> getName &&& (getChildren >>> getText) -- list has TDs from all three TRs

Tried with `guards` but getting the same result.


I know there are other packages that might solve this in another way, but I'd like to understand what is going on here.

br,

vlatko



_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to