Hello again :) I'm looking at implemented options of scraping web pages? I've hit into this
http://re-factor.blogspot.nl/2014/04/scraping-re-factor.html but that's a json output and I'm looking at pages that only have html. I see there's parse-html and scrape-html to parse a url into a vector, which seems like an html tree flattened to an (event) stream. I'm left to wonder about the choice as it is unusual to my eyes, but I found there's a bunch of words working with the output in html.parser.analyzer. I've fiddled around with it and found my way around to extract some components I was looking for. So now I'm wondering - is there anything else I've missed. Is there something that parses html into a tree structure? Is there some simpler DSL to extract data? The common cases I hit into are XPath and CSS selectors, which are short and to the point, but I'm fine with w/e that is easy enough and has the same power. So basically I'm just looking for more tips or options in case I missed something. You guys have a lot of vocabs :) -- ------------ Peter Nagy ------------ ------------------------------------------------------------------------------ _______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk