2013/9/5 John Porubek <jporu...@gmail.com>: > I'm re-posting this message since I didn't get an answer the first > time and I'm nothing if not persistent! > > Let me recast the original question differently. Is there a fairly > easy way, using Factor, to scrape a blog website for a list of blog > titles? This seems like it would be a really useful tool for finding > information, assuming the author chose fairly meaningful titles. > > I'm no expert in web technology, but it occurs to me, as I think about > this problem, that it might be kind of difficult in the general case. > I have no idea how similar different blogs might be. For now, I'm most > interested in the special case of John Benediktsson's "Re:Factor" > blog.
Hi John, Yours truly have written a tutorial for scraping with Factor available here: https://github.com/bjourne/playground-factor/wiki/Parsing-gmane-with-factor Though it's much a rough work in progress which I might never finish because I get bored quickly. Maybe you can salvage some information from it. The general strategy for scraping is: IN: USE: html.parser.analyzer IN: "http://www.factorcode.org/" scrape-html ! Get stuff from the tag seq, eg. IN: "title" find-between-first first text>> "Factor programming language" -- mvh/best regards Björn Lindqvist http://www.bjornlindqvist.se/ ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk _______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk