Hi all,

I'm playing with the TagSoup library trying to extract links to
original pictures from my Flickr Sets page. This programs first loads
the Sets page, open links to each set, get links to pictures and then
search for original picture link (see steps in main function).

It does the job, but for the tests I just wanted to take 10 links to
reduce the time the program runs. Just hoping that haskell laziness
would magically take the minimum amount of data required to get the
first 10 links out of this set of pages.

I did this replacing:
  (putStrLn . unlines . concat) origLinks
with
  (putStrLn . unlines . take 10 . concat) origLinks
in the main function.

With the last version of that line, I effectively only get 10 links
but the runtime is exactly the same for both main functions.

As I'm a newbie haskell programmer I certainly missing something.

By the way I know Flickr has an api I could use, but the purpose was
playing with TagSoup.

Thanks for any advice.

Olivier.

Here's the code:

module Main where

import Data.Html.TagSoup
import Control.Monad (liftM)
import Data.List (isPrefixOf, groupBy)
import Data.Maybe (mapMaybe)
import System (getArgs)
import System.Time
import IO (hPutStrLn, stderr)

base    = "http://www.flickr.com";
setsUrl name = "/photos/" ++ name ++ "/sets/"

main :: IO ()
main = do
   args      <- getArgs
   tStart    <- getClockTime
   setLinks  <- getLinksByAttr ("class", "Seta") (base ++ setsUrl (args !! 0))
   picLinks  <- mapM (getLinksByAttr ("class", "image_link")) setLinks
   origLinks <- mapM (getLinksAfterImgByAttr ("src",
"http://l.yimg.com/www.flickr.com/images/icon_download.gif";)) $
(mapMaybe linkToOrigSize . concat) picLinks
   (putStrLn . unlines . concat) origLinks
   tEnd      <- getClockTime
   hPutStrLn stderr ( timeDiffToString $ diffClockTimes tEnd tStart )

-- | extract all links from "a" tag types having given attribute
getLinksByAttr :: (String, String) -> String -> IO [String]
getLinksByAttr attr url = do
   sects <- getSectionsByTypeAndAttr "a" attr url
   return $ hrefs sects

-- | get "a" tags following a "img" having a specific attribute
getLinksAfterImgByAttr :: (String, String) -> String -> IO [String]
getLinksAfterImgByAttr attr url = do
   sects <- getSectionsByTypeAndAttr "img" attr url
   return $ hrefs $ map (dropWhile (not . isTagOpen) . drop 1) sects

-- | create sections from tag type and attribute
getSectionsByTypeAndAttr :: String -> (String, String) -> String -> IO [[Tag]]
getSectionsByTypeAndAttr tagType attr url = do
   tags <- liftM parseTags $ openURL $ url
   (return . filterByTypeAndAttr tagType attr) tags
 where
   filterByTypeAndAttr :: String -> (String, String) -> [Tag] -> [[Tag]]
   filterByTypeAndAttr t a = sections (~== TagOpen t [a])

-- | extract href values from sections of "a" tags
hrefs :: [[Tag]] -> [String]
hrefs = map (addBase . fromAttrib "href" . head)
 where
   addBase :: String -> String
   addBase s | "http://"; `isPrefixOf` s = s
   addBase s | otherwise                = base ++ s

-- | transform a link to a picture into a link to the original size picture
linkToOrigSize :: String -> Maybe String
linkToOrigSize link =
   if parts !! 3 == "photos" then
       Just $ newUrl parts
       else
           Nothing
 where
   parts = map tail $ groupBy (const(/='/')) link
   newUrl p = "http://www.flickr.com/photo_zoom.gne?id="; ++ p !! 5 ++
"&size=o&context=" ++ p !! 7
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to