On Sunday 28 January 2007 09:14, Neil Mitchell wrote:
> Hi Alistair,
>
> > > Is there a simple way to get the contents of a webpage using Haskell on
> > > a Windows box?
> >
> > This isn't exactly what you want, but it gets you partway there. Not
> > sure if LineBuffering or NoBuffering is the best option. Line
> > buffering should be fine for just text output, but if you request a
> > binary object (like an image) then you have to read exactly the number
> > of bytes specified, and no more.
>
> This works great for haskell.org, unfortunately it doesn't work as
> well with the rest of the web universe.
>
> With www.google.com I get: Program error: <handle>: IO.hGetChar:
> illegal operation
>
> With www.slashdot.org I get: 501 Not Implemented returned
>
> www.msnbc.msn.com works fine.
>
> Any ideas why? 

At the very least it's missing the HTTP version on the request line, and you 
almost always need to send a Host header.

For a start you could try changing client to:

client server port page = do
  h <- connectTo server (PortNumber port)
  hSetBuffering h NoBuffering
  putStrLn "send request"
  hPutStrLn h ("GET " ++ page ++ " HTTP/1.1\r")
  hPutStrLn h ("Host: " ++ server ++ "\r")
  hPutStrLn h "\r"
  hPutStrLn h "\r"
  putStrLn "wait for response"
  readResponse h
  putStrLn ""

Note that I haven't tried this, or the rest of Alistair code at all, so the 
usual 30 day money back guarantee doesn't apply.  It certainly won't handle 
redirects.


> Are there any alternatives to read in a file off the 
> internet (i.e. wget but as a library)

The http library sort of works most of the time, but there are several bugs 
that cause it to fail on many 'in the wild' webservers.

HXT has a wrapper around a command line invocation of cURL.  It works better.  
There is still a problem with redirects, but thats an easy enough fix.
I doubt that it would be very easy to extract it from the surrounding HXT 
framework though.
It would be nice to have a binding to libcurl.

Daniel
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to