Re: [Haskell-cafe] TagSoup 0.9
Hi, From what I can tell of your example you've managed to get the raw HTTP response in Unicode, which isn't suitable for sending to tagsoup. I've not used the Network.HTTP library for downloading much, but when I did I thought it stripped the headers automatically. Can you just print the first few lines of the output you get from the HTTP library, without passing them through tagsoup. That should show the problem independent of tagsoup. Thanks, Neil On Mon, May 24, 2010 at 3:24 AM, Ralph Hodgson rhodg...@topquadrant.com wrote: Thanks Neil, Using Network.HTTP worked. However something else I have just run into concerns some web pages that start with: ?xml version=1.0 encoding=iso-8859-1? !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd; I get the following bad result: TagText HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nLast-Modified: Tue, 27 Oct 2009 19:30:40 GMT\r\nETag: \6f248cf73b57ca1:25e2\\r\nDate: Sun, 23 May 2010 22:46:41 GMT\r\nTransfer-Encoding: chunked\r\nConnection: close\r\nConnection: Transfer-Encoding\r\n\r\n4000\r\n\255\254\NUL?\NULx\NULm\NULl\NUL \NULv\NULe\NULr\NULs\NULi\NULo\NULn\NUL=\NUL\\NUL1\NUL.\NUL0\NUL\\NUL \NULe\NULn\NULc\NULo\NULd\NULi\NULn\NULg\NUL=\NUL\\NULi\NULs\NULo\NUL-\NUL8\NUL8\NUL5\NUL9\NUL-\NUL1\NUL\\NUL etc etc Is this an easy thing to fix? I've started to look over the code. -Original Message- From: Neil Mitchell [mailto:ndmitch...@gmail.com] Sent: Wednesday, May 19, 2010 12:19 PM To: Ralph Hodgson Cc: Daniel Fischer; haskell-cafe@haskell.org; Don Stewart Subject: Re: [Haskell-cafe] TagSoup 0.9 Hi Ralph, I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this error: TQ\TagSoup\TagSoupExtensions.lhs:29:17: `Tag' is not applied to enough type arguments Expected kind `*', but `Tag' has kind `* - *' In the type synonym declaration for `Bundle' Failed, modules loaded: TQ.Common.TextAndListHandling. My change notes have this being a change between 0.6 and 0.8. As Malcolm says, any old uses of Tag should become Tag String. The reason is that Tag is now parameterised, and you can use Tag ByteString etc. However, I should point out that Tag ByteString won't be any faster than Tag String in this version (it's in the future work pile). Forgot to add: I now need to understand the following warnings on this line import Text.HTML.Download: Everyone's comments have been right. I previously included Text.HTML.Download so that it was easy to test tagsoup against the web. Since I first wrote that snippet the HTTP downloading libraries have improved substantially, so people should use those in favour of the version in tagsoup - you'll be able to connect to more websites in more reliable ways, go through proxies etc. I don't intend to remove the Download module any time soon, but I will do eventually. Thanks, Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] TagSoup 0.9
Thanks Neil, Using Network.HTTP worked. However something else I have just run into concerns some web pages that start with: ?xml version=1.0 encoding=iso-8859-1? !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd; I get the following bad result: TagText HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nLast-Modified: Tue, 27 Oct 2009 19:30:40 GMT\r\nETag: \6f248cf73b57ca1:25e2\\r\nDate: Sun, 23 May 2010 22:46:41 GMT\r\nTransfer-Encoding: chunked\r\nConnection: close\r\nConnection: Transfer-Encoding\r\n\r\n4000\r\n\255\254\NUL?\NULx\NULm\NULl\NUL \NULv\NULe\NULr\NULs\NULi\NULo\NULn\NUL=\NUL\\NUL1\NUL.\NUL0\NUL\\NUL \NULe\NULn\NULc\NULo\NULd\NULi\NULn\NULg\NUL=\NUL\\NULi\NULs\NULo\NUL-\NUL8 \NUL8\NUL5\NUL9\NUL-\NUL1\NUL\\NUL etc etc Is this an easy thing to fix? I've started to look over the code. -Original Message- From: Neil Mitchell [mailto:ndmitch...@gmail.com] Sent: Wednesday, May 19, 2010 12:19 PM To: Ralph Hodgson Cc: Daniel Fischer; haskell-cafe@haskell.org; Don Stewart Subject: Re: [Haskell-cafe] TagSoup 0.9 Hi Ralph, I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this error: TQ\TagSoup\TagSoupExtensions.lhs:29:17: `Tag' is not applied to enough type arguments Expected kind `*', but `Tag' has kind `* - *' In the type synonym declaration for `Bundle' Failed, modules loaded: TQ.Common.TextAndListHandling. My change notes have this being a change between 0.6 and 0.8. As Malcolm says, any old uses of Tag should become Tag String. The reason is that Tag is now parameterised, and you can use Tag ByteString etc. However, I should point out that Tag ByteString won't be any faster than Tag String in this version (it's in the future work pile). Forgot to add: I now need to understand the following warnings on this line import Text.HTML.Download: Everyone's comments have been right. I previously included Text.HTML.Download so that it was easy to test tagsoup against the web. Since I first wrote that snippet the HTTP downloading libraries have improved substantially, so people should use those in favour of the version in tagsoup - you'll be able to connect to more websites in more reliable ways, go through proxies etc. I don't intend to remove the Download module any time soon, but I will do eventually. Thanks, Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] TagSoup 0.9
Hello Neil , I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this error: TQ\TagSoup\TagSoupExtensions.lhs:29:17: `Tag' is not applied to enough type arguments Expected kind `*', but `Tag' has kind `* - *' In the type synonym declaration for `Bundle' Failed, modules loaded: TQ.Common.TextAndListHandling. where line 29 is the type declaration for 'bundle' in the following code: module TQ.TagSoup.TagSoupExtensions where import TQ.Common.TextAndListHandling import Text.HTML.TagSoup import Text.HTML.Download import Control.Monad import Data.List import Data.Char type Bundle = [Tag] [snip] tagsOnPage :: String - IO(String) tagsOnPage url = do tags - liftM parseTags $ openURL url let results = unlines $ map(show) $ tags return (results) extractTags :: Tag - Tag - [Tag] - [Tag] extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= fromTag ) tags extractTagsBetween :: Tag - [Tag] - [Tag] extractTagsBetween _ [] = [] extractTagsBetween markerTag tags = if startTags == [] then [] else [head startTags] ++ (takeWhile (~/= markerTag ) $ tail startTags) where startTags = dropWhile (~/= markerTag ) tags I need to repair this code quickly. I am hoping you can quickly help me resolve this. Thanks. Ralph Hodgson, @ralphtq http://twitter.com/ralphtq ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] TagSoup 0.9
Neil says that the API of TagSoup changed in 0.9. All usages of the type Tag should now take a type argument, e.g. Tag String. Regards, Malcolm On Wednesday, May 19, 2010, at 08:05AM, Ralph Hodgson rhodg...@topquadrant.com wrote: ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] TagSoup 0.9
Thanks Malcolm, Providing a 'String' type argument worked: type Bundle = [Tag String] extractTags :: Tag String - Tag String - Bundle - Bundle extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= fromTag ) tags From: Malcolm Wallace [mailto:malcolm.wall...@me.com] Sent: Wednesday, May 19, 2010 1:48 AM To: rhodg...@topquadrant.com Cc: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] TagSoup 0.9 Neil says that the API of TagSoup changed in 0.9. All usages of the type Tag should now take a type argument, e.g. Tag String. Regards, Malcolm On Wednesday, May 19, 2010, at 08:05AM, Ralph Hodgson rhodg...@topquadrant.com wrote: ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe Hello Neil , I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this error: TQ\TagSoup\TagSoupExtensions.lhs:29:17: `Tag' is not applied to enough type arguments Expected kind `*', but `Tag' has kind `* - *' In the type synonym declaration for `Bundle' Failed, modules loaded: TQ.Common.TextAndListHandling. where line 29 is the type declaration for 'bundle' in the following code: module TQ.TagSoup.TagSoupExtensions where import TQ.Common.TextAndListHandling import Text.HTML.TagSoup import Text.HTML.Download import Control.Monad import Data.List import Data.Char type Bundle = [Tag] [snip] tagsOnPage :: String - IO(String) tagsOnPage url = do tags - liftM parseTags $ openURL url let results = unlines $ map(show) $ tags return (results) extractTags :: Tag - Tag - [Tag] - [Tag] extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= fromTag ) tags extractTagsBetween :: Tag - [Tag] - [Tag] extractTagsBetween _ [] = [] extractTagsBetween markerTag tags = if startTags == [] then [] else [head startTags] ++ (takeWhile (~/= markerTag ) $ tail startTags) where startTags = dropWhile (~/= markerTag ) tags I need to repair this code quickly. I am hoping you can quickly help me resolve this. Thanks. Ralph Hodgson, @ralphtq http://twitter.com/ralphtq ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] TagSoup 0.9
Forgot to add: I now need to understand the following warnings on this line import Text.HTML.Download: TagSoupExtensions.lhs:24:2: Warning: In the use of `openItem' (imported from Text.HTML.Download): Deprecated: Use package HTTP, module Network.HTTP, getResponseBody = simpleHTTP (getRequest url) TagSoupExtensions.lhs:24:2: Warning: In the use of `openURL' (imported from Text.HTML.Download): Deprecated: Use package HTTP, module Network.HTTP, getResponseBody = simpleHTTP (getRequest url) Ok, modules loaded: TQ.TagSoup.TagSoupExtensions. *TQ.TagSoup.TagSoupExtensions From: haskell-cafe-boun...@haskell.org [mailto:haskell-cafe-boun...@haskell.org] On Behalf Of Ralph Hodgson Sent: Wednesday, May 19, 2010 10:30 AM To: 'Malcolm Wallace' Cc: haskell-cafe@haskell.org Subject: RE: [Haskell-cafe] TagSoup 0.9 Thanks Malcolm, Providing a 'String' type argument worked: type Bundle = [Tag String] extractTags :: Tag String - Tag String - Bundle - Bundle extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= fromTag ) tags From: Malcolm Wallace [mailto:malcolm.wall...@me.com] Sent: Wednesday, May 19, 2010 1:48 AM To: rhodg...@topquadrant.com Cc: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] TagSoup 0.9 Neil says that the API of TagSoup changed in 0.9. All usages of the type Tag should now take a type argument, e.g. Tag String. Regards, Malcolm On Wednesday, May 19, 2010, at 08:05AM, Ralph Hodgson rhodg...@topquadrant.com wrote: ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe Hello Neil , I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this error: TQ\TagSoup\TagSoupExtensions.lhs:29:17: `Tag' is not applied to enough type arguments Expected kind `*', but `Tag' has kind `* - *' In the type synonym declaration for `Bundle' Failed, modules loaded: TQ.Common.TextAndListHandling. where line 29 is the type declaration for 'bundle' in the following code: module TQ.TagSoup.TagSoupExtensions where import TQ.Common.TextAndListHandling import Text.HTML.TagSoup import Text.HTML.Download import Control.Monad import Data.List import Data.Char type Bundle = [Tag] [snip] tagsOnPage :: String - IO(String) tagsOnPage url = do tags - liftM parseTags $ openURL url let results = unlines $ map(show) $ tags return (results) extractTags :: Tag - Tag - [Tag] - [Tag] extractTags fromTag toTag tags = takeWhile (~/= toTag ) $ dropWhile (~/= fromTag ) tags extractTagsBetween :: Tag - [Tag] - [Tag] extractTagsBetween _ [] = [] extractTagsBetween markerTag tags = if startTags == [] then [] else [head startTags] ++ (takeWhile (~/= markerTag ) $ tail startTags) where startTags = dropWhile (~/= markerTag ) tags I need to repair this code quickly. I am hoping you can quickly help me resolve this. Thanks. Ralph Hodgson, @ralphtq http://twitter.com/ralphtq ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] TagSoup 0.9
On Wednesday 19 May 2010 19:46:57, Ralph Hodgson wrote: Forgot to add: I now need to understand the following warnings on this line import Text.HTML.Download: In Text.HTML.Download, there's the following: {-| /DEPRECATED/: Use the HTTP package instead: import Network.HTTP openURL x = getResponseBody = simpleHTTP (getRequest x) This module simply downloads a page off the internet. It is very restricted, and it not intended for proper use. The original version was by Alistair Bayley, with additional help from Daniel McAllansmith. It is taken from the Haskell-Cafe mailing list \Simple HTTP lib for Windows?\, 18 Jan 2007. http://thread.gmane.org/gmane.comp.lang.haskell.cafe/18443/ -} and {-# DEPRECATED openItem, openURL Use package HTTP, module Network.HTTP, getResponseBody = simpleHTTP (getRequest url) #-} So, don't use Text.HTML.Download anymore, instead use the functions from the HTTP package. Deprecated stuff will probably be removed in one of the next releases. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] TagSoup 0.9
Or use things from the download-curl package, which provides a nice openURL function. daniel.is.fischer: On Wednesday 19 May 2010 19:46:57, Ralph Hodgson wrote: Forgot to add: I now need to understand the following warnings on this line import Text.HTML.Download: In Text.HTML.Download, there's the following: {-| /DEPRECATED/: Use the HTTP package instead: import Network.HTTP openURL x = getResponseBody = simpleHTTP (getRequest x) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] TagSoup 0.9
Hi Ralph, I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have this error: TQ\TagSoup\TagSoupExtensions.lhs:29:17: `Tag' is not applied to enough type arguments Expected kind `*', but `Tag' has kind `* - *' In the type synonym declaration for `Bundle' Failed, modules loaded: TQ.Common.TextAndListHandling. My change notes have this being a change between 0.6 and 0.8. As Malcolm says, any old uses of Tag should become Tag String. The reason is that Tag is now parameterised, and you can use Tag ByteString etc. However, I should point out that Tag ByteString won't be any faster than Tag String in this version (it's in the future work pile). Forgot to add: I now need to understand the following warnings on this line import Text.HTML.Download: Everyone's comments have been right. I previously included Text.HTML.Download so that it was easy to test tagsoup against the web. Since I first wrote that snippet the HTTP downloading libraries have improved substantially, so people should use those in favour of the version in tagsoup - you'll be able to connect to more websites in more reliable ways, go through proxies etc. I don't intend to remove the Download module any time soon, but I will do eventually. Thanks, Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] TagSoup 0.9
Don Stewart schrieb: Or use things from the download-curl package, which provides a nice openURL function. The openURL function from TagSoup is lazy, which the proposed replacement 'getResponseBody = simpleHTTP (getRequest x)' is not. Is the openURL function from download-curl lazy? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] TagSoup 0.9
schlepptop: Don Stewart schrieb: Or use things from the download-curl package, which provides a nice openURL function. The openURL function from TagSoup is lazy, which the proposed replacement 'getResponseBody = simpleHTTP (getRequest x)' is not. Is the openURL function from download-curl lazy? Yes, see: Network.Curl.Download.Lazy.openLazyURI though I think it is possible that I strictified the code. Have a play around with it if it doesn't meet your needs -- should be /trivial/ to ensure it is chunk-wise lazy. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe