Re: HttpKit, Enlive (html retrieval and parsing)
This is exactly what I do and it works great! On Saturday, January 11, 2014 7:00:22 PM UTC-5, Jan Herich wrote: I don't recommend using java's built in HTTP retrieval (by passing java.net.URL object to enlive html-resource function). Not only is it significantly slower then using clj-http (which uses apache-http client under the hood), but it's also unreliable when issuing more parallel requests. Current enlive library supports plug-able parsers, the default one is TagSoup, but you can switch it very easily for example for JSoup by setting *parser* dynamic var. You can have a look at one of my little projects where i used enlive for html scraping herehttps://github.com/janherich/lazada-quest/blob/master/src/lazada_quest/scrapper.clj , in this case, i used clj-http as http client: (ns lazada-quest.scrapper (:require [clojure.string :as string] [clj-http.client :as client] [net.cgrand.enlive-html :as html])) (defn fetch-url Given some url string, fetch html content of the resource served under url adress and return it in the form of enlive nodes [url] (html/html-resource (:body (client/get url {:as :stream} It would be straightforward to replace use of clj-http with http-kit synchronous api, or asynchronous api with some changes Dňa nedeľa, 12. januára 2014 0:24:48 UTC+1 Dave Tenny napísal(-a): I'm just playing around with tool kits to retrieve and parse html from web pages and files that I already have on disk (such as JDK API documentation). Based on too little time, it looks like [http-kit 2.1.16] will retrieve but not parse html, and [enlive 1.1.5] will retrieve AND parse html. Or is there a whole built-in parse capability I'm missing in http-kiit? Also, http-kit doesn't seem to want to retrieve content from a file:/// url, whereas enlive is happy with both local and remote content. I'm just messing around, I wanted to have some REPL javadoc logic that didn't fire up a browser or use the swing app (whose fonts are unreadable for me, and half a day spent trying to change it was not fruitful). Any tips or suggestions? Just don't want to make sure I'm missing obvious things. Thanks! -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
HttpKit, Enlive (html retrieval and parsing)
I'm just playing around with tool kits to retrieve and parse html from web pages and files that I already have on disk (such as JDK API documentation). Based on too little time, it looks like [http-kit 2.1.16] will retrieve but not parse html, and [enlive 1.1.5] will retrieve AND parse html. Or is there a whole built-in parse capability I'm missing in http-kiit? Also, http-kit doesn't seem to want to retrieve content from a file:/// url, whereas enlive is happy with both local and remote content. I'm just messing around, I wanted to have some REPL javadoc logic that didn't fire up a browser or use the swing app (whose fonts are unreadable for me, and half a day spent trying to change it was not fruitful). Any tips or suggestions? Just don't want to make sure I'm missing obvious things. Thanks! -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: HttpKit, Enlive (html retrieval and parsing)
I was using net.cgrand.enlive-html/html-resource and org.httpkit.client/get for the page retrievals. On Saturday, January 11, 2014 6:24:48 PM UTC-5, Dave Tenny wrote: I'm just playing around with tool kits to retrieve and parse html from web pages and files that I already have on disk (such as JDK API documentation). Based on too little time, it looks like [http-kit 2.1.16] will retrieve but not parse html, and [enlive 1.1.5] will retrieve AND parse html. Or is there a whole built-in parse capability I'm missing in http-kiit? Also, http-kit doesn't seem to want to retrieve content from a file:/// url, whereas enlive is happy with both local and remote content. I'm just messing around, I wanted to have some REPL javadoc logic that didn't fire up a browser or use the swing app (whose fonts are unreadable for me, and half a day spent trying to change it was not fruitful). Any tips or suggestions? Just don't want to make sure I'm missing obvious things. Thanks! -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: HttpKit, Enlive (html retrieval and parsing)
Java has HTTP retrieval built in. Clojure's core functions can use file or http URLs: user (slurp http://google.com;) user (slurp file:///etc/passwd) Parsing HTML on the other hand is a question of not just science but also art. Doesn't enlive use Tag Soup? -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: HttpKit, Enlive (html retrieval and parsing)
I don't recommend using java's built in HTTP retrieval (by passing java.net.URL object to enlive html-resource function). Not only is it significantly slower then using clj-http (which uses apache-http client under the hood), but it's also unreliable when issuing more parallel requests. Current enlive library supports plug-able parsers, the default one is TagSoup, but you can switch it very easily for example for JSoup by setting *parser* dynamic var. You can have a look at one of my little projects where i used enlive for html scraping herehttps://github.com/janherich/lazada-quest/blob/master/src/lazada_quest/scrapper.clj , in this case, i used clj-http as http client: (ns lazada-quest.scrapper (:require [clojure.string :as string] [clj-http.client :as client] [net.cgrand.enlive-html :as html])) (defn fetch-url Given some url string, fetch html content of the resource served under url adress and return it in the form of enlive nodes [url] (html/html-resource (:body (client/get url {:as :stream} It would be straightforward to replace use of clj-http with http-kit synchronous api, or asynchronous api with some changes Dňa nedeľa, 12. januára 2014 0:24:48 UTC+1 Dave Tenny napísal(-a): I'm just playing around with tool kits to retrieve and parse html from web pages and files that I already have on disk (such as JDK API documentation). Based on too little time, it looks like [http-kit 2.1.16] will retrieve but not parse html, and [enlive 1.1.5] will retrieve AND parse html. Or is there a whole built-in parse capability I'm missing in http-kiit? Also, http-kit doesn't seem to want to retrieve content from a file:/// url, whereas enlive is happy with both local and remote content. I'm just messing around, I wanted to have some REPL javadoc logic that didn't fire up a browser or use the swing app (whose fonts are unreadable for me, and half a day spent trying to change it was not fruitful). Any tips or suggestions? Just don't want to make sure I'm missing obvious things. Thanks! -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.