Second the suggestion for JSoup. Alternatively you could use BeautifulSoup. 
If you want the data in Julia in process, you can call either of these 
packages from Julia using JavaCall or PyCall respectively. 

Regards
-
Avik

On Thursday, 5 June 2014 11:18:48 UTC+1, Yuuki Soho wrote:
>
> So, you want to parse a web page and get the content out of it.
>
> I'm not sure there's a very good way of doing it currently, because html 
> pages are often messy ( 
> http://programmers.stackexchange.com/questions/151739/getting-data-from-a-webpage-in-a-stable-and-efficient-way
>  
> ). What you want is some kind of html parser like jsoup. I don't any are 
> available in julia right know. You could try with an xml parser (
> https://github.com/lindahua/LightXML.jl) but I'm not sure it that will 
> work very well.
>
> Otherwise you can go take the dirty road and use regular expressions to 
> extract what you want.
>
>>

Reply via email to