On Wed, Oct 31, 2012 at 6:20 AM, Soichi Ishida <[email protected]> wrote:
> Rails 1.9.3
>
> For http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A
> I would like to make a list of airport names using Nokogiri.
>
> The following code seems to work but it does not insert "\n" as I wish.
>
> Can you tell me why?
>
>
>
> require 'open-uri'
> require 'nokogiri'
>
> test_url =
> "http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A"
>
> url_list_file = "list_page_url.txt"
> test_xpath = "//tr"
> output_file = "list_airport_names_wiki_url.txt"
>
> test = Nokogiri::HTML(open(test_url))
> File.open(output_file, "a") {|f|
> test.xpath(test_xpath).each do |e|
> f.write e.xpath("//td[3]/a").text + "\n" #### HERE!!! ####
> end
> }
First of all the XPath looks suspicious: you certainly want only "td"
elements nested below the current "tr". So you should use any of
td[3]/a
.//td[3]/a
Otherwise the first selection is useless because //td[3]/a will select
all "a" children of the third "td" in the document. Also, e.xpath
will return a NodeSet which, when converted via #text, will lead to
surprising results:
irb(main):026:0> puts dom
<?xml version="1.0"?>
<table>
<td>abc</td>
</table>
=> nil
irb(main):027:0> dom.xpath('//*')
=> [#<Nokogiri::XML::Element:0x..fc00768e6 name="table"
children=[#<Nokogiri::XML::Element:0x..fc00766d4 name="td"
children=[#<Nokogiri::XML::Text:0x..fc0076526 "abc">]>]>,
#<Nokogiri::XML::Element:0x..fc00766d4 name="td"
children=[#<Nokogiri::XML::Text:0x..fc0076526 "abc">]>]
irb(main):028:0> dom.xpath('//*').text
=> "abcabc"
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
-- You received this message because you are subscribed to the Google Groups
ruby-talk-google group. To post to this group, send email to
[email protected]. To unsubscribe from this group, send email
to [email protected]. For more options, visit this
group at https://groups.google.com/d/forum/ruby-talk-google?hl=en