On Saturday, November 10, 2018 at 10:35:03 AM UTC-5, Walter Lee Davis wrote: > > > > On Nov 9, 2018, at 6:22 PM, fugee ohu <fuge...@gmail.com <javascript:>> > wrote: > > > > > > > > On Wednesday, November 7, 2018 at 12:28:05 PM UTC-5, Jake Niemiec wrote: > > The ui-box class would indicate that it is a react component: > https://github.com/segmentio/ui-box > > > > React components are run client-side, meaning the text you are looking > for is inserted into the document after the page runs <script> tags. I > would take a look at the Sources tab in chrome, you can find all the loaded > scripts there. > > > > On Wed, Nov 7, 2018 at 10:17 AM fugee ohu <fuge...@gmail.com> wrote: > > > > > > On Wednesday, November 7, 2018 at 11:01:32 AM UTC-5, Colin Law wrote: > > I should think that javascript is involved. I am sure you asked a > > similar question before when you were trying to scrape a website and > > couldn't find the text in the html. > > > > Colin > > On Wed, 7 Nov 2018 at 15:35, fugee ohu <fuge...@gmail.com> wrote: > > > > > > I'm not very good with the consoles in chrome and firefox but I > couldn't find the text I was looking for in source even though it's > displayed as text seemingly, the cursur changes to a vertical line on > mouse-over I found this html below in the source How does this html create > the text that displays? > > > > > > <div class="ui-box product-description-main" > id="j-product-description"> > > > <div class="ui-box-title">Product Description</div> > > > <div class="ui-box-body"> > > > > > > <div class="description-content" data-role="description" > data-spm="1000023"> > > > <div class="loading32"></div> > > > </div> > > > > > > </div> > > > </div> > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to rubyonrails-ta...@googlegroups.com. > > > To post to this group, send email to rubyonra...@googlegroups.com. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com. > > > > > For more options, visit https://groups.google.com/d/optout. > > > > Yes, within that context, javascript, how does it happen that the text > I'm viewing in the browser isn't visible in source? > > > > -- > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to rubyonrails-ta...@googlegroups.com. > > To post to this group, send email to rubyonra...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > > > > So far I'm trying to get up to the table, the last element shown below > doc.at_css("div#j-product-description div.ui-box-body > div.description-content") gets me back the div class="description-content > element but doc.at_css("div#j-product-description div.ui-box-body > div.description-content div.origin-part") returns nil There's a lot inside > kde:widget that I'm not including here > > > > <div class="ui-box product-description-main" id="j-product-description" > data-widget-cid="widget-27"> > > <div class="ui-box-title">Product Description</div> > > <div class="ui-box-body"> > > <div class="description-content" data-role="description" > data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> </p> > > <kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP" > type="relation">...</kse:widget> > > <table border="2"> > > > > It seems to me that you are going to have to identify the data source that > the in-page JavaScript is using to generate the dynamic table data, and > query that rather than trying to work everything out from the HTML (which > is just a template for the in-page script to fill). There's probably a JSON > URL somewhere that is being loaded into the page, and the script is > building from that. This entire approach is pretty fraught with peril, > though, because (like any scraping project, only more so) any change to the > scheme that the site's developer chooses to implement will break your > scraper immediately. > > Following this path is going to force you to learn about how the site is > working on a code level -- and to figure out how they go from data to > presentation. > > Another approach might be to use a headless browser on the server to > construct a "real" DOM of the page, and query that. To be clear -- I do not > recommend you follow this path -- I am noting it here to illustrate how > ridiculous this effort will be. > > One way to visualize this difference is to use the Web Inspector in Safari > or Chrome to look at the differences between the raw HTML (Safari labels > this tab "Resources") and the DOM (Safari calls this "Elements"). There is > likely very little in common outside of the overall outline, if the page is > changing as dramatically as you describe. If you hunt through the Resources > tab (in Safari) you may find a link to a JSON file that is being required > into the page. Loading that URL, rather than the HTML, may give you a much > cleaner set of data (which you can parse directly using Ruby) rather than > trying to execute JS on your server in order to construct an HTML DOM that > you can parse with Nokogiri. > > Walter > > It wasn't shown in source but when I expanded the element recursively in chrome developer tools I saw the text I was looking for So, what's that gonna be worth?
-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscr...@googlegroups.com. To post to this group, send email to rubyonrails-talk@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/4d5c228f-5252-46b4-9ab0-72257d754ead%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.