On Saturday, November 10, 2018 at 10:35:03 AM UTC-5, Walter Lee Davis wrote:
>
>
> > On Nov 9, 2018, at 6:22 PM, fugee ohu <fuge...@gmail.com <javascript:>> 
> wrote: 
> > 
> > 
> > 
> > On Wednesday, November 7, 2018 at 12:28:05 PM UTC-5, Jake Niemiec wrote: 
> > The ui-box class would indicate that it is a react component: 
> https://github.com/segmentio/ui-box 
> > 
> > React components are run client-side, meaning the text you are looking 
> for is inserted into the document after the page runs <script> tags. I 
> would take a look at the Sources tab in chrome, you can find all the loaded 
> scripts there. 
> > 
> > On Wed, Nov 7, 2018 at 10:17 AM fugee ohu <fuge...@gmail.com> wrote: 
> > 
> > 
> > On Wednesday, November 7, 2018 at 11:01:32 AM UTC-5, Colin Law wrote: 
> > I should think that javascript is involved.  I am sure you asked a 
> > similar question before when you were trying to scrape a website and 
> > couldn't find the text in the html. 
> > 
> > Colin 
> > On Wed, 7 Nov 2018 at 15:35, fugee ohu <fuge...@gmail.com> wrote: 
> > > 
> > > I'm not very good with the consoles in chrome and firefox but I 
> couldn't find the text I was looking for in source even though it's 
> displayed as text seemingly, the cursur changes to a vertical line on 
> mouse-over I found this html below in the source How does this html create 
> the text that displays? 
> > > 
> > >    <div class="ui-box product-description-main" 
> id="j-product-description"> 
> > >         <div class="ui-box-title">Product Description</div> 
> > >         <div class="ui-box-body"> 
> > > 
> > >             <div class="description-content" data-role="description" 
> data-spm="1000023"> 
> > >             <div class="loading32"></div> 
> > >             </div> 
> > > 
> > >         </div> 
> > >     </div> 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> Groups "Ruby on Rails: Talk" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send 
> an email to rubyonrails-ta...@googlegroups.com. 
> > > To post to this group, send email to rubyonra...@googlegroups.com. 
> > > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
>  
>
> > > For more options, visit https://groups.google.com/d/optout. 
> > 
> >  Yes, within that context, javascript, how does it happen that the text 
> I'm viewing in the browser isn't visible in source? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Ruby on Rails: Talk" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to rubyonrails-ta...@googlegroups.com. 
> > To post to this group, send email to rubyonra...@googlegroups.com. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/rubyonrails-talk/12b65225-60e5-4fe3-80a7-9ebb8013f312%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > So far I'm trying to get up to the table, the last element shown below   
> doc.at_css("div#j-product-description div.ui-box-body 
> div.description-content") gets me back the div class="description-content 
> element but  doc.at_css("div#j-product-description div.ui-box-body 
> div.description-content div.origin-part") returns nil There's a lot inside 
> kde:widget that I'm not including here 
> > 
> > <div class="ui-box product-description-main" id="j-product-description" 
> data-widget-cid="widget-27"> 
> >         <div class="ui-box-title">Product Description</div> 
> >         <div class="ui-box-body"> 
> > <div class="description-content" data-role="description" 
> data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> &nbsp; </p> 
> > <kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP" 
> type="relation">...</kse:widget> 
> > <table border="2"> 
> > 
>
> It seems to me that you are going to have to identify the data source that 
> the in-page JavaScript is using to generate the dynamic table data, and 
> query that rather than trying to work everything out from the HTML (which 
> is just a template for the in-page script to fill). There's probably a JSON 
> URL somewhere that is being loaded into the page, and the script is 
> building from that. This entire approach is pretty fraught with peril, 
> though, because (like any scraping project, only more so) any change to the 
> scheme that the site's developer chooses to implement will break your 
> scraper immediately. 
>
> Following this path is going to force you to learn about how the site is 
> working on a code level -- and to figure out how they go from data to 
> presentation. 
>
> Another approach might be to use a headless browser on the server to 
> construct a "real" DOM of the page, and query that. To be clear -- I do not 
> recommend you follow this path -- I am noting it here to illustrate how 
> ridiculous this effort will be. 
>
> One way to visualize this difference is to use the Web Inspector in Safari 
> or Chrome to look at the differences between the raw HTML (Safari labels 
> this tab "Resources") and the DOM (Safari calls this "Elements"). There is 
> likely very little in common outside of the overall outline, if the page is 
> changing as dramatically as you describe. If you hunt through the Resources 
> tab (in Safari) you may find a link to a JSON file that is being required 
> into the page. Loading that URL, rather than the HTML, may give you a much 
> cleaner set of data (which you can parse directly using Ruby) rather than 
> trying to execute JS on your server in order to construct an HTML DOM that 
> you can parse with Nokogiri. 
>
> Walter 
>
>
It wasn't shown in source but when I expanded the element recursively in 
chrome developer tools I saw the text I was looking for So, what's that 
gonna be worth?

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rubyonrails-talk+unsubscr...@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/4d5c228f-5252-46b4-9ab0-72257d754ead%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to