> On Feb 4, 2019, at 7:35 AM, fugee ohu <fugee...@gmail.com> wrote:
> 
> 
> 
> On Sunday, February 3, 2019 at 9:54:25 PM UTC-5, Walter Lee Davis wrote:
> 
> > On Feb 3, 2019, at 7:14 PM, fugee ohu <fuge...@gmail.com> wrote: 
> > 
> > 
> > 
> > On Wednesday, January 30, 2019 at 5:16:59 PM UTC-5, Colin Law wrote: 
> > On Wed, 30 Jan 2019 at 22:12, Colin Law <cla...@gmail.com> wrote: 
> > > 
> > > On Wed, 30 Jan 2019 at 22:09, fugee ohu <fuge...@gmail.com> wrote: 
> > > > 
> > > > 
> > > > 
> > > > On Wednesday, January 30, 2019 at 5:02:17 PM UTC-5, Colin Law wrote: 
> > > >> 
> > > >> On Wed, 30 Jan 2019 at 21:56, fugee ohu <fuge...@gmail.com> wrote: 
> > > >> > ... 
> > > >> > Everything in the unparsed resonse body that I want is between [ and 
> > > >> > ] I have to gsub it out 
> > > >> 
> > > >> 
> > > >> No you don't.  After you get parsed_obj["results] (which is an array, 
> > > >> that's what the [] mean) then you can get the first product by 
> > > >> parsed_obj["results"][0]["productId"] 
> > > >> It is just an array.  You have met ruby arrays haven't you? 
> > > >> 
> > > >> I am rapidly losing the will to live. 
> > > >> 
> > > >> Colin 
> > > > 
> > > > 
> > > > The response body isn't JSON.parse parsable as is it has to be gsub'd 
> > > > and chomped first before I can run JSON.parse My original gsub wasn't 
> > > > right it wasn't removing the end that follows ] 
> > > > JSON::ParserError: 784: unexpected token at 
> > > > 'myscript.js({"success":true,"code 
> > > 
> > > You previously posted that you had got parsed_obj where 
> > > parsed_obj["results]  was an array.  Go back to that. 
> > 
> > To quote your previous message 
> > 
> > >puts parsed_obj["results"]  shows the entire results but `puts 
> > >parsed_obj["results"]["productId"] gets me error no implicit 
> > > conversion of String into Integer 
> > 
> > The error is because it is an array, which is perfectly obvious if you 
> > look at the unparsed string. So if you use 
> > parsed_obj["results"][0] 
> > you will get the first element 
> > 
> > Colin 
> > 
> > There are scripts in the browser page source that pass a lot of useful 
> > values like this 
> > <script type="text/javascript"> 
> >                 if(!window.runParams) { 
> >                 window.runParams = {}; 
> >                 } 
> >                 window.runParams.minPrice="44.98"; 
> >                 window.runParams.maxPrice="44.98"; 
> >                ... 
> > And more within definitions in the same script like this 
> > var 
> > skuProducts=[{"skuAttr":"14:1052","skuPropIds":"1052","skuVal":{"actSkuCalPrice":"20.24","actSkuMultiCurrencyCalPrice":"20.24","actSkuMultiCurrencyDisplayPrice":"20.24","availQuantity":29,"inventory":30,"isActivity":true,"skuCalPrice":"44.98","skuMultiCurrencyCalPrice":"44.98","skuMultiCurrencyDisplayPrice":"44.98"}},{"skuAttr":"14:173","skuPropIds":"173","skuVal":{"actSkuCalPrice":"20.24","actSkuMultiCurrencyCalPrice":"20.24","actSkuMultiCurrencyDisplayPrice":"20.24","availQuantity":26,"inventory":30,"isActivity":true,"skuCalPrice":"44.98","skuMultiCurrencyCalPrice":"44.98","skuMultiCurrencyDisplayPrice":"44.98"}}];
> >  
> >                 var GaData = { 
> >         pageType: "product", 
> >         productIds: "en32837801078", 
> >         totalValue: "US $20.24" 
> >     }; 
> > 
> > Since it's in <script> containers in page source can I parse it? 
> 
> Since it's in a <script> tag, you can use Nokogiri or another HTML parser to 
> extract only that bit of the page. To be sure, you will have to do some work 
> on the script before you can access the parts you're interested in as JSON. 
> But JSON is the same whether it is being parsed by JavaScript or Ruby. You're 
> going to have to work out the best way to identify the parts you want. 
> There's no such thing as a JavaScript parser in Ruby, but if you can figure 
> out where to start, and how to get the offsets to trim your starting code, 
> the parts that look interesting above will be interesting to Ruby, too. 
> 
> I'm assuming you don't have control over this page, and that you are doing 
> some sort of scraping exercise here. So you'll need to have lots of tests 
> around whatever code you write, and keep checking often, because the owner of 
> this code may change its fundamental structure at a moment's notice. 
> 
> Walter 
> 
> 
> It's the 12th <script> on the page but doc.at_css("script:nth-child(12)") 
> returns nil

Are you sure that it's there in the original HTML, or is it being put there by 
a script after page load? Make sure that you are only looking at the original 
HTML, not the DOM. (Safari shows the HTML in the tab called Resources, and the 
DOM in the tab called Elements. If you're using a different browser, there may 
be a similar distinction with different names.) The DOM can change over the 
life of the page, in response to scripting. The HTML is fixed at the moment 
that the page is served to the client. Nokogiri and other HTML parsers can only 
read the HTML as served, not the DOM as mutated by a browser. 

You've already been around this tree a couple of times -- no, the answer is not 
to stand up a headless browser and read the DOM. The data is there, either in 
the HTML or the associated scripts served by the site. You just have to find it 
and isolate it from the rest of the visual page content.

Once you confirm that the data you want is there in the HTML, then instead of 
trying to just get the 12th script tag with css_at, use css('script').each to 
loop over all the scripts on the page, and see which one contains that target 
string. Once you figure out the correct offset, you can use the more targeted 
selector if you like, but because you don't own the page, you may want to 
continue using an enumerator to loop over the page contents -- that's likely to 
be more resilient in the face of change.

Walter

> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to rubyonrails-talk+unsubscr...@googlegroups.com.
> To post to this group, send email to rubyonrails-talk@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/rubyonrails-talk/86958eec-3d90-42e0-bfdb-30801a38a878%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rubyonrails-talk+unsubscr...@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/5A338FA0-0A9F-4A84-8E5B-0F4649EE2D03%40wdstudio.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to