On Monday, February 4, 2019 at 6:43:48 PM UTC-5, Walter Lee Davis wrote:
>
>
> > On Feb 4, 2019, at 12:27 PM, fugee ohu <fuge...@gmail.com <javascript:>> 
> wrote: 
> > 
> > 
> > 
> > On Monday, February 4, 2019 at 8:10:33 AM UTC-5, Walter Lee Davis wrote: 
> > 
> > > On Feb 4, 2019, at 7:35 AM, fugee ohu <fuge...@gmail.com> wrote: 
> > > 
> > > 
> > > 
> > > On Sunday, February 3, 2019 at 9:54:25 PM UTC-5, Walter Lee Davis 
> wrote: 
> > > 
> > > > On Feb 3, 2019, at 7:14 PM, fugee ohu <fuge...@gmail.com> wrote: 
> > > > 
> > > > 
> > > > 
> > > > On Wednesday, January 30, 2019 at 5:16:59 PM UTC-5, Colin Law wrote: 
> > > > On Wed, 30 Jan 2019 at 22:12, Colin Law <cla...@gmail.com> wrote: 
> > > > > 
> > > > > On Wed, 30 Jan 2019 at 22:09, fugee ohu <fuge...@gmail.com> 
> wrote: 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Wednesday, January 30, 2019 at 5:02:17 PM UTC-5, Colin Law 
> wrote: 
> > > > > >> 
> > > > > >> On Wed, 30 Jan 2019 at 21:56, fugee ohu <fuge...@gmail.com> 
> wrote: 
> > > > > >> > ... 
> > > > > >> > Everything in the unparsed resonse body that I want is 
> between [ and ] I have to gsub it out 
> > > > > >> 
> > > > > >> 
> > > > > >> No you don't.  After you get parsed_obj["results] (which is an 
> array, 
> > > > > >> that's what the [] mean) then you can get the first product by 
> > > > > >> parsed_obj["results"][0]["productId"] 
> > > > > >> It is just an array.  You have met ruby arrays haven't you? 
> > > > > >> 
> > > > > >> I am rapidly losing the will to live. 
> > > > > >> 
> > > > > >> Colin 
> > > > > > 
> > > > > > 
> > > > > > The response body isn't JSON.parse parsable as is it has to be 
> gsub'd and chomped first before I can run JSON.parse My original gsub 
> wasn't right it wasn't removing the end that follows ] 
> > > > > > JSON::ParserError: 784: unexpected token at 
> 'myscript.js({"success":true,"code 
> > > > > 
> > > > > You previously posted that you had got parsed_obj where 
> > > > > parsed_obj["results]  was an array.  Go back to that. 
> > > > 
> > > > To quote your previous message 
> > > > 
> > > > >puts parsed_obj["results"]  shows the entire results but `puts 
> parsed_obj["results"]["productId"] gets me error no implicit 
> > > > > conversion of String into Integer 
> > > > 
> > > > The error is because it is an array, which is perfectly obvious if 
> you 
> > > > look at the unparsed string. So if you use 
> > > > parsed_obj["results"][0] 
> > > > you will get the first element 
> > > > 
> > > > Colin 
> > > > 
> > > > There are scripts in the browser page source that pass a lot of 
> useful values like this 
> > > > <script type="text/javascript"> 
> > > >                 if(!window.runParams) { 
> > > >                 window.runParams = {}; 
> > > >                 } 
> > > >                 window.runParams.minPrice="44.98"; 
> > > >                 window.runParams.maxPrice="44.98"; 
> > > >                ... 
> > > > And more within definitions in the same script like this 
> > > > var 
> skuProducts=[{"skuAttr":"14:1052","skuPropIds":"1052","skuVal":{"actSkuCalPrice":"20.24","actSkuMultiCurrencyCalPrice":"20.24","actSkuMultiCurrencyDisplayPrice":"20.24","availQuantity":29,"inventory":30,"isActivity":true,"skuCalPrice":"44.98","skuMultiCurrencyCalPrice":"44.98","skuMultiCurrencyDisplayPrice":"44.98"}},{"skuAttr":"14:173","skuPropIds":"173","skuVal":{"actSkuCalPrice":"20.24","actSkuMultiCurrencyCalPrice":"20.24","actSkuMultiCurrencyDisplayPrice":"20.24","availQuantity":26,"inventory":30,"isActivity":true,"skuCalPrice":"44.98","skuMultiCurrencyCalPrice":"44.98","skuMultiCurrencyDisplayPrice":"44.98"}}];
>  
>
> > > >                 var GaData = { 
> > > >         pageType: "product", 
> > > >         productIds: "en32837801078", 
> > > >         totalValue: "US $20.24" 
> > > >     }; 
> > > > 
> > > > Since it's in <script> containers in page source can I parse it? 
> > > 
> > > Since it's in a <script> tag, you can use Nokogiri or another HTML 
> parser to extract only that bit of the page. To be sure, you will have to 
> do some work on the script before you can access the parts you're 
> interested in as JSON. But JSON is the same whether it is being parsed by 
> JavaScript or Ruby. You're going to have to work out the best way to 
> identify the parts you want. There's no such thing as a JavaScript parser 
> in Ruby, but if you can figure out where to start, and how to get the 
> offsets to trim your starting code, the parts that look interesting above 
> will be interesting to Ruby, too. 
> > > 
> > > I'm assuming you don't have control over this page, and that you are 
> doing some sort of scraping exercise here. So you'll need to have lots of 
> tests around whatever code you write, and keep checking often, because the 
> owner of this code may change its fundamental structure at a moment's 
> notice. 
> > > 
> > > Walter 
> > > 
> > > 
> > > It's the 12th <script> on the page but 
> doc.at_css("script:nth-child(12)") returns nil 
> > 
> > Are you sure that it's there in the original HTML, or is it being put 
> there by a script after page load? Make sure that you are only looking at 
> the original HTML, not the DOM. (Safari shows the HTML in the tab called 
> Resources, and the DOM in the tab called Elements. If you're using a 
> different browser, there may be a similar distinction with different 
> names.) The DOM can change over the life of the page, in response to 
> scripting. The HTML is fixed at the moment that the page is served to the 
> client. Nokogiri and other HTML parsers can only read the HTML as served, 
> not the DOM as mutated by a browser. 
> > 
> > You've already been around this tree a couple of times -- no, the answer 
> is not to stand up a headless browser and read the DOM. The data is there, 
> either in the HTML or the associated scripts served by the site. You just 
> have to find it and isolate it from the rest of the visual page content. 
> > 
> > Once you confirm that the data you want is there in the HTML, then 
> instead of trying to just get the 12th script tag with css_at, use 
> css('script').each to loop over all the scripts on the page, and see which 
> one contains that target string. Once you figure out the correct offset, 
> you can use the more targeted selector if you like, but because you don't 
> own the page, you may want to continue using an enumerator to loop over the 
> page contents -- that's likely to be more resilient in the face of change. 
> > 
> > Walter 
> > 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> Groups "Ruby on Rails: Talk" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send 
> an email to rubyonrails-ta...@googlegroups.com. 
> > > To post to this group, send email to rubyonra...@googlegroups.com. 
> > > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/rubyonrails-talk/86958eec-3d90-42e0-bfdb-30801a38a878%40googlegroups.com.
>  
>
> > > For more options, visit https://groups.google.com/d/optout. 
> > 
> >   
> > doc=Nokogiri::HTML.parse(browser.html) should return the html or 
> elements ? 
>
> This method returns a parsed document, which you can then explore using 
> Nokogiri's very thorough API. This will be similar to a DOM, but it will 
> only include the HTML part of the usual content tripod (HTML, CSS, and JS). 
> Therefore there will not be any modification of the DOM from JS after the 
> initial load. Any JS that was served as part of the HTML will be there in 
> the code as Nokogiri elements that you can get the text value from, and 
> then further treat to get parseable data. 
>
> It's important to note that the bits of the script that are not JSON will 
> not be parseable by JSON.parse. That method really expects that you pass it 
> a true JSON value. So where you see var skuProducts=[ bunch of objects ], 
> the content of those square brackets (and the square brackets) are actually 
> the JSON part. The variable assignment part (var skuProducts = ) is NOT 
> part of the JSON, and will just give you a parse error. 
>
> Repeat after me: "I will not try to execute JavaScript in Ruby". 
>
> https://gist.github.com/walterdavis/72874606cf44f3c9f1fb27c56ce87ad2 
>
> Once you successfully parse some JSON into a Ruby hash, you can then use 
> it as a data store, interrogate it, traverse it, etc. 
>
> Walter 
>
>
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Ruby on Rails: Talk" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to rubyonrails-ta...@googlegroups.com <javascript:>. 
> > To post to this group, send email to rubyonra...@googlegroups.com 
> <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/rubyonrails-talk/5dbb1150-0b08-4b3b-80ca-023b6c6f39f4%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
>
 That  found me the right script Can you tell me how to gsub out everything 
in the script from the beginning up to the first [{ and after the next }] 
There's a lot of stuff before and after

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rubyonrails-talk+unsubscr...@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/452ad38a-c0dc-49dd-90a7-6246e5415786%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to