On Monday, February 4, 2019 at 6:43:48 PM UTC-5, Walter Lee Davis wrote: > > > > On Feb 4, 2019, at 12:27 PM, fugee ohu <fuge...@gmail.com <javascript:>> > wrote: > > > > > > > > On Monday, February 4, 2019 at 8:10:33 AM UTC-5, Walter Lee Davis wrote: > > > > > On Feb 4, 2019, at 7:35 AM, fugee ohu <fuge...@gmail.com> wrote: > > > > > > > > > > > > On Sunday, February 3, 2019 at 9:54:25 PM UTC-5, Walter Lee Davis > wrote: > > > > > > > On Feb 3, 2019, at 7:14 PM, fugee ohu <fuge...@gmail.com> wrote: > > > > > > > > > > > > > > > > On Wednesday, January 30, 2019 at 5:16:59 PM UTC-5, Colin Law wrote: > > > > On Wed, 30 Jan 2019 at 22:12, Colin Law <cla...@gmail.com> wrote: > > > > > > > > > > On Wed, 30 Jan 2019 at 22:09, fugee ohu <fuge...@gmail.com> > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Wednesday, January 30, 2019 at 5:02:17 PM UTC-5, Colin Law > wrote: > > > > > >> > > > > > >> On Wed, 30 Jan 2019 at 21:56, fugee ohu <fuge...@gmail.com> > wrote: > > > > > >> > ... > > > > > >> > Everything in the unparsed resonse body that I want is > between [ and ] I have to gsub it out > > > > > >> > > > > > >> > > > > > >> No you don't. After you get parsed_obj["results] (which is an > array, > > > > > >> that's what the [] mean) then you can get the first product by > > > > > >> parsed_obj["results"][0]["productId"] > > > > > >> It is just an array. You have met ruby arrays haven't you? > > > > > >> > > > > > >> I am rapidly losing the will to live. > > > > > >> > > > > > >> Colin > > > > > > > > > > > > > > > > > > The response body isn't JSON.parse parsable as is it has to be > gsub'd and chomped first before I can run JSON.parse My original gsub > wasn't right it wasn't removing the end that follows ] > > > > > > JSON::ParserError: 784: unexpected token at > 'myscript.js({"success":true,"code > > > > > > > > > > You previously posted that you had got parsed_obj where > > > > > parsed_obj["results] was an array. Go back to that. > > > > > > > > To quote your previous message > > > > > > > > >puts parsed_obj["results"] shows the entire results but `puts > parsed_obj["results"]["productId"] gets me error no implicit > > > > > conversion of String into Integer > > > > > > > > The error is because it is an array, which is perfectly obvious if > you > > > > look at the unparsed string. So if you use > > > > parsed_obj["results"][0] > > > > you will get the first element > > > > > > > > Colin > > > > > > > > There are scripts in the browser page source that pass a lot of > useful values like this > > > > <script type="text/javascript"> > > > > if(!window.runParams) { > > > > window.runParams = {}; > > > > } > > > > window.runParams.minPrice="44.98"; > > > > window.runParams.maxPrice="44.98"; > > > > ... > > > > And more within definitions in the same script like this > > > > var > skuProducts=[{"skuAttr":"14:1052","skuPropIds":"1052","skuVal":{"actSkuCalPrice":"20.24","actSkuMultiCurrencyCalPrice":"20.24","actSkuMultiCurrencyDisplayPrice":"20.24","availQuantity":29,"inventory":30,"isActivity":true,"skuCalPrice":"44.98","skuMultiCurrencyCalPrice":"44.98","skuMultiCurrencyDisplayPrice":"44.98"}},{"skuAttr":"14:173","skuPropIds":"173","skuVal":{"actSkuCalPrice":"20.24","actSkuMultiCurrencyCalPrice":"20.24","actSkuMultiCurrencyDisplayPrice":"20.24","availQuantity":26,"inventory":30,"isActivity":true,"skuCalPrice":"44.98","skuMultiCurrencyCalPrice":"44.98","skuMultiCurrencyDisplayPrice":"44.98"}}]; > > > > > > var GaData = { > > > > pageType: "product", > > > > productIds: "en32837801078", > > > > totalValue: "US $20.24" > > > > }; > > > > > > > > Since it's in <script> containers in page source can I parse it? > > > > > > Since it's in a <script> tag, you can use Nokogiri or another HTML > parser to extract only that bit of the page. To be sure, you will have to > do some work on the script before you can access the parts you're > interested in as JSON. But JSON is the same whether it is being parsed by > JavaScript or Ruby. You're going to have to work out the best way to > identify the parts you want. There's no such thing as a JavaScript parser > in Ruby, but if you can figure out where to start, and how to get the > offsets to trim your starting code, the parts that look interesting above > will be interesting to Ruby, too. > > > > > > I'm assuming you don't have control over this page, and that you are > doing some sort of scraping exercise here. So you'll need to have lots of > tests around whatever code you write, and keep checking often, because the > owner of this code may change its fundamental structure at a moment's > notice. > > > > > > Walter > > > > > > > > > It's the 12th <script> on the page but > doc.at_css("script:nth-child(12)") returns nil > > > > Are you sure that it's there in the original HTML, or is it being put > there by a script after page load? Make sure that you are only looking at > the original HTML, not the DOM. (Safari shows the HTML in the tab called > Resources, and the DOM in the tab called Elements. If you're using a > different browser, there may be a similar distinction with different > names.) The DOM can change over the life of the page, in response to > scripting. The HTML is fixed at the moment that the page is served to the > client. Nokogiri and other HTML parsers can only read the HTML as served, > not the DOM as mutated by a browser. > > > > You've already been around this tree a couple of times -- no, the answer > is not to stand up a headless browser and read the DOM. The data is there, > either in the HTML or the associated scripts served by the site. You just > have to find it and isolate it from the rest of the visual page content. > > > > Once you confirm that the data you want is there in the HTML, then > instead of trying to just get the 12th script tag with css_at, use > css('script').each to loop over all the scripts on the page, and see which > one contains that target string. Once you figure out the correct offset, > you can use the more targeted selector if you like, but because you don't > own the page, you may want to continue using an enumerator to loop over the > page contents -- that's likely to be more resilient in the face of change. > > > > Walter > > > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to rubyonrails-ta...@googlegroups.com. > > > To post to this group, send email to rubyonra...@googlegroups.com. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/86958eec-3d90-42e0-bfdb-30801a38a878%40googlegroups.com. > > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > doc=Nokogiri::HTML.parse(browser.html) should return the html or > elements ? > > This method returns a parsed document, which you can then explore using > Nokogiri's very thorough API. This will be similar to a DOM, but it will > only include the HTML part of the usual content tripod (HTML, CSS, and JS). > Therefore there will not be any modification of the DOM from JS after the > initial load. Any JS that was served as part of the HTML will be there in > the code as Nokogiri elements that you can get the text value from, and > then further treat to get parseable data. > > It's important to note that the bits of the script that are not JSON will > not be parseable by JSON.parse. That method really expects that you pass it > a true JSON value. So where you see var skuProducts=[ bunch of objects ], > the content of those square brackets (and the square brackets) are actually > the JSON part. The variable assignment part (var skuProducts = ) is NOT > part of the JSON, and will just give you a parse error. > > Repeat after me: "I will not try to execute JavaScript in Ruby". > > https://gist.github.com/walterdavis/72874606cf44f3c9f1fb27c56ce87ad2 > > Once you successfully parse some JSON into a Ruby hash, you can then use > it as a data store, interrogate it, traverse it, etc. > > Walter > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to rubyonrails-ta...@googlegroups.com <javascript:>. > > To post to this group, send email to rubyonra...@googlegroups.com > <javascript:>. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/rubyonrails-talk/5dbb1150-0b08-4b3b-80ca-023b6c6f39f4%40googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > > That found me the right script Can you tell me how to gsub out everything in the script from the beginning up to the first [{ and after the next }] There's a lot of stuff before and after
-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscr...@googlegroups.com. To post to this group, send email to rubyonrails-talk@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/452ad38a-c0dc-49dd-90a7-6246e5415786%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.