Hey David.

Is this an open/public site? If it is, can you provide -- basic/simple
steps that a user has to do, to see/get what you're doing as a user using a
browser?

Thanks





On Fri, May 27, 2016 at 4:35 PM, David Fishburn <[email protected]>
wrote:

> I have been struggling with this one for quite some time, finally giving
> up and asking here.
>
> I have a page which uses an iframe, which is totally JS created (no URLs
> to create it, uses SAPUI5).
>
> The body, when I request the page is this:
>
> <body class="sapUiBody" role="application">
>     <div id="ctrRoot"></div>
> </body>
>
>
> First, JS executes and creates:
>
> <body class="sapUiBody" role="application" style="margin: 0px;">
>   <div id="ctrRoot" data-sap-ui-area="ctrRoot">
>     <div id="__shell0" data-sap-ui="__shell0" class="sapDkShell
> sapUiUx3Shell sapUiUx3ShellDesignStandard sapUiUx3ShellFullHeightContent
> sapUiUx3ShellHeadStandard sapUiUx3ShellNoContentPadding">
> ... Lots of crap here ...
>     </div>
>  </div>
> </body>
>
>
>
> Eventually, the following gets added in the ... Lots of crap here ....
> section with many nested <div> tags
>
>         <div id="demokitSplitter_secondPane" class=
> "sapUiVSplitterSecondPane" style="overflow: hidden; width: 79.7396%;">
>           <iframe id="content" name="content" src="about:blank"
> frameborder="0" onload="sap.ui.demokit.DemokitApp.getInstance().
> onContentLoaded();" data-sap-ui-preserve="content">
>           </iframe>
>         </div>
>
>
>
> This is the part that has the iframe.
>
> Eventually, the iframe is replaced with:
>
>         <div id="demokitSplitter_secondPane" class=
> "sapUiVSplitterSecondPane" style="overflow: hidden; width: 79.7396%;">
>           <iframe id="content" name="content" src="about:blank"
> frameborder="0" onload="sap.ui.demokit.DemokitApp.getInstance().
> onContentLoaded();" data-sap-ui-preserve="content">
>
>
> <html xml:lang="en" lang="en" data-highlight-query-terms="pending">
>     <body>
>         <div id="main">
>             <div id="content">
>                 <div class="full-description">
>                 </div>
>                 <div class="summary section">
>                     <div class="sectionItems">
>                         <div class="sectionItem itemName namespace static"
> >
>                             <b class="icon" title="Analysis Path
> Framework">
>                                 <a href="test.html">test</a>
>                             </b>
>                             <span class="description">Analysis Path
> Framework</span>
>                         </div>
>                         <div class="sectionItem itemName namespace static"
> >
>                             <b class="icon" title="Test2">
>                                 <a href="test.html">test2</a>
>                             </b>
>                             <span class="description">Test2</span>
>                         </div>
>                     </div>
>                 </div>
>             </div>
>         </div>
>     </body>
> </html>
>
>
>
>
>           </iframe>
>         </div>
>
>
> What I need to get access to:
>                     <div class="sectionItems">
>
>
> And cycle through all these:
>                         <div class="sectionItem itemName namespace static"
> >
>                         <div class="sectionItem itemName namespace static"
> >
>
>
> I can't seem to get my PhantomJS downloader to work.
>
> I have tried all the following attempts to try to wait to get that text:
>
>
>     def _response(self, _, driver, spider):
>         print 'PhantomJSDownloadHandler _response writing first.html,
> possibly empty html (due to AJAX) %s' %(time.asctime( time.localtime(time.
> time()) ))
>         target = codecs.open('first.html', 'w', "utf-8")
>         target.truncate()
>         target.write(driver.page_source)
>         target.close()
>
>
>         try:            print 'PhantomJSDownloadHandler waiting for
> sectionTitles %s' %(time.asctime( time.localtime(time.time()) ))
>             max_time_to_wait_sec = 20
>             time_between_polls_milli = 2
>
>
>             #element = WebDriverWait(driver, max_time_to_wait_sec,
> time_between_polls_milli).until(EC.presence_of_element_located((By.CLASS_NAME,
> "sectionItems")))
>             #element = WebDriverWait(driver,
> max_time_to_wait_sec).until(EC.presence_of_element_located((By.CLASS_NAME,
> "sapUiVSplitterSecondPane")))
>             #element =
> self.driver.find_elements_by_xpath('//div[@class="sectionItems"]')
>             #element = self.driver.find_elements_by_xpath('//iframe')
>
> #WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.visibility_of(element))
>
> #WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.frame_to_be_available_and_switch_to_it(By.id("content")))
>             WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.
> frame_to_be_available_and_switch_to_it((By.id, "content")))
>             
> #WebDriverWait(self.driver,20,poll_frequency=.2).until(EC.visibility_of_element_located(By.CLASS_NAME,
> "sectionItems"))
>
>
> Some of the posts on stackoverflow talk about this:
>
> http://stackoverflow.com/questions/25057174/scrapy-crawl-in-order
>
>
>
> def parse(self, response):
>     for link in response.xpath("//article/a/@href").extract():
>         yield Request(link, callback=self.parse_page, meta={'link':link})
>
>
> def parse_page(self, response):
>     for frame in response.xpath("//iframe").extract():
>         item = MyItem()
>         item['link'] = response.meta['link']
>         item['frame'] = frame
>         yield item
>
>
>
>
>
> But this looks like it is trying to fetch a link (URL) but my iframe does
> it via a JS function, not a URL.
>
>
>
> Now, assuming someone can actually help me with the downloader, so it can
> wait until the sectionItems div is available.
>
> Then in Scrapy, I need to iterate through those results.  I have this code
> written:
>
>
> # Working, finds first SectionsItems
>
>
> print 'checking for <div class="sectionItems">'sectionItems = namespace.
> xpath(".//div[@class='summary section']/div[@class='sectionItems']")
> #sections = hxs.xpath("//div[@class='sectionItem']")
> #sections = hxs.xpath("//div[contains(@class, 'sectionItem itemName
> namespace static')]")
> #sections = hxs.xpath("//<div class="sectionTitle">Namespaces &amp;
> Classes</div>/div[@class='sectionItems']")
>
>
> print 'xpath SectionItems:%s' %sectionItems
>
>
> for sectionItem in sectionItems:
>     print 'Found SectionItem:'
>     #sections = sectionItem.xpath("div[@class='sectionItem']")
>     sections = sectionItem.xpath("div[re:test(@class, 'sectionItem')]")
>     #sections = sectionItem.xpath("div[re:test(@class, 'sectionItem
> itemName namespace static')]")
>
>
>     for section in sections:
>         print 'Found Section:%s' %(section.extract())
>
>
>
>
> Any help is greatly appreciated.
> David
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to