On May 12, 4:59 pm, 7stud <[EMAIL PROTECTED]> wrote:
> On May 12, 1:54 pm, Jetus <[EMAIL PROTECTED]> wrote:
>
> > I am able to download this page (enclosed code), but I then want to
> > download a pdf file that I can view in a regular browser by clicking
> > on the "view" link. I don't know how to automate this next part of my
> > script. It seems like it uses Javascript.
> > The line in the page source says
>
> > href="javascript:openimagewin('JCCOGetImage.jsp?
> > refnum=DN2007036179');" tabindex=-1>
>
> 1) Use BeautifulSoup to extract the path:
>
> JCCOGetImage.jsp?refnum=DN2007036179
>
> from the html page.
>

BeautifulSoup will allow you to locate and extract the href attribute:

javascript:openimagewin('JCCOGetImage.jsp?refnum=DN2007036179');

See: "The attributes of Tags" in the BS docs.

Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to