On May 12, 4:59 pm, 7stud <[EMAIL PROTECTED]> wrote: > On May 12, 1:54 pm, Jetus <[EMAIL PROTECTED]> wrote: > > > I am able to download this page (enclosed code), but I then want to > > download a pdf file that I can view in a regular browser by clicking > > on the "view" link. I don't know how to automate this next part of my > > script. It seems like it uses Javascript. > > The line in the page source says > > > href="javascript:openimagewin('JCCOGetImage.jsp? > > refnum=DN2007036179');" tabindex=-1> > > 1) Use BeautifulSoup to extract the path: > > JCCOGetImage.jsp?refnum=DN2007036179 > > from the html page. >
BeautifulSoup will allow you to locate and extract the href attribute: javascript:openimagewin('JCCOGetImage.jsp?refnum=DN2007036179'); See: "The attributes of Tags" in the BS docs. Then you can use string functions(preferable) or a regex to get everything between the parentheses(remove the quotes around the path, too) -- http://mail.python.org/mailman/listinfo/python-list