On May 12, 6:59 pm, 7stud <[EMAIL PROTECTED]> wrote: > On May 12, 1:54 pm, Jetus <[EMAIL PROTECTED]> wrote: > > > I am able to download this page (enclosed code), but I then want to > > download a pdf file that I can view in a regular browser by clicking > > on the "view" link. I don't know how to automate this next part of my > > script. It seems like it uses Javascript. > > The line in the page source says > > > href="javascript:openimagewin('JCCOGetImage.jsp? > > refnum=DN2007036179');" tabindex=-1> > > 1) Use BeautifulSoup to extract the path: > > JCCOGetImage.jsp?refnum=DN2007036179 > > from the html page. > > 2) The path is relative to the current url, so if the current url is: > > http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp > > Then the url to the page you want is: > > http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2... > > You can use urlparse.urljoin() to join a relative path to the current > url: > > import urlparse > > base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp' > relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179' > > target_url = urlparse.urljoin(base_url, relative_url) > print target_url > > --output:--http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2... > > 3) Python has a webbrowser module that allows you to open urls in a > browser: > > import webbrowser > > webbrowser.open("www.google.com") > > You could also use system() or os.startfile()[Windows], to do the same > thing: > > os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe') > > #You don't have to worry about directory names > #with spaces in them if you use startfile(): > os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe') > > All the urls you posted give me errors when I try to open them in a > browser, so you will have to sort out those problems first.
7Stud; Thanks for sharing your knowledge!! 1)The proper url to the website is http://www.landrecords.jcc.ky.gov/records/S0Search.html. 2) The join won't work. I found that the request it sends is http://206.196.0.195/cgi-bin/webview/SEND2.PGM?dispfmt=&itype=Q&authorization=&parm2=SDAAAA76070B It looks like it generates a random code for param2... I have two choices for generating this javascript, I can click on the View, or in the form, if I put a "i" in the code and click on the option link, it will send me pdf file. 3) Was not sure why you suggested I use the Webbrowser module? But I am glad to find out about it. -- http://mail.python.org/mailman/listinfo/python-list