Perhaps someone might know of process to work. College has a campus directory that I used wget2 to download the 6 web pages in about 6 seconds to get raw data. Worked great until other day.
wget2 --restrict-file-names=windows --max-threads=02 -4 --secure-protocol=PFS -q --base=\"https://ssb-prod.ec.guamcc.edu/PROD/\" -i testlist"; Now it just returns No valid route Same with old link to page via browser. https://ssb-prod.ec.guamcc.edu/PROD/bwpkedir.P_Display Directory There is a link that does still work, that seems to show same data. https://employeessb-prod.ec.guamcc.edu/EmployeeSelfSer vice/ssb/campusDirectory#/lastName But using pages with wget2 download page, but is bunch of java script and not of the data? 1309996 Oct 10 15:07 campusDirectory Note: That is only the 1st page of data. Also, tried saving page in browser and get similar info with no data. Only process that works is to go to each of the pages, and use the print to PDF files. 57249 Oct 10 14:30 Campus Directory2.pdf 57310 Oct 10 14:30 Campus Directory3.pdf 57905 Oct 10 14:31 Campus Directory4.pdf 57981 Oct 10 14:31 Campus Directory5.pdf 40142 Oct 10 14:32 Campus Directory6.pdf 59397 Oct 10 14:30 Campus Directory.pdf Then can use pdftotext to convert to text files that have raw data. Old process with wget2 took total of about 6 seconds to download. Manually printing th 6 pages and printing to pdf, then converting to txt, and pulling the usueful data takes about 3 minutes 30 seconds. The process I do with the University of Guam still works fine with wget2, but they have 69 web pages and takes about 20 to 40 seconds. Thanks if anyone has an ideal. +------------------------------------------------------------+ Michael D. Setzer II - Computer Science Instructor (Retired) mailto:[email protected] mailto:[email protected] mailto:[email protected] Guam - Where America's Day Begins G4L Disk Imaging Project maintainer http://sourceforge.net/projects/g4l/ +------------------------------------------------------------+
