Perhaps someone might know of process to work.

College has a campus directory that I used wget2 to 
download the 6 web pages in about 6 seconds to get raw 
data. Worked great until other day.

wget2 --restrict-file-names=windows --max-threads=02 -4 
--secure-protocol=PFS -q 
--base=\"https://ssb-prod.ec.guamcc.edu/PROD/\"; -i 
testlist";

Now it just returns No valid route
Same with old link to page via browser.
https://ssb-prod.ec.guamcc.edu/PROD/bwpkedir.P_Display
Directory

There is a link that does still work, that seems to show 
same data.
https://employeessb-prod.ec.guamcc.edu/EmployeeSelfSer
vice/ssb/campusDirectory#/lastName

But using pages with wget2 download page, but is bunch 
of java script and not of the data?
1309996 Oct 10 15:07 campusDirectory
Note: That is only the 1st page of data.

Also, tried saving page in browser and get similar info 
with no data. Only process that works is to go to each of 
the pages, and use the print to PDF files.
 57249 Oct 10 14:30 Campus Directory2.pdf
 57310 Oct 10 14:30 Campus Directory3.pdf
 57905 Oct 10 14:31 Campus Directory4.pdf
 57981 Oct 10 14:31 Campus Directory5.pdf
 40142 Oct 10 14:32 Campus Directory6.pdf
 59397 Oct 10 14:30 Campus Directory.pdf

Then can use pdftotext to convert to text files that have 
raw data.

Old process with wget2 took total of about 6 seconds to 
download.
Manually printing th 6 pages and printing to pdf, then 
converting to txt, and pulling the usueful data takes about 
3 minutes 30 seconds.

The process I do with the University of Guam still works 
fine with wget2, but they have 69 web pages and takes 
about 20 to 40 seconds.

Thanks if anyone has an ideal.


+------------------------------------------------------------+
 Michael D. Setzer II - Computer Science Instructor 
(Retired)     
 mailto:[email protected]                            
 mailto:[email protected]
 mailto:[email protected]
 Guam - Where America's Day Begins                        
 G4L Disk Imaging Project maintainer 
 http://sourceforge.net/projects/g4l/
+------------------------------------------------------------+




Reply via email to