My goal: I want to create something similar to the phone guide. In this page exist a great number of documents in format pdf. So, I want to unite the different documents and can filter for name, last name, location. If I make this manually I have to open each document, download the document and search the name of the person manually.
The documents uploaded in this page are different: year, date. They contain different information. On Wed, Jul 20, 2022 at 7:04 PM Mike <[email protected]> wrote: > > I'm going to be traveling, so will not be able to help much > in the next 2 days. > > That is a PDF file you supplied. Is it fair to say you want to > be able to search for all the names listed in a text file and be > able to print out which file contains which name. And in some > cases the name will not be in any of the files? Is that the goal? > > Define your goal and we will help you. > > > The file below is a bit old, but maybe it works for your > PDF files. I have not tested it on your url. I gather > you don't have HTML tables, so maybe it is not for your case. > > > Mike > > > #!/usr/bin/perl -w > # > # > # This program writes the results of the webpage listed in line 17 > # to $outfile. So basically it converts HTML to text. > # It works reasonably well with HTML tables. > # > # > > #!/usr/bin/perl > use strict; > use warnings; > use LWP::UserAgent; > use HTML::FormatText::WithLinks::AndTables; > > > my $page = 'http://www.mflan.com/crime.htm'; > > my $outfile = 'output.txt'; > > chdir '/home/mike/Documents/copy'; > > open OUT, ">>$outfile" or die "Can't open '$outfile': $!"; > > my ($sl, $request, $response, $html); > > $sl = LWP::UserAgent->new; > > > $sl->proxy('http', ''); # enter proxy if needs be / and set it for Soap > too ... > $request = HTTP::Request->new('GET', $page); > $response = $sl->request($request); > $html = $response->as_string; > > print "Got it into \$html.\n"; > > > > my $text = HTML::FormatText::WithLinks::AndTables->convert($html); > > > print OUT "$text"; > > print "\nAll done.\n"; > > close OUT; > > > __END__ > > > > > On 7/20/22 10:13, William Torrez Corea wrote: > > The url of the page: > > > > https://www.pgr.gob.ni/PDF/2021/GACETA/GACETA_17_08_2021.pdf > > > > On 7/20/22, William Torrez Corea <[email protected]> wrote: > >> Exist a page where you put info about the person but if you want to > search > >> a name you must search this manually. So, I want to automate this > process > >> with perl. > >> -- > >> > >> With kindest regards, William. > >> > >> ⢀⣴⠾⠻⢶⣦⠀ > >> ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system > >> ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org > >> ⠈⠳⣄⠀⠀⠀⠀ > >> > > > > -- With kindest regards, William. ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org ⠈⠳⣄⠀⠀⠀⠀
