Re: Checking if a website is up doesn't work correctly
On Sat, 17 Feb 2018 18:21:26 +0100 Manfred Lotz wrote: > Thanks. The attached program does better as https://notabug.org > works. Only http://scripts.sil.org doesn't work. It seems there are > special checks active on that site. Yeah, some sites block user-agents recognised as robots, scripts etc. You can, of course, tell LWP to send any other user-agent header you like, to pretend to be a normal browser, but that coul be consiered to be a bit shady and deceptive. Maybe talk to the owners of the site in question first? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Checking if a website is up doesn't work correctly
On Thu, 15 Feb 2018 05:46:33 -0600 Mike Flannigan wrote: > See if some version of the attached program > gives the results you expect. > > > Mike > > Thanks. The attached program does better as https://notabug.org works. Only http://scripts.sil.org doesn't work. It seems there are special checks active on that site. -- Manfred > On 2/13/2018 8:33 PM, beginners-digest-h...@perl.org wrote: > > I tried WWW::Mechanize, and (of course) got also 403. > > > > Really strange. > > > > Is there another tool I could use for checking? I mean some tool in > > the Perl universe? > > > > -- Manfred > > > > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Checking if a website is up doesn't work correctly
See if some version of the attached program gives the results you expect. Mike On 2/13/2018 8:33 PM, beginners-digest-h...@perl.org wrote: I tried WWW::Mechanize, and (of course) got also 403. Really strange. Is there another tool I could use for checking? I mean some tool in the Perl universe? -- Manfred # # # This scipt checks links in the html in the __DATA__ section # and reports if they are good links or bad links. # # #!/usr/bin/perl -w use strict; use LWP::UserAgent; use HTML::LinkExtor; #-- #-- i am being very lazy in the demo #-- you should really localize it in a block #-- local $/; my $p = HTML::LinkExtor->new(\&hrefs)->parse(); sub hrefs{ my($tag,@links) = @_; return unless($tag =~ /^a$/i); my $p = LWP::UserAgent->new->request( HTTP::Request->new(GET => $links[1])); print $p->is_success ? "GOOD: $links[1]" : $p->status_line . " $links[1]", "\n"; } __DATA__ http://scripts.sil.org/robots.txt";> https://shlomif.github.io/";> https://notabug.org";> http://scripts.sil.org/OFL";> __END__ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Checking if a website is up doesn't work correctly
> Is there another tool I could use for checking? I mean some tool in the Perl universe? Well, lwp-dump is a perl util - comes w/ LWP I believe. The sil.org, for one, just returns forbidden/403 for their own policy reasons, but as far as your "is it up?" question, that should be answer enough. It uses LWP::UserAgent. To play fair (though it doesn't help with sil.org) you should be looking for /robots.txt as you're creating a robot. Pretty sure there's a libcurl interface (Net::Curl and WWW::Curl for two) which might have better luck impersonating a proper user to get around the policy. But your urls so far have shown some odd repsonse using wget so you may want to check them out first before your script has at them. On Tue, Feb 13, 2018 at 2:34 PM, Manfred Lotz wrote: > On Tue, 13 Feb 2018 13:50:55 -0600 > Andy Bach wrote: > > > $ wget http://scripts.sil.org/OFL > > --2018-02-13 13:42:50-- http://scripts.sil.org/OFL > > Resolving scripts.sil.org (scripts.sil.org)... 209.12.63.143 > > Connecting to scripts.sil.org (scripts.sil.org)|209.12.63.143|:80... > > connected. > > HTTP request sent, awaiting response... 302 Found > > Location: > > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > > [following] --2018-02-13 13:42:52-- > > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > > Reusing existing connection to scripts.sil.org:80. > > HTTP request sent, awaiting response... 302 Moved Temporarily > > Location: /cms/scripts/page.php?site_id=nrsi&id=OFL&_sc=1 [following] > > --2018-02-13 13:42:52-- > > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL&_sc=1 > > Reusing existing connection to scripts.sil.org:80. > > HTTP request sent, awaiting response... 302 Moved Temporarily > > Location: /cms/scripts/page.php?site_id=nrsi&id=OFL [following] > > --2018-02-13 13:42:53-- > > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > > Reusing existing connection to scripts.sil.org:80. > > HTTP request sent, awaiting response... 200 OK > > Length: unspecified [text/html] > > Saving to: ‘OFL’ > > > > [ > > <=> > > ] 37,439 59.6KB/s in 0.6s > > > > 2018-02-13 13:42:55 (59.6 KB/s) - ‘OFL’ saved [37439] > > > > so it may not be following the 302s. I'm not sure you're using the > > correct tool here. A little more straight forward > > > > andy@wiwmb-md-afb-mint:~/spam$ wget http://scripts.sil.org/robots.txt > > --2018-02-13 13:47:27-- http://scripts.sil.org/robots.txt > > Resolving scripts.sil.org (scripts.sil.org)... 209.12.63.143 > > Connecting to scripts.sil.org (scripts.sil.org)|209.12.63.143|:80... > > connected. > > HTTP request sent, awaiting response... 200 OK > > Length: 36 [text/plain] > > Saving to: ‘robots.txt’ > > > > 100%[=== > =>] > > 36 --.-K/s in 0s > > > > 2018-02-13 13:47:27 (2.99 MB/s) - ‘robots.txt’ saved [36/36] > > > > but > > $ is_it_up.pl > > http://scripts.sil.org/robots.txt is DOWN > > > > You might look at more LWP tools: > > $ lwp-dump https://www.sil.org > > HTTP/1.1 403 Forbidden > > Cache-Control: max-age=10 > > Connection: keep-alive > > Date: Tue, 13 Feb 2018 19:49:47 GMT > > Server: cloudflare > > Content-Type: text/html; charset=UTF-8 > > Expires: Tue, 13 Feb 2018 19:49:57 GMT > > CF-RAY: 3eca501a5d569895-LAX > > Expect-CT: max-age=604800, report-uri=" > > https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"; > > Set-Cookie: __cfduid=dd8038f4f2c995fa4b4c7fa8beb2b42f31518551387; > > expires=Wed, 13-Feb-19 19:49:47 GMT; path=/; domain=.sil.org; HttpOnly > > X-Frame-Options: SAMEORIGIN > > > > > > > class="no-js" lang="en-US"> > > Access denied | www.sil.org used Cloudflare to restrict > > access > > > > > > > (+ 2770 more bytes not shown) > > > > so it's up, but "forbidden" probably as the user agent isn't set or > > some other policy reason. > > > > > > I tried WWW::Mechanize, and (of course) got also 403. > > Really strange. > > Is there another tool I could use for checking? I mean some tool in the > Perl universe? > > -- > Manfred > > > > > On Tue, Feb 13, 2018 at 11:33 AM, Manfred Lotz > > wrote: > > > > > On Tue, 13 Feb 2018 10:47:42 -0600 > > > Andy Bach wrote: > > > > > > > The site doesn't like 'head' requests? get works > > > > #!/usr/bin/perl > > > > > > > > use strict; > > > > use warnings; > > > > > > > > use LWP::Simple; > > > > # my $url="https://shlomif.github.io/";; > > > > my $url="http://www.notabug.org/";; > > > > print "$url is ", ( > > > > (! get($url)) ? "DOWN" > > > > : "up" > > > > ), "\n"; > > > > > > > > $ is_it_up.pl > > > > http://www.notabug.org/ is up > > > > > > > > > > You are right. > > > > > > But am afraid this is not all of it. If I test > > > http://scripts.sil.org/OFL then I get an error but it is fine in > > > firefox. > > > > > > Very strange. > > > > > > -- > > > Manfred > > > > > > >
Re: Checking if a website is up doesn't work correctly
On Tue, 13 Feb 2018 13:50:55 -0600 Andy Bach wrote: > $ wget http://scripts.sil.org/OFL > --2018-02-13 13:42:50-- http://scripts.sil.org/OFL > Resolving scripts.sil.org (scripts.sil.org)... 209.12.63.143 > Connecting to scripts.sil.org (scripts.sil.org)|209.12.63.143|:80... > connected. > HTTP request sent, awaiting response... 302 Found > Location: > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > [following] --2018-02-13 13:42:52-- > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > Reusing existing connection to scripts.sil.org:80. > HTTP request sent, awaiting response... 302 Moved Temporarily > Location: /cms/scripts/page.php?site_id=nrsi&id=OFL&_sc=1 [following] > --2018-02-13 13:42:52-- > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL&_sc=1 > Reusing existing connection to scripts.sil.org:80. > HTTP request sent, awaiting response... 302 Moved Temporarily > Location: /cms/scripts/page.php?site_id=nrsi&id=OFL [following] > --2018-02-13 13:42:53-- > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL > Reusing existing connection to scripts.sil.org:80. > HTTP request sent, awaiting response... 200 OK > Length: unspecified [text/html] > Saving to: ‘OFL’ > > [ > <=> > ] 37,439 59.6KB/s in 0.6s > > 2018-02-13 13:42:55 (59.6 KB/s) - ‘OFL’ saved [37439] > > so it may not be following the 302s. I'm not sure you're using the > correct tool here. A little more straight forward > > andy@wiwmb-md-afb-mint:~/spam$ wget http://scripts.sil.org/robots.txt > --2018-02-13 13:47:27-- http://scripts.sil.org/robots.txt > Resolving scripts.sil.org (scripts.sil.org)... 209.12.63.143 > Connecting to scripts.sil.org (scripts.sil.org)|209.12.63.143|:80... > connected. > HTTP request sent, awaiting response... 200 OK > Length: 36 [text/plain] > Saving to: ‘robots.txt’ > > 100%[>] > > 36 --.-K/s in 0s > > 2018-02-13 13:47:27 (2.99 MB/s) - ‘robots.txt’ saved [36/36] > > but > $ is_it_up.pl > http://scripts.sil.org/robots.txt is DOWN > > You might look at more LWP tools: > $ lwp-dump https://www.sil.org > HTTP/1.1 403 Forbidden > Cache-Control: max-age=10 > Connection: keep-alive > Date: Tue, 13 Feb 2018 19:49:47 GMT > Server: cloudflare > Content-Type: text/html; charset=UTF-8 > Expires: Tue, 13 Feb 2018 19:49:57 GMT > CF-RAY: 3eca501a5d569895-LAX > Expect-CT: max-age=604800, report-uri=" > https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"; > Set-Cookie: __cfduid=dd8038f4f2c995fa4b4c7fa8beb2b42f31518551387; > expires=Wed, 13-Feb-19 19:49:47 GMT; path=/; domain=.sil.org; HttpOnly > X-Frame-Options: SAMEORIGIN > > > class="no-js" lang="en-US"> > Access denied | www.sil.org used Cloudflare to restrict > access > > > (+ 2770 more bytes not shown) > > so it's up, but "forbidden" probably as the user agent isn't set or > some other policy reason. > > I tried WWW::Mechanize, and (of course) got also 403. Really strange. Is there another tool I could use for checking? I mean some tool in the Perl universe? -- Manfred > On Tue, Feb 13, 2018 at 11:33 AM, Manfred Lotz > wrote: > > > On Tue, 13 Feb 2018 10:47:42 -0600 > > Andy Bach wrote: > > > > > The site doesn't like 'head' requests? get works > > > #!/usr/bin/perl > > > > > > use strict; > > > use warnings; > > > > > > use LWP::Simple; > > > # my $url="https://shlomif.github.io/";; > > > my $url="http://www.notabug.org/";; > > > print "$url is ", ( > > > (! get($url)) ? "DOWN" > > > : "up" > > > ), "\n"; > > > > > > $ is_it_up.pl > > > http://www.notabug.org/ is up > > > > > > > You are right. > > > > But am afraid this is not all of it. If I test > > http://scripts.sil.org/OFL then I get an error but it is fine in > > firefox. > > > > Very strange. > > > > -- > > Manfred > > > > > > > > > > > > On Tue, Feb 13, 2018 at 5:25 AM, Manfred Lotz > > > wrote: > > > > > > > Hi there, > > > > Somewhere I found an example how to check if a website is up. > > > > > > > > Here my sample: > > > > > > > > #! /usr/bin/perl > > > > > > > > use strict; > > > > > > > > use LWP::Simple; > > > > my $url="https://notabug.org";; > > > > if (! head($url)) { > > > > die "$url is DOWN" > > > > } > > > > > > > > Running above code I get > > > > https://notabug.org is DOWN at ./check_url.pl line 8. > > > > > > > > > > > > However, firefox shows the site works ok. > > > > > > > > > > > > What am I doing wrong? > > > > > > > > > > > > -- > > > > Thanks, > > > > Manfred > > > > > > > > -- > > > > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > > > > For additional commands, e-mail: beginners-h...@perl.org > > > > http://learn.perl.org/ > > > > > > > > > > > > > > > > > > > > > > -- > > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > > For additional commands, e-mail: beginner
Re: Checking if a website is up doesn't work correctly
$ wget http://scripts.sil.org/OFL --2018-02-13 13:42:50-- http://scripts.sil.org/OFL Resolving scripts.sil.org (scripts.sil.org)... 209.12.63.143 Connecting to scripts.sil.org (scripts.sil.org)|209.12.63.143|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL [following] --2018-02-13 13:42:52-- http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL Reusing existing connection to scripts.sil.org:80. HTTP request sent, awaiting response... 302 Moved Temporarily Location: /cms/scripts/page.php?site_id=nrsi&id=OFL&_sc=1 [following] --2018-02-13 13:42:52-- http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL&_sc=1 Reusing existing connection to scripts.sil.org:80. HTTP request sent, awaiting response... 302 Moved Temporarily Location: /cms/scripts/page.php?site_id=nrsi&id=OFL [following] --2018-02-13 13:42:53-- http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL Reusing existing connection to scripts.sil.org:80. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘OFL’ [ <=> ] 37,439 59.6KB/s in 0.6s 2018-02-13 13:42:55 (59.6 KB/s) - ‘OFL’ saved [37439] so it may not be following the 302s. I'm not sure you're using the correct tool here. A little more straight forward andy@wiwmb-md-afb-mint:~/spam$ wget http://scripts.sil.org/robots.txt --2018-02-13 13:47:27-- http://scripts.sil.org/robots.txt Resolving scripts.sil.org (scripts.sil.org)... 209.12.63.143 Connecting to scripts.sil.org (scripts.sil.org)|209.12.63.143|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 36 [text/plain] Saving to: ‘robots.txt’ 100%[>] 36 --.-K/s in 0s 2018-02-13 13:47:27 (2.99 MB/s) - ‘robots.txt’ saved [36/36] but $ is_it_up.pl http://scripts.sil.org/robots.txt is DOWN You might look at more LWP tools: $ lwp-dump https://www.sil.org HTTP/1.1 403 Forbidden Cache-Control: max-age=10 Connection: keep-alive Date: Tue, 13 Feb 2018 19:49:47 GMT Server: cloudflare Content-Type: text/html; charset=UTF-8 Expires: Tue, 13 Feb 2018 19:49:57 GMT CF-RAY: 3eca501a5d569895-LAX Expect-CT: max-age=604800, report-uri=" https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"; Set-Cookie: __cfduid=dd8038f4f2c995fa4b4c7fa8beb2b42f31518551387; expires=Wed, 13-Feb-19 19:49:47 GMT; path=/; domain=.sil.org; HttpOnly X-Frame-Options: SAMEORIGIN Access denied | www.sil.org used Cloudflare to restrict access wrote: > On Tue, 13 Feb 2018 10:47:42 -0600 > Andy Bach wrote: > > > The site doesn't like 'head' requests? get works > > #!/usr/bin/perl > > > > use strict; > > use warnings; > > > > use LWP::Simple; > > # my $url="https://shlomif.github.io/";; > > my $url="http://www.notabug.org/";; > > print "$url is ", ( > > (! get($url)) ? "DOWN" > > : "up" > > ), "\n"; > > > > $ is_it_up.pl > > http://www.notabug.org/ is up > > > > You are right. > > But am afraid this is not all of it. If I test > http://scripts.sil.org/OFL then I get an error but it is fine in > firefox. > > Very strange. > > -- > Manfred > > > > > > > On Tue, Feb 13, 2018 at 5:25 AM, Manfred Lotz > > wrote: > > > > > Hi there, > > > Somewhere I found an example how to check if a website is up. > > > > > > Here my sample: > > > > > > #! /usr/bin/perl > > > > > > use strict; > > > > > > use LWP::Simple; > > > my $url="https://notabug.org";; > > > if (! head($url)) { > > > die "$url is DOWN" > > > } > > > > > > Running above code I get > > > https://notabug.org is DOWN at ./check_url.pl line 8. > > > > > > > > > However, firefox shows the site works ok. > > > > > > > > > What am I doing wrong? > > > > > > > > > -- > > > Thanks, > > > Manfred > > > > > > -- > > > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > > > For additional commands, e-mail: beginners-h...@perl.org > > > http://learn.perl.org/ > > > > > > > > > > > > > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: Checking if a website is up doesn't work correctly
On Tue, 13 Feb 2018 10:47:42 -0600 Andy Bach wrote: > The site doesn't like 'head' requests? get works > #!/usr/bin/perl > > use strict; > use warnings; > > use LWP::Simple; > # my $url="https://shlomif.github.io/";; > my $url="http://www.notabug.org/";; > print "$url is ", ( > (! get($url)) ? "DOWN" > : "up" > ), "\n"; > > $ is_it_up.pl > http://www.notabug.org/ is up > You are right. But am afraid this is not all of it. If I test http://scripts.sil.org/OFL then I get an error but it is fine in firefox. Very strange. -- Manfred > > On Tue, Feb 13, 2018 at 5:25 AM, Manfred Lotz > wrote: > > > Hi there, > > Somewhere I found an example how to check if a website is up. > > > > Here my sample: > > > > #! /usr/bin/perl > > > > use strict; > > > > use LWP::Simple; > > my $url="https://notabug.org";; > > if (! head($url)) { > > die "$url is DOWN" > > } > > > > Running above code I get > > https://notabug.org is DOWN at ./check_url.pl line 8. > > > > > > However, firefox shows the site works ok. > > > > > > What am I doing wrong? > > > > > > -- > > Thanks, > > Manfred > > > > -- > > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > > For additional commands, e-mail: beginners-h...@perl.org > > http://learn.perl.org/ > > > > > > > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Checking if a website is up doesn't work correctly
The site doesn't like 'head' requests? get works #!/usr/bin/perl use strict; use warnings; use LWP::Simple; # my $url="https://shlomif.github.io/";; my $url="http://www.notabug.org/";; print "$url is ", ( (! get($url)) ? "DOWN" : "up" ), "\n"; $ is_it_up.pl http://www.notabug.org/ is up On Tue, Feb 13, 2018 at 5:25 AM, Manfred Lotz wrote: > Hi there, > Somewhere I found an example how to check if a website is up. > > Here my sample: > > #! /usr/bin/perl > > use strict; > > use LWP::Simple; > my $url="https://notabug.org";; > if (! head($url)) { > die "$url is DOWN" > } > > Running above code I get > https://notabug.org is DOWN at ./check_url.pl line 8. > > > However, firefox shows the site works ok. > > > What am I doing wrong? > > > -- > Thanks, > Manfred > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: Checking if a website is up doesn't work correctly
Hi Manfred! On Tue, 13 Feb 2018 12:25:31 +0100 Manfred Lotz wrote: > Hi there, > Somewhere I found an example how to check if a website is up. > > Here my sample: > > #! /usr/bin/perl > > use strict; > > use LWP::Simple; > my $url="https://notabug.org";; > if (! head($url)) { > die "$url is DOWN" > } > > Running above code I get > https://notabug.org is DOWN at ./check_url.pl line 8. > This code seems to work fine here: #!/usr/bin/perl use strict; use warnings; use LWP::Simple; my $url="https://shlomif.github.io/";; # my $url="https://notabug.org/";; if (! head($url)) { die "$url is DOWN"; } seems like notabug blocks libwww-perl. > > However, firefox shows the site works ok. > same here. Regards, Shlomi > > What am I doing wrong? > > -- - Shlomi Fish http://www.shlomifish.org/ Chuck Norris/etc. Facts - http://www.shlomifish.org/humour/bits/facts/ “Hey, I have a flat tire. Can you help me change it with a can opener and a pound of sesame seeds?” — talexb on parsing HTML or XML with regular expressions. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Checking if a website is up doesn't work correctly
Hi there, Somewhere I found an example how to check if a website is up. Here my sample: #! /usr/bin/perl use strict; use LWP::Simple; my $url="https://notabug.org";; if (! head($url)) { die "$url is DOWN" } Running above code I get https://notabug.org is DOWN at ./check_url.pl line 8. However, firefox shows the site works ok. What am I doing wrong? -- Thanks, Manfred -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/