[Bug-wget] Suggestion

2013-11-03 Thread Mike Gold
I can't really find the way to set the wget to not show any warnings or any
other texts at all, just the progress bar.
My file is on the dropbox, and every time it's being downloaded wget wants
to inform me that it could not get the right certificate. I'd like it to be
all quiet, but show the progress bar, when it starts to download. Is there
any set of switches to do so?

-- 
-
Never argue with an idiot. First he will bring you down to his level then
he will beat you with experience.


[Bug-wget] request for help with wget (crawling search results of a website)

2013-11-03 Thread Altug Tekin
Dear mailing List members,

According to the website http://www.gnu.org/software/wget/ it is ok to
write emails with help requests to this mailing list. I have the following
problem:

I am trying to crawl the search results of a news website using *wget*.

The name of the website is *www.voanews.com *.

After typing in my *search keyword* and clicking search on the website, it
proceeds to the results. Then i can specify a *"to" and a "from"-date* and
hit search again.

After this the URL becomes:

http://www.voanews.com/search/?st=article&k=mykeyword&df=10%2F01%2F2013&dt=09%2F20%2F2013&ob=dt#article

and the actual content of the results is what i want to download.

To achieve this I created the following wget-command:

wget --reject=js,txt,gif,jpeg,jpg \
 --accept=html \
 --user-agent=My-Browser \
 --recursive --level=2 \
 
www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F2013&ob=dt#article

Unfortunately, the crawler doesn't download the search results. It only
gets into the upper link bar, which contains the "Home,USA,Africa,Asia,..."
links and saves the articles they link to.

*It seems like he crawler doesn't check the search result links at all*.

*What am I doing wrong and how can I modify the wget command to download
the results search list links (and of course the sites they link to) only ?*

Thank you for any help...


Re: [Bug-wget] Suggestion

2013-11-03 Thread Darshit Shah
Maybe you want to use the --no-verbose flag?


On Sun, Nov 3, 2013 at 8:43 PM, Mike Gold  wrote:

> I can't really find the way to set the wget to not show any warnings or any
> other texts at all, just the progress bar.
> My file is on the dropbox, and every time it's being downloaded wget wants
> to inform me that it could not get the right certificate. I'd like it to be
> all quiet, but show the progress bar, when it starts to download. Is there
> any set of switches to do so?
>
> --
> -
> Never argue with an idiot. First he will bring you down to his level then
> he will beat you with experience.
>



-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] wget alpha release 1.14.96-38327

2013-11-03 Thread Darshit Shah
Simply need to add those to the EXTRA_DIST variable in Makefile.am.

I've attached a patch for this.


On Sun, Nov 3, 2013 at 9:21 PM, Andrea Urbani  wrote:

> Hi Giuseppe,
>
> the following files are missing:
>   tests/Test-ftp-list-Multinet.px
>   tests/Test-ftp-list-Unknown.px
>   tests/Test-ftp-list-Unknown-a.px
>   tests/Test-ftp-list-Unknown-hidden.px
>   tests/Test-ftp-list-Unknown-list-a-fails.px
>   tests/Test-ftp-list-UNIX-hidden.px
>
> Please, include them (you find them also attached here)
>
> Bye
> Andrea
>
>
> - Original Message -
> From: Giuseppe Scrivano
> Sent: 11/02/13 01:32 PM
> To: bug-wget@gnu.org
> Subject: [Bug-wget] wget alpha release 1.14.96-38327
>  Hi,
>
> I have just uploaded an alpha release for wget. If no blocking errors
> are found, then I will make an official release in the coming weeks
> (that is, wget 1.15).
>
> Please test it! :-)
>
> http://alpha.gnu.org/gnu/wget/wget-1.14.96-38327.tar.gz
> http://alpha.gnu.org/gnu/wget/wget-1.14.96-38327.tar.xz
>
> signatures, using key C03363F4:
>
> http://alpha.gnu.org/gnu/wget/wget-1.14.96-38327.tar.gz.sig
> http://alpha.gnu.org/gnu/wget/wget-1.14.96-38327.tar.xz.sig
>
> Thanks,
> Giuseppe
>



-- 
Thanking You,
Darshit Shah
From febe1547001dc60e32177437ee9dfb25f4957962 Mon Sep 17 00:00:00 2001
From: Darshit Shah 
Date: Mon, 4 Nov 2013 06:15:17 +0530
Subject: [PATCH] Add tests to EXTRA_DIST variable for distribution packaging

---
 tests/ChangeLog   | 6 ++
 tests/Makefile.am | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index e1ef334..c8dc09d 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,9 @@
+2013-11-04  Darshit Shah  
+
+	* Makefile.am: Add new tests introduced in last commit to
+  EXTRA_DIST.
+  Reported by: Andrea Urbani  
+
 2013-10-17  Andrea Urbani  
 
 	* FTPServer.pm (GetBehavior): new routine.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index a494787..bea262e 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -82,6 +82,12 @@ EXTRA_DIST = FTPServer.pm FTPTest.pm HTTPServer.pm HTTPTest.pm \
  Test-ftp-iri-fallback.px \
  Test-ftp-iri-recursive.px \
  Test-ftp-iri-disabled.px \
+ Test-ftp-list-Multinet.px \
+ Test-ftp-list-Unknown.px \
+ Test-ftp-list-Unknown-a.px \
+ Test-ftp-list-Unknown-hidden.px \
+ Test-ftp-list-Unknown-list-a-fails.px \
+ Test-ftp-list-UNIX-hidden.px \
  Test-HTTP-Content-Disposition-1.px \
  Test-HTTP-Content-Disposition-2.px \
  Test-HTTP-Content-Disposition.px \
-- 
1.8.4.2



Re: [Bug-wget] request for help with wget (crawling search results of a website)

2013-11-03 Thread Dagobert Michelsen
Hi,

Am 03.11.2013 um 09:13 schrieb Altug Tekin :
> I am trying to crawl the search results of a news website using *wget*.
> 
> The name of the website is *www.voanews.com *.
> 
> After typing in my *search keyword* and clicking search on the website, it
> proceeds to the results. Then i can specify a *"to" and a "from"-date* and
> hit search again.
> 
> After this the URL becomes:
> 
> http://www.voanews.com/search/?st=article&k=mykeyword&df=10%2F01%2F2013&dt=09%2F20%2F2013&ob=dt#article
> 
> and the actual content of the results is what i want to download.
> 
> To achieve this I created the following wget-command:
> 
> wget --reject=js,txt,gif,jpeg,jpg \
> --accept=html \
> --user-agent=My-Browser \
> --recursive --level=2 \
> 
> www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F2013&ob=dt#article
> 
> Unfortunately, the crawler doesn't download the search results. It only
> gets into the upper link bar, which contains the "Home,USA,Africa,Asia,..."
> links and saves the articles they link to.
> 
> *It seems like he crawler doesn't check the search result links at all*.
> 
> *What am I doing wrong and how can I modify the wget command to download
> the results search list links (and of course the sites they link to) only ?*


You need to inspect the urls of the results and make sure to
only download these. Maybe a --no-parent is enough.


Best regards

  -- Dago


-- 
"You don't become great by trying to be great, you become great by wanting to 
do something,
and then doing it so hard that you become great in the process." - xkcd #896



smime.p7s
Description: S/MIME cryptographic signature


[Bug-wget] Keep copyright year always update.

2013-11-03 Thread Trần Ngọc Quân
Hello,
In order to automatic update copyright year, please:
1. In configure.ac, add:
# Defines
AC_DEFINE([COPYRIGHT_YEAR], [m4_esyscmd([date +%Y])], [Current year use
in copyright message])

2.
#: src/main.c:959
  if (fputs (_("\
Copyright (C) 2011 Free Software Foundation, Inc.\n"), stdout) < 0)
exit (3);

Change too:

  if (fprintf (stdout, _("\
Copyright (C) %d Free Software Foundation, Inc.\n"), COPYRIGHT_YEAR) < 0)
exit (3);

-- 
Trần Ngọc Quân.
Vietnamese team leader in TranslationProject.org




Re: [Bug-wget] request for help with wget (crawling search results of a website)

2013-11-03 Thread Tony Lewis
Altug Tekin wrote:

> To achieve this I created the following wget-command:
>
> wget --reject=js,txt,gif,jpeg,jpg \
>  --accept=html \
>  --user-agent=My-Browser \
>  --recursive --level=2 \
>
www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F
2013&ob=dt#article

You need to quote the URL since it contains characters that are interpreted
by your command shell. (Most likely nothing after the "&" was sent to the
web server.

I think you might run into problems with --accept since the URL does not end
with ".html" so you might need to delete that argument to get the results
you want.

Tony