Re: [Bug-wget] grab complete download link

2014-07-20 Thread Yousong Zhou
Hi,

On 21 July 2014 09:38, bas smit  wrote:
> Dear Darshit Shah
> Thanks for your response.
>
> I tried with the following command:
> subprocess.call([wget,'--user',user,'--password',passw,'-P',download_dir,'--page-requisites',url,'-o',logfile,\
> '--no-check-certificate'])
>

The URL you provided needs login to access.  But I guess recursive
download is what you want.  Try options `--recursive --level=1` , or
`-r -l 1` for the short equivalent.

> However, still unsuccessful to download the required file.
>
> I also obtained the following in the log file:
>
> WARNING: Certificate verification error: unable to get local issuer
> certificate
>
>
> I hope you can help me.
>
> Bas
>
>
> WARNING: Certificate verification error: unable to get local issuer
> certificate
>
>
> On Thu, Jul 17, 2014 at 9:34 PM, Darshit Shah  wrote:
>
>> You want to use the --page-requisites option
>>
>> On Thu, Jul 17, 2014 at 2:22 PM, bas smit  wrote:
>> > I am looking for command line option to use the same functionality as the
>> > "Download All with Free Download Manager" does. It grabs the complete
>> > download links though only partial links are shown in the source html.  I
>> > tried the following code, but but could not figure out which particular
>> > parameter is necessary for that. The url provided below is the only known
>> > one.
>> >
>> > import subprocess
>> >
>> > user, passw = 'user', 'passw'
>> >
>> > url = '
>> http://earthexplorer.usgs.gov/download/3120/LM10300301974324GDS05/STANDARD/BulkDownload
>> '
>> >
>> > wget = "C:\\Users\\bas\\Downloads\\wget-1.10.2.exe"
>> > subprocess.call([wget, '--user', user, '--password', passw, url])
>>
>>
>>
>> --
>> Thanking You,
>> Darshit Shah
>>



Re: [Bug-wget] grab complete download link

2014-07-20 Thread bas smit
Dear Darshit Shah
Thanks for your response.

I tried with the following command:
subprocess.call([wget,'--user',user,'--password',passw,'-P',download_dir,'--page-requisites',url,'-o',logfile,\
'--no-check-certificate'])

However, still unsuccessful to download the required file.

I also obtained the following in the log file:

WARNING: Certificate verification error: unable to get local issuer
certificate


I hope you can help me.

Bas


WARNING: Certificate verification error: unable to get local issuer
certificate


On Thu, Jul 17, 2014 at 9:34 PM, Darshit Shah  wrote:

> You want to use the --page-requisites option
>
> On Thu, Jul 17, 2014 at 2:22 PM, bas smit  wrote:
> > I am looking for command line option to use the same functionality as the
> > "Download All with Free Download Manager" does. It grabs the complete
> > download links though only partial links are shown in the source html.  I
> > tried the following code, but but could not figure out which particular
> > parameter is necessary for that. The url provided below is the only known
> > one.
> >
> > import subprocess
> >
> > user, passw = 'user', 'passw'
> >
> > url = '
> http://earthexplorer.usgs.gov/download/3120/LM10300301974324GDS05/STANDARD/BulkDownload
> '
> >
> > wget = "C:\\Users\\bas\\Downloads\\wget-1.10.2.exe"
> > subprocess.call([wget, '--user', user, '--password', passw, url])
>
>
>
> --
> Thanking You,
> Darshit Shah
>


Re: [Bug-wget] [Bug-Wget] Misc. patches

2014-07-20 Thread Tim Rühsen
Am Montag, 21. Juli 2014, 00:58:49 schrieb Darshit Shah:
> On Mon, Jul 7, 2014 at 8:14 PM, Tim Ruehsen  wrote:
> > One more comment / idea.
> > 
> > The 'cookie_domain' comes from a HTTP Set-Cookie repsonse header and thus
> > is (must be) toASCII() encoded (=puncode). Of course this has to be
> > checked when normalizing the incoming cookie data. A cookie comain having
> > non-ascii characters should simply be dropped.
> > 
> > The whole check only works when 'host' is also in toASCII() (punycode)
> > form.
> > 
> > Assuming this, psl_str_to_utf8lower() just reduces to a ASCII lowercase
> > converter.
> > 
> > If Wget would convert any domain name input to punycode + lowercase, many
> > conversions would fall away and case-function would not be needed (e.g.
> > calling strcmp instead of strcasecmp, the need to call
> > psl_str_to_utf8lower() would fall away, etc.).
> > 
> > What do you think ?
> 
> Sounds like an interesting idea to me. Although, how do you suggest we
> go about converting the domain names to lowercase?
> I'm not sure about this, so I confirm first. After running the input
> domain names through toASCII(), can we simply pass the string to
> tolower() to get the lowercase version?

That depends on the library you use.

libidn's toASCII() has a built-in lowercase conversion. So the input case does 
not matter, the output is always lowercase ASCII.

Using libidn2, you have to convert to lowercase first yourself (e.g. using 
libunistring). The output is of course lowercase ASCII.

Using libicu, you have to convert to lowercase first yourself (but libicu is 
able to do that). The output is of course lowercase ASCII.


What I thought of (what I did in Mget), 'normalize' every domain name before 
further processing/comparing. 'normalizing' means trimming, percent-decoding, 
charset transcoding to UTF-8, toASCII() conversion (with or without prior 
lowercasing, depending on the IDN library used).

Having that, Wget's code just needs strcmp() to compare domains and
$ wget übel.de Übel.de xn--bel-goa.de 
should reduce to a download of a single file (xn--bel-goa.de/index.html)
(but maybe it is Wget's policy to explictely download every URL given on the 
command line, even if it is always the same !?)

There is domain name input from the command line (URL's and a few options like 
-D/--domains), from local files (-i/--input-file) and from remote files.

But Darshit, maybe this should have low priority. It is more a kind of 'code 
polishing'. I am looking forward to start a Wget version based on a libwget in 
the next 6-12 months. Most of the code is already working in the Mget project, 
but everything needs polishing (e.g. APi docs and more of Wget functionality, 
-k/convert-links implemented last week ;-) And than the day comes to merge 
Wget and Mget... if that finds any friends ;-)

> 
> > Tim
> > 
> > On Monday 07 July 2014 17:08:48 Darshit Shah wrote:
> >> +  if (psl_str_to_utf8lower (cookie_domain, NULL,
> >> NULL,&cookie_domain_lower)> 
> > == PSL_SUCCESS &&
> > 
> >> +  psl_str_to_utf8lower (host, NULL, NULL, &host_lower) ==
> >> PSL_SUCCESS)
> >> +{
> >> +  is_acceptable = psl_is_cookie_domain_acceptable (psl,
> >> host_lower, cookie_domain_lower);
> >> +}
> >> +  else
> >> +{
> >> +DEBUGP (("libpsl unable to parse domain name. "
> >> + "Falling back to simple heuristics.\n"));
> >> +goto no_psl;
> >> +}




Re: [Bug-wget] [Bug-Wget] Misc. patches

2014-07-20 Thread Darshit Shah
On Mon, Jul 7, 2014 at 8:14 PM, Tim Ruehsen  wrote:
> One more comment / idea.
>
> The 'cookie_domain' comes from a HTTP Set-Cookie repsonse header and thus is
> (must be) toASCII() encoded (=puncode). Of course this has to be checked when
> normalizing the incoming cookie data. A cookie comain having non-ascii
> characters should simply be dropped.
>
> The whole check only works when 'host' is also in toASCII() (punycode) form.
>
> Assuming this, psl_str_to_utf8lower() just reduces to a ASCII lowercase
> converter.
>
> If Wget would convert any domain name input to punycode + lowercase, many
> conversions would fall away and case-function would not be needed (e.g.
> calling strcmp instead of strcasecmp, the need to call psl_str_to_utf8lower()
> would fall away, etc.).
>
> What do you think ?
>
Sounds like an interesting idea to me. Although, how do you suggest we
go about converting the domain names to lowercase?
I'm not sure about this, so I confirm first. After running the input
domain names through toASCII(), can we simply pass the string to
tolower() to get the lowercase version?

> Tim
>
> On Monday 07 July 2014 17:08:48 Darshit Shah wrote:
>> +  if (psl_str_to_utf8lower (cookie_domain, NULL, NULL,&cookie_domain_lower)
> == PSL_SUCCESS &&
>> +  psl_str_to_utf8lower (host, NULL, NULL, &host_lower) == PSL_SUCCESS)
>> +{
>> +  is_acceptable = psl_is_cookie_domain_acceptable (psl,
>> host_lower, cookie_domain_lower);
>> +}
>> +  else
>> +{
>> +DEBUGP (("libpsl unable to parse domain name. "
>> + "Falling back to simple heuristics.\n"));
>> +goto no_psl;
>> +}
>



-- 
Thanking You,
Darshit Shah