from:"Eli Zaretskii"

Re: [bug #64808] When I use wget to download some files from a web server, files with russian names do not get proper names

2023-11-17 Thread Eli Zaretskii

> Date: Fri, 17 Nov 2023 20:34:37 +0100
> From: grafgrim...@gmx.de
> 
> I use Linux and so not exe files. I use Gentoo Linux.
> 
> Command line example:
> One line (wget and the url):
> 
> wget
> http://releases.mozilla.org/pub/firefox/releases/119.0.1/source/firefox-119.0.1.source.tar.xz
> 
> result: a file with a wrong checksum.

But the above file name has no Russian characters, so why did you say
"files with russian names do not get proper names"?  What am I
missing?

[bug #60287] Windows recursive download escapes utf8 URLs twice

2021-03-28 Thread Eli Zaretskii

Follow-up Comment #12, bug #60287 (project wget):

You are welcome to send patches which would implement what you think should be
the correct behavior in Wget.  At the time, based on my study of the Wget
sources and its basic design of fetching Web pages, my conclusion was that the
only reliable way in Wget on Windows to deal with non-ASCII characters in URLs
specified by Web pages is to provide Wget with the remote and local encodings,
especially since UTF-8 support on Windows is rudimentary at best.  I thought I
was doing fine by helping you and others deal with these situations by
explaining how to use those options to your benefit...


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

[bug #60287] Windows recursive download escapes utf8 URLs twice

2021-03-27 Thread Eli Zaretskii

Follow-up Comment #10, bug #60287 (project wget):

Without converting charsets, it would be difficult to rely on certain library
functions and support certain features.

For example, locale-dependent C library functions work only with the locale's
encoding, and will produce wrong results if presented with strings encoded
differently.  The IRI support needs to work in UTF-8 internally.  And when
writing Web pages to disk, Wget needs to encode the page name so that it would
be acceptable as a file name by the local filesystem.

That is why conversion to the locale's charset is rather necessary. Using the
original bytes might work for some operations, but not for others, so keeping
the original bytes would need some logic for where they can and cannot be
used, which is a complication.  It is better to convert once, and then forget
about it.

The 404 error is most probably because Wget does attempt to convert encoding,
but does it incorrectly when you don't tell it the actual encodings.  So the
re-encoded URL is garbled.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

[bug #60287] Windows recursive download escapes utf8 URLs twice

2021-03-26 Thread Eli Zaretskii

Follow-up Comment #8, bug #60287 (project wget):

> Is this because wget first downloads the html file and then reads the
contents off disk

No.  It's because Wget downloads the pages you told it to, and saves them as
disk files.  Any links in the downloaded pages that lead to other pages
produce additional disk files (e.g., if you told Wget to download
recursively).

IOW, the file-name encoding issue happens when a Web page needs to be saved to
a file for some reason.

> If the bytes were downloaded with the correct encoding, and written to the
file system with the correct encoding, I would expect it to be able to parse
the file with the correct encoding.

What is the "correct encoding", though?

> the file `wget-test.html` has no non-ascii characters in it

Of course, it doesn't: the non-ASCII characters appear when we decode the
hex-encoded bytes.



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

[bug #60287] Windows recursive download escapes utf8 URLs twice

2021-03-26 Thread Eli Zaretskii

Follow-up Comment #6, bug #60287 (project wget):

> Isn't the encoding specified in the HTTP header?

Not the local one.  (And not every page you download has these headers, so the
remote one isn't always known, either.)

You must specify the local encoding, especially on MS-Windows, because Windows
filesystems aren't agnostic about encoding file names, they don't allow
arbitrary byte sequences to be part of a file name.  The file names are
written on disk in UTF-16, and so the file I/O APIs on Windows must convert
file names to UTF-16, and for that they need to know its encoding.

> If feels like a bug because my browser handles the links just fine, without
the chatset specified by the server.

The browser just shows the page, it doesn't save it to a disk file.  So
encoding of the page's name isn't an issue for the browser, as it is for
Wget.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

[bug #60287] Windows recursive download escapes utf8 URLs twice

2021-03-26 Thread Eli Zaretskii

Follow-up Comment #4, bug #60287 (project wget):

Why does this feel like a bug to you?  How can Wget be expected to guess the
correct encoding, if you don't tell it?


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

[bug #60287] Windows recursive download escapes utf8 URLs twice

2021-03-25 Thread Eli Zaretskii

Follow-up Comment #2, bug #60287 (project wget):

What was the locale on the GNU/Linux machine, where this "just works"?  I'm
guessing it was a UTF-8 locale, in which case I'd try the same with a
different locale.

I think you must use --remote-encoding=UTF-8 (and perhaps also a suitable
--local-encoding) to make this work correctly on MS-Windows.  Did you try
that?


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/

Re: Error SSL al ejecutar desde Windows

2021-02-13 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Sat, 13 Feb 2021 18:23:31 +0100
> 
> Try without " or leave away the spaces between these and the URL. 
> (Possibly I am misguided but your mailers line breaks.)
> 
> wget --no-check-certificate 
> "https://www.datos.gov.co/api/views/gt2j-8ykr/rows.csv?accessType=DOWNLOAD";

I don't think this is the problem, as the image clearly shows the
program received the URL correctly.

Wget 1.17 I have here doesn't have any problems downloading that file
on Windows, FWIW.

Re: Confusing "Success" error message

2019-11-08 Thread Eli Zaretskii

> Date: Fri, 8 Nov 2019 17:29:21 +0100
> From: "Andries E. Brouwer" 
> Cc: "Andries E. Brouwer" , tim.rueh...@gmx.de,
> ftu...@fastmail.fm, bug-wget@gnu.org
> 
> Did you read the line "a function that succeeds is allowed to change errno"?

Yes, but that's against every library whose sources I've ever read.

Re: Confusing "Success" error message

2019-11-08 Thread Eli Zaretskii

> Date: Fri, 8 Nov 2019 16:47:30 +0100
> From: "Andries E. Brouwer" 
> Cc: "Andries E. Brouwer" , tim.rueh...@gmx.de,
> ftu...@fastmail.fm, bug-wget@gnu.org
> 
> On Fri, Nov 08, 2019 at 04:34:10PM +0200, Eli Zaretskii wrote:
> 
> > > Libc functions are free to call other functions internally,
> > > and such internal calls may fail where the outer level call
> > > does not fail. So even if a libc function does not return
> > > an error, errno can have changed.
> > 
> > That would be a bug in libc, I think.  Its functions should save and
> > restore errno if other functions they call error out without causing
> > the calling function to fail.
> 
> % man 3 errno
> ...
>A common mistake is to do
> 
>if (somecall() == -1) {
>printf("somecall() failed\n");
>if (errno == ...) { ... }
>}
> 
>where errno no longer needs to have the value it had upon  return  from
>somecall()  (i.e.,  it may have been changed by the printf(3)).  If the
>value of errno should be preserved across a library call,  it  must  be
>saved:
> 
>if (somecall() == -1) {
>int errsv = errno;
>printf("somecall() failed\n");
>if (errsv == ...) { ... }
>}
> 
> That was the Linux man page. Here is the POSIX man page:
> 
> ...
>The  value  in  errno  is significant only when the return value of the
>call indicated an error (i.e., -1 from most system calls;  -1  or  NULL
>from  most  library  functions); a function that succeeds is allowed to
>change errno.

Thanks, but AFAIU this says the same as I did: if a function succeeds,
it should not modify errno.

In the above example from a man page, the "may have been changed by
printf" part alludes to the possibility that printf fails in some way,
e.g. because the format is in error or stdout is closed or somesuch.

Re: Confusing "Success" error message

2019-11-08 Thread Eli Zaretskii

> Date: Fri, 8 Nov 2019 15:03:21 +0100
> From: "Andries E. Brouwer" 
> Cc: Francesco Turco , bug-wget@gnu.org,
>  "Andries E. Brouwer" 
> 
> > Libc functions only touch errno if there *is* an error
> 
> Libc functions are free to call other functions internally,
> and such internal calls may fail where the outer level call
> does not fail. So even if a libc function does not return
> an error, errno can have changed.

That would be a bug in libc, I think.  Its functions should save and
restore errno if other functions they call error out without causing
the calling function to fail.

IOW, if a libc function succeed, it should whatever it takes to
preserve errno.

Re: [Bug-wget] Problem downloading with RIGHT SINGLE QUOTATION MARK (U+2019) in filename

2019-10-11 Thread Eli Zaretskii

> From: Cameron Tacklind 
> Date: Thu, 10 Oct 2019 20:31:02 -0700
> 
> The error is pretty clearly an encoding conversion issue, going from UTF-8,
> assumed to be CP1252, converting into UTF-8, which becomes wrong.

I think you need to tell Wget that the page encoding is UTF-8, by
using the --remote-encoding switch.  Did you try that?

Re: [Bug-wget] Wget on Windows handling of wildcards

2018-06-06 Thread Eli Zaretskii

> From: Sam Habiel 
> Date: Wed, 6 Jun 2018 08:27:44 -0400
> Cc: bug-wget@gnu.org
> 
> Is there a valid argument to be made that some arguments for wget
> should not be expanded, like accept and reject?

Probably.  The problem is that wildcard expansion of the command line
doesn't understand the command, and so doesn't know what to expand and
what not.  Quoting was supposed to be the user's tool to control the
expansion, and it did work up until Windows Vista, when Microsoft in
their infinite wisdom changed the long-standing behavior of their
setargv code.

I came up with the *.[d]at and similar tricks because I frequently
need to invoke a port of Grep using the --include switch, where I have
the same problem on Vista and newer systems.

Re: [Bug-wget] Wget on Windows handling of wildcards

2018-06-05 Thread Eli Zaretskii

> From: Sam Habiel 
> Date: Tue, 5 Jun 2018 14:16:27 -0400
> 
> I have a wget command that has a -A flag that contains a wildcard.
> It's '*.DAT'. That works fine on Linux. I am trying to get the same
> thing to run on Windows, but *.DAT keeps getting expanded by wget (cmd
> does no expansion itself). There is no way that I found of suppressing
> that. I think I tried everything: single quotes, double quotes, escape
> * with ^ (cmd escape char), etc.

What version of Windows is that?

> For reference, here's the whole command:
> 
> wget -rNndp -A "*.DAT"
> "https://foia-vista.osehra.org:443/Patches_By_Application/PSN-NATIONAL
> DRUG FILE (NDF)/PPS_DATS/" -P .
> 
> Run it twice on Windows to see the problem.

Did you try using "*.[D]AT"?

The problem AFAIK is that C runtime on modern versions of Windows
expands wildcards even when quoted.  So either you need to build wget
with wildcard expansion disabled (using the appropriate global
variable whose details depend on whether you use MSVC or MinGW and
which version of MinGW), or you use the above trick (assuming that
wget can expand such wildcards).  Disabling expansions altogether is
usually not a good option in this case, since you probably need it
with other use cases.

HTH

[Bug-wget] Run-time issues with Wget2 1.99.1 built with MinGW

2018-05-12 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Tue, 1 May 2018 15:15:26 +0200
> 
> GNU Wget2 is the successor of GNU Wget, a file and recursive website
> downloader.
> 
> Designed and written from scratch it wraps around libwget, that provides
> the basic functions needed by a web client.
> 
> Wget2 works multi-threaded and uses many features to allow fast operation.
> 
> In many cases Wget2 downloads much faster than Wget1.x due to HTTP zlib
> compression, parallel connections and use of If-Modified-Since HTTP header.

Thanks.  I've built this using mingw.org's MinGW GCC and runtime
support, and found the following run-time issues:

 . The help screen shows the command name as "wget", not "wget2".  is
   that deliberate?

 . Error message is displayed at startup about False Start, due to
   using GnuTLS 3.4.  Why not simply silently avoid using the False
   Start option by default on such systems?

 . Progress bar displays escape sequences, which are not converted to
   colors on MS-Windows.  I see there are functions in
   libwget/console.c that produce colors on Windows: would you prefer
   using them for progress bar, or would you rather have
   Windows-specific code in bar.c?

 . The tests that use libmicrohttpd all fail, and all pop up the
   Windows UAC elevation dialogue.  I don't yet know why that happens,
   and I'm looking into this issue.  To help me out, could you perhaps
   describe how libmicrohttpd is involved in wget2 tests that use it?
   What is the main idea, and how should I interpret the test logs
   (which includes quite a bit of text that doesn't really explain
   itself).

Thanks.

[Bug-wget] Build issues when building Wget2 1.99.1 with MinGW

2018-05-12 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Tue, 1 May 2018 15:15:26 +0200
> 
> GNU Wget2 is the successor of GNU Wget, a file and recursive website
> downloader.
> 
> Designed and written from scratch it wraps around libwget, that provides
> the basic functions needed by a web client.
> 
> Wget2 works multi-threaded and uses many features to allow fast operation.
> 
> In many cases Wget2 downloads much faster than Wget1.x due to HTTP zlib
> compression, parallel connections and use of If-Modified-Since HTTP header.

Thanks.  I've built this using mingw.org's MinGW GCC and runtime
support, and found the following issues that affect the build:

 . Several issues with Gnulib headers and functions, already reported
   to Gnulib mailing list.

 . The README says libmicrohttpd is required for running the test
   suite, but it doesn't tell which optional libmicrohttpd features
   are expected/recommended for the testing.  For example, is HTTPS
   support by libmicrohttpd required?  I presume yes, because
   otherwise HTTPS cannot be tested.  Likewise for other options -- it
   would be good to know how to build libmicrohttpd for optimal
   coverage of the test suite.  (This is especially important on
   Windows, since the existing binary of libmicrohttpd distributed by
   its developers was built without dependencies, so no HTTPS support,
   for example; I needed to build my own port.)

 . The configure time test for external regexp seems to assume that no
   library needs to be added to the link command line to get that
   functionality.  In my case, I needed a -lregex added, but the only
   way to do that seems to set LIBS at configure time.

 . Compiling lib/thread.c produces a warning:

 In file included from thread.c:43:0:
 thread.c: In function 'wget_thread_self':
 ../lib/glthread/thread.h:353:5: warning: return makes integer from pointer 
without a cast [-Wint-conversion]
  gl_thread_self_func ()
  ^~
 thread.c:279:9: note: in expansion of macro 'gl_thread_self'
   return gl_thread_self();
  ^~

   This is because wget.h does this:

 typedef unsigned long wget_thread_id_t;

   which conflicts with gl_thread_t, which on Windows is a pointer to
   a structure.

   To fix this, we could either make wget_thread_id_t a wide enough
   type (unsigned long is not wide enough for 64-bit Windows, we need
   uintptr_t instead), and then use an explicit cast in
   wget_thread_self; or wget_thread_id_t should be an opaque data
   type, like gl_thread_t, but then the rest of the code shouldn't
   treat it as a simple scalar integral type.

 . Compiling programs in examples/ produces this warning from libtool:

   CCLD getstream.exe
 libtool: warning: '-no-install' is ignored for i686-pc-mingw32
 libtool: warning: assuming '-no-fast-install' instead

 . "make install" installs wget2_noinstall.exe, which is presumably a
   mistake.  It does NOT install wget2.info, perhaps because docs were
   not built (I have neither Doxygen nor Pandoc on that system).

Thanks.

Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded

2017-11-15 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: lifenjoi...@163.com
> Date: Wed, 15 Nov 2017 20:28:17 +0100
> 
> I just wonder if we have the same problem on Linux console as well.
> I mean, *not* calling setlocale(LC_ALL, "") (when ENABLE_NLS is undefined) 
> would leave the program with the C locale, even if the console/environment 
> has 
> something else.

The problem is with the codeset, not with the locale itself.  Maybe
you are right, and the same problem exists on Posix hosts in some
corner cases, although I'm guessing that the widespread use of UTF-8
as the codeset should make it very rare.

> But no one complained so far... so my question:
> did you test the patch and does it work for you ?

I cannot test it: I have no access to a Windows system with a locale
where this problem raises its head.  But the change is very simple,
and it does TRT, so I think it should be applied.

Thanks.

Re: [Bug-wget] Bug?

2017-11-13 Thread Eli Zaretskii

> From: 
> Date: Mon, 13 Nov 2017 13:21:59 -0800
> 
> I am running WGET from a command line but the server folder has a pound sign
> in the name. It looks like Wget cannot parse the folder name and truncates
> it at the # sign and so the files don't get downloaded. I have the folder
> path in quotes. Is anyone aware of this problem or know of a fix?
> 
> "C:\Program Files (x86)\GnuWin32\bin\wget.exe" -m -nH -np
> "ftp://ftp.3gpp.org/tsg_cn/WG4_protocollars/TrFO_#01_stockholm/";

It works if you replace # with %23.

Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded

2017-11-13 Thread Eli Zaretskii

> Cc: bug-wget@gnu.org, lifenjoi...@163.com
> From: Tim Rühsen 
> Date: Mon, 13 Nov 2017 16:36:39 +0100
> 
> > I don't think it's a Gnulib issue.  The problem is that on Windows,
> > the implicit call at the beginning of Wget
> > 
> >   setlocale (LC_ALL, "C");
> 
> Why is there an explicit call with "C" ? There is an explicit call with "".

I said "implicit", not "explicit".  Such an implicit call is made at
the beginning of every C program, per ANSI C Standard.  Right?

The MSDN documentation says it clearly:

  At program startup, the equivalent of the following statement is executed:

setlocale( LC_ALL, "C" );

> From the man page:
> "If locale is an empty string, "", each part of the locale that should
> be modified is set according to the environment variables."

The call with a locale of "" is only done in a build that has
ENABLE_NLS defined.  I was talking about a build which didn't define
ENABLE_NLS.

> > is not good enough to work in multibyte locales of the Far East,
> > because the Windows runtime assumes a single-byte locale after that
> > call.  And since Wget happens to need to display text and create files
> > with non-ASCII characters, it gets hit more than other programs.
> 
> I (hopefully) can understand why this doesn't work. NTFS uses UTF-16 for
> the filenames. If your environment specifies a single-character encoding
> (e.g. C) and we use at some point a multi-character encoding (e.g.
> utf-8), then any automatic conversion to UTF-16 filenames are likely to
> fail. For me the question is: a) does wget has a bug (e.g. creating a
> filename with a wrong encoded name string or b) does the Windows API has
> a problem.
> 
> > The proposed solution is to add a special call to setlocale which gets
> > this right on Windows.
> 
> Why can't we just convert the filename string into the correct encoding
> and then create the file ? What do I miss ?

I guess you are missing a short introduction to the Windows l10n/i18n
mess.  Let me try.

First, the fact that NTFS uses UTF-16 is not really relevant.  Wget
uses 'char *' strings, not 'wchar *' strings to store file names and
call C library functions that accept file names.  So we cannot use the
UTF-16 encoding of non-ASCII file names directly.  Instead, we use the
locale's codepage (the C library and the OS APIs then convert to
UTF-16 before hitting the disk, but that's not important now).

Next, creating and opening file names is not the only problem: we need
also to display these file names and URLs, and that also needs to use
the encoding expected by the Windows console.

Now, in any locale which uses single-byte encoding of non-ASCII
characters, the C locale will support those characters, both for I/O
and for functions like strcmp, strlen, strcoll, etc.  But not in
double-byte locales of the Far East: there, you must explicitly call
setlocale with the correct codepage, to have the local character set
supported.  This support includes manipulating file names, calling C
library functions to access files, and displaying non-ASCII text, such
as file names and URLs, on the console.

IOW, this is a Windows runtime subtlety that unfortunately needs to be
fixed in the application code.

(UTF-8 is not relevant at all here, because Windows doesn't support
UTF-8 as the locale's codeset; if you try to call setlocale to set
UTF-8 as the codeset, setlocale will simply fail.  So if we have a
UTF-8 encoded URL or file name inside wget, we must convert it to the
current codepage by calling libiconv functions.)

Does the above make sense?  Let me know if I have to explain some
more.

Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded

2017-11-12 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Sun, 12 Nov 2017 14:50:47 +0100
> Cc: YX Hao 
> 
> As I understand, the second patch is still in discussion with Eli. Since I do 
> not have Windows, I can't help you here. Though what I saw from the 
> discussion, you address a portability issue that likely should be solved 
> within gnulib. Maybe you could (in parallel) send a mail to 
> bug-gnu...@gnu.org 
> with a link to your discussion with Eli. There might be some people with 
> deeper knowledge.

I don't think it's a Gnulib issue.  The problem is that on Windows,
the implicit call at the beginning of Wget

  setlocale (LC_ALL, "C");

is not good enough to work in multibyte locales of the Far East,
because the Windows runtime assumes a single-byte locale after that
call.  And since Wget happens to need to display text and create files
with non-ASCII characters, it gets hit more than other programs.

The proposed solution is to add a special call to setlocale which gets
this right on Windows.

Re: [Bug-wget] Patch: Fix printing mutibyte characters as unprintable characters on Windows

2017-11-11 Thread Eli Zaretskii

> From: "YX Hao" 
> Cc: 
> Date: Sun, 5 Nov 2017 23:01:22 +0800
> 
> And I can tell you that 'GetConsoleOutputCP' returns the codepage as command
> 'chcp'. It is right. The gnu 'vsnprintf' doesn't work right with 'setlocale'
> omitted.

I guess this means wget needs to call 'setlocale' with the right
codepage even when NLS is not enabled, because the naïve belief that
the default C locale will show n on-ASCII characters correctly is
false on Windows, especially in multibyte locales.  The MSDN
documentation of 'setlocale' confirms that by saying:

  The C locale assumes that all char data types are 1 byte and that
  their value is always less than 256.

Re: [Bug-wget] Patch: Fix printing mutibyte characters as unprintable characters on Windows

2017-11-05 Thread Eli Zaretskii

> From: "YX Hao" 
> Cc: 
> Date: Sun, 5 Nov 2017 23:01:22 +0800
> 
> >> '_getmbcp' is used
> > Maybe the problem is that the codepage used for the console output is
> > different from the system's ANSI codepage?  What does GetConsoleOutputCP
> > return in the case you describe?
> >
> > What happens if ENABLE_NLS is defined?  Your patch only handles the
> > situation where ENABLE_NLS is NOT defined.
> 
> Yes, my patch only handles the situation I meet with, by using necessary
> predefined conditions. I leave the others untouched, because I don't have
> the needed libraries.
> I hope others who has the environments can test it and turn on the switches
> when necessary.
> 
> And I can tell you that 'GetConsoleOutputCP' returns the codepage as command
> 'chcp'. It is right. The gnu 'vsnprintf' doesn't work right with 'setlocale'
> omitted.

Sorry, I don't follow.  Does GetConsoleOutputCP return the same value
as _getmbcp, or does it return a different value?

Re: [Bug-wget] Patch: Fix printing mutibyte characters as unprintable characters on Windows

2017-11-04 Thread Eli Zaretskii

> From: "YX Hao" 
> Cc: "'Eli Zaretskii'" 
> Date: Fri, 3 Nov 2017 20:14:02 +0800
> 
> Second, as my test, 'setlocale' is needed for the gnu printf related
> functions to work correctly on mutibyte characters. You can see that as
> attached screensots:
> setlocale_936.png
> setlocale_empty.png
> setlocale_omitted-OCP-0.png, the 3 results are the same

What do you mean by "gnu printf related functions"?  If this is a
build that doesn't define ENABLE_NLS, then wget outputs the original
text using the MS runtime versions of printf.  And in a build that
does define ENABLE_NLS, the text is additionally processed by the GNU
gettext library.  So is the problem with the build which defines
ENABLE_NLS or the build that didn't define ENABLE_NLS?  Or is it with
both?
> > Your change calls setlocale with a different value, does that even when
> 
> One tricky situation: one PC is all set to United States, except the
> multibyte code page is 936, for example.
> So, '_getmbcp' is used.

Maybe the problem is that the codepage used for the console output is
different from the system's ANSI codepage?  What does
GetConsoleOutputCP return in the case you describe?

> static void
> i18n_initialize (void)
> {
> +#if defined(WINDOWS) && !defined(ENABLE_NLS)
> +  char MBCP[8] = "";
> +  int CP;
> +#endif
> +
>   /* ENABLE_NLS implies existence of functions invoked here.  */
> #ifdef ENABLE_NLS
>   /* Set the current locale.  */
>   setlocale (LC_ALL, "");
>   /* Set the text message domain.  */
>   bindtextdomain ("wget", LOCALEDIR);
>   textdomain ("wget");
> #endif /* ENABLE_NLS */
> +
> +#if defined(WINDOWS) && !defined(ENABLE_NLS)
> +  CP = _getmbcp(); /* Consider it's different from default. */
> +  if (CP > 0)
> +sprintf(MBCP, ".%d", CP);
> +  setlocale(LC_ALL, MBCP);
> +#endif }

What happens if ENABLE_NLS is defined?  Your patch only handles the
situation where ENABLE_NLS is NOT defined.

Re: [Bug-wget] Patch: Fix printing mutibyte characters as unprintable characters on Windows

2017-11-02 Thread Eli Zaretskii

> From: "YX Hao" 
> Date: Thu, 2 Nov 2017 21:09:31 +0800
> 
> During my daily use, I've found a few small bugs and made the patches.
> I will email them in standalone topics. Patch is attached.
> 
> I made the patch on Windows. I think it shouldn't break anything on other
> platforms. Please take a review :)

Thanks.

I'm not Tim, but I have a few questions about your patches.

> 1. setlocale

Can you explain why you needed this?  wget already calls setlocale:

  static void
  i18n_initialize (void)
  {
/* ENABLE_NLS implies existence of functions invoked here.  */
  #ifdef ENABLE_NLS
/* Set the current locale.  */
setlocale (LC_ALL, "");
/* Set the text message domain.  */
bindtextdomain ("wget", LOCALEDIR);
textdomain ("wget");
  #endif /* ENABLE_NLS */
  }

Your change calls setlocale with a different value, does that even
when ENABLE_NLS is not defined, and also runs the risk of using a
wrong codepage, if _getmbcp returns zero (as MSDN says it could).  Why
is that needed?

> +#ifdef WINDOWS
> +  CP = _getmbcp(); /* Consider it's different from default. */

Why would it be different from default, and if it is, why doesn't the
call to setlocale shown above do its job?

Re: [Bug-wget] Report of potential bug in wget-1.19.1-win64

2017-08-08 Thread Eli Zaretskii

> Date: Tue, 8 Aug 2017 16:35:50 +0200
> From: Darshit Shah 
> Cc: bug-wget@gnu.org
> 
> It is an interesting bug when it works on a 32-bit build but not a 
> 64-bit build. I tested all three URLs on my local Linux machine with 
> both a 32-bit and 64-bit build of Wget 1.19 and found no issues.

The 64-bit model used on Windows is different from what's on Posix
platforms, as you probably know very well: Windows is LLP64.  So some
bugs might only be visible on Windows.

> Sadly, I do not have a Windows machine to test this on. The debug logs 
> (with the -d switch) may be useful. Or someone else that understands 
> Windows better could try and help.

One thing to do is look in the Windows Event Log for any messages
related to the crash.  They might show some details about where the
crash happened.

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: William Higgs 
> Cc: ,
>   'Jernej Simončič' 
> Date: Sun, 14 May 2017 21:17:02 -0400
> 
> Hey guys.  So while I was doing some research, I found the following post
> located at
> https://stackoverflow.com/questions/35004832/wget-exe-for-windows-10/3796296
> 5#37962965
> :
> "eternallybored build will crash when you are downloading a large file.
> This can be avoided by disabling LFH (Low Fragmentation Heap) by GlobalFlag
> registry."

Makes absolutely no sense to me.  LFH is the default heap allocation
strategy on MS-Windows since Vista; disabling it is only justified
when running a program under a debugger.  Disabling LFH globally for
your entire system means you risk running out of heap memory in some
memory-intensive applications, utterly unrelated to wget.

If that particular build of wget crashes when LFH is in use, it most
probably means a subtle memory-allocation bug, which is simply swept
under the carpet by changing the algorithm for heap allocation.  So I
would suggest to simply switch to a different build of wget, instead
of compromising your entire system.

> However, after looking into how to do this, I cannot find an explanation as
> to how to do this.  Can someone please provide some assistance?

  
https://support.microsoft.com/en-us/help/929136/why-the-low-fragmentation-heap-lfh-mechanism-may-be-disabled-on-some-computers-that-are-running-windows-server-2003,-windows-xp,-or-windows-2000

But I'm not sure this will work on Windows 10, and I urge you not to
do this in the first place.

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: William Higgs , 'Darshit Shah' 
> , Eli Zaretskii 
> Date: Sun, 14 May 2017 20:17:29 +0200
> 
> @Eli Mybe you could test latest master without a debugger as well !?

You mean Jernej, not Eli, right?

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: Darshit Shah 
> Date: Sun, 14 May 2017 19:17:40 +0200
> Cc: Bug-Wget 
> 
> I can't reproduce the issue on my Linux system, and as Eli says he can't
> reproduce it with his binaries either. Of course, he has linked to a much
> older version of Wget, but if I understand right, he even has newer builds
> available.

Actually, no, 1.16.1 is the last one I have built.  I could try
building 1.19 if we suspect it has some Windows specific problems.

Anyway, I see 2 important differences between my build and the one
William uses (apart of the version):

  . my build is 32-bit, William uses a 64-bit build
  . my build uses GnuTLS, William's uses openssl

Not sure any of these explain the crashes, but I thought I'd mention
them.

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> Date: Sun, 14 May 2017 19:30:04 +0200
> From: Jernej Simončič 
> 
> The wget binary I have on my site
> (https://eternallybored.org/misc/wget/) sometimes crashes when
> finishing download, but I haven't been able to reproduce the crash
> under a debugger (and I haven't had it crash since I installed DrMingw
> as JIT debugger either), so I'm not sure where the problem is.

Is that binary also of wget 1.19.1?  If so, perhaps there's a bug in
that particular version.

Also, is your build with libssl or libgnutls?  Does it support ipv6?

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: William Higgs 
> Date: Sun, 14 May 2017 12:13:58 -0400
> 
> So just to be clear, you want me to use an older release of wget?

No, I'm just saying that the version I built worked without crashing.
You may wish to try it; if it works on your system, it might mean the
problem is not with the OS, but with the wget build you have.

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: William Higgs 
> Date: Sun, 14 May 2017 10:27:12 -0400
> 
> And I saw that you had stated that it was working on Windows 7, which
> further convinces me that it is probably a windows 10 thing.  I originally
> thought this was the case because, while the faulting application is wget,
> the faulting module (module I assume to mean what actually caused the crash
> in the application), is ntdll.dll, which is a core system dll.  But sfc
> scans return no issues..

I'm not sure this is a Windows problem.  Crashes inside system DLLs
more often than not are caused by bugs in the applications.

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: William Higgs 
> Date: Sun, 14 May 2017 10:23:35 -0400
> 
> The txt file contains the output from the command "wget --version".

Maybe I'm missing something, but I don't see that.  All I see is this:

  Description   : Faulting application name: wget.exe, version: 0.0.0.0, 
time stamp: 0x003cc610

which doesn't show the wget version.

I tried with wget 1.16.1, which I built myself.  You can find its
binaries here:

  
https://sourceforge.net/projects/ezwinports/files/wget-1.16.1-w32-bin.zip/download

> As for the second question, I obtained the binaries by utilizing
> chocolatey's package management system.  If you are more familiar
> with Linux, you can think of chocolatey as the official unofficial
> (while not directly supported by Microsoft, Microsoft has
> incorporated its use into Powershell 5's package management cmdlets)
> "apt-get" for Windows (https://chocolatey.org/).

Thanks for the info.

P.S. Please keep the list address on the CC.

Re: [Bug-wget] Wget keeps crashing on me

2017-05-14 Thread Eli Zaretskii

> From: William Higgs 
> Date: Sat, 13 May 2017 17:37:13 -0400
> 
> So this may have nothing to do with wget (probably very likely, as Windows
> 10 creators update continues to be a very large thorn in my side), but wget
> keeps crashing when I run the attached bat file (converted to txt).  I saved
> the event logs associated with the crash, but again, it looks more like an
> os issue than wget.  Still, wanted to get your opinion on the matter.  Also,
> thanks for the awesome, quality, free software.

FWIW, this works for me, on Windows 7.

What version of wget did you use, and where did you get the binaries?

Re: [Bug-wget] GSoC Project | Design and Implementation of a Framework for Plugins

2017-03-20 Thread Eli Zaretskii

> Date: Tue, 21 Mar 2017 02:29:20 +0700
> From: Didik Setiawan 
> 
> > One way to implement plugins is via libdl (dlopen(), ...), and that is what 
> > I 
> > have in mind. That is not perfectly portable, but our first goal will be 
> > systems that support dlopen().
> 
> So, should I continue to use dlopen() or there is another better method? In 
> case 
> we need more portability.

I'd suggest to use libltdl (part of libtool), which will make these
features more portable.

Note that the GNU project's practice for plugins is to require that
any compatible plugin exports a symbol named plugin_is_GPL_compatible,
to signal that it's released under GPL or a compatible license.  I
think any framework should have the verification of this as its part.

Thanks.

Re: [Bug-wget] Vulnerability Report - CRLF Injection in Wget Host Part

2017-03-06 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Date: Mon, 06 Mar 2017 10:17:25 +0100
> Cc: Orange Tsai 
> 
> Thanks, just pushed a commit, not allowing control chars in host part.

Hmm... is it really enough to reject only ASCII control characters?
Maybe we should also reject control characters from other Unicode
ranges?  Just a thought.

Re: [Bug-wget] Fwd: PATCH: bugs 20369 and 20389

2017-03-04 Thread Eli Zaretskii

> From: Vijo Cherian 
> Date: Fri, 3 Mar 2017 11:33:05 -0800
> Cc: bug-wget@gnu.org
> 
> bool
> file_exists_p (const char *filename, file_stats_t *fstats)
> {
>   struct stat buf;
> 
> #if defined(WINDOWS) || defined(__VMS)
> return stat (filename, &buf) >= 0;
> #else

This leaves fstats untouched on Windows.  At least access_err should
be set, I think.

Re: [Bug-wget] Fwd: PATCH: bugs 20369 and 20389

2017-03-03 Thread Eli Zaretskii

> From: Vijo Cherian 
> Date: Thu, 2 Mar 2017 18:47:11 -0800
> 
> Changes
>   - Bug #20369 - Safeguards against TOCTTOU
> Added safe_fopen() and safe_open() that checks to makes sure the file
> didn't change underneath us.
>   - Bug #20389 - Return error from file_exists_p()
> Added a way to return error from this file without major surgery to
> the callers.

Allow me a few comments to your patch.

> +  errno = 0;
> +  if (stat (filename, &buf) >= 0 && S_ISREG(buf.st_mode) &&

'stat' is documented to return 0 upon success, so I don't think a
positive return value should be considered a success.

> +  (((S_IRUSR & buf.st_mode) && (getuid() == buf.st_uid))  ||
> +   ((S_IRGRP & buf.st_mode) && group_member(buf.st_gid))  ||
> +(S_IROTH & buf.st_mode))) {

These tests assume Posix semantics, and will be too restrictive on
MS-Windows, for example.

> +if (fstats != NULL) {
> +  logprintf (LOG_VERBOSE, _("File %s exists, but NULL for fstats\n"), 
> filename);

The log message says fstats is NULL, but it isn't.

> +  fstats->access_err = 0;
> +  fstats->st_ino = buf.st_ino;
> +  fstats->st_dev = buf.st_dev;
> +}
> +logprintf (LOG_VERBOSE, _("%s exists!!\n"), filename);
> +return true;
> +  } else {
> +if (fstats != NULL) {
> +  fstats->access_err = (errno == 0 ? EACCES : errno);
> +  logprintf (LOG_VERBOSE, _("File %s is not accessible\n"), filename);
> +}
> +logprintf (LOG_VERBOSE, _("File %s doesn't exist\n"), filename);
> +errno = 0;
> +return false;

Do we really need such detailed log messages for such a trivial check?

Also, the name of the function and its commentary seem to no longer
describe what it actually does.  The commentary should also describe
the return value.

> +/* Safe_fopen assumes that file_exists_p() was called earlier. 

The name of the function doesn't describe what it does.

Also, instead of "assumes that file_exists_p() was called earlier",
I'd suggest to state that the FSTATS argument should be available,
e.g. by calling file_exists_p.

> +  if (fstats != NULL && 
> +  (fdstats.st_dev != fstats->st_dev ||
> +   fdstats.st_ino != fstats->st_ino)) {

These are Posix assumptions; on Windows you will get meaningless
results from such a test.  I suggest to have a function for this, with
different implementations on Posix and non-Posix platforms.

Same comments for safe_open.

Thanks for working on this.

Re: [Bug-wget] patch: Improve the rolling file name length for downloading progress image when without NLS

2017-02-17 Thread Eli Zaretskii

> Date: Fri, 17 Feb 2017 14:44:02 +0100
> From: "Andries E. Brouwer" 
> Cc: "Andries E. Brouwer" , tim.rueh...@gmx.de,
> bug-wget@gnu.org, lifenjoi...@163.com
> 
> On Fri, Feb 17, 2017 at 03:38:40PM +0200, Eli Zaretskii wrote:
> 
> > Fonts indeed can affect the visual width, but if we assume that the
> > terminal font is a fixed-pitch one, that problem is much less
> > significant, IME.
> 
> Experience shows that one may get a single-width replacement symbol
> when the actual double width symbol is not available.

That happens, yes.  But again, it's unrelated to the encoding of the
writes to the terminal.  It only depends on the terminal fonts used.

Re: [Bug-wget] patch: Improve the rolling file name length for downloading progress image when without NLS

2017-02-17 Thread Eli Zaretskii

> Date: Fri, 17 Feb 2017 13:41:38 +0100
> From: "Andries E. Brouwer" 
> Cc: bug-wget@gnu.org, "Andries E. Brouwer" ,
>   YX Hao 
> 
> So the unicode part is straightforward. Other encodings are messy.
> For SJIS and similar multi-byte character sets one can have
> (per character set) descriptions of which codes are single-byte
> and which double-byte, and how wide the resulting symbol is.
> For ISO 2022 type encoding, with embedded escape sequences,
> life is more difficult.

I'm probably missing something here, because I don't understand how
the encoding of what wget writes to the terminal can affect the width
of the characters displayed by that terminal.  That width is the
property of the character itself, and doesn't depend on how you tell
the terminal to display the character.

Fonts indeed can affect the visual width, but if we assume that the
terminal font is a fixed-pitch one, that problem is much less
significant, IME.

Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS

2017-02-17 Thread Eli Zaretskii

> From: William Prescott 
> Date: Fri, 17 Feb 2017 04:31:18 -0500
> 
> I would imagine that it could use the value that "--local-encoding"
> gets. I used UTF-8 here as an example, since that is what my terminal
> is set to.

So the example you show had --local-encoding=utf-8 specified on the
wget command line?  Then it indeed sounds like a bug.

Re: [Bug-wget] patch: Improve the rolling file name length for downloading progress image when without NLS

2017-02-17 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Date: Fri, 17 Feb 2017 09:48:23 +0100
> Cc: "Andries E. Brouwer" , YX Hao 
> 
> Calculating the number of displayed columns from the number of bytes of a 
> string is non-trivial. It is trivial only for charsets/locales where each 
> byte 
> (or codepoint) will take exactly one column on the display.
> 
> With unicode you have to *at least* compose the string first (NFC I guess), 
> and 
> then count the codepoints. But I am not sure about exceptions.
> 
> @Andries Do you know an algorithm how to calculate the columns from a given 
> string + encoding ?

I'm not Andries, but AFAIK there's a file in the Unicode Character
Database (UCD) called EastAsianWidth.txt which provides the width
information.

There's also this (which is a derivative of the UCD data):

  https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS

2017-02-17 Thread Eli Zaretskii

> From: William Prescott 
> Date: Fri, 17 Feb 2017 03:34:20 -0500
> 
> > I would also like to note that, even when the the document's links don't
> > contain a tilde, Wget will still fail to fetch the pages as long as there
> > is a tilde in the URL the Wget was called with.
> 
> Let's consider the (UTF-8) URL "http://example.com/~foo/bar.html";
> bar.html is Shift_JIS encoded and contains:
> 
> Baz
> 
> (this time, bar.html is perfectly valid Shift_JIS and doesn't have a tilde)
> 
> A recursive download will fail, because the relative URL appears to get
> processed as
> sjis_to_utf8(utf8_to_sjis("http://example.com/~foo/";) + sjis("baz.html"))
> resulting in
> http://example.com/‾foo/baz.html
> 
> I would have expected
> utf8("http://example.com/~foo/";) + sjis_to_utf8("baz.html")
> resulting in
> http://example.com/~foo/baz.html

How should wget know that "http://example.com/~foo/bar.html"; comes
from a UTF-8 encoding?  Where should that piece of information come
from?

Re: [Bug-wget] patch: Stored file name coversion logic correction

2017-02-15 Thread Eli Zaretskii

> Date: Thu, 16 Feb 2017 12:42:23 +0800 (CST)
> From: "YX Hao" 
> 
> I downloaded the 'mbox format' original, and found out the reason why you 
> can't reproduce the issue.
> The non-ASCII characters you use is encoded in "iso-8859-1" in your email, 
> and should be displayed correctly in your environment.
> So, your encoding is compatible with 'UTF8', which is the remote server's 
> default encoding. That won't cause iconv error :)
> Think about 'UFT8' incompatible encoding envrionments ...

Maybe I misunderstand, but ISO-8859-1 (a.k.a. "Latin-1") is NOT
compatible with UTF-8.  Trying to decode Latin-1 text as UTF-8 will
get you errors from the conversion routines, because Latin-1 byte
sequences are generally not valid UTF-8 sequences.

Re: [Bug-wget] [PATCH] utils: rename base64_{encode,decode}

2016-12-15 Thread Eli Zaretskii

> From: Rahul Bedarkar 
> Date: Thu, 15 Dec 2016 12:57:16 +0530
> 
> In case of static build, all symbols are visible. Since GnuTLS is static 
> library, which is just archive of object files, linking happens at 
> caller end i.e. wget, linker don't know what to (un)export. That's why 
> we see definition clash in static builds. Please correct me if I'm 
> missing something.

That's not so: static linking will only pull from a static library
symbols that are not already resolved by earlier object files the
linker processed.  In this case, since that symbol should have been
satisfied by wget's own function, the linker had no reason to use the
one in the library.  Unless, that is, you submitted the explicit
library file name to the linker command line, instead of using -lgnutls.

Re: [Bug-wget] Query about correcting for DST with Wget

2016-11-15 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Date: Tue, 15 Nov 2016 10:41:40 +0100
> 
> > If we care about this, we could have a private implementation of
> > utimes in mswindows.c, which would DTRT in the use cases needed by
> > Wget.
> 
> Wget uses utime() if available. This function is not covered by gnulib and it 
> is obsoleted by POSIX 2008.
> 
> Instead we should use utimens which is covered by gnulib and circumvents 
> several issues. Currently, I can see no special Windows code in gnulib - but 
> if the issue persists, it should IMO be fixed in gnulib.
> 
> WDYT ?

I won't hold my breath for this to solve the issue.  At best, gnulib
will probably call utimes in its utimens implementation, which will
reset us back to square one.  At worse, they will ask us to provide
the missing implementation for MS-Windows.

This issue is not solvable without calling win32 APIs directly,
because the MS C runtime function all behave consistently -- and
wrongly -- in this case.

Re: [Bug-wget] Query about correcting for DST with Wget

2016-11-14 Thread Eli Zaretskii

> Date: Sun, 13 Nov 2016 21:39:32 +0100
> From: Jernej Simončič 
> 
> On Sunday, November 13, 2016, 19:53:06, Eli Zaretskii wrote:
> 
> > Does "while DST is in effect" mean that you download the file when DST
> > is in effect, or you examine the timestamp of the file when the DST is
> > in effect?
> 
> I download the file when DST is(n't) in effect (I download that
> specific URL quite often, on different computers).

Then yes, it's a known problem with how the MS-Windows implementation
of the utimes function works: it converts (internally) from local time
to UTC using the current setting of the DST flag, not its setting at
the time being converted.  The irony of this is that there should be
no need to go through local time in this case, because the timestamp
provided by Wget is in UTC to begin with, and the low-level Windows
APIs that timestamp files accept UTC values.

If we care about this, we could have a private implementation of
utimes in mswindows.c, which would DTRT in the use cases needed by
Wget.

> I just remembered that there may be a 3rd explanation: some msvcrt
> functions return different timestamps depending on whether DST is in
> effect or not - at least with GIMP you can observe that it'll rescan
> all fonts the first time it's run after DST change (I'm not sure if
> this applies only to msvcrt.dll [which MinGW uses by default], or also
> to the runtimes shipped with newer Visual Studio versions).

You are talking about 'stat', I believe.  Yes, they, too, have a
similar bug, but 'stat' is not involved in the code in Wget that sets
the timestamps of downloaded files.

Re: [Bug-wget] Query about correcting for DST with Wget

2016-11-13 Thread Eli Zaretskii

> Date: Sun, 13 Nov 2016 19:10:57 +0100
> From: Jernej Simončič 
> 
> I'm not sure if this is a problem with wget, Windows or the server
> hosting the file, but I observed this happening with
>  - while DST is in effect,
> the file gets timestamp of 22:19, and when it's not it's 23:19 (I'm in
> the CET timezone).

Does "while DST is in effect" mean that you download the file when DST
is in effect, or you examine the timestamp of the file when the DST is
in effect?

Also, how do you display the timestamp of the file? with what program?

Re: [Bug-wget] Query about correcting for DST with Wget

2016-11-10 Thread Eli Zaretskii

> From: "Tim" 
> Date: Thu, 10 Nov 2016 19:26:45 -
> 
> I would be very grateful for any help with an issue I am having with 
> downloading files from a website using Version 1.11.4.3287 of Wget on a 
> Windows XP computer.

That's a very old version of Wget.

> When I use Wget to download a file from a website, the timestamp is out by an 
> hour. I think this is because of Daylight Saving Time. Do any of you know how 
> I can correct this?

Can you tell more details, like the exact URL you downloaded and how
you see the 1-hour difference?  I'd like to try to reproduce this
here.

In general, Windows XP has a database of DST offsets and should use it
to avoid such problems, but maybe Wget doesn't DTRT in this matter,
somehow, at least in your case.

Re: [Bug-wget] Fwd: Re: [PATCH v3] bug #45790: wget prints it's progress even when background

2016-10-19 Thread Eli Zaretskii

> From: "Wajda, Piotr" 
> Date: Wed, 19 Oct 2016 12:18:13 +0200
>
> For CTRL+Break we could probably go to background on windows by forking
> process using current fake_fork method. Child process should be then
> started with -c and -b.

Could be, although it'd be a strange thing to do for Ctrl+BREAK,
IMO, because it's akin to Unix SIGQUIT.

Re: [Bug-wget] [PATCH v3] bug #45790: wget prints it's progress even when background

2016-10-19 Thread Eli Zaretskii

> From: "Wajda, Piotr" 
> Date: Wed, 19 Oct 2016 11:57:06 +0200
> 
> My only confusion was that during testing on windows, when sending 
> CTRL+C or CTRL+Break it immediately terminates, which is basically what 
> I think it should do for CTRL+C. Not sure about CTRL+Break.

What else is reasonable for CTRL+Break?  We can arrange for them to
produce different effects, if there are two alternative behaviors that
would make sense.

Thanks.

Re: [Bug-wget] strerror() on Win32

2016-10-13 Thread Eli Zaretskii

> From: Gisle Vanem 
> Date: Thu, 13 Oct 2016 22:42:03 +0200
> 
> I think I've mentioned earlier; the troubles with strerror()
> returning "Unknown error" for seemingly common 'errno' values.
> 
> I hit me today, when connection to my ftp-hosting service. From
> the Wsock-trace [1] of connect():
> 
>   * 49.163 sec: f:/MingW32/src/gnu/gnulib/lib/connect.c(43) (rpl_connect+64):
> connect (620, 46.30.213.77:21, fam AF_INET) --> WSAETIMEDOUT (10060).
> 
> failed: Unknown error.
> 
> I put some trace-code in Wget's connect.c and do see 'errno' is 138.
> Which is ETIMEDOUT as defined by Gnulib's . But I fail to
> understand why Gnulib's strerror(138) is incapable of handling it.
> 
> Looking at Gnulib's strerror-override.c, I see it should return
> "Connection timed out" there. But it doesn't. Any pointers?

Didn't we already have a similar discussion?  I think you told about
some connect attempt that times out, but the error message doesn't
mention timeout?  And I tried that with my MinGW-compiled Wget, and
couldn't reproduce the problem, because my Wget did report "Connection
timed out"?

My guess is that for some reason Wget calls the MS-Windows strerror,
not its Gnulib replacement.  But that's a guess, and I don't know how
to explain it.  Perhaps put a breakpoint both at the Gnulib strerror
and the MS runtime one, and see what happens in your scenario.

Failing that, if you can show a recipe for reproducing this, including
a URL to use, I could see what happens on my system, and maybe we will
see the light.

Re: [Bug-wget] [PATCH v3] bug #45790: wget prints it's progress even when background

2016-10-07 Thread Eli Zaretskii

> From: losgrandes 
> Date: Thu,  6 Oct 2016 09:47:01 +0200
> 
> Fortunately I tested wget.exe in normal mode and background mode (-b). Was ok.
> Unfortunately I haven't tested wget.exe with CTRL+Break/CTRL+C (is it really 
> works on windows?).

Yes, CTRL-C/CTRL-BREAK should work on Windows.  What didn't work in
your case?

> 1. Could you be so kind and test my wget.exe with CTRL+Break?

Send your test instructions, and I will try to build and test it here.

> 2. Advise me with error I get while compiling:
>   main.o:main.c:(.text+0x579): undefined reference to `pipe'

The Windows runtime doesn't have 'pipe', it has '_pipe' (with a
slightly different argument list).  I believe we need the Gnulib pipe
module to get this to compile.  However, just as a quick hack, replace
the call to 'pipe' with a corresponding call to '_pipe', you can find
its documentation here:

  https://msdn.microsoft.com/en-us/library/edze9h7e.aspx

(This problem is unrelated to your changes, the call to 'pipe' is
already in the repository.)

>   url.o:url.c:(.text+0x1e78): undefined reference to `libiconv_open'
>   url.o:url.c:(.text+0x1f25): undefined reference to `libiconv'
>   url.o:url.c:(.text+0x1f57): undefined reference to `libiconv'
>   url.o:url.c:(.text+0x1f7e): undefined reference to `libiconv_close'
>   url.o:url.c:(.text+0x20ee): undefined reference to `libiconv_close'
>   collect2: error: ld returned 1 exit status
> 
> This was generated by:
> ./configure --host=i686-w64-mingw32 --without-ssl --without-libidn 
> --without-metalink --with-gpgme-prefix=/dev/null 
> CFLAGS=-I$BUILDDIR/tmp/include LDFLAGS=-L$BUILDDIR/tmp/lib 
> --with-libiconv-prefix=$BUILDDIR/tmp CFLAGS=-liconv

I think you need to add -liconv to LIBS, not to CFLAGS.  GNU ld is a
one-pass linker, so it needs to see -liconv _after_ all the object
files, not before.

Re: [Bug-wget] [PATCH v2] bug #45790: wget prints it's progress even when background

2016-10-02 Thread Eli Zaretskii

> Cc: Eli Zaretskii 
> From: "pwa...@gmail.net.pl" 
> Date: Sun, 2 Oct 2016 21:54:58 +0200
> 
> Is there a instruction on how to compile current wget version on windows?

Nothing special, just "./configure && make", as you'd do on a Posix
system.

The trick is to have a development environment that supports the
above.  I use MSYS and MinGW from mingw.org:

  https://sourceforge.net/projects/mingw/files/

If you don't have the dependency libraries, you need to install them
first.  They are also built as above, but you can find precompiled
32-bit binaries and the corresponding header files here:

  https://sourceforge.net/projects/ezwinports/files

If you want to build a 64-bit version of Wget, I suggest to get the
dependencies from the MSYS2 project, starting with this page's
instructions:

  https://msys2.github.io/

The MSYS2/MinGW64 development environment can be installed from that
page as well (if you don't have it).

Re: [Bug-wget] wget for windows - current build?

2016-10-02 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: ge...@mweb.co.za
> Date: Sat, 01 Oct 2016 20:12:28 +0200
> 
> If you like to create a README.windows maybe with (basic) explanations on how 
> to build wget on Windows plus pointers to your port(s), we include it into 
> the 
> project.

That's okay (will do when I have time), but I think it would be more
useful to have a link on the Wget Web page to the places where Windows
binaries can be found.

> BTW, meanwhile libidn fixed several security issues, as well as gnutls, 
> libpsl 
> and wget itself ;-)

Duly noted.

Re: [Bug-wget] wget for windows - current build?

2016-10-01 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: ge...@mweb.co.za
> Date: Sat, 01 Oct 2016 18:10:26 +0200
> 
> > > It shouldn't be too hard to write a script that cross-compiles wget and
> > > some dependencies via mingw. But would such an .exe really work on a real
> > > Windows machine ?
> > 
> > I'm not sure I understand the question.  If cross-compiling works,
> > then why won't the result run as expected?
> 
> Well, some years ago I copied cross compiled executables (32bit) onto a WinXP 
> machine. Executing these didn't error, but they immediately returned without 
> doing anything. Even the first printf() line didn't do anything.
> While executing the same executables with wine on the machine that I used for 
> compilation, they worked fine.

Sounds like some incompatibility between the import libraries you had
in that cross-environment and the corresponding DLLs on the target
Windows XP machine.

> While it seems pretty easy to generate a wget.exe on Linux and even run it 
> through wine, it seems not to work out that easily on a real Windows. At 
> least 
> these questions for a recent Windows executable are pretty common - and the 
> Windows affine users here do not have a easy solution as it seems.

Building Wget on Windows is easy if you have an operational
development environment.  What's not easy is running the test suite
and figuring out what each failure means, then fixing the sources as
needed.

Anyway, in addition to my site, which offers a 32-bit build, the MSYS2
project offers both 32-bit and 64-bit builds (although I cannot vouch
for their thoroughness in running the test suite -- not that I know
they didn't, mind you).  People who ask about that must be doing that
out of ignorance; perhaps we should include pointers to those places
in the distribution.

Re: [Bug-wget] wget for windows - current build?

2016-10-01 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: "ge...@mweb.co.za" 
> Date: Sat, 01 Oct 2016 13:04:25 +0200
> 
> It shouldn't be too hard to write a script that cross-compiles wget and some 
> dependencies via mingw. But would such an .exe really work on a real Windows 
> machine ?

I'm not sure I understand the question.  If cross-compiling works,
then why won't the result run as expected?

Re: [Bug-wget] wget for windows - current build?

2016-09-30 Thread Eli Zaretskii

> Date: Fri, 30 Sep 2016 16:52:55 +0200 (SAST)
> From: "ge...@mweb.co.za" 
> 
> So, is there a "secret" new place hosting a newer version for Windows? Or is 
> the 1.11 on sourceforge actually okay? And - while I am already asking all 
> these stupid questions - would that version actually handle larger file sizes 
> already? 

You can find a 32-bit Windows port of 1.16.1 here:

  https://sourceforge.net/projects/ezwinports/files/?source=navbar

Re: [Bug-wget] [PATCH v2] bug #45790: wget prints it's progress even when background

2016-09-30 Thread Eli Zaretskii

> From: Piotr Wajda 
> Date: Fri, 30 Sep 2016 09:51:37 +0200
> 
> Hi, Reworked recent patch to behave correctly on fg and bg. Now user can 
> switch from fg to bg and vice versa and wget will select fd accordingly.

Thanks.

> +  /* Initialize this values so we don't have to ask every time we print line 
> */
> +  shell_is_interactive = isatty (STDIN_FILENO);

The MS-Windows version of isatty returns non-zero when its argument
file descriptor is open on any character device.  Notably, this
includes the null device, which is definitely not what we want in this
case, I think.

So I think using this logic will need to import isatty from Gnulib, or
provide an alternative implementation in mswindows.c.

>  static void
>  check_redirect_output (void)
>  {
> -  if (redirect_request == RR_REQUESTED)
> +  /* If it was redirected already to log file by SIGHUP or SIGUSR1, it was 
> permanent */
> +  if(!redirect_request_signal_name && shell_is_interactive)
>  {
> -  redirect_request = RR_DONE;
> -  redirect_output ();
> +  if(tcgetpgrp(STDIN_FILENO) != getpgrp()) 

Neither tcgetpgrp nor getpgrp exist on MS-Windows.

AFAIU, this test is intended to check whether wget was backgrounded.
Since AFAIK that's not possible on MS-Windows, this test should always
return zero on Windows, so I suggest a separate predicate function
with 2 implementations: one on Windows, the other on Posix platforms.

Re: [Bug-wget] [PATCH 20/25] New option --metalink-index to process Metalink application/metalink4+xml

2016-09-16 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Cc: mehw.is...@inventati.org, bug-wget@gnu.org
> Date: Fri, 16 Sep 2016 11:15:31 +0200
> 
> > So if wget needs to create or open such files, it needs to replace the
> > colon with some other character, like '!'.
> 
> That is what I meant with 'Wget has functions to percent escape special 
> characters...'. It is not only colons. And it depends on the OS (and/or file 
> system).

OK, so the problem should not exist, good.

> From https://en.wikipedia.org/wiki/Comparison_of_file_systems:
> "MS-DOS, Microsoft Windows, and OS/2 disallow the characters \ / : ? * " > < 
> | 
> and NUL in file and directory names across all filesystems. Unices and Linux 
> disallow the characters / and NUL in file and directory names across all 
> filesystems."

This is a better reference, JFYI:

  
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

Re: [Bug-wget] [PATCH 20/25] New option --metalink-index to process Metalink application/metalink4+xml

2016-09-16 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Date: Fri, 16 Sep 2016 10:15:17 +0200
> Cc: bug-wget@gnu.org
> 
> >   *name  +  ref   -> result
> >   -
> >   NULL   + "foo/C:D:file" -> "file" [bare basename]
> >   "foobar"   + "foo/C:D:file" -> "file" [bare basename]
> >   "dir/old"  + "foo/C:D:file" -> "dir/C:D:file"
> >   "C:D:file/old" + "foo/E:F:new"  -> "C:D:file/E:F:new" [is this ok?]
> 
> Just make sure that no file name beginning with letter+colon is used for 
> system 
> calls on Windows (e.g. open("C:D:file/E:F:new", ...) is not a good idea). 
> Either you strip the 'C:D:', or percent escape ':' on Windows. Wget has 
> functions to percent escape special characters in file names, depending on 
> the 
> OS it is built on.

(I've lost track of this discussion, and don't understand the context
well enough to get back on track, so please bear with me.)

Windows filesystems will not allow file names that have embedded colon
characters, except if that colon is part of the drive specification at
the beginning of a file name, as in "D:/dir/file".  File names like
the 2 last results above are not allowed, and cannot be created or
opened.

So if wget needs to create or open such files, it needs to replace the
colon with some other character, like '!'.

Again, apologies if this comment makes no sense in the context of
whatever you've been discussing.

Re: [Bug-wget] [PATCH 09/25] Enforce Metalink file name verification, strip directory if necessary

2016-09-12 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Date: Mon, 12 Sep 2016 13:00:32 +0200
> 
> > +  char *basename = name;
> > +
> > +  while ((name = strstr (basename, "/")))
> > +basename = name + 1;
> 
> Could you use strrchr() ? something like
> 
> char *basename = strrchr (name, '/');
> 
> if (basename)
>   basename += 1;
> else
>   basename = name;

I think we want to use ISSEP, no?  Otherwise Windows file names with
backslashes will misfire.

Re: [Bug-wget] Wget - acess list bypass / race condition PoC

2016-08-21 Thread Eli Zaretskii

> From: Giuseppe Scrivano 
> Date: Sun, 21 Aug 2016 15:26:58 +0200
> Cc: "bug-wget@gnu.org" ,
>   Dawid Golunski ,
>   "kseifr...@redhat.com" 
> 
>  #else /* def __VMS */
> -  *fp = fopen (hs->local_file, "wb");
> +  if (opt.delete_after
> +|| opt.spider /* opt.recursive is implicitely true */
> +|| !acceptable (hs->local_file))
> +{
> +  *fp = fdopen (open (hs->local_file, O_CREAT | O_TRUNC | 
> O_WRONLY, S_IRUSR | S_IWUSR), "wb");
> +}

For this to work on MS-Windows, the 'open' call should use O_BINARY,
in addition to the other flags.  Otherwise, the "b" in "wb" of
'fdopen' will be ignored by the MS runtime.

Thanks.

Re: [Bug-wget] [PATCH] wget: Add --ssh-askpass support

2016-07-23 Thread Eli Zaretskii

> From: j...@wxcvbn.org (Jeremie Courreges-Anglas)
> Cc: "Liam R. Howlett" , bug-wget@gnu.org
> Date: Sat, 23 Jul 2016 21:24:33 +0200
> 
> > This implementation is unnecessarily non-portable ('fork' doesn't
> > exist on some supported platforms).  I suggest to use a much more
> > portable 'popen' instead.
> 
> popen(3) may be more portable but is it subject to all the problems
> brought by "sh -c": the string may contain shell metacharacters, etc.

Nothing command-line quoting cannot handle, surely.

> What worries me is the use of strace(1), which is afaik available only
> on Linux. OpenBSD for example doesn't have it.  Why would strace(1) be
> needed here?

Right.

Re: [Bug-wget] [PATCH] wget: Add --ssh-askpass support

2016-07-23 Thread Eli Zaretskii

> From: "Liam R. Howlett" 
> Date: Fri, 22 Jul 2016 20:24:05 -0400
> Cc: liam.howl...@windriver.com
> 
> This adds the --ssh-askpass option which is disabled by default.

Thanks.

> +
> +/* Execute external application SSH_ASKPASS which is stored in 
> opt.ssh_askpass
> + */
> +void
> +run_ssh_askpass(const char *question, char **answer)
> +{
> +  char tmp[1024];
> +  pid_t pid;
> +  int com[2];
> +
> +  if (pipe(com) == -1)
> +  {
> +fprintf(stderr, _("Cannot create pipe"));
> +exit (WGET_EXIT_GENERIC_ERROR);
> +  }
> +
> +  pid = fork();
> +  if (pid == -1)
> +  {
> +fprintf(stderr, "Error forking SSH_ASKPASS");
> +exit (WGET_EXIT_GENERIC_ERROR);
> +  }
> +  else if (pid == 0)
> +  {
> +/* Child */
> +dup2(com[1], STDOUT_FILENO);
> +close(com[0]);
> +close(com[1]);
> +fprintf(stdout, "test");
> +execlp("/usr/bin/strace", "-s256", "-otest.out", opt.ssh_askpass, 
> question, (char*)NULL);
> +assert("Execlp failed!");
> +  }
> +  else
> +  {
> +close(com[1]);
> +unsigned int bytes = read(com[0], tmp, sizeof(tmp));
> +if (!bytes)
> +{
> +  fprintf(stderr,
> +_("Error reading response from SSH_ASKPASS %s %s\n"),
> +opt.ssh_askpass, question);
> +  exit (WGET_EXIT_GENERIC_ERROR);
> +}
> +else if (bytes > 1)
> +  *answer = strndup(tmp, bytes-1);
> +  }
> +}

This implementation is unnecessarily non-portable ('fork' doesn't
exist on some supported platforms).  I suggest to use a much more
portable 'popen' instead.

Re: [Bug-wget] [PATCH] Trivial changes in HSTS

2016-06-18 Thread Eli Zaretskii

> From: Gisle Vanem 
> Date: Fri, 17 Jun 2016 22:50:27 +0200
> 
> > +static bool
> > +hsts_file_access_valid (const char *filename)
> > +{
> > +  struct_stat st;
> > +
> > +  if (stat (filename, &st) == -1)
> > +return false;
> > +
> > +  return !(st.st_mode & S_IWOTH) && S_ISREG (st.st_mode);
> 
> Due to the above patch, the following output on Wget/Windows seems
> a bit paranoid; wget -d https://vortex.data.microsoft.com/collect/v1
>   ...
>   Reading HSTS entries from c:\Users\Gisle\AppData\Roaming/.wget-hsts
>   Will not apply HSTS. The HSTS database must be a regular and 
> non-world-writable file.
>   ERROR: could not open HSTS store at 
> 'c:\Users\Gisle\AppData\Roaming/.wget-hsts'. HSTS will be disabled.
> 
> On Windows this file is *not* "world-writeable" AFAICS (and yes, it does 
> exists).
> Hence this "paranoia" should be accounted for. I'm not so much into Posix,
> so I'll leave it to you experts to comment & patch.

IMO, this test should be bypassed on Windows.  The "world" part in
"world-writeable" is a Unix-centric notion, and its translation into
MS-Windows ACLs is non-trivial (read: "impossible").  (For example,
your "non-world-writeable" file is accessible to certain users and
groups of users on Windows, other than Administrator.)  So the sanest
solution for this is simply not to make this test on Windows.

Re: [Bug-wget] retrieval failure:Forbidden? for UTF-8-URL in wget that works on FF and IE

2016-06-08 Thread Eli Zaretskii

> Date: Wed, 08 Jun 2016 11:47:46 -0700
> From: "L. A. Walsh" 
> 
> I tried:
> 
> wget "http://translate.google.com/#ja/en/クイーンズブレイド・メインテーマＢ";
> 
> But get a an Error "403: Forbidden" (tried w/ and w/o proxy) -- same.

On what OS and with which version of wget?

Re: [Bug-wget] Progress bar on MS-Windows

2016-06-07 Thread Eli Zaretskii

> From: Gisle Vanem 
> Date: Tue, 7 Jun 2016 09:00:43 +0200
> 
> Compare the attached image wget-progress-1.png:
>   wget --show-progress --quiet -np -r www.watt-32.net/watt-doc/
> 
> VS wget-progress-2.png:
>   wget --show-progress --quiet --limit-rate=2k -np -r 
> www.watt-32.net/watt-doc/
> 
> I think it's a bit strange the final d/l speed isn't "sticky" in both cases.
> Is it because the speed is too high?

I think the download time is so short that wget doesn't have enough to
estimate the speed, before it's all over.

Re: [Bug-wget] Progress bar on MS-Windows

2016-06-06 Thread Eli Zaretskii

> Date: Sat, 04 Jun 2016 13:40:12 +0300
> From: Eli Zaretskii 
> 
> > Here's a build as of commit 7c0752c4cb6575c6720d6e2d4bf4eda61b63e0f1:
> > https://eternallybored.org/misc/wget/test/wget.exe
> 
> Thanks, will try it.

Finally had an opportunity to try this build (needed a very slow
connection and a large file, to see the longest ETA string
displayed).  Indeed, the problem I was trying to solve doesn't exists
in this build, so the patch I proposed is not needed.

Thanks.

Re: [Bug-wget] Progress bar on MS-Windows

2016-06-04 Thread Eli Zaretskii

> Date: Sat, 4 Jun 2016 11:08:18 +0200
> From: Jernej Simončič 
> 
> On Saturday, June 4, 2016, 10:27:56, Eli Zaretskii wrote:
> 
> > Sorry, no.  Not unless someone will be kind enough to produce a
> > complete tarball.  Building from Git requires all kinds of utilities
> > that are not simple to set up on Windows.
> 
> Here's a build as of commit 7c0752c4cb6575c6720d6e2d4bf4eda61b63e0f1:
> https://eternallybored.org/misc/wget/test/wget.exe

Thanks, will try it.

Re: [Bug-wget] Progress bar on MS-Windows

2016-06-04 Thread Eli Zaretskii

> Date: Wed, 1 Jun 2016 10:28:35 +0200
> From: Darshit Shah 
> Cc: bug-wget@gnu.org
> 
> Can you please try with the latest HEAD once? As far as I am aware, all 
> the off-by-one errors have been fixed.

Sorry, no.  Not unless someone will be kind enough to produce a
complete tarball.  Building from Git requires all kinds of utilities
that are not simple to set up on Windows.

In any case, I did look at the latest sources in Git, and I don't
think the patch I suggested is fixed, because the calculation in
determine_screen_width didn't change.  The problem here is that the
Windows cmd console moves to the next line as soon as you display the
last character that fits on the line.  So the last column must never
be occupied if we want a proper progress display.

> If there is something left over, we should fix it in the progress bar 
> output itself instead of a hack somewhere in the Windows specific code.  

It's not a hack.

> Unless the issue is in how the window size is reported in Windows.

It is.

Thanks.

[Bug-wget] Progress bar on MS-Windows

2016-05-28 Thread Eli Zaretskii

Running wget from the Windows cmd window, I see that we write 1 column
too many, when we display "eta XXm YYs" -- this causes the next
progress bar be displayed on the next screen line.  So I came up with
a small patch below, in the Windows specific portion of
determine_screen_width.

Does anyone else see this?  I see this in Wget 1.16.1, but I don't see
any changes in the related code in the current Git master.  Did I miss
something?  If not, OK to push this change?

--- src/utils.c~0   2014-11-23 18:49:06.0 +0200
+++ src/utils.c 2016-05-28 21:09:24.91675 +0300
@@ -1822,7 +1824,7 @@ determine_screen_width (void)
   CONSOLE_SCREEN_BUFFER_INFO csbi;
   if (!GetConsoleScreenBufferInfo (GetStdHandle (STD_ERROR_HANDLE), &csbi))
 return 0;
-  return csbi.dwSize.X;
+  return csbi.dwSize.X - 1;
 #else  /* neither TIOCGWINSZ nor WINDOWS */
   return 0;
 #endif /* neither TIOCGWINSZ nor WINDOWS */

Re: [Bug-wget] wget IRI test failures on Mac OS X

2016-05-19 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Cc: Ryan Schmidt 
> Date: Thu, 19 May 2016 10:59:27 +0200
> 
> After looking closer to this issue... it is a more general issue with file 
> systems that change/mangle the filenames. The wget test suite will likely 
> break on many such file systems. On (v)fat (e.g. 8.3 mangling), on hpfs also 
> due to lowercasing (see 'man mount' on linux). Also ntfs with certain 
> settings. 

There's no mangling I know about on VFAT filesystems, AFAIK.

> Anyone wants to implement a list of filesystems to switch off certain tests ?
> 
> I personally see this a low priority issue.
> But I would happily accept any patches addressing it.

The usual way of handling this is with some kind of "file-truename"
system call.  E.g., the short 8+3 aliases on Windows can be run
through that to retrieve the original long file names.

Re: [Bug-wget] wget IRI test failures on Mac OS X

2016-05-18 Thread Eli Zaretskii

> From: Ryan Schmidt 
> Date: Wed, 18 May 2016 12:58:58 -0500
> Cc: bug-wget@gnu.org
> 
> > Does OS X have a function that can compare equal strings with composed
> > and decomposed characters that are equivalent sequences?
> 
> Maybe this helps: A quick search shows not a function to compare for Unicode 
> equivalence, but functions for converting an NSString to various forms:
> 
> https://developer.apple.com/library/ios/qa/qa1235/_index.html

Yes, but that's a nuisance, IME.

Thanks.

Re: [Bug-wget] wget IRI test failures on Mac OS X

2016-05-18 Thread Eli Zaretskii

> From: Ryan Schmidt 
> Date: Wed, 18 May 2016 02:39:56 -0500
> Cc: bug-wget@gnu.org
> 
> Thanks Eli. I tried the latest commit from April 2016, 
> 42cc84b6b6cceeb146a668797ceaafe60743ce6d, and the IRI tests still failed:

Does OS X have a function that can compare equal strings with composed
and decomposed characters that are equivalent sequences?

Re: [Bug-wget] Wget 1.17.1 bug?

2016-05-17 Thread Eli Zaretskii

> Date: Tue, 17 May 2016 11:22:25 +0300
> From: "Zeroes & Ones" 
> 
> I not compiled himself, i use binaries installed used setup-x86.exe (v2.874 
> 32 bit) 
> 
> chosen site:
> cygwin.mirror.constant.com
> 
> trouble reproduced 100%
> 
> wget -V
> 
> GNU Wget 1.17.1 built on cygwin.

That's a Cygwin build, so you need to use chmod to make the downloaded
file executable.  Cygwin programs emulate Posix permissions using NT
security features, so you need to play by Cygwin rules.

Re: [Bug-wget] wget IRI test failures on Mac OS X

2016-05-16 Thread Eli Zaretskii

> From: Ryan Schmidt 
> Date: Thu, 12 May 2016 23:52:08 -0500
> Cc: Micah Cowan 
> 
> Hello, just wanted to gently remind you that this bug in the wget test suite 
> running on OS X that I reported in 2009 with wget 1.12 still exists today 
> with wget 1.17.1.

Please try the latest Git master, some progress was made, although I'm
not quite sure those changes are enough for the HPFS+ canonical
decomposition of file names.  But it could.

Re: [Bug-wget] Wget 1.17.1 bug?

2016-05-16 Thread Eli Zaretskii

> Date: Mon, 16 May 2016 10:24:25 +0300
> From: "Zeroes & Ones" 
> 
> i update Wget 1.11.4 to latest 1.17.1 and i have troubles
> 
> output file have wrong permission on NTFS (checked on W2008R2, Win8.1)
> 
> 
> for example
> wget.exe http://www.nch.com.au/components/burnsetup.exe
> after complete downloading i see what i can't execute file
> 
> accesschk.exe burnsetup.exe
> 
> Accesschk v6.01 - Reports effective permissions for securable objects
> Copyright (C) 2006-2016 Mark Russinovich
> Sysinternals - www.sysinternals.com
> 
> Error: C:\6\burnsetup.exe has a non-canonical DACL:
>Explicit Deny after Explicit Allow
> C:\6\burnsetup.exe
>   RW GOMELENERGO\s.dindikov
>   R  GOMELENERGO\Domain Users
>   RW NT AUTHORITY\SYSTEM
>   RW BUILTIN\Administrators
>   R  BUILTIN\Users
>   R  Everyone
> 
> with older version Wget all OK:
> 
> Accesschk v6.01 - Reports effective permissions for securable objects
> Copyright (C) 2006-2016 Mark Russinovich
> Sysinternals - www.sysinternals.com
> 
> C:\5\burnsetup.exe
>   RW NT AUTHORITY\SYSTEM
>   RW BUILTIN\Administrators
>   R  BUILTIN\Users

I don't think the native Windows port of Wget uses any NT security
related system calls, so it's hard to believe what you see is due to
some changes in Wget code proper.

Did you build both versions of Wget yourself?  If not, where did you
get the binaries?  Could it be that the latter was built differently,
with some changes in the sources, or with some optional libraries
which could explain this?  E.g., could it be that Wget 1.17 is a
Cygwin build or an MSYS build?  (What does "wget --version" display?)

[Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encoding to unicode correctly (mingw32)

2016-04-22 Thread Eli Zaretskii

Follow-up Comment #5, bug #47701 (project wget):

In order for you to see the files with non-ASCII names correctly named on your
Windows disk, all the non-ASCII characters in the file names must be supported
by the current system codepage. In addition, your wget must be built with
libiconv.  If any of these two conditions is not true, you will see mojibake
in the file names, because Windows doesn't support UTF-8 encoded file names.

A way to lift one of these limitations -- that the file names be expressible
in the system codepage -- was discussed, but no one has submitted a clean
patchset to fix it. (Doing so on Windows requires to replace/wrap C library
functions that deal with file names with versions that can accept UTF-8
encoded name, convert it to UTF-16, and then call the appropriate library
function, like call _wopen instead of open etc.)

One other thing: a few months back I submitted changes to make non-ASCII file
name support more correct, and I'm not sure that patch is in wget 1.17.1. 
Perhaps Tim or Giuseppe could tell.  If the patch is not in 1.17.1, I suggest
to build wget from the Git repository and see if some of the problems are
gone.


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encoding to unicode correctly (mingw32)

2016-04-15 Thread Eli Zaretskii

Follow-up Comment #1, bug #47701 (project wget):

You need to give Wget the --local-encoding=UTF-8 command-line option, because
the URL you are trying to fetch is actually in UTF-8 encoding (and then each
byte of the UTF-8 sequence is encoded with percent escapes on top of that).

When I use that switch, the command works for me (with Wget 1.16.1 compiled
with MinGW on MS-Windows).


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Bug-wget] [bug #47689] Support for UTF-16 encoding.

2016-04-13 Thread Eli Zaretskii

Follow-up Comment #1, bug #47689 (project wget):

This site does work for me with Wget 1.16.1 on MS-Windows, with the exact
command you have shown.  The file index.html is downloaded and it is encoded
in UTF-16LE on my disk.

So I'm unsure why it doesn't work for you.


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] HAVE_CARES on Windows

2016-04-11 Thread Eli Zaretskii

> From: Gisle Vanem 
> Date: Mon, 11 Apr 2016 11:30:45 +0200
> 
> Tim Rühsen wrote:
> 
> > As Eli, I would like to know a few more details.
> > Is it possible to make c-ares return the 'native' socket numbers to not get 
> > in 
> > conflict with gnulib ?
> 
> As Eli pointed out, it's vice-versa; C-ares *do* return 'native'
> socket numbers. While Gnulib's socket(), select() etc. creates and
> expects 'file descriptors'. Normally in the range >= 3 (?). (I assume
> this has something to POSIX compliance. Winsock's socket() never returns
> such low numbers).

Windows sockets are handles in disguise, not file descriptors.

> Eli> However, converting a handle into a
> Eli> file descriptor and vise versa involves using 2 simple functions,
> 
> I'm not sure what those functions are since I'm not so much into Gnulib.

It's not a Gnulib thing, it's a Windows runtime library thing:

  HANDLE h = _get_osfhandle (int fd);

will produce a handle underlying a file descriptor, while

  int fd = _open_osfhandle (HANDLE h, O_RDWR | O_BINARY);

will do the opposite.

> My intuition told me the 'rpl_select()' was the cause for the resolve-
> failure, hence this 'undef'. And since the host.c 'select()' is used only for
> 'HAVE_LIBCARES' code, I felt it won't hurt do '#undef select' in host.c.

Is it a good idea to have 2 different implementations of 'select' in
the same program?  Can it happen that Wget wants to wait on both the
libcares sockets and the other kind?

> But I'm open to alternatives. Eli, can you try building with
> 'HAVE_LIBCARES'?

Not right now, as I'm quite busy these days.

Re: [Bug-wget] HAVE_CARES on Windows

2016-04-10 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Sun, 10 Apr 2016 20:29:36 +0200
> 
> > I have tried building latest Wget with '-DHAVE_LIBCARES'
> > and all resolve attempts failed due to Gnulib's select()
> > is not compatible with the socket-number(s) returned from
> > a normal C-ares library on Windows.
> 
> As Eli, I would like to know a few more details.
> Is it possible to make c-ares return the 'native' socket numbers to not get 
> in 
> conflict with gnulib ?

I should tell here what I wrote to Gisle privately: Gnulib attempts to
hide the Windows socket-related idiosyncrasies by returning a file
descriptor instead of the 'native' handle, and Gnulib's 'select'
expects such file descriptors.  However, converting a handle into a
file descriptor and vise versa involves using 2 simple functions, so I
hope a more elegant solution should be possible.

Admittedly, Gisle already built Wget with libcares, while I didn't, so
his inputs and opinions carry more weight than mine...

Re: [Bug-wget] HAVE_CARES on Windows

2016-04-09 Thread Eli Zaretskii

> From: Gisle Vanem 
> Date: Sat, 9 Apr 2016 21:58:18 +0200
> 
> I have tried building latest Wget with '-DHAVE_LIBCARES'
> and all resolve attempts failed due to Gnulib's select()
> is not compatible with the socket-number(s) returned from
> a normal C-ares library on Windows.

What is a "socket number" that libcares returns?  Is it a file
descriptor, a handle, or something else?

Re: [Bug-wget] --no-check-certificate does not work in 1.11.4.3287

2016-02-20 Thread Eli Zaretskii

> From: Tim Paulson 
> Date: Fri, 19 Feb 2016 16:32:21 +
> 
> The following command with -no-check-certificate works in wget 1.10.2 but 
> fails in 1.11.4 on Win7.
> wget -t 3 --http-user=admin --http-passwd=password --no-check-certificate 
> https://192.168.1.51/admin/backups/latest
> 
> Has the syntax changed in 1.11.4 for the ability to ignore self signed 
> certificate errors?
> 
> Sample output from wget 1.10.2 and wget 1.11.4 shown below.
> 
> 
> c:\Program Files (x86)\GnuWin32\bin>wget -t 3 --http-user=admin 
> --http-passwd=password --no-check-certificate 
> https://192.168.1.51/admin/backups/latest
> SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrcsyswgetrc = c:\Program Files 
> (x86)\GnuWin32/etc/wgetrc

What do you have in your wgetrc?  Could that be the cause?

FWIW, this option works for me in Wget 1.16.1.  I cannot verify that
with the command line you gave, as it uses an address on your private
network, but if you show a command using an address I can reach, I
will try that and see if it works.

Re: [Bug-wget] [Patch] Use -isystem in Makefile to suppress warnings from libraries

2016-01-29 Thread Eli Zaretskii

> From: Darshit Shah 
> Date: Fri, 29 Jan 2016 15:45:02 +0100
> Cc: Bug-Wget 
> 
> Most of them are actually false positives, probably due to us. Gnulib
> uses some more modern code extensions and the compiler keeps warning
> us about it since we set the C language to std=gnu89.

Does Gnulib assume C99?  Is that documented somewhere?

> I'm not happy about this fact, but this discussion has happened
> multiple times and I don't think we will be moving to a more modern
> setup anytime soon. I would personally prefer using *at least* C99,
> a more recent version like C11 would be even better, but not all
> compiler would support that.

If Gnulib requires C99, why not use -std=gnu99 if it is supported?
Many packages already do.

Re: [Bug-wget] [Patch] Use -isystem in Makefile to suppress warnings from libraries

2016-01-29 Thread Eli Zaretskii

> From: Darshit Shah 
> Date: Fri, 29 Jan 2016 15:18:57 +0100
> 
> A recent GCC / LLVM update has caused my setup to spew far too many
> warnings on compiling Wget. On a closer look, they all come from
> Gnulib code. I propose the attached patch to explicitly mark those
> files as libraries and have the compiler suppress warnings from them.
> This way we can focus on the warnings generated by Wget codebase
> alone.

If we do this, who will tell Gnulib people to get their act together
and fix those warnings?  I think the right solution to this is in
Gnulib, not in Wget.

Thanks.

[Bug-wget] [bug #46943] Crash on old CPU w/o SSE2

2016-01-21 Thread Eli Zaretskii

Follow-up Comment #2, bug #46943 (project wget):

Did you build wget yourself, or did you download its binary from somewhere? 
If you downloaded a precompiled binary, can you tell which site did you
download it from?


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/

Re: [Bug-wget] Support non-ASCII URLs

2016-01-12 Thread Eli Zaretskii

> From: Giuseppe Scrivano 
> Cc: tim.rueh...@gmx.de, bug-wget@gnu.org
> Date: Tue, 12 Jan 2016 20:58:16 +0100
> 
> Eli Zaretskii  writes:
> 
> > This was fixed by Tim in the meantime.  Are you running the current
> > Git version?
> 
> sorry my mistake, I was using an outdated version.  All works now for me
> as well.

Great, thanks for testing.

Re: [Bug-wget] Support non-ASCII URLs

2016-01-12 Thread Eli Zaretskii

> From: Giuseppe Scrivano 
> Cc: Tim Rühsen , bug-wget@gnu.org
> Date: Tue, 12 Jan 2016 12:19:06 +0100
> 
> >> FAIL: Test-iri-forced-remote
> >> 
> >> My son has birthday tomorrow, so I am not sure how much time I can spend 
> >> on 
> >> the weekend on this issue. Maybe Eli or you could have a look ?
> >
> > I cannot bootstrap the Git repo (too many prerequisites I don't have).
> > Can you or someone else produce a distribution tarball out of Git that
> > I could then build "as usual"?
> >
> > Also, can you show me the log of the failed test?  Turkish locales
> > have "an issue" with certain upper/lower-case characters, maybe that's
> > the problem.  Or maybe it's something else; looking at the log might
> > give good clues.
> 
> sorry for taking so long, this is the log I get when I run
> 
> $ TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check
> 
> ===
>wget 1.17.1.10-c78d: tests/test-suite.log
> ===
> 
> # TOTAL: 85
> # PASS:  84
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  1
> # XPASS: 0
> # ERROR: 0
> 
> .. contents:: :depth: 2
> 
> FAIL: Test-iri-forced-remote

This was fixed by Tim in the meantime.  Are you running the current
Git version?

Re: [Bug-wget] Support non-ASCII URLs

2015-12-20 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Sun, 20 Dec 2015 21:34:18 +0100
> 
> Please review this patch.

Looks good to me, thanks.

Re: [Bug-wget] Support non-ASCII URLs

2015-12-20 Thread Eli Zaretskii

> From: Tim Rühsen 
> Date: Sun, 20 Dec 2015 16:26:20 +0100
> 
> > Tim sent me the tarball and the log off-list (thanks!).  I didn't yet
> > try to build Wget, but just looking at the test, I guess I don't
> > understand its idea.  It has an index.html page that's encoded in
> > ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1,
> > and the URLs themselves in "my %urls" are all encoded in UTF-8.  How's
> > this supposed to work?
> 
> Regarding the wget man page, --remote-encoding just sets the *default* server 
> encoding. This only comes into play when the HTTP header does not contain a 
> Content-type with charset set *and* the HTML page does not contain a  http-equiv="Content-Type" with 'content=... charset=...'.

Makes sense.

> 'index.html' in this test is correctly having a meta tag with charset=utf-8 
> and the URLs encoded in utf-8.

That's not what I see: index.html says

  "Content-type" => "text/html; charset=ISO-8859-15"

and its contents indeed has URLs encoded in ISO-8859-15.

> > Also, I'm not following the logic of overriding Content-type by the
> > remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but
> > includes a link encoded in ISO-8859-1, and the test seems to expect
> > Wget to use the remote encoding in preference to what "charset=" says.
> 
> Either the test is wrong here or the man page. I would say the man page 
> should 
> be correct here - it makes the most sense to me. In this case the test is 
> wrong, also the comment.

OK.

> > Does the remote encoding override the encoding for the _contents_ of
> > the URL, not just for the URL itself?  That seems to make little sense
> > to me: the contents and the name can legitimately be encoded
> > differently, I think.
> 
> The filenames in %expected_downloaded_files depend on --local-encoding.
> Since this is not given on the command line, this test will behave 
> differently 
> with different settings for LC_ALL ('make check' use LC_ALL=C, contrib/check-
> hard will also 'make check' with turkish UTF-8 locale).
> 
> To fix the test, we should use --local-encoding to some kind of UTF-8 locale 
> (or something else, but than we have to fix the filenames regarding that 
> locale).

But then what would be the point of repeating the test with the
turkish locale? verify that when given --local-encoding the locale is
ignored?

Re: [Bug-wget] Support non-ASCII URLs

2015-12-19 Thread Eli Zaretskii

> Date: Sat, 19 Dec 2015 10:15:03 +0200
> From: Eli Zaretskii 
> Cc: bug-wget@gnu.org
> 
> > 2. contrib/check-hard fails with
> > TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check
> > 
> > FAIL: Test-iri-forced-remote
> > 
> > My son has birthday tomorrow, so I am not sure how much time I can spend on 
> > the weekend on this issue. Maybe Eli or you could have a look ?
> 
> I cannot bootstrap the Git repo (too many prerequisites I don't have).
> Can you or someone else produce a distribution tarball out of Git that
> I could then build "as usual"?
> 
> Also, can you show me the log of the failed test?  Turkish locales
> have "an issue" with certain upper/lower-case characters, maybe that's
> the problem.  Or maybe it's something else; looking at the log might
> give good clues.

Tim sent me the tarball and the log off-list (thanks!).  I didn't yet
try to build Wget, but just looking at the test, I guess I don't
understand its idea.  It has an index.html page that's encoded in
ISO-8859-15, but Wget is invoked with --remote-encoding=iso-8859-1,
and the URLs themselves in "my %urls" are all encoded in UTF-8.  How's
this supposed to work?

Also, I'm not following the logic of overriding Content-type by the
remote encoding: p1_fran%C3%A7ais.html states "charset=UTF-8", but
includes a link encoded in ISO-8859-1, and the test seems to expect
Wget to use the remote encoding in preference to what "charset=" says.
Does the remote encoding override the encoding for the _contents_ of
the URL, not just for the URL itself?  That seems to make little sense
to me: the contents and the name can legitimately be encoded
differently, I think.

I guess I lack some basic info about what Wget is supposed to do in
these tricky situations, and how.  Can you help me understand that?
The manual doesn't seem to be very details on what's expected here.

TIA

Re: [Bug-wget] Support non-ASCII URLs

2015-12-19 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: Giuseppe Scrivano , Eli Zaretskii 
> Date: Fri, 18 Dec 2015 22:41:29 +0100
> 
> 1. Maybe do_conversion() should take a char * argument instead of const char 
> *. We avoid one ugly const -> non-const cast an also a warning about iconv.

I agree.

> 2. contrib/check-hard fails with
> TESTS_ENVIRONMENT="LC_ALL=tr_TR.utf8 VALGRIND_TESTS=0" make check
> 
> FAIL: Test-iri-forced-remote
> 
> My son has birthday tomorrow, so I am not sure how much time I can spend on 
> the weekend on this issue. Maybe Eli or you could have a look ?

I cannot bootstrap the Git repo (too many prerequisites I don't have).
Can you or someone else produce a distribution tarball out of Git that
I could then build "as usual"?

Also, can you show me the log of the failed test?  Turkish locales
have "an issue" with certain upper/lower-case characters, maybe that's
the problem.  Or maybe it's something else; looking at the log might
give good clues.

Re: [Bug-wget] Support non-ASCII URLs

2015-12-18 Thread Eli Zaretskii

> From: Giuseppe Scrivano 
> Cc: bug-wget@gnu.org, Eli Zaretskii 
> Date: Fri, 18 Dec 2015 11:31:17 +0100
> 
> >> Attached.
> >
> > Nice, thank you.
> >
> > There is just one test not passing: Test-ftp-iri.px
> >
> > Maybe the test is wrong (using --local-encoding=iso-8859-1, but writing to 
> > an 
> > UTF-8 filename). I am not very much into FTP. How do we know the remote 
> > encoding ?
> 
> the patch looks fine to me.  Eli, could you please modify the test the
> pass and add a note in NEWS?

Attached.

>From 8ce8fc66bd6d994194eabd2768aefccbe2090e43 Mon Sep 17 00:00:00 2001
From: Eli Zaretskii 
Date: Fri, 18 Dec 2015 17:03:26 +0200
Subject: [PATCH] Support non-ASCII URLs

* src/url.c [HAVE_ICONV]: Include iconv.h and langinfo.h.
(convert_fname): New function.
[HAVE_ICONV]: Convert file name from remote encoding to local
encoding.
(url_file_name): Call convert_fname.
(filechr_table): Don't consider bytes in 128..159 as control
characters.

* tests/Test-ftp-iri.px: Fix the expected file name to match the
new file-name recoding.  State the remote encoding explicitly on
the Wget command line.

* NEWS: Mention the URI recoding when built with libiconv.
---
 NEWS  |  7 +
 src/url.c | 87 +--
 tests/Test-ftp-iri.px |  4 +--
 3 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/NEWS b/NEWS
index c8cebad..c63c678 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Please send GNU Wget bug reports to .
 
 * Changes in Wget X.Y.Z
 
+* When Wget is built with libiconv, it now converts non-ASCII URIs to
+  the locale's codeset when it creates files.  The encoding of the
+  remote files and URIs is taken from --remote-encoding, defaulting to
+  UTF-8.  The result is that non-ASCII URIs and files downloaded via
+  HTTP/HTTPS and FTP will have names on the local filesystem that
+  correspond to their remote names.
+
 * Changes in Wget 1.17.1
 
 * Fix compile error when IPv6 is disabled or SSL is not present.
diff --git a/src/url.c b/src/url.c
index c62867f..ca7fe29 100644
--- a/src/url.c
+++ b/src/url.c
@@ -43,6 +43,11 @@ as that of the covered work.  */
 #include "host.h"  /* for is_valid_ipv6_address */
 #include "c-strcase.h"
 
+#if HAVE_ICONV
+#include 
+#include 
+#endif
+
 #ifdef __VMS
 #include "vms.h"
 #endif /* def __VMS */
@@ -1399,8 +1404,8 @@ UVWC, VC, VC, VC,  VC, VC, VC, VC,   /* NUL SOH STX ETX  EOT ENQ ACK BEL */
0,  0,  0,  0,   0,  0,  0,  0,   /* p   q   r   st   u   v   w   */
0,  0,  0,  0,   W,  0,  0,  C,   /* x   y   z   {|   }   ~   DEL */
 
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 128-143 */
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 144-159 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0, /* 128-143 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0, /* 144-159 */
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
 
@@ -1531,6 +1536,82 @@ append_uri_pathel (const char *b, const char *e, bool escaped,
   append_null (dest);
 }
 
+static char *
+convert_fname (const char *fname)
+{
+  char *converted_fname = (char *)fname;
+#if HAVE_ICONV
+  const char *from_encoding = opt.encoding_remote;
+  const char *to_encoding = opt.locale;
+  iconv_t cd;
+  size_t len, done, inlen, outlen;
+  char *s;
+  const char *orig_fname = fname;;
+
+  /* Defaults for remote and local encodings.  */
+  if (!from_encoding)
+from_encoding = "UTF-8";
+  if (!to_encoding)
+to_encoding = nl_langinfo (CODESET);
+
+  cd = iconv_open (to_encoding, from_encoding);
+  if (cd == (iconv_t)(-1))
+logprintf (LOG_VERBOSE, _("Conversion from %s to %s isn't supported\n"),
+	   quote (from_encoding), quote (to_encoding));
+  else
+{
+  inlen = strlen (fname);
+  len = outlen = inlen * 2;
+  converted_fname = s = xmalloc (outlen + 1);
+  done = 0;
+
+  for (;;)
+	{
+	  if (iconv (cd, &fname, &inlen, &s, &outlen) != (size_t)(-1)
+	  && iconv (cd, NULL, NULL, &s, &outlen) != (size_t)(-1))
+	{
+	  *(converted_fname + len - outlen - done) = '\0';
+	  iconv_close(cd);
+	  DEBUGP (("Converted file name '%s' (%s) -> '%s' (%s)\n",
+		   orig_fname, from_encoding, converted_fname, to_encoding));
+	  xfree (orig_fname);
+	  return converted_fname;
+	}
+
+	  /* Incomplete or invalid multibyte sequence */
+	  if (errno == EINVAL || errno == EILSEQ)
+	{
+	  logprintf (LOG_VERBOSE,
+			 _("Incomplete or invalid multibyte sequence encountered\n"));
+	  xfree (converted_fname);
+	  converted_fname = (char *)orig_fname;
+	  break;
+	}
+	  else if (errno == E2BIG) /* Output buffer full */
+	{
+	  done = len;
+

Re: [Bug-wget] Support non-ASCII URLs

2015-12-17 Thread Eli Zaretskii

> From: Tim Rühsen 
> Cc: gscriv...@gnu.org
> Date: Thu, 17 Dec 2015 21:16:28 +0100
> 
> There is just one test not passing: Test-ftp-iri.px
> 
> Maybe the test is wrong (using --local-encoding=iso-8859-1, but writing to an 
> UTF-8 filename). I am not very much into FTP. How do we know the remote 
> encoding ?

>From --remote-encoding, and falling back to UTF-8.  It looks like the
file name is Latin-1 encoded in that test case, but you expect the
downloaded file name to be in UTF-8, although you use
"--local-encoding=iso-8859-1", is that right?  That should create a
file whose name is encoded in Latin-1, not UTF-8.

Re: [Bug-wget] Support non-ASCII URLs

2015-12-17 Thread Eli Zaretskii

> From: Tim Ruehsen 
> Cc: Giuseppe Scrivano 
> Date: Thu, 17 Dec 2015 17:50:47 +0100
> 
> @Eli: If my change is ok for Giuseppe, please apply the changes from iri.c to 
> your patch. If possible, make a local commit and create the attachment/patch 
> with 'git format -1' (or -2 for the latest two commits). That makes it easier 
> for us to apply the patch since author (you) and commit message are copied as 
> well.

Attached.

>From 197483b6c62dcea1a900d626c79ba7e65a0c1e67 Mon Sep 17 00:00:00 2001
From: Eli Zaretskii 
Date: Thu, 17 Dec 2015 20:06:30 +0200
Subject: [PATCH] Support non-ASCII URLs

* src/url.c [HAVE_ICONV]: Include iconv.h and langinfo.h.
(convert_fname): New function.
[HAVE_ICONV]: Convert file name from remote encoding to local
encoding.
(url_file_name): Call convert_fname.
(filechr_table): Don't consider bytes in 128..159 as control
characters.
---
 src/url.c | 87 +--
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/src/url.c b/src/url.c
index c62867f..ca7fe29 100644
--- a/src/url.c
+++ b/src/url.c
@@ -43,6 +43,11 @@ as that of the covered work.  */
 #include "host.h"  /* for is_valid_ipv6_address */
 #include "c-strcase.h"
 
+#if HAVE_ICONV
+#include 
+#include 
+#endif
+
 #ifdef __VMS
 #include "vms.h"
 #endif /* def __VMS */
@@ -1399,8 +1404,8 @@ UVWC, VC, VC, VC,  VC, VC, VC, VC,   /* NUL SOH STX ETX  EOT ENQ ACK BEL */
0,  0,  0,  0,   0,  0,  0,  0,   /* p   q   r   st   u   v   w   */
0,  0,  0,  0,   W,  0,  0,  C,   /* x   y   z   {|   }   ~   DEL */
 
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 128-143 */
-  C, C, C, C,  C, C, C, C,  C, C, C, C,  C, C, C, C, /* 144-159 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0, /* 128-143 */
+  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0, /* 144-159 */
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
   0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,  0, 0, 0, 0,
 
@@ -1531,6 +1536,82 @@ append_uri_pathel (const char *b, const char *e, bool escaped,
   append_null (dest);
 }
 
+static char *
+convert_fname (const char *fname)
+{
+  char *converted_fname = (char *)fname;
+#if HAVE_ICONV
+  const char *from_encoding = opt.encoding_remote;
+  const char *to_encoding = opt.locale;
+  iconv_t cd;
+  size_t len, done, inlen, outlen;
+  char *s;
+  const char *orig_fname = fname;;
+
+  /* Defaults for remote and local encodings.  */
+  if (!from_encoding)
+from_encoding = "UTF-8";
+  if (!to_encoding)
+to_encoding = nl_langinfo (CODESET);
+
+  cd = iconv_open (to_encoding, from_encoding);
+  if (cd == (iconv_t)(-1))
+logprintf (LOG_VERBOSE, _("Conversion from %s to %s isn't supported\n"),
+	   quote (from_encoding), quote (to_encoding));
+  else
+{
+  inlen = strlen (fname);
+  len = outlen = inlen * 2;
+  converted_fname = s = xmalloc (outlen + 1);
+  done = 0;
+
+  for (;;)
+	{
+	  if (iconv (cd, &fname, &inlen, &s, &outlen) != (size_t)(-1)
+	  && iconv (cd, NULL, NULL, &s, &outlen) != (size_t)(-1))
+	{
+	  *(converted_fname + len - outlen - done) = '\0';
+	  iconv_close(cd);
+	  DEBUGP (("Converted file name '%s' (%s) -> '%s' (%s)\n",
+		   orig_fname, from_encoding, converted_fname, to_encoding));
+	  xfree (orig_fname);
+	  return converted_fname;
+	}
+
+	  /* Incomplete or invalid multibyte sequence */
+	  if (errno == EINVAL || errno == EILSEQ)
+	{
+	  logprintf (LOG_VERBOSE,
+			 _("Incomplete or invalid multibyte sequence encountered\n"));
+	  xfree (converted_fname);
+	  converted_fname = (char *)orig_fname;
+	  break;
+	}
+	  else if (errno == E2BIG) /* Output buffer full */
+	{
+	  done = len;
+	  len = outlen = done + inlen * 2;
+	  converted_fname = xrealloc (converted_fname, outlen + 1);
+	  s = converted_fname + done;
+	}
+	  else /* Weird, we got an unspecified error */
+	{
+	  logprintf (LOG_VERBOSE, _("Unhandled errno %d\n"), errno);
+	  xfree (converted_fname);
+	  converted_fname = (char *)orig_fname;
+	  break;
+	}
+	}
+  DEBUGP (("Failed to convert file name '%s' (%s) -> '?' (%s)\n",
+	   orig_fname, from_encoding, to_encoding));
+}
+
+iconv_close(cd);
+#endif
+
+  return converted_fname;
+}
+
 /* Append to DEST the directory structure that corresponds the
directory part of URL's path.  For example, if the URL is
http://server/dir1/dir2/file, this appends "/dir1/dir2".
@@ -1706,6 +1787,8 @@ url_file_name (const struct url *u, char *replaced_filename)
 
   xfree (temp_fnres.base);
 
+  fname = convert_fname (fname);
+
   /* Check the cases in which the unique extensions are not used:
  1) Clobbering is turned off (-nc).
  2) Retrieval with regetting.
-- 
2.6.4.windows.1

Re: [Bug-wget] Marking Release v1.17.1?

2015-12-16 Thread Eli Zaretskii

> Cc: bug-wget@gnu.org
> From: Gisle Vanem 
> Date: Wed, 16 Dec 2015 14:11:27 +0100
> 
> Eli Zaretskii wrote:
> 
> > +  {
> > +#ifdef WIN32
> > +   /* If the connection timed out, fd_close will hang in Gnulib's
> > +  close_fd_maybe_socket, inside the call to WSAEnumNetworkEvents.  */
> > +   if (errno != ETIMEDOUT)
> > +#endif
> > + fd_close (sock);
> > +  }
> >  if (print)
> >logprintf (LOG_NOTQUIET, _("failed: %s.\n"), strerror (errno));
> >  errno = save_errno;
> 
> I assume fd_close() could cause 'errno' to be set again (for some
> strange reason?). So shouldn't 'save_errno' be printed instead?

You mean, when errno is ETIMEDOUT, or when it isn't?  If the former,
fd_close is not called (on MS-Windows), so both values are identical.
If the latter, then I think the code does want to print the value of
errno from fd_close.  So I don't see why a change would be needed; am
I missing something?

Re: [Bug-wget] Marking Release v1.17.1?

2015-12-16 Thread Eli Zaretskii

> From: Giuseppe Scrivano 
> Cc: Gisle Vanem , bug-wget@gnu.org
> Date: Wed, 16 Dec 2015 10:34:12 +0100
> 
> do you mind to send it in the git am format with a ChangeLog entry?

Attached.  (I presume by "ChangeLog entry" you meant a commit log
message formatted according to ChangeLog rules.)

>From 9a0c637b07be7b842b9be21488238d578f39d781 Mon Sep 17 00:00:00 2001
From: Eli Zaretskii 
Date: Wed, 16 Dec 2015 14:40:17 +0200
Subject: [PATCH] Avoid hanging on MS-Windows when invoked with
 --connect-timeout

* src/connect.c (connect_to_ip) [WIN32]: Don't call fd_close if
the connection timed out, to avoid hanging.
---
 src/connect.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/connect.c b/src/connect.c
index 024b231..0704000 100644
--- a/src/connect.c
+++ b/src/connect.c
@@ -369,7 +369,14 @@ connect_to_ip (const ip_address *ip, int port, const char *print)
logprintf.  */
 int save_errno = errno;
 if (sock >= 0)
-  fd_close (sock);
+  {
+#ifdef WIN32
+	/* If the connection timed out, fd_close will hang in Gnulib's
+	   close_fd_maybe_socket, inside the call to WSAEnumNetworkEvents.  */
+	if (errno != ETIMEDOUT)
+#endif
+	  fd_close (sock);
+  }
 if (print)
   logprintf (LOG_NOTQUIET, _("failed: %s.\n"), strerror (errno));
 errno = save_errno;
-- 
2.6.3.windows.1

Re: [Bug-wget] Support non-ASCII URLs

2015-12-16 Thread Eli Zaretskii

> From: Giuseppe Scrivano 
> Cc: bug-wget@gnu.org, andries.brou...@cwi.nl
> Date: Wed, 16 Dec 2015 10:53:51 +0100
> 
> > +  for (;;)
> > +   {
> > + if (iconv (cd, &fname, &inlen, &s, &outlen) != (size_t)(-1))
> > +   {
> > + /* Flush the last bytes.  */
> > + iconv (cd, NULL, NULL, &s, &outlen);
> 
> should not the return code be checked here?

We should probably simply copy what iri.c does in a similar function,
yes.

> > + else if (errno == E2BIG) /* Output buffer full */
> > +   {
> > + char *new;
> > +
> > + done = len;
> > + outlen = done + inlen * 2;
> > + new = xmalloc (outlen + 1);
> > + memcpy (new, converted_fname, done);
> > + xfree (converted_fname);
> 
> What would be the extra cost in terms of copied bytes if we just replace
> the three lines above with xrealloc?

I don't know, probably nothing.  This is simply copied (with trivial
changes) from do_conversion in iri.c, so if we want to make that
change, we should do it there as well.

Thanks.

Re: [Bug-wget] Support non-ASCII URLs

2015-12-15 Thread Eli Zaretskii

This second part is the main part of the change.  It uses 'iconv',
when available, to convert the file names to the local encoding,
before saving the files.  Note that the same function I modified is
used by ftp.c, so downloading via FTP should also work with non-ASCII
file names now; however, I didn't test that.

Thanks.

diff --git a/src/url.c b/src/url.c
index c62867f..d984bf7 100644
--- a/src/url.c
+++ b/src/url.c
@@ -43,6 +43,11 @@ as that of the covered work.  */
 #include "host.h"  /* for is_valid_ipv6_address */
 #include "c-strcase.h"
 
+#if HAVE_ICONV
+#include 
+#include 
+#endif
+
 #ifdef __VMS
 #include "vms.h"
 #endif /* def __VMS */
@@ -1531,6 +1536,90 @@ append_uri_pathel (const char *b, const char *e, bool 
escaped,
   append_null (dest);
 }
 
+static char *
+convert_fname (const char *fname)
+{
+  char *converted_fname = (char *)fname;
+#if HAVE_ICONV
+  const char *from_encoding = opt.encoding_remote;
+  const char *to_encoding = opt.locale;
+  iconv_t cd;
+  /* sXXXav : hummm hard to guess... */
+  size_t len, done, inlen, outlen;
+  char *s;
+  const char *orig_fname = fname;;
+
+  /* Defaults for remote and local encodings.  */
+  if (!from_encoding)
+from_encoding = "UTF-8";
+  if (!to_encoding)
+to_encoding = nl_langinfo (CODESET);
+
+  cd = iconv_open (to_encoding, from_encoding);
+  if (cd == (iconv_t)(-1))
+logprintf (LOG_VERBOSE, _("Conversion from %s to %s isn't supported\n"),
+  quote (from_encoding), quote (to_encoding));
+  else
+{
+  inlen = strlen (fname);
+  len = outlen = inlen * 2;
+  converted_fname = s = xmalloc (outlen + 1);
+  done = 0;
+
+  for (;;)
+   {
+ if (iconv (cd, &fname, &inlen, &s, &outlen) != (size_t)(-1))
+   {
+ /* Flush the last bytes.  */
+ iconv (cd, NULL, NULL, &s, &outlen);
+ *(converted_fname + len - outlen - done) = '\0';
+ iconv_close(cd);
+ DEBUGP (("Converted file name '%s' (%s) -> '%s' (%s)\n",
+  orig_fname, from_encoding, converted_fname, 
to_encoding));
+ return converted_fname;
+   }
+
+ /* Incomplete or invalid multibyte sequence */
+ if (errno == EINVAL || errno == EILSEQ)
+   {
+ logprintf (LOG_VERBOSE,
+_("Incomplete or invalid multibyte sequence 
encountered\n"));
+ xfree (converted_fname);
+ converted_fname = (char *)orig_fname;
+ break;
+   }
+ else if (errno == E2BIG) /* Output buffer full */
+   {
+ char *new;
+
+ done = len;
+ outlen = done + inlen * 2;
+ new = xmalloc (outlen + 1);
+ memcpy (new, converted_fname, done);
+ xfree (converted_fname);
+ converted_fname = new;
+ len = outlen;
+ s = converted_fname + done;
+   }
+ else /* Weird, we got an unspecified error */
+   {
+ logprintf (LOG_VERBOSE, _("Unhandled errno %d\n"), errno);
+ xfree (converted_fname);
+ converted_fname = (char *)orig_fname;
+ break;
+   }
+   }
+  DEBUGP (("Failed to convert file name '%s' (%s) -> '?' (%s)\n",
+  orig_fname, from_encoding, to_encoding));
+  xfree (fname);
+}
+
+iconv_close(cd);
+#endif
+
+  return converted_fname;
+}
+
 /* Append to DEST the directory structure that corresponds the
directory part of URL's path.  For example, if the URL is
http://server/dir1/dir2/file, this appends "/dir1/dir2".
@@ -1706,6 +1795,8 @@ url_file_name (const struct url *u, char 
*replaced_filename)
 
   xfree (temp_fnres.base);
 
+  fname = convert_fname (fname);
+
   /* Check the cases in which the unique extensions are not used:
  1) Clobbering is turned off (-nc).
  2) Retrieval with regetting.

1 2 >

1 - 100 of 158 matches

Mail list logo