Re: [Bug-wget] [Bug-Wget][BUG] Progress bar does not support multibyte characters

2014-08-30 Thread Tim Rühsen
Am Samstag, 30. August 2014, 09:23:08 schrieb Darshit Shah:
 Earlier this year, I implemented a new, more concise form of the
 progress bar. However, I've just been given a bug report regarding the
 same, which I was unable to fix.
 
 The currently implemented progress bar shows only upto 15 characters
 of the URL. In case of longer URLs, we scroll the filename like a
 ticker. For selecting the 15 characters, wget copies 15 bytes from the
 string into the progress bar. This method fails on URLs containing
 multibyte characters. In this scenario, the progress bar happens to be
 very jittery since the string lengths are very varying.
 
 I am trying to find a solution where we can select a substring which
 is n columns large from a given string of potentially multibyte
 characters. If someone knows how to and could implement a fix, it
 would be truly great!

Hi Darshit,

your are talking about UTF-8 strings ('multibyte' could also be UCS2/4 or 
something else).

UTF-8 strings can't be split at an arbitrary byte, only between so-called code 
points. While you could use a library to handle that, an own function is not 
complicated - UTF-8 is a very straight-forward format. Of course you can find 
tested (GPL) source code if you search, maybe even the GNU Lib contains 
functions for that purpose (at least I wouldn't be suprised).

See http://en.wikipedia.org/wiki/UTF-8 for a description.

Tim




Re: [Bug-wget] [Bug-Wget][BUG] Progress bar does not support multibyte characters

2014-08-30 Thread Darshit Shah
On Sat, Aug 30, 2014 at 9:57 PM, Tim Rühsen tim.rueh...@gmx.de wrote:
 Am Samstag, 30. August 2014, 09:23:08 schrieb Darshit Shah:
 Earlier this year, I implemented a new, more concise form of the
 progress bar. However, I've just been given a bug report regarding the
 same, which I was unable to fix.

 The currently implemented progress bar shows only upto 15 characters
 of the URL. In case of longer URLs, we scroll the filename like a
 ticker. For selecting the 15 characters, wget copies 15 bytes from the
 string into the progress bar. This method fails on URLs containing
 multibyte characters. In this scenario, the progress bar happens to be
 very jittery since the string lengths are very varying.

 I am trying to find a solution where we can select a substring which
 is n columns large from a given string of potentially multibyte
 characters. If someone knows how to and could implement a fix, it
 would be truly great!

 Hi Darshit,

 your are talking about UTF-8 strings ('multibyte' could also be UCS2/4 or
 something else).

 UTF-8 strings can't be split at an arbitrary byte, only between so-called code
 points. While you could use a library to handle that, an own function is not
 complicated - UTF-8 is a very straight-forward format. Of course you can find
 tested (GPL) source code if you search, maybe even the GNU Lib contains
 functions for that purpose (at least I wouldn't be suprised).

Yes, I guess I'm looking for UTF-8 strings, because other character
encodings wouldn't create this problem, (I think?)
I'll look at the Wiki page again and see of GNULib has anything, Right
now, I'm trying to implement a solution based on wide characters
through wchar.h but I don't like the code I've written. I's prefer
something more elegant and efficient.

 See http://en.wikipedia.org/wiki/UTF-8 for a description.

 Tim




-- 
Thanking You,
Darshit Shah