Re: Dealing with Autodecode

2016-06-01 Thread Jacob Carlborg via Digitalmars-d

On 2016-06-01 02:46, Walter Bright wrote:

It is not practical to just delete or deprecate autodecode - it is too
embedded into things. What we can do, however, is stop using it
ourselves and stop relying on it in the documentation, much like [] is
eschewed in favor of std::vector in C++.

The way to deal with it is to replace reliance on autodecode with
.byDchar


Don't you get the same behavior using byDchar as with autodecode?

--
/Jacob Carlborg


Re: Horrible DMD startup performance on Windows

2016-06-01 Thread Jacob Carlborg via Digitalmars-d

On 2016-05-31 20:44, Bruno Medeiros wrote:


Also, an important follow-up is: could this be affecting other
DMD-generated programs?


I would guess that's possible now since DMD is written in D and 
generated with DMD.


--
/Jacob Carlborg


Re: Dealing with Autodecode

2016-06-01 Thread Walter Bright via Digitalmars-d

On 5/31/2016 11:57 PM, Jacob Carlborg wrote:

The way to deal with it is to replace reliance on autodecode with
.byDchar

Don't you get the same behavior using byDchar as with autodecode?



Yes (except that byDchar returns the replacement char on invalid Unicode, while 
autodecode throws an exception). But the point is that byDchar is opt-in.




Re: Dealing with Autodecode

2016-06-01 Thread Guillaume Chatelet via Digitalmars-d

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:

I have a better one, that we discussed on IRC last night:

1) put the string overloads for front and popFront on a version 
switch:


D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY WITH A SIMPLE MIGRATION PATH


2) After a while, we swap the version conditions, so opting 
into it preserves the old behavior for a while.


3) A wee bit longer, we exterminate all this autodecoding crap 
and enjoy Phobos being a smaller, more efficient library.


+1


Re: faster splitter

2016-06-01 Thread Chris via Digitalmars-d
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu 
wrote:


You may want to then try https://dpaste.dzfl.pl/392710b765a9, 
which generates inline code on all compilers. -- Andrei


I've added it as `Andrei3`. It runs faster with dmd, but it's not 
as good with ldc. Seems like ldc performs some extra optimization 
when moving `computeSkip` into the loop, something it doesn't 
bother to do when it's already there.


dmd -O -release -inline -noboundscheck *.d -ofbenchmark.dmd
./benchmark.dmd
Search in Alice in Wonderland
   std: 190 ±4
manual: 115 ±4
  qznc: 106 ±2
 Chris: 160 ±4
Andrei: 159 ±4
   Andrei2: 108 ±2
   Andrei3: 100 ±0
Search in random short strings
   std: 222 ±27
manual: 193 ±49
  qznc: 120 ±12
 Chris: 224 ±57
Andrei: 114 ±9
   Andrei2: 106 ±5
   Andrei3: 102 ±3
Mismatch in random long strings
   std: 186 ±28
manual: 206 ±85
  qznc: 118 ±14
 Chris: 265 ±104
Andrei: 194 ±85
   Andrei2: 116 ±18
   Andrei3: 102 ±4
Search random haystack with random needle
   std: 189 ±38
manual: 171 ±45
  qznc: 118 ±11
 Chris: 225 ±52
Andrei: 149 ±32
   Andrei2: 110 ±7
   Andrei3: 103 ±6
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

./benchmark.ldc
Search in Alice in Wonderland
   std: 170 ±1
manual: 143 ±2
  qznc: 133 ±1
 Chris: 144 ±1
Andrei: 196 ±10
   Andrei2: 100 ±0
   Andrei3: 111 ±1
Search in random short strings
   std: 223 ±30
manual: 211 ±51
  qznc: 124 ±12
 Chris: 223 ±61
Andrei: 115 ±10
   Andrei2: 102 ±3
   Andrei3: 105 ±4
Mismatch in random long strings
   std: 181 ±17
manual: 253 ±109
  qznc: 146 ±24
 Chris: 248 ±108
Andrei: 228 ±96
   Andrei2: 101 ±2
   Andrei3: 108 ±6
Search random haystack with random needle
   std: 187 ±22
manual: 208 ±60
  qznc: 152 ±27
 Chris: 202 ±58
Andrei: 173 ±35
   Andrei2: 102 ±4
   Andrei3: 110 ±8
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz



Improvement of error messages for failed overloads by attaching custom strings

2016-06-01 Thread poliklosio via Digitalmars-d
I just have an idea which some of you may find good. I described 
my idea on DMD issue tracker but noone responded, so I'm posting 
here also.

Link to issue:
https://issues.dlang.org/show_bug.cgi?id=16059

tldr:
Compiler error messages for failed overloads are not helpful 
enough.
Some useful information that the messages **should** have cannot 
be trivially generated from the code.
Extend the language so that a library author can write something 
like this:


pragma(on_overload_resolution_error, "std.conv.toImpl", "error message>")


Examples of "" that library author may 
choose to write:
- "The most common reason for \"toImpl\" overload resolution 
error is using the \"to\" function incorrectly."
- "The \"to\" function is intended to be used for types summary of acceptable types in English language>."

- "Some of the common fixes are: ."

Does any of this make sense?


Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d

On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
UTF-8 is an antiquated hack that needs to be eradicated.  It 
forces all other languages than English to be twice as long, 
for no good reason, have fun with that when you're downloading 
text on a 2G connection in the developing world.


I assume you're talking about the web here. In this case, plain 
text makes up only a minor part of the entire traffic, the 
majority of which is images (binary data), javascript and 
stylesheets (almost pure ASCII), and HTML markup (ditto). It's 
like not significant even without taking compression into 
account, which is ubiquitous.


It is unnecessarily inefficient, which is precisely why 
auto-decoding is a problem.


No, inefficiency is the least of the problems with auto-decoding.


It is only a matter of time till UTF-8 is ditched.


This is ridiculous, even if your other claims were true.



D devs should lead the way in getting rid of the UTF-8 
encoding, not bickering about how to make it more palatable.  I 
suggested a single-byte encoding for most languages, with 
double-byte for the ones which wouldn't fit in a byte.  Use 
some kind of header or other metadata to combine strings of 
different languages, _rather than encoding the language into 
every character!_


I think I remember that post, and - sorry to be so blunt - it was 
one of the worst things I've ever seen proposed regarding text 
encoding.




The common string-handling use case, by far, is strings with 
only one language, with a distant second some substrings in a 
second language, yet here we are putting the overhead into 
every character to allow inserting characters from an arbitrary 
language!  This is madness.


No. The common string-handling use case is code that is unaware 
which script (not language, btw) your text is in.


Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Tuesday, 31 May 2016 at 20:56:43 UTC, Andrei Alexandrescu 
wrote:
On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d 
wrote:
In the vast majority of cases what folks care about is full 
character


How are you so sure? -- Andrei


He doesn't need to be sure. You are the one advocating for code 
points, so the burden is on you to present evidence that it's the 
correct choice.


Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu 
wrote:
On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d 
wrote:
Wasn't the whole point of operating at the code point level by 
default to
make it so that code would be operating on full characters by 
default
instead of chopping them up as is so easy to do when operating 
at the code

unit level?


The point is to operate on representation-independent entities 
(Unicode code points) instead of low-level 
representation-specific artifacts (code units).


_Both_ are low-level representation-specific artifacts.


Re: Dealing with Autodecode

2016-06-01 Thread Andrea Fontana via Digitalmars-d
On Wednesday, 1 June 2016 at 08:21:36 UTC, Guillaume Chatelet 
wrote:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:

I have a better one, that we discussed on IRC last night:

1) put the string overloads for front and popFront on a 
version switch:


D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY WITH A SIMPLE MIGRATION PATH


2) After a while, we swap the version conditions, so opting 
into it preserves the old behavior for a while.


3) A wee bit longer, we exterminate all this autodecoding crap 
and enjoy Phobos being a smaller, more efficient library.


+1


+1


Re: The Case Against Autodecode

2016-06-01 Thread Marc Schütz via Digitalmars-d
On Wednesday, 1 June 2016 at 01:13:17 UTC, Steven Schveighoffer 
wrote:

On 5/31/16 4:38 PM, Timon Gehr wrote:

What about e.g. joiner?


Compiler error. Better than what it does now.


I believe everything that does only concatenation will work 
correctly. That's why joiner() is one of those algorithms that 
should accept strings directly without going through any decoding 
(but it may need to recode the joining element itself, of course).


Re: Dealing with Autodecode

2016-06-01 Thread poliklosio via Digitalmars-d

On Wednesday, 1 June 2016 at 05:46:29 UTC, Kirill Kryukov wrote:

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY WITH A SIMPLE MIGRATION PATH


This.
(...)
I don't want to become an expert in avoiding language pitfalls 
(The reason I abandoned C++ years ago).


+1
If you have too many pitfalls in the language, its not easier to 
learn than C++, just different (regardless of the maximum 
productivity you have when using the language, that's another 
issue).
The worst case is you just want to use ASCII text and suddenly 
you have to spend weeks reading a ton of confusing stuff about 
Unicode, D and autodecoding, just to know how to use char[] 
correctly in D.
Compare that to how trivial it is to process ASCII text in, say, 
C++.
And processing just plain ASCII is a very common case, e.g. 
processing textual logs from tools.


Re: Dealing with Autodecode

2016-06-01 Thread Guillaume Piolat via Digitalmars-d

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:

On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:

It is not practical to just delete or deprecate autodecode


Yes, it is.

We need to stop holding on to the mistakes of the past. 9 of 10 
dentists agree that autodecoding is a mistake. Not just WAS a 
mistake, IS a mistake. It has ongoing cost. If we don't fix our 
attitude about these problems, we are going to turn into that 
very demon we despise, yea, even the next C++!




Please, just remove auto-decoding, any way you want. I only ever 
used it once or twice voluntarily. It's a special case that must 
go.

Maybe with a flag like for -vtls.



Re: Dealing with Autodecode

2016-06-01 Thread Seb via Digitalmars-d

On Wednesday, 1 June 2016 at 02:39:55 UTC, Nick Sabalausky wrote:

On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:


version(string_migration)
deprecated void popFront(T)(ref T t) if(isSomeString!T) {
   static assert(0, "this is crap, fix your code.");
}
else
deprecated("use -versionstring_migration to fix your buggy 
code, would

you like to know more?")
/* existing popFront here */



I vote we use Adam's exact verbiage, too! :)



D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY

WITH A SIMPLE MIGRATION PATH



Yes. This. If I wanted an endless bucket of baggage, I'd have 
stuck with C++.


3) A wee bit longer, we exterminate all this autodecoding crap 
and enjoy

Phobos being a smaller, more efficient library.



Yay! Profit!



How about a poll?

http://www.polljunkie.com/poll/ftmibx/remove-auto-decoding-in-d

Results are shown after casting a vote or here:
http://www.polljunkie.com/poll/aqzbwg/remove-auto-decoding-in-d/view


Re: Dealing with Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 08:46 PM, Walter Bright wrote:

It is not practical to just delete or deprecate autodecode - it is too
embedded into things. What we can do, however, is stop using it
ourselves and stop relying on it in the documentation, much like [] is
eschewed in favor of std::vector in C++.

The way to deal with it is to replace reliance on autodecode with
.byDchar (.byDchar has a bonus of not throwing an exception on invalid
UTF, but using the replacement dchar instead.)

To that end, and this will be an incremental process:

1. Temporarily break autodecode such that using it will cause a compile
error. Then, see what breaks in Phobos and fix those to use .byDchar

2. Change examples in the documentation and the Phobos examples to use
.byDchar

3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit
when dealing with ranges/arrays of characters to make it clear what is
happening.


(Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and 
byCodePoint stay as they are.)


4. Rally behind RCStr as the preferred string type of the D language. 
RCStr manages its own memory, is fast, and has the right interface (i.e. 
offers several views for iteration without an implicit one, doesn't 
throw on invalid code points, etc).


This is the key component. We get rid of GC-backed strings, which is 
part of the crucial goal for D we need to achieve, and reap the benefit 
of a better design as a perk. Breaking existing code does not have the 
right benefit for the cost.


Let's keep the eyes on the ball, folks. We want to rid D of the GC. 
That's the prize.



Andrei



Re: faster splitter

2016-06-01 Thread Patrick Schluter via Digitalmars-d

On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:
There is a special version of find for searching a single char 
in a string. Using a one-letter needle string is more like a 
user mistake than something to optimize for.


At compile time you may not know the length of the needle, like 
in the grep command.





Re: faster splitter

2016-06-01 Thread Seb via Digitalmars-d

On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote:

On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:
There is a special version of find for searching a single char 
in a string. Using a one-letter needle string is more like a 
user mistake than something to optimize for.


At compile time you may not know the length of the needle, like 
in the grep command.


1) how about a CTFE find?

s.find!(needle, pred)

If we can initialize boyer-moore or KMP at compile time - it 
should be the fastest!


At ndslice such functions are called bifacial:

http://dlang.org/phobos/std_experimental_ndslice_iteration.html

Imho a lot more in std.algorithm should be able to profit from 
facts known at compile-time.


2) Even for a runtime runtime one-letter needle I am pretty sure 
it's worth to specialize


Re: Transient ranges

2016-06-01 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 31 May 2016 at 18:31:05 UTC, Steven Schveighoffer 
wrote:

On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:
On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d 
wrote:
I'd argue that range-based generic code that assumes 
non-transience is

inherently buggy, because generic code ought not to make any
assumptions beyond what the range API guarantees. Currently, 
the range
API does not guarantee non-transience, therefore code that 
assumes so is
broken by definition.  Just because they happen to work most 
of the time

does not change the fact that they're written wrongly.


Technically, the range API doesn't even require that front 
return the same
value every time that it's called, because isInputRange can't 
possibly test

for it.


The API doesn't require it mechanically, but the API does 
require it semantically (what does popFront mean if front 
changes automatically?). If front returns different things, I'd 
say that's a bug in your range construction.


The `Generator` range is an eager violator of this requirement:
https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080

... although I'd agree that's an implementation error.


Re: faster splitter

2016-06-01 Thread Chris via Digitalmars-d

On Wednesday, 1 June 2016 at 12:41:19 UTC, Seb wrote:
On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter 
wrote:

On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:
There is a special version of find for searching a single 
char in a string. Using a one-letter needle string is more 
like a user mistake than something to optimize for.


At compile time you may not know the length of the needle, 
like in the grep command.


1) how about a CTFE find?

s.find!(needle, pred)

If we can initialize boyer-moore or KMP at compile time - it 
should be the fastest!


At ndslice such functions are called bifacial:

http://dlang.org/phobos/std_experimental_ndslice_iteration.html

Imho a lot more in std.algorithm should be able to profit from 
facts known at compile-time.


2) Even for a runtime runtime one-letter needle I am pretty 
sure it's worth to specialize



That makes sense. I think that std.algorithm needs to be revised 
and optimized for speed. We cannot afford to have suboptimal 
algorithms in there.


Re: Dealing with Autodecode

2016-06-01 Thread Chris via Digitalmars-d
On Wednesday, 1 June 2016 at 12:14:06 UTC, Andrei Alexandrescu 
wrote:

On 05/31/2016 08:46 PM, Walter Bright wrote:

(Shouldn't those be by!dchar, by!wchar, by!char? byCodeUnit and 
byCodePoint stay as they are.)


4. Rally behind RCStr as the preferred string type of the D 
language. RCStr manages its own memory, is fast, and has the 
right interface (i.e. offers several views for iteration 
without an implicit one, doesn't throw on invalid code points, 
etc).


This is the key component. We get rid of GC-backed strings, 
which is part of the crucial goal for D we need to achieve, and 
reap the benefit of a better design as a perk. Breaking 
existing code does not have the right benefit for the cost.


Let's keep the eyes on the ball, folks. We want to rid D of the 
GC. That's the prize.



Andrei


How would the transition look like? How would it affect existing 
code, like e.g.  `countUntil`, `.length` etc.?


Re: faster splitter

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 08:41 AM, Seb wrote:

On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter wrote:

On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:

There is a special version of find for searching a single char in a
string. Using a one-letter needle string is more like a user mistake
than something to optimize for.


At compile time you may not know the length of the needle, like in the
grep command.


1) how about a CTFE find?

s.find!(needle, pred)

If we can initialize boyer-moore or KMP at compile time - it should be
the fastest!


That would require partial evaluation, which sadly we don't have in D. 
-- Andrei




Re: faster splitter

2016-06-01 Thread Patrick Schluter via Digitalmars-d

On Wednesday, 1 June 2016 at 12:41:19 UTC, Seb wrote:
On Wednesday, 1 June 2016 at 12:14:07 UTC, Patrick Schluter 
wrote:

On Tuesday, 31 May 2016 at 17:54:34 UTC, qznc wrote:
There is a special version of find for searching a single 
char in a string. Using a one-letter needle string is more 
like a user mistake than something to optimize for.


At compile time you may not know the length of the needle, 
like in the grep command.


1) how about a CTFE find?

What I wanted to say, is that in real life, the input of the 
search routine is very often run-time user provided data. Think 
of search box in browsers and apps, command line parameter à la 
grep, etc. The "string" search function should not 
catastrophically break down on special input, like 1 character 
strings, unusual Unicode or when needle==haystack. I only said 
this to not lose the focus on what is being tried to be achieved 
here.
It's often a danger of micro-optimization and unit test focused 
development, that a lot of time and effort are spent on 
improvements that are completely irrelevant when checked against 
what is really needed in real world (i.e. we're full in bike shed 
territory here).


Re: The Case Against Autodecode

2016-06-01 Thread Joakim via Digitalmars-d

On Wednesday, 1 June 2016 at 10:04:42 UTC, Marc Schütz wrote:

On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
UTF-8 is an antiquated hack that needs to be eradicated.  It 
forces all other languages than English to be twice as long, 
for no good reason, have fun with that when you're downloading 
text on a 2G connection in the developing world.


I assume you're talking about the web here. In this case, plain 
text makes up only a minor part of the entire traffic, the 
majority of which is images (binary data), javascript and 
stylesheets (almost pure ASCII), and HTML markup (ditto). It's 
like not significant even without taking compression into 
account, which is ubiquitous.


No, I explicitly said not the web in a subsequent post.  The 
ignorance here of what 2G speeds are like is mind-boggling.


It is unnecessarily inefficient, which is precisely why 
auto-decoding is a problem.


No, inefficiency is the least of the problems with 
auto-decoding.


Right... that's why this 200-post thread was spawned with that as 
the main reason.



It is only a matter of time till UTF-8 is ditched.


This is ridiculous, even if your other claims were true.


The UTF-8 encoding is what's ridiculous.



D devs should lead the way in getting rid of the UTF-8 
encoding, not bickering about how to make it more palatable.  
I suggested a single-byte encoding for most languages, with 
double-byte for the ones which wouldn't fit in a byte.  Use 
some kind of header or other metadata to combine strings of 
different languages, _rather than encoding the language into 
every character!_


I think I remember that post, and - sorry to be so blunt - it 
was one of the worst things I've ever seen proposed regarding 
text encoding.


Well, when you _like_ a ludicrous encoding like UTF-8, not sure 
your opinion matters.




The common string-handling use case, by far, is strings with 
only one language, with a distant second some substrings in a 
second language, yet here we are putting the overhead into 
every character to allow inserting characters from an 
arbitrary language!  This is madness.


No. The common string-handling use case is code that is unaware 
which script (not language, btw) your text is in.


Lol, this may be the dumbest argument put forth yet.

I don't think anyone here even understands what a good encoding 
is and what it's for, which is why there's no point in debating 
this.


Re: Transient ranges

2016-06-01 Thread Patrick Schluter via Digitalmars-d
On Tuesday, 31 May 2016 at 12:42:23 UTC, Steven Schveighoffer 
wrote:



There are 2 main issues with FILE *:

1) it does not provide buffer access, so you must rely on 
things like getline if they exist. But these have their own 
problems (i.e. do not support unicode, require C-malloc'd 
buffer)


You can cheat by using setvbuf() and imposing your own buffer to 
the FILE* routine. What and how the underlying implementation put 
in that buffer is of course not documented but not very difficult 
to guess (for example fseek()/fread() will always fill the buffer 
from 0 to the end (of buffer or file depending what come first).


Re: Dealing with Autodecode

2016-06-01 Thread Jack Stouffer via Digitalmars-d

On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:

On 5/31/2016 6:36 PM, Adam D. Ruppe wrote:
Our preliminary investigation found about 130 places in Phobos 
that need to be

changed. That's not hard to fix!


PRs please!


https://github.com/dlang/phobos/pull/4322


Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 06:25 AM, Marc Schütz wrote:

On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote:

On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:

Wasn't the whole point of operating at the code point level by
default to
make it so that code would be operating on full characters by default
instead of chopping them up as is so easy to do when operating at the
code
unit level?


The point is to operate on representation-independent entities
(Unicode code points) instead of low-level representation-specific
artifacts (code units).


_Both_ are low-level representation-specific artifacts.


Maybe this is a misunderstanding. Representation = how things are laid 
out in memory. What does associating numbers with various Unicode 
symbols have to do with representation? -- Andrei




Re: faster splitter

2016-06-01 Thread Chris via Digitalmars-d

On Wednesday, 1 June 2016 at 13:47:10 UTC, Patrick Schluter wrote:


What I wanted to say, is that in real life, the input of the 
search routine is very often run-time user provided data. Think 
of search box in browsers and apps, command line parameter à la 
grep, etc. The "string" search function should not 
catastrophically break down on special input, like 1 character 
strings, unusual Unicode or when needle==haystack. I only said 
this to not lose the focus on what is being tried to be 
achieved here.
It's often a danger of micro-optimization and unit test focused 
development, that a lot of time and effort are spent on 
improvements that are completely irrelevant when checked 
against what is really needed in real world (i.e. we're full in 
bike shed territory here).


That's true. We should try to optimize for the worst case, i.e. 
random input. However, it is often the case that the needle is 
known at compile time, e.g. things like


if (input.canFind("foo"))

There might be ways to optimize those cases at compile time.


Re: Code security: "auto" / Reason for errors

2016-06-01 Thread John Nixon via Digitalmars-d
On Wednesday, 2 March 2016 at 21:37:56 UTC, Steven Schveighoffer 
wrote:


Pointer copying is inherent in D. Everything is done at the 
"head", deep copies are never implicit. This is a C-like 
language, so one must expect this kind of behavior and plan for 
it.


I sympathise with Ozan. What is the best reference you know that 
explains this fully?


Clearly from your comments, we have lost the argument as far as D 
is concerned. This leads me to question whether a computer 
language that is similar to D except that all variables of any 
type are considered in the same way as objects that own their own 
data has been considered? I would like to suggest the following:

1. Assignment would imply full (deep) copies
2. “dup” functions would not need to exist
3. I think the const system could be much simpler, perhaps more 
like it is in C++
4. Function parameters would be passed by reference by default 
(to avoid unnecessary copying, but with a reliable const system)


I realise that copying large objects might then happen in 
erroneous code when not intended, but it would be easy for the 
programmer to diagnose this.


I raised a similar issue with the following program that was 
being discussed in the Learn forum for D and works correctly. 
Adding another writeln statement shows that the second line of 
test_fun calls first CS.this then CS.opAssign. An alternative 
version using a .dup function also works.


import std.stdio;

struct CS {
char[] t;
this(const CS rhs) {
this = rhs;
}
CS opAssign(const CS rhs) {
writeln("CS.opAssign called");
this.t = rhs.t.dup;
return this;
}
};

void test_fun(const ref CS rhs) {
auto cs = CS(rhs);
writeln("cs = ",cs);
}

void main() {
CS rhs;
rhs.t = "string".dup;
test_fun(rhs);
return;
}

I wanted to be able to write instead simply:

struct CS {
char[] t;
};

void test_fun(const ref CS rhs) {
auto cs = rhs;
writeln("cs = ",cs);
}

void main() {
CS rhs;
rhs.t = "string".dup;
test_fun(rhs);
return;
}

i.e. the functions this and opAssign aren’t needed, and the 
simple definition/assignment “auto cs = rhs;” would carry out the 
full copying of the CS object (as is the case for the simplest of 
D’s fundamental types). This would guarantee that rhs could not 
be changed in test_fun and allow it to be declared const. It 
seems to me there would be great advantages in the simpler syntax 
(especially if CS had many other members).


I think shallow copying at any level should not be the default 
especially as in general there would be a level of copying that 
needs specification i.e. how many pointers to de-reference e.g. 
if some elements were obtained by multiple indirection e.g. an 
array of arrays of char , char [][], where each array is dynamic.


[OT] Effect of UTF-8 on 2G connections

2016-06-01 Thread Marco Leise via Digitalmars-d
Am Wed, 01 Jun 2016 13:57:27 +
schrieb Joakim :

> No, I explicitly said not the web in a subsequent post.  The 
> ignorance here of what 2G speeds are like is mind-boggling.

I've used 56k and had a phone conversation with my sister
while she was downloading a 800 MiB file over 2G. You just
learn to be patient (or you already are when the next major
city is hundreds of kilometers away) and load only what you
need. Your point about the costs convinced me more.

Here is one article spiced up with numbers and figures:
http://www.thequint.com/technology/2016/05/30/almost-every-indian-may-be-online-if-data-cost-cut-to-one-third

But even if you could prove with a study that UTF-8 caused a
notable bandwith cost in real life, it would - I think - be a
matter of regional ISPs to provide special servers and apps
that reduce data volume. There is also the overhead of
key exchange when establishing a secure connection:
http://stackoverflow.com/a/20306907/4038614
Something every app should do, but will increase bandwidth use.
Then there is the overhead of using XML in applications
like WhatsApp, which I presume is quite popular around the
world. I'm just trying to broaden the view a bit here.
This note from the XMPP that WhatsApp and Jabber use will make
you cringe: https://tools.ietf.org/html/rfc6120#section-11.6

-- 
Marco



[OT] The Case Against... Unicode?

2016-06-01 Thread Wyatt via Digitalmars-d

On Wednesday, 1 June 2016 at 13:57:27 UTC, Joakim wrote:


No, I explicitly said not the web in a subsequent post.  The 
ignorance here of what 2G speeds are like is mind-boggling.


It's not hard.  I think a lot of us remember when a 14.4 modem 
was cutting-edge.  Codepages and incompatible encodings were 
terrible then, too.


Never again.

Well, when you _like_ a ludicrous encoding like UTF-8, not sure 
your opinion matters.


It _is_ kind of ludicrous, isn't it?  But it really is the 
least-bad option for the most text.  Sorry, bub.


No. The common string-handling use case is code that is 
unaware which script (not language, btw) your text is in.


Lol, this may be the dumbest argument put forth yet.


This just makes it feel like you're trolling.  You're not just 
trolling, right?


I don't think anyone here even understands what a good encoding 
is and what it's for, which is why there's no point in debating 
this.


And I don't think you realise how backwards you sound to people 
who had to live through the character encoding hell of the past.  
This has been an ongoing headache for the better part of a 
century (it still comes up in old files, sites, and systems) and 
you're literally the only person I've ever seen seriously suggest 
we turn back now that the madness has been somewhat tamed.


If you have to deal with delivering the fastest possible i18n at 
GSM data rates, well, that's a tough problem and it sounds like 
you might need to do something pretty special. Turning the entire 
ecosystem into your special case is not the answer.


-Wyatt


Re: Dealing with Autodecode

2016-06-01 Thread Jack Stouffer via Digitalmars-d

On Wednesday, 1 June 2016 at 02:28:04 UTC, Jonathan M Davis wrote:
The other critical thing is to make sure that Phobos in general 
works with byDChar, byCodeUnit, etc. For instance, pretty much 
as soon as I started trying to use byCodeUnit instead of naked 
strings, I ran into this:


https://issues.dlang.org/show_bug.cgi?id=15800


https://github.com/dlang/phobos/pull/4390




Re: Dealing with Autodecode

2016-06-01 Thread Kagamin via Digitalmars-d

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:

version(string_migration)
deprecated void popFront(T)(ref T t) if(isSomeString!T) {
  static assert(0, "this is crap, fix your code.");
}
else
deprecated("use -versionstring_migration to fix your buggy 
code, would you like to know more?")

/* existing popFront here */


version(autodecode_migration)
deprecated("autodecode attempted, use byDchar instead")
alias popFront=_d_popFront;
else
alias popFront=_d_popFront;

void _d_popFront(T)(ref T t) if(isSomeString!T) {
/* existing popFront here */
}

The migration branch should compile and work or template 
constraints will silently fail. Then deprecation messages can be 
grepped. That said does compiler print deprecation messages 
triggered inside template constraints?


Re: Dealing with Autodecode

2016-06-01 Thread Jonathan M Davis via Digitalmars-d
On Wednesday, June 01, 2016 08:14:06 Andrei Alexandrescu via Digitalmars-d 
wrote:
> 4. Rally behind RCStr as the preferred string type of the D language.
> RCStr manages its own memory, is fast, and has the right interface (i.e.
> offers several views for iteration without an implicit one, doesn't
> throw on invalid code points, etc).
>
> This is the key component. We get rid of GC-backed strings, which is
> part of the crucial goal for D we need to achieve, and reap the benefit
> of a better design as a perk. Breaking existing code does not have the
> right benefit for the cost.
>
> Let's keep the eyes on the ball, folks. We want to rid D of the GC.
> That's the prize.

Since when has it been the goal to get rid of GC-allocated strings? We
definitely want an alternative to GC-allocated strings for code that can't
afford to use the GC, but auto-decoding issues aside, why would I want to
use RCString instead of string if the GC isn't a problem for my program?
Walter pointed out at dconf that using a GC is often faster than reference
counting; it's just that it can incur a large cost at once when a collection
is run, whereas the cost of ref-counting is amortized across the time that
the program is running.

I expect that RCString will be very important for us going forward, but I
don't see much reason to use it as the default string type in code over just
using string except for the fact that we have the auto-decoding mess to deal
with. It seems more like RCString is an optimization for certain types of
programs than what you'd want to use by default.

- Jonathan M Davis



Re: Dealing with Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:

It seems more like RCString is an optimization for certain types of
programs than what you'd want to use by default.


You'll always want to use it. The small string optimization will make it 
compelling for all applications. -- Andrei


Re: Code security: "auto" / Reason for errors

2016-06-01 Thread deadalnix via Digitalmars-d

On Wednesday, 2 March 2016 at 19:42:02 UTC, Ozan wrote:

Hi

I despair of "auto var1 = var2"for arrays. Isn't it a open door 
for errors. Example


import std.stdio;

void main()
{
int[] a;
foreach(i; 0..10) a ~= i;
auto b = a; // correct dlang coding: auto b = a.dup;

a[2] = 1;
b[2] = 5; // Overwrites assignment before
writeln(a);
	writeln(b); // Always a == b but developer would like to have 
(a != b)

}

The behaviour is different to other non-container datatypes.
So in a first view, it looks like a data copy but it's only a 
pointer copy.


Regards, Ozan


Everything behaves as designed, auto changes nothing in the 
example and there is no security concern.


We have a bingo.



Re: Code security: "auto" / Reason for errors

2016-06-01 Thread Kagamin via Digitalmars-d

On Wednesday, 1 June 2016 at 14:52:29 UTC, John Nixon wrote:
Clearly from your comments, we have lost the argument as far as 
D is concerned. This leads me to question whether a computer 
language that is similar to D except that all variables of any 
type are considered in the same way as objects that own their 
own data has been considered? I would like to suggest the 
following:

1. Assignment would imply full (deep) copies
2. “dup” functions would not need to exist
3. I think the const system could be much simpler, perhaps more 
like it is in C++
4. Function parameters would be passed by reference by default 
(to avoid unnecessary copying, but with a reliable const system)


Value-type containers are planned for phobos, but not done yet. 
You can try 
https://github.com/economicmodeling/containers/blob/master/src/containers/dynamicarray.d - it's not copyable currently.


Re: [OT] The Case Against... Unicode?

2016-06-01 Thread Patrick Schluter via Digitalmars-d

On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:

On Wednesday, 1 June 2016 at 13:57:27 UTC, Joakim wrote:


No, I explicitly said not the web in a subsequent post.  The 
ignorance here of what 2G speeds are like is mind-boggling.


It's not hard.  I think a lot of us remember when a 14.4 modem 
was cutting-edge.  Codepages and incompatible encodings were 
terrible then, too.


Never again.

Well, when you _like_ a ludicrous encoding like UTF-8, not 
sure your opinion matters.


It _is_ kind of ludicrous, isn't it?  But it really is the 
least-bad option for the most text.  Sorry, bub.


No. The common string-handling use case is code that is 
unaware which script (not language, btw) your text is in.


Lol, this may be the dumbest argument put forth yet.


This just makes it feel like you're trolling.  You're not just 
trolling, right?


I don't think anyone here even understands what a good 
encoding is and what it's for, which is why there's no point 
in debating this.


And I don't think you realise how backwards you sound to people 
who had to live through the character encoding hell of the 
past.  This has been an ongoing headache for the better part of 
a century (it still comes up in old files, sites, and systems) 
and you're literally the only person I've ever seen seriously 
suggest we turn back now that the madness has been somewhat 
tamed.


Indeed, Joakim's proposal is so insane it beggars belief (why not 
go back to baudot encoding, it's only 5 bit, hurray, it's so much 
faster when used with flag semaphores).


As a programmer in the European Commission translation unit, 
working on the probably biggest translation memory in the world 
for 14 years, I can attest that Unicode is a blessing. When I 
remember the shit we had in our documents because of the code 
pages before most programs could handle utf-8 or utf-16 (and 
before 2004 we only had 2 alphabets to take care of, Western and 
Greek). What Joakim does not understand, is that there are huge, 
huge quantities of documents that are multi-lingual. Translators 
of course handle nearly exclusively with at least bi-lingual 
documents. Any document encountered by a translator must at least 
be able to present the source and the target language. But even 
outside of that specific population, multilingual documents are 
very, very common.




If you have to deal with delivering the fastest possible i18n 
at GSM data rates, well, that's a tough problem and it sounds 
like you might need to do something pretty special. Turning the 
entire ecosystem into your special case is not the answer.








Re: [OT] The Case Against... Unicode?

2016-06-01 Thread deadalnix via Digitalmars-d

On Wednesday, 1 June 2016 at 16:15:15 UTC, Patrick Schluter wrote:
What Joakim does not understand, is that there are huge, huge 
quantities of documents that are multi-lingual.


That should be obvious to anyone living outside the USA.



Re: The Case Against Autodecode

2016-06-01 Thread Nick Sabalausky via Digitalmars-d

On 06/01/2016 10:29 AM, Andrei Alexandrescu wrote:

On 06/01/2016 06:25 AM, Marc Schütz wrote:

On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote:


The point is to operate on representation-independent entities
(Unicode code points) instead of low-level representation-specific
artifacts (code units).


_Both_ are low-level representation-specific artifacts.


Maybe this is a misunderstanding. Representation = how things are laid
out in memory. What does associating numbers with various Unicode
symbols have to do with representation? -- Andrei



As has been explained countless times already, code points are a non-1:1 
internal representation of graphemes. Code points don't exist for their 
own sake, their entire existence is purely as a way to encode graphemes. 
Whether that technically qualifies as "memory representation" or not is 
irrelevant: it's still a low-level implementation detail of text.




Re: [OT] Effect of UTF-8 on 2G connections

2016-06-01 Thread Joakim via Digitalmars-d

On Wednesday, 1 June 2016 at 14:58:47 UTC, Marco Leise wrote:

Am Wed, 01 Jun 2016 13:57:27 +
schrieb Joakim :

No, I explicitly said not the web in a subsequent post.  The 
ignorance here of what 2G speeds are like is mind-boggling.


I've used 56k and had a phone conversation with my sister while 
she was downloading a 800 MiB file over 2G. You just learn to 
be patient (or you already are when the next major city is 
hundreds of kilometers away) and load only what you need. Your 
point about the costs convinced me more.


I see that max 2G speeds are 100-200 kbits/s.  At that rate, it 
would have taken her more than 10 hours to download such a large 
file, that's nuts.  The worst part is when the download gets 
interrupted and you have to start over again because most 
download managers don't know how to resume, including the stock 
one on Android.


Also, people in these countries buy packs of around 100-200 MB 
for 30-60 US cents, so they would never download such a large 
file.  They use messaging apps like Whatsapp or WeChat, which 
nobody in the US uses, to avoid onerous SMS charges.


Here is one article spiced up with numbers and figures: 
http://www.thequint.com/technology/2016/05/30/almost-every-indian-may-be-online-if-data-cost-cut-to-one-third


Yes, only the middle class, which are at most 10-30% of the 
population in these developing countries, can even afford 2G.  
The way to get costs down even further is to make the tech as 
efficient as possible.  Of course, much of the rest of the 
population are illiterate, so there are bigger problems there.



But even if you could prove with a study that UTF-8 caused a
notable bandwith cost in real life, it would - I think - be a
matter of regional ISPs to provide special servers and apps
that reduce data volume.


Yes, by ditching UTF-8.


There is also the overhead of
key exchange when establishing a secure connection:
http://stackoverflow.com/a/20306907/4038614
Something every app should do, but will increase bandwidth use.


That's not going to happen, even HTTP/2 ditched that requirement. 
 Also, many of those countries' govts will not allow it: google 
how Blackberry had to give up their keys for "secure" BBM in many 
countries.  It's not just Canada and the US spying on their 
citizens.



Then there is the overhead of using XML in applications
like WhatsApp, which I presume is quite popular around the
world. I'm just trying to broaden the view a bit here.


I didn't know they used XML.  Googling it now, I see mention that 
they switched to an "internally developed protocol" at some 
point, so I doubt they're using XML now.



This note from the XMPP that WhatsApp and Jabber use will make
you cringe: https://tools.ietf.org/html/rfc6120#section-11.6


Haha, no wonder Jabber is dead. :) I jumped on Jabber for my own 
messages a decade ago, as it seemed like an open way out of that 
proprietary messaging mess, then I read that they're using XML 
and gave up on it.


On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:

On Wednesday, 1 June 2016 at 13:57:27 UTC, Joakim wrote:


No, I explicitly said not the web in a subsequent post.  The 
ignorance here of what 2G speeds are like is mind-boggling.


It's not hard.  I think a lot of us remember when a 14.4 modem 
was cutting-edge.


Well, then apparently you're unaware of how bloated web pages are 
nowadays.  It used to take me minutes to download popular web 
pages _back then_ at _top speed_, and those pages were a _lot_ 
smaller.



Codepages and incompatible encodings were terrible then, too.

Never again.


This only shows you probably don't know the difference between an 
encoding and a code page, which are orthogonal concepts in 
Unicode.  It's not surprising, as Walter and many others 
responding show the same ignorance.  I explained this repeatedly 
in the previous thread, but it depends on understanding the tech, 
and I can't spoon-feed that to everyone.


Well, when you _like_ a ludicrous encoding like UTF-8, not 
sure your opinion matters.


It _is_ kind of ludicrous, isn't it?  But it really is the 
least-bad option for the most text.  Sorry, bub.


I think we can do a lot better.

No. The common string-handling use case is code that is 
unaware which script (not language, btw) your text is in.


Lol, this may be the dumbest argument put forth yet.


This just makes it feel like you're trolling.  You're not just 
trolling, right?


Are you trolling?  Because I was just calling it like it is.

The vast majority of software is written for _one_ language, the 
local one.  You may think otherwise because the software that 
sells the most and makes the most money is internationalized 
software like Windows or iOS, because it can be resold into many 
markets.  But as a percentage of lines of code written, such 
international code is almost nothing.


I don't think anyone here even understands what a good 
encoding is and what it's for, which is why there's no point 
in debating this

Re: [OT] The Case Against... Unicode?

2016-06-01 Thread Kagamin via Digitalmars-d

On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
If you have to deal with delivering the fastest possible i18n 
at GSM data rates, well, that's a tough problem and it sounds 
like you might need to do something pretty special. Turning the 
entire ecosystem into your special case is not the answer.


UTF-8 encoded SMS work fine for me in GSM network, didn't notice 
any problem.


Re: [OT] The Case Against... Unicode?

2016-06-01 Thread Nick Sabalausky via Digitalmars-d

On 06/01/2016 12:26 PM, deadalnix wrote:

On Wednesday, 1 June 2016 at 16:15:15 UTC, Patrick Schluter wrote:

What Joakim does not understand, is that there are huge, huge
quantities of documents that are multi-lingual.


That should be obvious to anyone living outside the USA.



Or anyone in the USA who's ever touched a product that includes a manual 
or a safety warning, or gone to high school (a foreign language class is 
pretty much universally mandatory, even in the US).




Re: [OT] The Case Against... Unicode?

2016-06-01 Thread Kagamin via Digitalmars-d

On Wednesday, 1 June 2016 at 16:26:36 UTC, deadalnix wrote:
On Wednesday, 1 June 2016 at 16:15:15 UTC, Patrick Schluter 
wrote:
What Joakim does not understand, is that there are huge, huge 
quantities of documents that are multi-lingual.


That should be obvious to anyone living outside the USA.


https://msdn.microsoft.com/th-th inside too :)


Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 12:41 PM, Nick Sabalausky wrote:

As has been explained countless times already, code points are a non-1:1
internal representation of graphemes. Code points don't exist for their
own sake, their entire existence is purely as a way to encode graphemes.


Of course, thank you.


Whether that technically qualifies as "memory representation" or not is
irrelevant: it's still a low-level implementation detail of text.


The relevance is meandering across the discussion, and it's good to have 
the same definitions for terms. Unicode code points are abstract notions 
with meanings attached to them, whereas UTF8/16/32 are concerned with 
their representation.



Andrei



Re: The Case Against Autodecode

2016-06-01 Thread ZombineDev via Digitalmars-d
On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu 
wrote:

On 05/31/2016 02:46 PM, Timon Gehr wrote:

On 31.05.2016 20:30, Andrei Alexandrescu wrote:

D's


Phobos'


foreach, too. -- Andrei


Incorrect. https://dpaste.dzfl.pl/ba7a65d59534


Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 01:35 PM, ZombineDev wrote:

On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu wrote:

On 05/31/2016 02:46 PM, Timon Gehr wrote:

On 31.05.2016 20:30, Andrei Alexandrescu wrote:

D's


Phobos'


foreach, too. -- Andrei


Incorrect. https://dpaste.dzfl.pl/ba7a65d59534


Try typing the iteration variable with "dchar". -- Andrei


Re: The Case Against Autodecode

2016-06-01 Thread Adam D. Ruppe via Digitalmars-d
On Wednesday, 1 June 2016 at 17:57:15 UTC, Andrei Alexandrescu 
wrote:

Try typing the iteration variable with "dchar". -- Andrei


Or you can type it as wchar...

But important to note: that's opt in, not automatic.


Re: [OT] Effect of UTF-8 on 2G connections

2016-06-01 Thread Wyatt via Digitalmars-d

On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:

On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
It's not hard.  I think a lot of us remember when a 14.4 modem 
was cutting-edge.


Well, then apparently you're unaware of how bloated web pages 
are nowadays.  It used to take me minutes to download popular 
web pages _back then_ at _top speed_, and those pages were a 
_lot_ smaller.


It's telling that you think the encoding of the text is anything 
but the tiniest fraction of the problem.  You should look at 
where the actual weight of a "modern" web page comes from.



Codepages and incompatible encodings were terrible then, too.

Never again.


This only shows you probably don't know the difference between 
an encoding and a code page,


"I suggested a single-byte encoding for most languages, with 
double-byte for the ones which wouldn't fit in a byte. Use some 
kind of header or other metadata to combine strings of different 
languages, _rather than encoding the language into every 
character!_"


Yeah, that?  That's codepages.  And your exact proposal to put 
encodings in the header was ALSO tried around the time that 
Unicode was getting hashed out.  It sucked.  A lot.  (Not as bad 
as storing it in the directory metadata, though.)


Well, when you _like_ a ludicrous encoding like UTF-8, not 
sure your opinion matters.


It _is_ kind of ludicrous, isn't it?  But it really is the 
least-bad option for the most text.  Sorry, bub.


I think we can do a lot better.


Maybe.  But no one's done it yet.

The vast majority of software is written for _one_ language, 
the local one.  You may think otherwise because the software 
that sells the most and makes the most money is 
internationalized software like Windows or iOS, because it can 
be resold into many markets.  But as a percentage of lines of 
code written, such international code is almost nothing.


I'm surprised you think this even matters after talking about web 
pages.  The browser is your most common string processing 
situation.  Nothing else even comes close.


largely ignoring the possibilities of the header scheme I 
suggested.


"Possibilities" that were considered and discarded decades ago by 
people with way better credentials.  The era of single-byte 
encodings is gone, it won't come back, and good riddance to bad 
rubbish.


I could call that "trolling" by all of you, :) but I'll instead 
call it what it likely is, reactionary thinking, and move on.


It's not trolling to call you out for clearly not doing your 
homework.



I don't think you understand: _you_ are the special case.


Oh, I understand perfectly.  _We_ (whoever "we" are) can handle 
any sequence of glyphs and combining characters (correctly-formed 
or not) in any language at any time, so we're the special case...?


Yeah, it sounds funny to me, too.

The 5 billion people outside the US and EU are _not the special 
case_.


Fortunately, it works for them to.

The problem is all the rest, and those just below who cannot 
afford it at all, in part because the tech is not as efficient 
as it could be yet.  Ditching UTF-8 will be one way to make it 
more efficient.


All right, now you've found the special case; the case where the 
generic, unambiguous encoding may need to be lowered to something 
else: people for whom that encoding is suboptimal because of 
_current_ network constraints.


I fully acknowledge it's a couple billion people and that's 
nothing to sneeze at, but I also see that it's a situation that 
will become less relevant over time.


-Wyatt


Re: The Case Against Autodecode

2016-06-01 Thread ZombineDev via Digitalmars-d
On Wednesday, 1 June 2016 at 17:57:15 UTC, Andrei Alexandrescu 
wrote:

On 06/01/2016 01:35 PM, ZombineDev wrote:
On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu 
wrote:

On 05/31/2016 02:46 PM, Timon Gehr wrote:

On 31.05.2016 20:30, Andrei Alexandrescu wrote:

D's


Phobos'


foreach, too. -- Andrei


Incorrect. https://dpaste.dzfl.pl/ba7a65d59534


Try typing the iteration variable with "dchar". -- Andrei


I think you are not getting my point. This is not autodecoding. 
There is nothing auto-magic w.r.t. strings in plain foreach. 
Typing char, wchar or dchar is the same using byChar, byWchar or 
byDchar - it is opt-in. The only problems are the front, empty 
and popFront overloads for narrow strings.


Re: The Case Against Autodecode

2016-06-01 Thread ZombineDev via Digitalmars-d

On Wednesday, 1 June 2016 at 19:07:26 UTC, ZombineDev wrote:
On Wednesday, 1 June 2016 at 17:57:15 UTC, Andrei Alexandrescu 
wrote:

On 06/01/2016 01:35 PM, ZombineDev wrote:
On Tuesday, 31 May 2016 at 19:33:03 UTC, Andrei Alexandrescu 
wrote:

On 05/31/2016 02:46 PM, Timon Gehr wrote:

On 31.05.2016 20:30, Andrei Alexandrescu wrote:

D's


Phobos'


foreach, too. -- Andrei


Incorrect. https://dpaste.dzfl.pl/ba7a65d59534


Try typing the iteration variable with "dchar". -- Andrei


I think you are not getting my point. This is not autodecoding. 
There is nothing auto-magic w.r.t. strings in plain foreach. 
Typing char, wchar or dchar is the same using byChar, byWchar 
or byDchar - it is opt-in. The only problems are the front, 
empty and popFront overloads for narrow strings...


in std.range.primitives.




Re: Interest in Paris area D meetups?

2016-06-01 Thread Claude via Digitalmars-d
On Wednesday, 18 May 2016 at 15:05:21 UTC, Guillaume Chatelet 
wrote:

I got inspired by Steven's thread :)
Anyone in Paris interested in D meetups?


Sorry for the later reply, but yes, I'd be interested by a meetup 
in Paris. Anyone else?


Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 03:07 PM, ZombineDev wrote:

This is not autodecoding. There is nothing auto-magic w.r.t. strings in
plain foreach.


I understand where you're coming from, but it actually is autodecoding. 
Consider:


byte[] a;
foreach (byte x; a) {}
foreach (short x; a) {}
foreach (int x; a) {}

That works by means of a conversion short->int. However:

char[] a;
foreach (char x; a) {}
foreach (wchar x; a) {}
foreach (dchar x; a) {}

The latter two do autodecoding, not coversion as the rest of the language.


Andrei



Re: Dealing with Autodecode

2016-06-01 Thread Timon Gehr via Digitalmars-d

On 01.06.2016 17:30, Andrei Alexandrescu wrote:

On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:

It seems more like RCString is an optimization for certain types of
programs than what you'd want to use by default.


You'll always want to use it. The small string optimization will make it
compelling for all applications. -- Andrei



- Why is it dependent on the allocation strategy or on the type of the data?

- It seems to be a pessimization if I'm taking a lot of small slices.

- It is undesirable if I later want to reference-compare those slices.


Re: Interest in Paris area D meetups?

2016-06-01 Thread Guillaume Chatelet via Digitalmars-d

On Wednesday, 1 June 2016 at 19:25:13 UTC, Claude wrote:
On Wednesday, 18 May 2016 at 15:05:21 UTC, Guillaume Chatelet 
wrote:

I got inspired by Steven's thread :)
Anyone in Paris interested in D meetups?


Sorry for the later reply, but yes, I'd be interested by a 
meetup in Paris. Anyone else?


Two sounds like a good start ^__^
We can start with a beer somewhere :)


Re: Dealing with Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 04:28 PM, Timon Gehr wrote:

On 01.06.2016 17:30, Andrei Alexandrescu wrote:

On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:

It seems more like RCString is an optimization for certain types of
programs than what you'd want to use by default.


You'll always want to use it. The small string optimization will make it
compelling for all applications. -- Andrei



- Why is it dependent on the allocation strategy or on the type of the
data?


Not getting this.


- It seems to be a pessimization if I'm taking a lot of small slices.


I agree cases can be created in which straight arrays do sometimes 
better. They are rare and far between - for strings, the small string 
optimization is to live by.



- It is undesirable if I later want to reference-compare those slices.


Arrays will still be usable.


Andrei



RCString

2016-06-01 Thread Timon Gehr via Digitalmars-d

On 01.06.2016 22:43, Andrei Alexandrescu wrote:

On 06/01/2016 04:28 PM, Timon Gehr wrote:

On 01.06.2016 17:30, Andrei Alexandrescu wrote:

On 06/01/2016 11:24 AM, Jonathan M Davis via Digitalmars-d wrote:

It seems more like RCString is an optimization for certain types of
programs than what you'd want to use by default.


You'll always want to use it. The small string optimization will make it
compelling for all applications. -- Andrei



- Why is it dependent on the allocation strategy or on the type of the
data?


Not getting this.
...


The small string optimization also works for GC-allocated strings. Why 
do I always want to use RCString instead of the corresponding GCString?
(Also, the same approach can be applied to other arrays with value 
semantics.)




Re: Code security: "auto" / Reason for errors

2016-06-01 Thread Timon Gehr via Digitalmars-d

On 01.06.2016 17:34, deadalnix wrote:

On Wednesday, 2 March 2016 at 19:42:02 UTC, Ozan wrote:

Hi

I despair of "auto var1 = var2"for arrays. Isn't it a open door for
errors. Example

import std.stdio;

void main()
{
int[] a;
foreach(i; 0..10) a ~= i;
auto b = a; // correct dlang coding: auto b = a.dup;

a[2] = 1;
b[2] = 5; // Overwrites assignment before
writeln(a);
writeln(b); // Always a == b but developer would like to have (a
!= b)
}

The behaviour is different to other non-container datatypes.
So in a first view, it looks like a data copy but it's only a pointer
copy.

Regards, Ozan


Everything behaves as designed, auto changes nothing in the example and
there is no security concern.

We have a bingo.



Mutable aliasing can be error prone if it is not what you need, because 
then it is essentially a form of manual memory management. Built-in 
slices are likely just too low-level for the OP.


Re: Code security: "auto" / Reason for errors

2016-06-01 Thread John Nixon via Digitalmars-d

On Wednesday, 1 June 2016 at 15:56:24 UTC, Kagamin wrote:


Value-type containers are planned for phobos, but not done yet.


Thank you for this info. This is probably what I want, meanwhile 
I’ll try to work round it. If you have any indication of the 
timing it might be useful.





Re: The Case Against Autodecode

2016-06-01 Thread Jack Stouffer via Digitalmars-d
On Wednesday, 1 June 2016 at 19:52:01 UTC, Andrei Alexandrescu 
wrote:

foreach (dchar x; a) {}
The latter two do autodecoding, not coversion as the rest of 
the language.


This seems to be a miscommunication with semantics. This is not 
auto-decoding at all; you're decoding, but there is nothing 
"auto" about it. This code is an explicit choice by the 
programmer to do something.


On the other hand, using std.range.primitives.front for narrow 
strings is auto-decoding because the programmer has not made a 
choice, the choice is made for the programmer.


Re: Code security: "auto" / Reason for errors

2016-06-01 Thread Alex Parrill via Digitalmars-d

On Wednesday, 1 June 2016 at 14:52:29 UTC, John Nixon wrote:
On Wednesday, 2 March 2016 at 21:37:56 UTC, Steven 
Schveighoffer wrote:


Pointer copying is inherent in D. Everything is done at the 
"head", deep copies are never implicit. This is a C-like 
language, so one must expect this kind of behavior and plan 
for it.


I sympathise with Ozan. What is the best reference you know 
that explains this fully?


Slices/dynamic arrays are literally just a pointer (arr.ptr) and 
a length (arr.length).


Assigning a slice simply copies the ptr and length fields, 
causing the slice to refer to the entire section of data. Slicing 
(arr[1..2]) returns a new slice with the ptr and length fields 
updated.


(This also means you can slice arbitrary pointers; ex. 
`(cast(ubyte*) malloc(1024))[0..1024]` to get a slice of memory 
backed by C malloc. Very useful.)


The only magic happens when increasing the size of the array, via 
appending or setting length, which usually allocates a new array 
from the GC heap, except when D determines that it can get away 
with not doing so (i.e. when the data points somewhere in a GC 
heap and there's no data in-use after the end of the array. 
capacity also looks at GC metadata).


Re: RCString

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 05:03 PM, Timon Gehr wrote:

The small string optimization also works for GC-allocated strings. Why
do I always want to use RCString instead of the corresponding GCString?
(Also, the same approach can be applied to other arrays with value
semantics.)


Point taken, thanks. Mine was that you can't (reasonably) use the SSO if 
you commit to represent strings as bare slices. -- Andrei


Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 05:30 PM, Jack Stouffer wrote:

On Wednesday, 1 June 2016 at 19:52:01 UTC, Andrei Alexandrescu wrote:

foreach (dchar x; a) {}
The latter two do autodecoding, not coversion as the rest of the
language.


This seems to be a miscommunication with semantics. This is not
auto-decoding at all; you're decoding, but there is nothing "auto" about
it. This code is an explicit choice by the programmer to do something.


No, this is autodecoding pure and simple. We can't move the goals 
whenever we don't like where the ball gets. The usual language rules are 
not applied for strings - they are autodecoded (i.e. there's code 
generated that magically decodes UTF surprisingly for beginners, in 
apparent violation of the language rules, and without any user-visible 
request) by the foreach statement. -- Andrei




Re: The Case Against Autodecode

2016-06-01 Thread ZombineDev via Digitalmars-d
On Wednesday, 1 June 2016 at 19:52:01 UTC, Andrei Alexandrescu 
wrote:

On 06/01/2016 03:07 PM, ZombineDev wrote:
This is not autodecoding. There is nothing auto-magic w.r.t. 
strings in

plain foreach.


I understand where you're coming from, but it actually is 
autodecoding. Consider:


byte[] a;
foreach (byte x; a) {}
foreach (short x; a) {}
foreach (int x; a) {}

That works by means of a conversion short->int. However:

char[] a;
foreach (char x; a) {}
foreach (wchar x; a) {}
foreach (dchar x; a) {}

The latter two do autodecoding, not coversion as the rest of 
the language.



Andrei


Regardless of how different people may call it, it's not what 
this thread is about. Deprecating front, popFront and empty for 
narrow strings is what we are talking about here. This has little 
to do with explicit string transcoding in foreach. I don't think 
anyone has a problem with it, because it is **opt-in** and easy 
to change to get the desired behavior.
On the other hand, trying to prevent Phobos from autodecoding 
without typesystem defeating hacks like .representation is an 
uphill battle right now.


Removing range autodecoding will also be beneficial for library 
writers. For example, instead of writing find specializations for 
char, wchar and dchar needles, it would be much more productive 
to focus on optimising searching for T in T[] and specializing on 
element size and other type properties that generic code should 
care about. Having to specialize for all the char and string 
types instead of just any types of that size that can be compared 
bitwise is like programming in a language with no support for 
generic programing.


And like many others have pointed out, it also about correctness. 
Only the users can decide if searching at code unit, code point 
or grapheme level (or something else) is right for their needs. A 
library that pretends that a single interpretation (i.e. code 
point) is right for every case is a false friend.




Re: The Case Against Autodecode

2016-06-01 Thread Andrei Alexandrescu via Digitalmars-d

On 06/01/2016 06:09 PM, ZombineDev wrote:

Regardless of how different people may call it, it's not what this
thread is about.


Yes, definitely - but then again we can't after each invalidated claim 
to go "yeah well but that other point stands".



Deprecating front, popFront and empty for narrow
strings is what we are talking about here.


That will not happen. Walter and I consider the cost excessive and the 
benefit too small.



This has little to do with
explicit string transcoding in foreach.


It is implicit, not explicit.


I don't think anyone has a
problem with it, because it is **opt-in** and easy to change to get the
desired behavior.


It's not opt-in. There is no way to tell foreach "iterate this array by 
converting char to dchar by the usual language rules, no autodecoding". 
You can if you e.g. use uint for the iteration variable. Same deal as 
with .representation.



On the other hand, trying to prevent Phobos from autodecoding without
typesystem defeating hacks like .representation is an uphill battle
right now.


Characterizing .representation as a typesystem defeating hack is a 
stretch. What memory safety issues is it introducing?



Andrei



Re: Dealing with Autodecode

2016-06-01 Thread Seb via Digitalmars-d

On Wednesday, 1 June 2016 at 11:42:06 UTC, Seb wrote:
On Wednesday, 1 June 2016 at 02:39:55 UTC, Nick Sabalausky 
wrote:

On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:


version(string_migration)
deprecated void popFront(T)(ref T t) if(isSomeString!T) {
   static assert(0, "this is crap, fix your code.");
}
else
deprecated("use -versionstring_migration to fix your buggy 
code, would

you like to know more?")
/* existing popFront here */



I vote we use Adam's exact verbiage, too! :)



D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY

WITH A SIMPLE MIGRATION PATH



Yes. This. If I wanted an endless bucket of baggage, I'd have 
stuck with C++.


3) A wee bit longer, we exterminate all this autodecoding 
crap and enjoy

Phobos being a smaller, more efficient library.



Yay! Profit!



How about a poll?

http://www.polljunkie.com/poll/ftmibx/remove-auto-decoding-in-d

Results are shown after casting a vote or here:
http://www.polljunkie.com/poll/aqzbwg/remove-auto-decoding-in-d/view


Just FYI after a short period of ten hours we got the following 
45 responses:


Yes, with fire! (hobby user)  
77% (35)
Yeah remove that special behavior (professional user) 
35% (16)
Wait that is what auto decoding is? wah ugh...
8%  (4)
I don't always decode codeunits, but when I do I use byDChar 
already  6%  (3)


Re: Transient ranges

2016-06-01 Thread Steven Schveighoffer via Digitalmars-d

On 6/1/16 8:49 AM, Joseph Rushton Wakeling wrote:

On Tuesday, 31 May 2016 at 18:31:05 UTC, Steven Schveighoffer wrote:

On 5/31/16 11:45 AM, Jonathan M Davis via Digitalmars-d wrote:

On Monday, May 30, 2016 09:57:29 H. S. Teoh via Digitalmars-d wrote:

I'd argue that range-based generic code that assumes non-transience is
inherently buggy, because generic code ought not to make any
assumptions beyond what the range API guarantees. Currently, the range
API does not guarantee non-transience, therefore code that assumes
so is
broken by definition.  Just because they happen to work most of the
time
does not change the fact that they're written wrongly.


Technically, the range API doesn't even require that front return the
same
value every time that it's called, because isInputRange can't
possibly test
for it.


The API doesn't require it mechanically, but the API does require it
semantically (what does popFront mean if front changes
automatically?). If front returns different things, I'd say that's a
bug in your range construction.


The `Generator` range is an eager violator of this requirement:
https://github.com/dlang/phobos/blob/ca292ff78cd825f642eb58d586e2723ba14ae448/std/range/package.d#L3075-L3080


 although I'd agree that's an implementation error.


Yeah, that seems like a bug.

-Steve


Re: Transient ranges

2016-06-01 Thread Steven Schveighoffer via Digitalmars-d

On 6/1/16 10:05 AM, Patrick Schluter wrote:

On Tuesday, 31 May 2016 at 12:42:23 UTC, Steven Schveighoffer wrote:


There are 2 main issues with FILE *:

1) it does not provide buffer access, so you must rely on things like
getline if they exist. But these have their own problems (i.e. do not
support unicode, require C-malloc'd buffer)


You can cheat by using setvbuf() and imposing your own buffer to the
FILE* routine. What and how the underlying implementation put in that
buffer is of course not documented but not very difficult to guess (for
example fseek()/fread() will always fill the buffer from 0 to the end
(of buffer or file depending what come first).


But there is no mechanism to determine where the current file pointer is 
inside the buffer. One could use ftell vs. tell on the file descriptor, 
but that is going to perform quite poorly.


-Steve


Re: Free the DMD backend

2016-06-01 Thread Matthias Klumpp via Digitalmars-d

On Wednesday, 1 June 2016 at 01:26:53 UTC, Eugene Wissner wrote:

On Tuesday, 31 May 2016 at 20:12:33 UTC, Russel Winder wrote:
On Tue, 2016-05-31 at 10:09 +, Atila Neves via 
Digitalmars-d wrote:

 […]

No, no, no, no. We had LDC be the default already on Arch 
Linux for a while and it was a royal pain. I want to choose 
to use LDC when and if I need performance. Otherwise, I want 
my projects to compile as fast possible and be able to use 
all the shiny new features.


So write a new backend for DMD the licence of which allows DMD 
to be in Debian and Fedora.


LDC shouldn't be the default compiler to be included in Debian 
or Fedora. Reference compiler and the default D compiler in a 
particular distribution are two independent things.


Exactly. But since we can legally distribute DMD in e.g. Debian, 
and DMD is the reference compiler, we will build software in 
Debian with a compiler that upstream might not have tested.
Additionally, new people usually try out a language with the 
default compiler found in their Linux distribution, and there is 
a chance that the reference compiler and default free compiler 
differ, which is just additional pain and plain weird in the 
Linux world.


E.g. think of Python. Everyone uses and tests with CPython, 
although there are other interpreters available. If CPython would 
be non-free, distros would need to compile with a free compiler, 
e.g. PyPy, which is potentially not feature complete, leading to 
a split in the Python ecosystem between what the reference 
compiler (CPython) does, and what people actually use in Linux 
distributions (PyPy). Those compilers might use different 
language versions, or have a different standard library or 
runtime, making the issue worse.
Fortunately, CPython is completely free, so we don't really have 
that issue ;-)




Re: Free the DMD backend

2016-06-01 Thread Brad Anderson via Digitalmars-d

On Tuesday, 31 May 2016 at 20:18:34 UTC, default0 wrote:
I have no idea how licensing would work in that regard but 
considering that DMDs backend is actively maintained and may 
eventually even be ported to D, wouldn't it at some point 
differ enough from Symantecs "original" backend to simply call 
the DMD backend its own thing?


Or are all the changes to the DMD backend simply changes to 
Symantecs backend period?


Then again even if that'd legally be fine after some point, 
someone would have to make the judgement call and that seems 
like a potentially large legal risk, so I guess even if it'd 
work that way it would be an unrealistic step.


Copyright law's answer to the Ship of Theseus paradox is that 
it's the same ship (i.e. derivative works are still covered under 
the original copyright).


Re: Dealing with Autodecode

2016-06-01 Thread Adam D. Ruppe via Digitalmars-d

On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:

PRs please!


https://github.com/dlang/phobos/pull/4384

You'll notice it is closed.

Now, that one wasn't meant to be merged anyway, but Andrei seems 
to have zero interest in actually accepting the change. That 
doesn't encourage further work.


Re: Dealing with Autodecode

2016-06-01 Thread Walter Bright via Digitalmars-d

On 6/1/2016 8:51 PM, Adam D. Ruppe wrote:

On Wednesday, 1 June 2016 at 02:36:15 UTC, Walter Bright wrote:

PRs please!


https://github.com/dlang/phobos/pull/4384

You'll notice it is closed.

Now, that one wasn't meant to be merged anyway, but Andrei seems to have zero
interest in actually accepting the change. That doesn't encourage further work.


Andrei is in favor of fixing Phobos so it does not depend on autodecode. He is, 
however, rightfully concerned about the extent of breakage that would happen if 
autocode were removed. So am I.


Interestingly, when I tried to remove autodecoding from path/file code a couple 
years ago, I received quite a bit of resistance. It seems there's been a 
tectonic shift in opinion on autodecode.


What I'd like to see, that we all agree on, is progress in removing autodecode 
reliance from Phobos. Let's see what it takes.


Re: [OT] Effect of UTF-8 on 2G connections

2016-06-01 Thread Joakim via Digitalmars-d

On Wednesday, 1 June 2016 at 18:30:25 UTC, Wyatt wrote:

On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:

On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
It's not hard.  I think a lot of us remember when a 14.4 
modem was cutting-edge.


Well, then apparently you're unaware of how bloated web pages 
are nowadays.  It used to take me minutes to download popular 
web pages _back then_ at _top speed_, and those pages were a 
_lot_ smaller.


It's telling that you think the encoding of the text is 
anything but the tiniest fraction of the problem.  You should 
look at where the actual weight of a "modern" web page comes 
from.


I'm well aware that text is a small part of it.  My point is that 
they're not downloading those web pages, they're using mobile 
instead, as I explicitly said in a prior post.  My only point in 
mentioning the web bloat to you is that _your perception_ is off 
because you seem to think they're downloading _current_ web pages 
over 2G connections, and comparing it to your downloads of _past_ 
web pages with modems.  Not only did it take minutes for us back 
then, it takes _even longer_ now.


I know the text encoding won't help much with that.  Where it 
will help is the mobile apps they're actually using, not the 
bloated websites they don't use.



Codepages and incompatible encodings were terrible then, too.

Never again.


This only shows you probably don't know the difference between 
an encoding and a code page,


"I suggested a single-byte encoding for most languages, with 
double-byte for the ones which wouldn't fit in a byte. Use some 
kind of header or other metadata to combine strings of 
different languages, _rather than encoding the language into 
every character!_"


Yeah, that?  That's codepages.  And your exact proposal to put 
encodings in the header was ALSO tried around the time that 
Unicode was getting hashed out.  It sucked.  A lot.  (Not as 
bad as storing it in the directory metadata, though.)


You know what's also codepages?  Unicode.  The UCS is a 
standardized set of code pages for each language, often merely 
picking the most popular code page at that time.


I don't doubt that nothing I'm saying hasn't been tried in some 
form before.  The question is whether that alternate form would 
be better if designed and implemented properly, not if a botched 
design/implementation has ever been attempted.


Well, when you _like_ a ludicrous encoding like UTF-8, not 
sure your opinion matters.


It _is_ kind of ludicrous, isn't it?  But it really is the 
least-bad option for the most text.  Sorry, bub.


I think we can do a lot better.


Maybe.  But no one's done it yet.


That's what people said about mobile devices for a long time, 
until about a decade ago.  It's time we got this right.


The vast majority of software is written for _one_ language, 
the local one.  You may think otherwise because the software 
that sells the most and makes the most money is 
internationalized software like Windows or iOS, because it can 
be resold into many markets.  But as a percentage of lines of 
code written, such international code is almost nothing.


I'm surprised you think this even matters after talking about 
web pages.  The browser is your most common string processing 
situation.  Nothing else even comes close.


No, it's certainly popular software, but at the scale we're 
talking about, ie all string processing in all software, it's 
fairly small.  And the vast majority of webapps that handle 
strings passed from a browser are written to only handle one 
language, the local one.


largely ignoring the possibilities of the header scheme I 
suggested.


"Possibilities" that were considered and discarded decades ago 
by people with way better credentials.  The era of single-byte 
encodings is gone, it won't come back, and good riddance to bad 
rubbish.


Lol, credentials. :D If you think that matters at all in the face 
of the blatant stupidity embodied by UTF-8, I don't know what to 
tell you.


I could call that "trolling" by all of you, :) but I'll 
instead call it what it likely is, reactionary thinking, and 
move on.


It's not trolling to call you out for clearly not doing your 
homework.


That's funny, because it's precisely you and others who haven't 
done your homework.  So are you all trolling me?  By your 
definition of trolling, which btw is not the standard one, _you_ 
are the one doing it.



I don't think you understand: _you_ are the special case.


Oh, I understand perfectly.  _We_ (whoever "we" are) can handle 
any sequence of glyphs and combining characters 
(correctly-formed or not) in any language at any time, so we're 
the special case...?


And you're doing so by mostly using a single-byte encoding for 
_your own_ Euro-centric languages, ie ASCII, while imposing 
unnecessary double-byte and triple-byte encodings on everyone 
else, despite their outnumbering you 10 to 1.  That is the very 
definition of a special case.



Yeah, i

Re: Dealing with Autodecode

2016-06-01 Thread poliklosio via Digitalmars-d

On Thursday, 2 June 2016 at 00:14:30 UTC, Seb wrote:

Just FYI after a short period of ten hours we got the following 
45 responses:


Yes, with fire! (hobby user)
77% (35)
Yeah remove that special behavior (professional user)
35% (16)
Wait that is what auto decoding is? wah ugh...
8%  (4)
I don't always decode codeunits, but when I do I use byDChar 
already  6%  (3)


You failed to mention that there were additional answers:

Auto-decoding is great!
0% (0)
No, please don't break my code.
0% (0)

I think those zeroes are actually the most important part of the 
results. :)