On Wednesday, May 16, 2018 08:57:10 Dennis via Digitalmars-d-learn wrote: > I thought it wouldn't be hard to crudely split this file using > D's range functions and basic string manipulation, but the > combination of being to large for a string and having invalid > encoding seems to defeat most simple solutions.
D is designed with the idea that a string is valid UTF-8, a wstring is valid UTF-16, and dstring is valid UTF-32. For various reasons, that doesn't always hold true like it should, but pretty much all of Phobos is written with that assumption and will generally throw an exception if it isn't. If you're ever dealing with a different encoding (or with invalid Unicode), you really need to use integral types like ubyte (e.g. by using std.string.representation or by reading the data in as ubytes rather than as a string) and not try to use character types like char or string. If you try to use char or string with invalid UTF-8 without having it throw any exceptions, you're pretty much guaranteed to fail. - Jonathan M Davis