On 22 November 2015 at 01:46, <ele...@gmail.com> wrote: > > On Sunday, November 22, 2015 at 10:02:03 AM UTC+10, James Gilbert wrote: >> >> The spaces in your string are '\u3000' the ideographic space. >> isspace('\u3000') returns true, and split(s) is supposed to split on all >> space characters, so I think this might be a julia bug. > > Or a documentation bug, the actual default is only the ASCII spaces > https://github.com/JuliaLang/julia/blob/master/base/strings/util.jl#L62
It should probably be pointed out that at least Python3 (but not Python2) gets it "right". > python3 Python 3.4.3+ (default, Oct 14 2015, 16:03:50) [GCC 5.2.1 20151010] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "Time flies like an arrow".split() ['Time', 'flies', 'like', 'an', 'arrow'] I would argue that Unicode is a first class citizen and that Julia should also get this "right". This would require some fairly straightforward, yet not trivial, tinkering and would be an excellent first contribution if someone wants to take a stab at it. Pontus