On 22 November 2015 at 01:46,  <ele...@gmail.com> wrote:
>
> On Sunday, November 22, 2015 at 10:02:03 AM UTC+10, James Gilbert wrote:
>>
>> The spaces in your string are '\u3000' the ideographic space.
>> isspace('\u3000') returns true, and split(s) is supposed to split on all
>> space characters, so I think this might be a julia bug.
>
> Or a documentation bug, the actual default is only the ASCII spaces
> https://github.com/JuliaLang/julia/blob/master/base/strings/util.jl#L62

It should probably be pointed out that at least Python3 (but not
Python2) gets it "right".

    > python3
    Python 3.4.3+ (default, Oct 14 2015, 16:03:50)
    [GCC 5.2.1 20151010] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "Time flies like an arrow".split()
    ['Time', 'flies', 'like', 'an', 'arrow']

I would argue that Unicode is a first class citizen and that Julia
should also get this "right".  This would require some fairly
straightforward, yet not trivial, tinkering and would be an excellent
first contribution if someone wants to take a stab at it.

    Pontus

Reply via email to