Hi David,
I forgot to make a comment when you originally posted the idea, and I think
this would be a great time to add my 2 cents.

Regarding the structure:
* Would it not be better to add the flags bit field immediately after the
isa pointer? My thought here is that it can be checked for if different
versions of the structure exist. This is important for CoreBase since it
does not have the luxury of real classes.
* Would it be possible to make the hash variable a NSUInterger? The output
of -hash is an NSUInterger, and that would allow the value to be expanded
in the future.
* Why have both count and length? Would it not make more sense to keep a
single variable here called count and define it as, "The count/number of
code units"? For ASCII and UTF-8 this would be # of bytes, and for UTF-16
it would be the # of 16-bit codes. The Apple documentation states "The
number of UTF-16 code units in the receiver", making at least the ASCII and
UTF-16 numbers correct. The way I understand the current implementation,
the value for length would return the UTF-32 # of characters, which is
inconsistent with the docs.
* I would also think that it makes more sense to have the length/count
variable before the data pointer. I don't have a strong opinion about this
one, but it just makes more sense in my head.

Regarding the hash function:
Why are we using Murmur3 hash? I know it is significantly more efficient
than our current one-at-a-time approach, but how much better is it to
competing hash functions? Is there a bench mark out there comparing some of
the major ones? For example, how does it compare with lookup3 or
SpookyHash. If we are storing the hash in the string structure, the speed
of calculating the hash is not as important as the spread. Additionally,
Murmur3 seems ill suited if NSUInteger is used to store the hash value
since, as far as I could tell, it only outputs 32-bit and 128-bit hashes.
Lookup3 and SpookyHash, for example, output 64-bit values (2 32-bit words
in the case of lookup3), as well.

I'm late for work, so I have to wrap up.

Stefan

On Thu, Apr 5, 2018 at 11:24 AM, David Chisnall <gnus...@theravensnest.org>
wrote:

> On 1 Apr 2018, at 14:06, Richard Frith-Macdonald <richard.frith-macdonald@
> theengagehub.com> wrote:
> >
> >
> > I wasn't aware of that ... it would make sense for your new ABI, when
> individual bits, to have them specified as particular bits rather than as a
> bitfield, avoiding the possibility of problems with different compilers.
> >
> > I don't think you should feel constrained to follow the current layout
> ... IMO the current one is good for years yet but probably not for decades.
> > However, I do think that it's more sensible to have pointer, count,
> hash, and flags similar to the current GNUstep layout than to follow Apple
> (and to bear in mind that its convenient for mutable strings to share a
> layout with constant ones).
>
> How about this:
>
> struct {
>         // Class pointer
>         id isa;
>         // Pointer to the buffer.  ro_data section, so immutable.
> NULL-terminated
>         const char *data;
>         // Number of characters, not including the null terminator
>         long count;
>         // Number of bytes in the encoding, not including the null
> terminator.
>         long length;
>         // Murmur 3 hash
>         uint32_t hash
>         // Flags bitfield:
>         // Low 2 bits, enum with values:
>         //   0: ASCII string
>         //   1: UTF-8 but not ASCII string
>         //   2: UTF-16 string
>         //   3: Reserved for future encodings
>         // (1<<2): has mumur3 hash
>         // (1<<3) to (1<<15): Reserved for future compiler-defined flags
>         // (1<<16) to (1<<31): Reserved for use by the constant string
> class
> }
>
> I think that this should give everything that we need, plus room for easy
> future expansion.
>
> David
>
>
> _______________________________________________
> Gnustep-dev mailing list
> Gnustep-dev@gnu.org
> https://lists.gnu.org/mailman/listinfo/gnustep-dev
>
_______________________________________________
Gnustep-dev mailing list
Gnustep-dev@gnu.org
https://lists.gnu.org/mailman/listinfo/gnustep-dev

Reply via email to