On 1 Apr 2018, at 11:36, Fred Kiefer <fredkie...@gmx.de> wrote: > > Wouldn’t the most useful structure be the one we already use for GSString?
That’s certainly a good starting point! > > @interface GSString : NSString > { > @public > GSCharPtr _contents; > unsigned int _count; Is this the number of bytes or the number of characters? I imagine that both are useful. > struct { > unsigned int wide: 1; // 16-bit characters in string? > unsigned int owned: 1; // Set if the instance owns the > // _contents buffer Owned is presumably redundant for constant strings. > unsigned int unused: 2; > unsigned int hash: 28; > } _flags; > } > @end > > Of course constant strings won’t require the hidden reference count that > come with all ObjC objects. But apart from that it seems to be a more useful > structure. Storing the length with the string should speed up some common > operations and 28 bit of hash should still be enough. There are even two > unused bits in the flags that could encode the specific hash function. I’d like to have more than 2 bits spare for future expansion. The current NXConstantString structure is now 30 years old, and I think there have been several times in the past when it would have been nice to add other things to it if we’d had a good way of maintaining compatibility. This structure does have the advantage that it doesn’t need padding on any 32- or 64-bit architectures. Do we have any measurements to tell us that 28 bits is enough for the hash? The -hash method returns an NSUInteger, which is 64 bits on most platforms, so we’re not using much of the available range. At some point, I’d like to move the hash implementation for NSString to MurmurHash3, which should give better distribution and is very fast on modern hardware. I’m also a bit nervous about using C bitfields in static data structures, because their layout is ABI dependent (and on some platforms can change between compiler versions). I’m also tempted to teach the compiler about GSTinyString for 64-bit platforms, though so far that’s not been part of the ABI. That gives us 8 7-bit ASCII strings and a 5-bit length. The hash for them needs computing dynamically, but they fit into a 64-bit pointer directly. David _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev