The string header will be 16 bytes on 64bit word size machines.
If most of the words are much shorter, interning won’t buy you
much. For applications where you *know* all the words are short,
and the total string space won’t exceed 4GB, you can try other
alternatives. For instance if the max length of any word is 16 bytes,
create 16 string tables. Then the ‘interned string’ type is a 4 byte
index, with the bottom 4 bits indicating the length as well as which
table to index! A table for N byte words stores one copy of each
unique word and in consecutive slots on N bytes. Given that on modern
machines even though memory can be plentiful, memory access is slow
& computing is fast, this can win. You will also need a possibly unsafe
String func. that doesn’t allocate string data space. You can even allow
an overflow table if longer words are not common. And you can allow
many more unique words by extending the interned string type to 8
bytes.

> On Jan 9, 2022, at 3:07 PM, burak serdar <bser...@computer.org> wrote:
> 
> 
> Note that a with a map[string]string, the code:
> 
> m[s]=s
> 
> The contents of the string s are not duplicated, only the string header s is. 
> 
>> On Sun, Jan 9, 2022 at 3:52 PM jlfo...@berkeley.edu <jlforr...@berkeley.edu> 
>> wrote:
>> I'm aware of Artem Krylysov's idea for string interning published on
>> https://artem.krylysov.com/blog/2018/12/12/string-interning-in-go/
>> 
>> If I understand it correctly, each string is stored twice in a map, once as
>> a key and once as a value. That means that words that only appear once in
>> his example of string interning the words in the novel 1984 are 100% 
>> inefficient
>> in terms of storage. Words that appear twice are neutral. The advantage of 
>> his
>> approach only appears for words that appear three or more times.
>> 
>> I'm wondering if there's a map-like data structure that would store a string
>> as the key, and the address of the key as the value. I'm aware that a 
>> standard
>> Go map can't be used for this because its components might be moved around
>> while a program is running so taking the address of the key would be 
>> dangerous,
>> which is why it isn't allowed.
>> 
>> This came up because I profiled a run of an app I'm working on that
>> processed 12518 strings, of which 9810 appeared 2 or fewer times. Krylysov's
>> approach would only be marginally useful for this app.
>> 
>> The data structure I'm looking for would always be at least neutral, and 
>> would start
>> showing a benefit when a word appears twice.
>> 
>> Does anybody know of such a data structure?
>> 
>> Cordially,
>> Jon Forrest
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/36684584-11eb-4bc0-a436-4906d523c8ban%40googlegroups.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/CAMV2RqodPp7kye6%3D5PH0NHha%2BwQC4mqfotipoutm4aeL1EK0Fw%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/F45BFAB2-315B-479A-A7BA-04BC61657AB7%40iitbombay.org.

Reply via email to