Thank you Serhat :) On Monday, October 7, 2019 at 10:04:05 AM UTC+2, Serhat Şevki Dinçer wrote: > > Turns out it did not really need uintptr so I switched to unsafe.Pointer. > Also added more tests. > > Thanks Francis.. > > On Friday, October 4, 2019 at 12:38:22 PM UTC+3, fra...@adeven.com wrote: >> >> Serhat, >> >> That implementation looks very tidy. But it still uses uintptr. So it >> doesn't solve the GC problems discussed above. >> >> On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer >> wrote: >>> >>> Hi, >>> >>> I wrote a string utility library <https://github.com/jfcg/sixb> with >>> many tests to ensure the conversion assumptions are correct. It could be >>> helpful. >>> >>> Cheers.. >>> >>> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote: >>>> >>>> I am looking at the correct way to convert from a byte slice to a >>>> string and back with no allocations. All very unsafe. >>>> >>>> I think these two cases are fairly symmetrical. So to simplify the >>>> discussion below I will only talk about converting from a string to []byte. >>>> >>>> func StringToBytes(s string) (b []byte) >>>> >>>> From what I have read it is currently not clear how to perform this >>>> correctly. >>>> >>>> When I say correctly I mean that the function returns a `[]byte` which >>>> contains all of and only the bytes in the string and never confuses the >>>> garbage collector. We fully expect that the `[]byte` returned will contain >>>> the same underlying memory as the string and modifying its contents will >>>> modify the string, making the string dangerously mutable. We are >>>> comfortable with the dangerously mutable string. >>>> >>>> Following the directions in unsafe you _might_ think that this would be >>>> a good solution. >>>> >>>> func StringToBytes(s string) []byte { >>>> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{ >>>> Data: >>>> (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, >>>> Len: len(s), >>>> Cap: len(s), >>>> })) >>>> } >>>> >>>> The line >>>> >>>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, >>>> >>>> here is a really carefully commented example of this approach from >>>> github >>>> >>>> seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about >>>> converting uintptr to and from unsafe.Pointer in a singe expression. >>>> >>>> However, it clearly violates rule 6 which states `...SliceHeader and >>>> StringHeader are only valid when interpreting the content of an actual >>>> slice or string value.`. The `[]byte` we are returning here is built from >>>> a >>>> `&reflect.SliceHeader` and not based on an existing `[]byte`. >>>> >>>> So we can switch to >>>> >>>> func StringToBytes(s string) (b []byte) { >>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>>> sliceHeader.Data = stringHeader.Data >>>> sliceHeader.Len = stringHeader.Len >>>> sliceHeader.Cap = stringHeader.Len >>>> return b >>>> } >>>> >>>> Now we are using an existing []byte to build `sliceHeader` which is >>>> good. But we end up with a new problem. sliceHeader.Data and >>>> stringHeader.Data are both uintptr. So by creating them in one expression >>>> and then writing them in another expression we violate the rule that >>>> `uintptr cannot be stored in variable`. >>>> >>>> There is a possible sense that we are protected because both of our >>>> `uinptr`s are actually real pointers inside a real string and []byte. This >>>> seems to be indicated by the line `In this usage hdr.Data is really an >>>> alternate way to refer to the underlying pointer in the string header, not >>>> a uintptr variable itself.` >>>> >>>> This feels very unclear to me. >>>> >>>> In particular the code example in the unsafe package >>>> >>>> var s string >>>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1 >>>> hdr.Data = uintptr(unsafe.Pointer(p)) // case 6 (this case) >>>> hdr.Len = n >>>> >>>> >>>> is not the same as the case we are dealing with here. Specifically in >>>> the unsafe package documentation we are writing from a uintpr stored in a >>>> separate variable to another uinptr. They are probably very similar in >>>> practice, but it isn't obvious and in my experience subtly incorrect code >>>> often comes from relying on vague understandings of important documents. >>>> >>>> If we assume that our uinptrs are safe because they are backed by real >>>> pointers then there is another issue with our string being garbage >>>> collected. >>>> >>>> func StringToBytes(s string) (b []byte) { >>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>>> // Our string is no longer referenced anywhere and could >>>> potentially be garbage collected >>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>>> sliceHeader.Data = stringHeader.Data >>>> sliceHeader.Len = stringHeader.Len >>>> sliceHeader.Cap = stringHeader.Len >>>> return b >>>> } >>>> >>>> >>>> There is a discussion where this potential problem is raised >>>> >>>> https://github.com/golang/go/issues/25484 >>>> >>>> we also see this issue mentioned in >>>> >>>> >>>> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ >>>> >>>> The solution of >>>> >>>> func StringToBytes(s string) (b []byte) { >>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>>> // Our string is no longer referenced anywhere and could >>>> potentially be garbage collected >>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>>> sliceHeader.Data = stringHeader.Data >>>> sliceHeader.Len = stringHeader.Len >>>> sliceHeader.Cap = stringHeader.Len >>>> runtime.KeepAlive(&s) >>>> return b >>>> } >>>> >>>> >>>> is proposed. This _probably_ works. But a survey of the implementations >>>> of unsafe string/[]byte conversions in Go projects that we depend on at >>>> work (this operation is very common), didn't show a single example of >>>> anyone using the KeepAlive trick. >>>> >>>> In particular the person who initiated the conversation in golang-nuts >>>> where the KeepAlive was suggested has implemented this conversion without >>>> it >>>> >>>> https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28 >>>> >>>> A workaround of writing >>>> >>>> func StringToBytes(s string) (b []byte) { >>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>>> sliceHeader.Data = stringHeader.Data >>>> sliceHeader.Len = len(s) >>>> sliceHeader.Cap = len(s) >>>> // Maybe we managed to hold onto s until here? >>>> return b >>>> } >>>> >>>> >>>> was proposed. I think the reasoning here is that the references to >>>> `len(s)` keep the string alive. I am not totally convinced because I think >>>> the compiler is free to write this as >>>> >>>> func StringToBytes(s string) (b []byte) { >>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>>> sliceHeader.Len = len(s) >>>> sliceHeader.Cap = len(s) >>>> // Compiler has reordered our code, and s might be garbage >>>> collected >>>> sliceHeader.Data = stringHeader.Data >>>> return b >>>> } >>>> >>>> but, maybe this modification can never happen. >>>> >>>> At this point I don't think we have any clear answers about how to >>>> write this code correctly. >>>> >>>> If we look inside the Go codebase we see a few interesting approaches >>>> >>>> going from []byte to string we can just >>>> >>>> return *(*string)(unsafe.Pointer(&b)) >>>> >>>> >>>> This approach is used in >>>> >>>> https://golang.org/src/strings/builder.go#L45 >>>> https://github.com/golang/go/blob/master/src/runtime/string.go#L152 >>>> >>>> So we can see that the Go std-lib isn't using Slice/StringHeader to >>>> perform these conversions, which seems like a shame. >>>> >>>> Looking at how this is done on github reveals a variety of different >>>> approaches and discussion on the topic don't seem to ever be conclusive. >>>> >>>> >>>> >>>> In general it seems that >>>> >>>> func StringToBytes(s string) (b []byte) { >>>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>>> sliceHeader.Data = stringHeader.Data >>>> sliceHeader.Len = len(s) >>>> sliceHeader.Cap = len(s) >>>> return b >>>> } >>>> >>>> >>>> is probably pretty good, and here is a very carefully commented example >>>> of it from github. >>>> >>>> https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62 >>>> >>>> Is there an authoritative correct way to do this? >>>> >>>
-- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/d61622df-8a4f-45ed-90f4-4b1c0700c079%40googlegroups.com.