String created from buffer has wrong length and strip() result is incorrect
When creating a string from a ubyte[], I have an invalid length and string.strip() doesn't strip off all whitespace. I'm new to the language. Is this a compiler issue? import std.string : strip; import std.stdio : writefln; int main() { const string ATA_STR = " ATA "; // this works fine { ubyte[] buffer = [' ', 'A', 'T', 'A', ' ' ]; string test = strip(cast(string)(buffer)); assert(test == strip(ATA_STR)); } // This is where things breaks { ubyte[] buff = new ubyte[16]; buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR); // read the string back from the buffer, stripping whitespace string stringFromBuffer = strip(cast(string)(buff[0..16])); // this shows strip() doesn't remove all whitespace writefln("StrFromBuff is '%s'; length %d", stringFromBuffer, stringFromBuffer.length); // !! FAILS. stringFromBuffer is length 15, not 3. assert(stringFromBuffer.length == strip(ATA_STR).length); } return 0; }
Re: String created from buffer has wrong length and strip() result is incorrect
On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote: // This is where things breaks { ubyte[] buff = new ubyte[16]; buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR); // read the string back from the buffer, stripping whitespace string stringFromBuffer = strip(cast(string)(buff[0..16])); // this shows strip() doesn't remove all whitespace writefln("StrFromBuff is '%s'; length %d", stringFromBuffer, stringFromBuffer.length); // !! FAILS. stringFromBuffer is length 15, not 3. assert(stringFromBuffer.length == strip(ATA_STR).length); Unlike C, strings in D are not zero-terminated by default, they are just arrays, i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to string, now you have a 16-chars string. You fill first few chars with data from ATA_STR but the rest 10 bytes of the array are still part of the string, not initialized with data, so having zeroes. Since this tail of zeroes is not whitespace (tabs or spaces etc.) 'strip' doesn't remove it.
Re: String created from buffer has wrong length and strip() result is incorrect
You fill first few chars with data from ATA_STR but the rest 10 bytes of the array are still part of the string Edit: you fill first 5 chars and have 11 bytes of zeroes in the tail. My counting skill is too bad. ;)
Re: String created from buffer has wrong length and strip() result is incorrect
On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote: On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote: // This is where things breaks { ubyte[] buff = new ubyte[16]; buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR); // read the string back from the buffer, stripping whitespace string stringFromBuffer = strip(cast(string)(buff[0..16])); // this shows strip() doesn't remove all whitespace writefln("StrFromBuff is '%s'; length %d", stringFromBuffer, stringFromBuffer.length); // !! FAILS. stringFromBuffer is length 15, not 3. assert(stringFromBuffer.length == strip(ATA_STR).length); Unlike C, strings in D are not zero-terminated by default, they are just arrays, i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to string, now you have a 16-chars string. You fill first few chars with data from ATA_STR but the rest 10 bytes of the array are still part of the string, not initialized with data, so having zeroes. Since this tail of zeroes is not whitespace (tabs or spaces etc.) 'strip' doesn't remove it. Side-note: since your string has those zeroes at the end, strip only removes the space at start (thus, final size=15), instead of at both ends. d
Re: String created from buffer has wrong length and strip() result is incorrect
On Friday, 17 October 2014 at 08:31:04 UTC, spir via Digitalmars-d-learn wrote: On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote: On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote: // This is where things breaks { ubyte[] buff = new ubyte[16]; buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR); // read the string back from the buffer, stripping whitespace string stringFromBuffer = strip(cast(string)(buff[0..16])); // this shows strip() doesn't remove all whitespace writefln("StrFromBuff is '%s'; length %d", stringFromBuffer, stringFromBuffer.length); // !! FAILS. stringFromBuffer is length 15, not 3. assert(stringFromBuffer.length == strip(ATA_STR).length); Unlike C, strings in D are not zero-terminated by default, they are just arrays, i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to string, now you have a 16-chars string. You fill first few chars with data from ATA_STR but the rest 10 bytes of the array are still part of the string, not initialized with data, so having zeroes. Since this tail of zeroes is not whitespace (tabs or spaces etc.) 'strip' doesn't remove it. Side-note: since your string has those zeroes at the end, strip only removes the space at start (thus, final size=15), instead of at both ends. d Okay things are becoming more clear. The cast to string is nothing like the C++ string ctor, I made a bad assumption. So given the below buffer would I use fromStringz (is this in the stdlib?) to cast it from a null-terminated buffer to a good string? Shouldn't the compiler give a warning about casting a buffer to a string without using fromStringz? Buffer = [ 0x20, 0x41, 0x54, 0x41, 0x20, 0x00, 0x00, ...]?
Re: String created from buffer has wrong length and strip() result is incorrect
On Fri, 17 Oct 2014 15:24:21 + Lucas Burson via Digitalmars-d-learn wrote: > So given the below buffer would I use fromStringz (is this in the > stdlib?) to cast it from a null-terminated buffer to a good > string? Shouldn't the compiler give a warning about casting a > buffer to a string without using fromStringz? if you are really-really sure that your buffer is null-terminated, you can use this trick: import std.conv; string s = to!string(cast(char*)buff.ptr); please note, that this is NOT SAFE. you'd better doublecheck that your buffer is not empty and is null-terminated. signature.asc Description: PGP signature
Re: String created from buffer has wrong length and strip() result is incorrect
On Fri, 17 Oct 2014 18:30:43 +0300 ketmar via Digitalmars-d-learn wrote: > > Shouldn't the compiler give a warning about casting a > > buffer to a string without using fromStringz? nope. such casting is perfectly legal, as D strings can contain embedded '\0's. signature.asc Description: PGP signature
Re: String created from buffer has wrong length and strip() result is incorrect
On Friday, 17 October 2014 at 15:30:52 UTC, ketmar via Digitalmars-d-learn wrote: On Fri, 17 Oct 2014 15:24:21 + Lucas Burson via Digitalmars-d-learn wrote: So given the below buffer would I use fromStringz (is this in the stdlib?) to cast it from a null-terminated buffer to a good string? Shouldn't the compiler give a warning about casting a buffer to a string without using fromStringz? if you are really-really sure that your buffer is null-terminated, you can use this trick: import std.conv; string s = to!string(cast(char*)buff.ptr); please note, that this is NOT SAFE. you'd better doublecheck that your buffer is not empty and is null-terminated. The buffer is populated from a scsi ioctl so it "should" be only ascii and null-terminated but it's a good idea to harden the code a bit. Thank you for your help!
Re: String created from buffer has wrong length and strip() result is incorrect
On Fri, 17 Oct 2014 16:08:04 + Lucas Burson via Digitalmars-d-learn wrote: > The buffer is populated from a scsi ioctl so it "should" be only > ascii and null-terminated but it's a good idea to harden the code > a bit. > Thank you for your help! i developed a habit of making such buffers one byte bigger than necessary and just setting the last byte to 0 before converting. this way it's guaranteed to be 0-terminated. signature.asc Description: PGP signature
Re: String created from buffer has wrong length and strip() result is incorrect
On Friday, 17 October 2014 at 17:40:09 UTC, ketmar via Digitalmars-d-learn wrote: i developed a habit of making such buffers one byte bigger than necessary and just setting the last byte to 0 before converting. this way it's guaranteed to be 0-terminated. Perfect, great idea. Below is my utility method to pull strings out of a buffer. /** * Get a string from buffer where the string spans [offset_start, offset_end). * Params: *buffer = Buffer with an ASCII string to obtain. *offset_start = Beginning byte offset within the buffer where the string starts. *offset_end = Ending byte offset which is not included in the string. */ string bufferGetString(ubyte[] buffer, ulong offset_start, ulong offset_end) in { assert(buffer != null); assert(offset_start < offset_end); assert(offset_end <= buffer.length); } body { ulong bufflen = offset_end - offset_start; // add one to the lenth for null-termination ubyte[] temp = new ubyte[bufflen+1]; temp[0..bufflen] = buffer[offset_start..offset_end]; temp[bufflen] = '\0'; return strip(to!string(cast(const char*) temp.ptr)); } unittest { ubyte[] no_null = [' ', 'A', 'B', 'C', ' ']; assert("ABC" == bufferGetString(no_null, 0, no_null.length)); assert("ABC" == bufferGetString(no_null, 1, no_null.length-1)); assert("A" == bufferGetString(no_null, 1, 2)); }
Re: String created from buffer has wrong length and strip() result is incorrect
On Sat, 18 Oct 2014 00:32:09 + Lucas Burson via Digitalmars-d-learn wrote: > On Friday, 17 October 2014 at 17:40:09 UTC, ketmar via > Digitalmars-d-learn wrote: > > > i developed a habit of making such buffers one byte bigger than > > necessary and just setting the last byte to 0 before > > converting. this > > way it's guaranteed to be 0-terminated. > > Perfect, great idea. Below is my utility method to pull strings > out of a buffer. > > > /** > * Get a string from buffer where the string spans [offset_start, > offset_end). > * Params: > *buffer = Buffer with an ASCII string to obtain. > *offset_start = Beginning byte offset within the buffer > where the string starts. > *offset_end = Ending byte offset which is not included in > the string. > */ > string bufferGetString(ubyte[] buffer, ulong offset_start, ulong > offset_end) > in > { > assert(buffer != null); > assert(offset_start < offset_end); > assert(offset_end <= buffer.length); > } > body > { > ulong bufflen = offset_end - offset_start; > > // add one to the lenth for null-termination > ubyte[] temp = new ubyte[bufflen+1]; > temp[0..bufflen] = buffer[offset_start..offset_end]; > temp[bufflen] = '\0'; > > return strip(to!string(cast(const char*) temp.ptr)); > } > > unittest > { > ubyte[] no_null = [' ', 'A', 'B', 'C', ' ']; > assert("ABC" == bufferGetString(no_null, 0, no_null.length)); > assert("ABC" == bufferGetString(no_null, 1, no_null.length-1)); > assert("A" == bufferGetString(no_null, 1, 2)); > } note that you can make your code slightly simplier (and more correct): size_t bufflen = offset_end-offset_start; // add one to the lenth for null-termination auto temp = new ubyte[bufflen+1]; // compiler knows the type ;-) temp[0..$-1] = buffer[offset_start..offset_end]; // this is not necessary, as 'temp' is initialized with zeroes //temp[$-1] = '\0'; return strip(to!string(cast(const char*) temp.ptr)); also note that this allocates like crazy. ;-) this can be tolerable, but good to remember anyway. besides, slices rocks, so you can just pass a slice there. so: string bufferGetString (const(ubyte)[] buffer) { import std.conv : to; import std.string : strip; if (buffer.length == 0) return null; // or "" if (buffer[$-1] == 0) return to!string(cast(char*)buffer.ptr).strip; auto temp = new ubyte[](buffer.length+1); temp[0..$-1] = buffer[]; return to!string(cast(char*)temp.ptr).strip; } unittest { ubyte[] no_null = [' ', 'A', 'B', 'C', ' ']; immutable ubyte[] no_nullI = [' ', 'A', 'B', 'C', ' ']; assert("ABC" == bufferGetString(no_null[0..$])); assert("ABC" == bufferGetString(no_null[1..$-1])); // look, we can use const/immutable buffers too! assert("A" == bufferGetString(no_nullI[1..2])); } slices are cheap, and you'll get range checking at the call site. signature.asc Description: PGP signature
Re: String created from buffer has wrong length and strip() result is incorrect
On Sat, 18 Oct 2014 00:32:09 + Lucas Burson via Digitalmars-d-learn wrote: p.s. it's ok to take '.length' from 'null' array. compiler is smart enough. signature.asc Description: PGP signature
Re: String created from buffer has wrong length and strip() result is incorrect
On Saturday, 18 October 2014 at 00:53:57 UTC, ketmar via Digitalmars-d-learn wrote: On Sat, 18 Oct 2014 00:32:09 + Lucas Burson via Digitalmars-d-learn wrote: Wow, your changes made it much simpler. Thank you for the suggestions and expertise ketmar :)
Re: String created from buffer has wrong length and strip() result is incorrect
On Sat, 18 Oct 2014 16:56:09 + Lucas Burson via Digitalmars-d-learn wrote: > Wow, your changes made it much simpler. Thank you for the > suggestions and expertise ketmar :) you're welcome. signature.asc Description: PGP signature