String created from buffer has wrong length and strip() result is incorrect

2014-10-16 Thread Lucas Burson via Digitalmars-d-learn
When creating a string from a ubyte[], I have an invalid length 
and string.strip() doesn't strip off all whitespace. I'm new to 
the language. Is this a compiler issue?



import std.string : strip;
import std.stdio  : writefln;

int main()
{
   const string ATA_STR = " ATA ";

   // this works fine
   {
  ubyte[] buffer = [' ', 'A', 'T', 'A', ' ' ];
  string test = strip(cast(string)(buffer));
  assert(test == strip(ATA_STR));
   }

   // This is where things breaks
   {
  ubyte[] buff = new ubyte[16];
  buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

  // read the string back from the buffer, stripping 
whitespace

  string stringFromBuffer = strip(cast(string)(buff[0..16]));
  // this shows strip() doesn't remove all whitespace
  writefln("StrFromBuff is '%s'; length %d", 
stringFromBuffer, stringFromBuffer.length);


  // !! FAILS. stringFromBuffer is length 15, not 3.
  assert(stringFromBuffer.length == strip(ATA_STR).length);

   }

   return 0;
}


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread thedeemon via Digitalmars-d-learn

On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:


   // This is where things breaks
   {
  ubyte[] buff = new ubyte[16];
  buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

  // read the string back from the buffer, stripping 
whitespace
  string stringFromBuffer = 
strip(cast(string)(buff[0..16]));

  // this shows strip() doesn't remove all whitespace
  writefln("StrFromBuff is '%s'; length %d", 
stringFromBuffer, stringFromBuffer.length);


  // !! FAILS. stringFromBuffer is length 15, not 3.
  assert(stringFromBuffer.length == strip(ATA_STR).length);


Unlike C, strings in D are not zero-terminated by default, they 
are just arrays, i.e. a pair of pointer and size. You create an 
array of 16 bytes and cast it to string, now you have a 16-chars 
string. You fill first few chars with data from ATA_STR but the 
rest 10 bytes of the array are still part of the string, not 
initialized with data, so having zeroes. Since this tail of 
zeroes is not whitespace (tabs or spaces etc.) 'strip' doesn't 
remove it.




Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread thedeemon via Digitalmars-d-learn

You fill first few chars with data from
ATA_STR but the rest 10 bytes of the array are still part of 
the string


Edit: you fill first 5 chars and have 11 bytes of zeroes in the 
tail. My counting skill is too bad. ;)


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread spir via Digitalmars-d-learn

On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote:

On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:


   // This is where things breaks
   {
  ubyte[] buff = new ubyte[16];
  buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

  // read the string back from the buffer, stripping whitespace
  string stringFromBuffer = strip(cast(string)(buff[0..16]));
  // this shows strip() doesn't remove all whitespace
  writefln("StrFromBuff is '%s'; length %d", stringFromBuffer,
stringFromBuffer.length);

  // !! FAILS. stringFromBuffer is length 15, not 3.
  assert(stringFromBuffer.length == strip(ATA_STR).length);


Unlike C, strings in D are not zero-terminated by default, they are just arrays,
i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to
string, now you have a 16-chars string. You fill first few chars with data from
ATA_STR but the rest 10 bytes of the array are still part of the string, not
initialized with data, so having zeroes. Since this tail of zeroes is not
whitespace (tabs or spaces etc.) 'strip' doesn't remove it.


Side-note: since your string has those zeroes at the end, strip only removes the 
space at start (thus, final size=15), instead of at both ends.


d



Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread Lucas Burson via Digitalmars-d-learn
On Friday, 17 October 2014 at 08:31:04 UTC, spir via 
Digitalmars-d-learn wrote:

On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote:

On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:


  // This is where things breaks
  {
 ubyte[] buff = new ubyte[16];
 buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

 // read the string back from the buffer, stripping 
whitespace
 string stringFromBuffer = 
strip(cast(string)(buff[0..16]));

 // this shows strip() doesn't remove all whitespace
 writefln("StrFromBuff is '%s'; length %d", 
stringFromBuffer,

stringFromBuffer.length);

 // !! FAILS. stringFromBuffer is length 15, not 3.
 assert(stringFromBuffer.length == strip(ATA_STR).length);


Unlike C, strings in D are not zero-terminated by default, 
they are just arrays,
i.e. a pair of pointer and size. You create an array of 16 
bytes and cast it to
string, now you have a 16-chars string. You fill first few 
chars with data from
ATA_STR but the rest 10 bytes of the array are still part of 
the string, not
initialized with data, so having zeroes. Since this tail of 
zeroes is not

whitespace (tabs or spaces etc.) 'strip' doesn't remove it.


Side-note: since your string has those zeroes at the end, strip 
only removes the space at start (thus, final size=15), instead 
of at both ends.


d


Okay things are becoming more clear. The cast to string is 
nothing like the C++ string ctor, I made a bad assumption.


So given the below buffer would I use fromStringz (is this in the 
stdlib?) to cast it from a null-terminated buffer to a good 
string? Shouldn't the compiler give a warning about casting a 
buffer to a string without using fromStringz?


Buffer = [ 0x20, 0x41, 0x54, 0x41, 0x20, 0x00, 0x00, ...]?


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread ketmar via Digitalmars-d-learn
On Fri, 17 Oct 2014 15:24:21 +
Lucas Burson via Digitalmars-d-learn
 wrote:

> So given the below buffer would I use fromStringz (is this in the 
> stdlib?) to cast it from a null-terminated buffer to a good 
> string? Shouldn't the compiler give a warning about casting a 
> buffer to a string without using fromStringz?
if you are really-really sure that your buffer is null-terminated, you
can use this trick:

  import std.conv;
  string s = to!string(cast(char*)buff.ptr);

please note, that this is NOT SAFE. you'd better doublecheck that your
buffer is not empty and is null-terminated.


signature.asc
Description: PGP signature


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread ketmar via Digitalmars-d-learn
On Fri, 17 Oct 2014 18:30:43 +0300
ketmar via Digitalmars-d-learn 
wrote:

> > Shouldn't the compiler give a warning about casting a 
> > buffer to a string without using fromStringz?
nope. such casting is perfectly legal, as D strings can contain
embedded '\0's.


signature.asc
Description: PGP signature


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread Lucas Burson via Digitalmars-d-learn
On Friday, 17 October 2014 at 15:30:52 UTC, ketmar via 
Digitalmars-d-learn wrote:

On Fri, 17 Oct 2014 15:24:21 +
Lucas Burson via Digitalmars-d-learn
 wrote:

So given the below buffer would I use fromStringz (is this in 
the stdlib?) to cast it from a null-terminated buffer to a 
good string? Shouldn't the compiler give a warning about 
casting a buffer to a string without using fromStringz?
if you are really-really sure that your buffer is 
null-terminated, you

can use this trick:

  import std.conv;
  string s = to!string(cast(char*)buff.ptr);

please note, that this is NOT SAFE. you'd better doublecheck 
that your

buffer is not empty and is null-terminated.


The buffer is populated from a scsi ioctl so it "should" be only 
ascii and null-terminated but it's a good idea to harden the code 
a bit.

Thank you for your help!


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread ketmar via Digitalmars-d-learn
On Fri, 17 Oct 2014 16:08:04 +
Lucas Burson via Digitalmars-d-learn
 wrote:

> The buffer is populated from a scsi ioctl so it "should" be only 
> ascii and null-terminated but it's a good idea to harden the code 
> a bit.
> Thank you for your help!
i developed a habit of making such buffers one byte bigger than
necessary and just setting the last byte to 0 before converting. this
way it's guaranteed to be 0-terminated.


signature.asc
Description: PGP signature


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread Lucas Burson via Digitalmars-d-learn
On Friday, 17 October 2014 at 17:40:09 UTC, ketmar via 
Digitalmars-d-learn wrote:



i developed a habit of making such buffers one byte bigger than
necessary and just setting the last byte to 0 before 
converting. this

way it's guaranteed to be 0-terminated.


Perfect, great idea. Below is my utility method to pull strings 
out of a buffer.



/**
 * Get a string from buffer where the string spans [offset_start, 
offset_end).

 * Params:
 *buffer = Buffer with an ASCII string to obtain.
 *offset_start = Beginning byte offset within the buffer 
where the string starts.
 *offset_end = Ending byte offset which is not included in 
the string.

 */
string bufferGetString(ubyte[] buffer, ulong offset_start, ulong 
offset_end)

in
{
   assert(buffer != null);
   assert(offset_start < offset_end);
   assert(offset_end <= buffer.length);
}
body
{
   ulong bufflen = offset_end - offset_start;

   // add one to the lenth for null-termination
   ubyte[] temp = new ubyte[bufflen+1];
   temp[0..bufflen] = buffer[offset_start..offset_end];
   temp[bufflen] = '\0';

   return strip(to!string(cast(const char*) temp.ptr));
}

unittest
{
   ubyte[] no_null = [' ', 'A', 'B', 'C', ' '];
   assert("ABC" == bufferGetString(no_null, 0, no_null.length));
   assert("ABC" == bufferGetString(no_null, 1, no_null.length-1));
   assert("A" == bufferGetString(no_null, 1, 2));
}


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread ketmar via Digitalmars-d-learn
On Sat, 18 Oct 2014 00:32:09 +
Lucas Burson via Digitalmars-d-learn
 wrote:

> On Friday, 17 October 2014 at 17:40:09 UTC, ketmar via 
> Digitalmars-d-learn wrote:
> 
> > i developed a habit of making such buffers one byte bigger than
> > necessary and just setting the last byte to 0 before 
> > converting. this
> > way it's guaranteed to be 0-terminated.
> 
> Perfect, great idea. Below is my utility method to pull strings 
> out of a buffer.
> 
> 
> /**
>   * Get a string from buffer where the string spans [offset_start, 
> offset_end).
>   * Params:
>   *buffer = Buffer with an ASCII string to obtain.
>   *offset_start = Beginning byte offset within the buffer 
> where the string starts.
>   *offset_end = Ending byte offset which is not included in 
> the string.
>   */
> string bufferGetString(ubyte[] buffer, ulong offset_start, ulong 
> offset_end)
> in
> {
> assert(buffer != null);
> assert(offset_start < offset_end);
> assert(offset_end <= buffer.length);
> }
> body
> {
> ulong bufflen = offset_end - offset_start;
> 
> // add one to the lenth for null-termination
> ubyte[] temp = new ubyte[bufflen+1];
> temp[0..bufflen] = buffer[offset_start..offset_end];
> temp[bufflen] = '\0';
> 
> return strip(to!string(cast(const char*) temp.ptr));
> }
> 
> unittest
> {
> ubyte[] no_null = [' ', 'A', 'B', 'C', ' '];
> assert("ABC" == bufferGetString(no_null, 0, no_null.length));
> assert("ABC" == bufferGetString(no_null, 1, no_null.length-1));
> assert("A" == bufferGetString(no_null, 1, 2));
> }

note that you can make your code slightly simplier (and more correct):

  size_t bufflen = offset_end-offset_start;

  // add one to the lenth for null-termination
  auto temp = new ubyte[bufflen+1]; // compiler knows the type ;-)
  temp[0..$-1] = buffer[offset_start..offset_end];
  // this is not necessary, as 'temp' is initialized with zeroes
  //temp[$-1] = '\0';

   return strip(to!string(cast(const char*) temp.ptr));

also note that this allocates like crazy. ;-) this can be tolerable,
but good to remember anyway.

besides, slices rocks, so you can just pass a slice there. so:

  string bufferGetString (const(ubyte)[] buffer) {
import std.conv : to;
import std.string : strip;
if (buffer.length == 0) return null; // or ""
if (buffer[$-1] == 0) return to!string(cast(char*)buffer.ptr).strip;
auto temp = new ubyte[](buffer.length+1);
temp[0..$-1] = buffer[];
return to!string(cast(char*)temp.ptr).strip;
  }

  unittest {
ubyte[] no_null = [' ', 'A', 'B', 'C', ' '];
immutable ubyte[] no_nullI = [' ', 'A', 'B', 'C', ' '];
assert("ABC" == bufferGetString(no_null[0..$]));
assert("ABC" == bufferGetString(no_null[1..$-1]));
// look, we can use const/immutable buffers too!
assert("A" == bufferGetString(no_nullI[1..2]));
  }

slices are cheap, and you'll get range checking at the call site.


signature.asc
Description: PGP signature


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-17 Thread ketmar via Digitalmars-d-learn
On Sat, 18 Oct 2014 00:32:09 +
Lucas Burson via Digitalmars-d-learn
 wrote:

p.s. it's ok to take '.length' from 'null' array. compiler is smart
enough.


signature.asc
Description: PGP signature


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-18 Thread Lucas Burson via Digitalmars-d-learn
On Saturday, 18 October 2014 at 00:53:57 UTC, ketmar via 
Digitalmars-d-learn wrote:

On Sat, 18 Oct 2014 00:32:09 +
Lucas Burson via Digitalmars-d-learn
 wrote:




Wow, your changes made it much simpler. Thank you for the 
suggestions and expertise ketmar :)


Re: String created from buffer has wrong length and strip() result is incorrect

2014-10-18 Thread ketmar via Digitalmars-d-learn
On Sat, 18 Oct 2014 16:56:09 +
Lucas Burson via Digitalmars-d-learn
 wrote:

> Wow, your changes made it much simpler. Thank you for the 
> suggestions and expertise ketmar :)
you're welcome.


signature.asc
Description: PGP signature