Re: SBuf review at r9370

2009-03-05 Thread Kinkie
On Wed, Mar 4, 2009 at 9:45 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 03/04/2009 06:09 AM, Kinkie wrote:
 Who'd be in charge of managing the passed memory block?
 I see two choices:
 - the caller is in charge
   then absorb becomes an alias of assign() and has no reason to exist
 except create confusion
 - absorb frees
   this would make the behaviour more consistent with the eventual
 impementation, and a mismanaged pointer will bomb immediately.

 Do you have a path you'd prefer to see?

 Only absorb frees path makes sense, as you pointed out yourself. This
 is why the proposed method is called absorb :-).

Agreed, and implemented.

 Done. Some resizing may still occur, because:
 1- we're not copying the whole MemBlock, just the portion that interests us
 2- rounding to memory boundaries due to (eventually) MemPools

 Sure, but all those are under-the-hood things that cow() should not
 worry about.

Yup.

 estimateCapacity() which right now statically suggests a 20% increase in 
 sizes

 I expect 50% or 100% increase would work better, but this is pure
 speculation on my part and not a big deal. Let's see what others suggest.

Only measuring it once live will make sense, but then I would expect
more sophisticated heuristics will serve us better.

 MemPools integration will see memAllocationPolicy be replaced by a
 call to Mem::memAllocateString which will choose the right pool and
 feed the new size back

 You may want to add the above plan as a source comment.

Already there. Clarified it a bit more. Also, the mempools code is
still in the source (the whole 1 line of it), just disabled via #if
0/#else/#endif

 But who supplies the file and line values to the constructor if we just
 call throw?

 The same way TextException does:
 throw someException(__FILE__,__LINE__)

 Ah, I see now. Not sure why I missed that before, sorry. BTW, __FILE__
 should be replaced with the new DebugBaseName() or whatever we called
 that thing, I guess. One more reason to enhance Throw() so that it can
 correctly throw custom exceptions.

This is IMO out of scope for this project. Exception code cleanup
should be performed in a specific branch.
If it's fine with everyone, I'd leave as-is for now as it's consistent
with general practice and maybe schedule said cleanup project post-3.2

 Hm.. How about
 - slice
 - shorten
 - chop
 - range
 - cut

 Let's use chop() or, if you prefer, zoom().

chop sounds just right to me. Renamed.

I'd call it a wrap, and move on to review and actual use.

Thanks.


-- 
/kinkie


Re: SBuf review at r9370

2009-03-04 Thread Kinkie
 * Remove isImported. Copy and then free the buffer when importing
 instead. Same motivation as in the isLiteral comment above.

 This too has a IMO a very practical use: it allows us an easy path to
 get into the SBuf world i/o buffer which have been created by the i/o
 layer. If you're sure, I'll remove it.

 I understand that it can be a useful optimization. I am sure we should
 replace that optimization with copying (in absorb()) for now because we
 already have enough bugs to worry about without this special case.
 Special cases like this significantly increase the probability of a bug
 left unnoticed by a reviewer, IMO.

Who'd be in charge of managing the passed memory block?
I see two choices:
- the caller is in charge
  then absorb becomes an alias of assign() and has no reason to exist
except create confusion
- absorb frees
  this would make the behaviour more consistent with the eventual
impementation, and a mismanaged pointer will bomb immediately.

Do you have a path you'd prefer to see?

 * cow() logic feels wrong because grow() does not just grow(). Grow is
 discussed below. The cow() code should be something very simple like:
 This approach is both clean and efficient.

 The copy constructor can optimize, but please do keep it simple.
 My suggestion: reintroduce reAlloc() which gets called by cow() and
 does the low-level work, and have both cow() and grow() perform
 checks, heuristics and call reAlloc(). How would you like this?

 I believe the Copy approach above is superior as far as cow() is
 concerned. I will have to revisit grow() separately.


Done. Some resizing may still occur, because:
1- we're not copying the whole MemBlock, just the portion that interests us
2- rounding to memory boundaries due to (eventually) MemPools


 * estimateCapacity() is too complex to understand its performance
 effects, especially since nreallocs is a very strange animal. I suggest
 that we keep it very simple and add optimizations later, if needed.
 Something like

    granularity = desired  PageSize ? 16 : PageSize;
    return roundto(desired, granularity)

 is much easier to comprehend, analyze, and optimize. Let's postpone
 complex optimizations.

 I agree, with a SLIGHTLY more complex design to match the current
 standard MemPool String sizes.
 Stuff like:
 desired *= constant_growth_factor;
 if (desired ShortStringsSize)
   return ShortStringsSize;
 if (desired  MediumStringsSize)
   return ShortStringsSize;
 if (desired  BigStringsSize)
   return BigStringsSize;
 if (desired  2kBuffersSize)
   return BigStringsSize;

 Can you add some static(?) SuggestMemSize() method to the String pool
 and call it from SBuf? We should not duplicate the logic of
 string-related memory size calculations, IMO.

Drastically simplified.
New approach:
grow() calls {
  estimateCapacity() which right now statically suggests a 20% increase in sizes
  and then reAlloc which calls {
MemBlob::memAlloc which calls {
  MemBlob::memAllocationPolicy which {
adopts a stepping factor to avoid fragmentation
(128-byte chunks up to 1kb, then 512-byte chunks up to 4kb,
then round to nearest 4kb
  }
  and allocates using xmalloc
}
  }
}
MemPools integration will see memAllocationPolicy be replaced by a
call to Mem::memAllocateString which will choose the right pool and
feed the new size back

 return roundto(desired,PageSize);

 We may want to discuss whether squidSystemPageSize should be static to
 this module (as it is now) or belongs to the global namespace.

 Does any other Squid code use it? If yes, it would be nice to
 store/calculate it in a single, shared location.

Done. Now in Mem.h/mem.cc

 * startsWith() is buggy -- it ignores the contents of its parameter's
 buffer.


 Does it?

 It does.  s/*this/S/.

Fixed and implemented unit tests.

 * SBuf::consume() returns a copy of SBuf. It should return a reference
 to this or void.

 Er.. no.
 Consume shortens the SBuf by chopping off the head and returns the chopped 
 part.

 Ah, I see, sorry. I would change the implementation to

    const SBuf rv(substr(0, actual));
    ... // chop the beginning from this buffer as you do now
    return rv;

Yes. Done.

 then.
 * Exceptions thrown by SBuf will lack source code location info, right?
 I think we need to add a ThrowHere(ex) macro that supplies that to the
 ex parameter.


 No, their constructor passes it through.

 But who supplies the file and line values to the constructor if we just
 call throw?

The same way TextException does:
throw someException(__FILE__,__LINE__)

 * Please rename substrSelf() as discussed.

 I don't recall reaching a decision about the target name.
 We can decide to drop it entirely (make it private), but the last
 agreement I recall is that since it's not part of the std::string API,
 we're on our own in defining it, and we haven't really defined it.

 I do not remember the two(?) replacement names that were proposed, but
 they are all better than substrSelf. Give 

Re: SBuf review at r9370

2009-03-04 Thread Alex Rousskov
On 03/04/2009 06:09 AM, Kinkie wrote:
 Who'd be in charge of managing the passed memory block?
 I see two choices:
 - the caller is in charge
   then absorb becomes an alias of assign() and has no reason to exist
 except create confusion
 - absorb frees
   this would make the behaviour more consistent with the eventual
 impementation, and a mismanaged pointer will bomb immediately.

 Do you have a path you'd prefer to see?
   
Only absorb frees path makes sense, as you pointed out yourself. This
is why the proposed method is called absorb :-).

 Done. Some resizing may still occur, because:
 1- we're not copying the whole MemBlock, just the portion that interests us
 2- rounding to memory boundaries due to (eventually) MemPools
   
Sure, but all those are under-the-hood things that cow() should not
worry about.
 estimateCapacity() which right now statically suggests a 20% increase in sizes
   
I expect 50% or 100% increase would work better, but this is pure
speculation on my part and not a big deal. Let's see what others suggest.

 MemPools integration will see memAllocationPolicy be replaced by a
 call to Mem::memAllocateString which will choose the right pool and
 feed the new size back
   
You may want to add the above plan as a source comment.
 But who supplies the file and line values to the constructor if we just
 call throw?
 
 The same way TextException does:
 throw someException(__FILE__,__LINE__)
   
Ah, I see now. Not sure why I missed that before, sorry. BTW, __FILE__
should be replaced with the new DebugBaseName() or whatever we called
that thing, I guess. One more reason to enhance Throw() so that it can
correctly throw custom exceptions.
 Hm.. How about
 - slice
 - shorten
 - chop
 - range
 - cut
   
Let's use chop() or, if you prefer, zoom().

HTH,

Alex.



Re: SBuf review at r9370

2009-03-04 Thread Amos Jeffries
 * Remove isImported. Copy and then free the buffer when importing
 instead. Same motivation as in the isLiteral comment above.

 This too has a IMO a very practical use: it allows us an easy path to
 get into the SBuf world i/o buffer which have been created by the i/o
 layer. If you're sure, I'll remove it.

 I understand that it can be a useful optimization. I am sure we should
 replace that optimization with copying (in absorb()) for now because we
 already have enough bugs to worry about without this special case.
 Special cases like this significantly increase the probability of a bug
 left unnoticed by a reviewer, IMO.

 Who'd be in charge of managing the passed memory block?
 I see two choices:
 - the caller is in charge
   then absorb becomes an alias of assign() and has no reason to exist
 except create confusion
 - absorb frees
   this would make the behaviour more consistent with the eventual
 impementation, and a mismanaged pointer will bomb immediately.

 Do you have a path you'd prefer to see?

Path #2.

In A.absorb(B)  A takes full control over the buffer of B.
Leaves B with NULL MemBlob pointer. Then A releases it's new temporary ptr
when finished doing the absorption. If MemBlob ref-counting does free or
not is irrelevant to both A and B, they no longer care or matter.


 * cow() logic feels wrong because grow() does not just grow(). Grow is
 discussed below. The cow() code should be something very simple like:
 This approach is both clean and efficient.

 The copy constructor can optimize, but please do keep it simple.
 My suggestion: reintroduce reAlloc() which gets called by cow() and
 does the low-level work, and have both cow() and grow() perform
 checks, heuristics and call reAlloc(). How would you like this?

 I believe the Copy approach above is superior as far as cow() is
 concerned. I will have to revisit grow() separately.


 Done. Some resizing may still occur, because:
 1- we're not copying the whole MemBlock, just the portion that interests
 us
 2- rounding to memory boundaries due to (eventually) MemPools


 * estimateCapacity() is too complex to understand its performance
 effects, especially since nreallocs is a very strange animal. I
 suggest
 that we keep it very simple and add optimizations later, if needed.
 Something like

 Â  Â granularity = desired  PageSize ? 16 : PageSize;
 Â  Â return roundto(desired, granularity)

 is much easier to comprehend, analyze, and optimize. Let's postpone
 complex optimizations.

 I agree, with a SLIGHTLY more complex design to match the current
 standard MemPool String sizes.
 Stuff like:
 desired *= constant_growth_factor;
 if (desired ShortStringsSize)
 Â  return ShortStringsSize;
 if (desired  MediumStringsSize)
 Â  return ShortStringsSize;
 if (desired  BigStringsSize)
 Â  return BigStringsSize;
 if (desired  2kBuffersSize)
 Â  return BigStringsSize;

 Can you add some static(?) SuggestMemSize() method to the String pool
 and call it from SBuf? We should not duplicate the logic of
 string-related memory size calculations, IMO.

 Drastically simplified.
 New approach:
 grow() calls {
   estimateCapacity() which right now statically suggests a 20% increase in
 sizes
   and then reAlloc which calls {
 MemBlob::memAlloc which calls {
   MemBlob::memAllocationPolicy which {
 adopts a stepping factor to avoid fragmentation
 (128-byte chunks up to 1kb, then 512-byte chunks up to 4kb,
 then round to nearest 4kb
   }
   and allocates using xmalloc
 }
   }
 }
 MemPools integration will see memAllocationPolicy be replaced by a
 call to Mem::memAllocateString which will choose the right pool and
 feed the new size back

 return roundto(desired,PageSize);

 We may want to discuss whether squidSystemPageSize should be static to
 this module (as it is now) or belongs to the global namespace.

 Does any other Squid code use it? If yes, it would be nice to
 store/calculate it in a single, shared location.

 Done. Now in Mem.h/mem.cc

 * startsWith() is buggy -- it ignores the contents of its parameter's
 buffer.


 Does it?

 It does. Â s/*this/S/.

 Fixed and implemented unit tests.

 * SBuf::consume() returns a copy of SBuf. It should return a reference
 to this or void.

 Er.. no.
 Consume shortens the SBuf by chopping off the head and returns the
 chopped part.

 Ah, I see, sorry. I would change the implementation to

 Â  Â const SBuf rv(substr(0, actual));
 Â  Â ... // chop the beginning from this buffer as you do now
 Â  Â return rv;

 Yes. Done.

 then.
 * Exceptions thrown by SBuf will lack source code location info,
 right?
 I think we need to add a ThrowHere(ex) macro that supplies that to the
 ex parameter.


 No, their constructor passes it through.

 But who supplies the file and line values to the constructor if we just
 call throw?

 The same way TextException does:
 throw someException(__FILE__,__LINE__)

 * Please rename substrSelf() as discussed.

 I don't recall reaching a decision about 

Re: SBuf review at r9370

2009-03-02 Thread Kinkie
On Thu, Feb 26, 2009 at 7:08 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 02/26/2009 10:24 AM, Kinkie wrote:
 On Thu, Feb 26, 2009 at 1:46 AM, Alex Rousskov
 rouss...@measurement-factory.com wrote:

 http://bazaar.launchpad.net/~kinkie/squid/stringng at r9370.

 * Remove isLiteral and the corresponding code/logic. We might add this
 very minor performance optimization as a non-public Blob member once the
 code is known to work well. For now, the code is confusing enough
 without it.

 The Literal thing was actually added to answer to a very specific
 need. IIRC it was because without it valgrind complained about leaked
 memory in case of a global static SBuf.
 If you're sure, I'll remove it.

 I am even more sure now that I know the reason you added it. :-)

I've removed it, and it's causing me to have headaches to no end, due
to interactions with MemPools.
The scenario is this:

I've extended mempools to add more String sizes, and I've offloaded
part of the logic in estimateCapacity there. Basically, what happens
is that estimateCapacity does not try to round to the pool size
anymore, since the pools are better suited to handle that.

BUT MemPools are initialized AFTER global variables, and thus a global
definition such as:
SBuf foo(some initializer);
causes the SBuf to grow()
which calls memAllocString
which tries to locate a pool, and fails because the pools are not yet ready.
(run...)
at program exit()
~SBuf
calls ~MemBlob
which calls memFreeString
which asserts unless by sheer chance the MemBlob has the exact same
size of a pool, and if it does the results are unpredicatable anyways.

Hacking my way around this is painful, and is causing me to touch
things I'd really rather not touch as much (mem.cc)
isLiteral/isImported had the dubious advantage of flagging in the
object themselves whether they were obtained from pools or from some
form of malloc or string constant. I can't really imagine a different
solution which doesn't break layering.

Do you have any suggestion?

 I agree, with a SLIGHTLY more complex design to match the current
 standard MemPool String sizes.
 Stuff like:
 desired *= constant_growth_factor;
 if (desired ShortStringsSize)
   return ShortStringsSize;
 if (desired  MediumStringsSize)
   return ShortStringsSize;
 if (desired  BigStringsSize)
   return BigStringsSize;
 if (desired  2kBuffersSize)
   return BigStringsSize;

Now removed (but not committed due to the above issues), it's better
done in memAllocString anyways.



-- 
/kinkie


Re: SBuf review at r9370

2009-02-27 Thread Kinkie
 Um, IMO its better to be liberal with length() and very,very conservative
 with len_.

I think I reached a sensible compromise, however subjective.
I ask you not to be too strict now, but to feel free to change it
after the branch lands into trunk.

 Dumping stuff and pointer maths particularly is one place its often
 better/safer to use const length() const instead of len_, unless the math
 is actually intended to change len_.

Sure. I _think_ that the only leftover read-access to len_ is in
dump() which prints the low-level details of the SBuf.

 now using xmin() and xmax() as per Amos' suggestion.

 !! no 'x' :)

Those are the only ones I could find in util.h...


-- 
/kinkie


Re: SBuf review at r9370

2009-02-27 Thread Amos Jeffries

Kinkie wrote:

Um, IMO its better to be liberal with length() and very,very conservative
with len_.


I think I reached a sensible compromise, however subjective.
I ask you not to be too strict now, but to feel free to change it
after the branch lands into trunk.


Dumping stuff and pointer maths particularly is one place its often
better/safer to use const length() const instead of len_, unless the math
is actually intended to change len_.


Sure. I _think_ that the only leftover read-access to len_ is in
dump() which prints the low-level details of the SBuf.


now using xmin() and xmax() as per Amos' suggestion.

!! no 'x' :)


Those are the only ones I could find in util.h...



Ah. The ones I'm on about are in compat through config.h

Thanks for the hint, looks like I'll have to kill the util versions in 
my cleanup.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE13
  Current Beta Squid 3.1.0.5


Re: SBuf review at r9370

2009-02-26 Thread Alex Rousskov
On 02/26/2009 10:24 AM, Kinkie wrote:
 On Thu, Feb 26, 2009 at 1:46 AM, Alex Rousskov
 rouss...@measurement-factory.com wrote:
   
 http://bazaar.launchpad.net/~kinkie/squid/stringng at r9370.

 * Remove isLiteral and the corresponding code/logic. We might add this
 very minor performance optimization as a non-public Blob member once the
 code is known to work well. For now, the code is confusing enough
 without it.
 
 The Literal thing was actually added to answer to a very specific
 need. IIRC it was because without it valgrind complained about leaked
 memory in case of a global static SBuf.
 If you're sure, I'll remove it.
   
I am even more sure now that I know the reason you added it. :-)
 * Remove isImported. Copy and then free the buffer when importing
 instead. Same motivation as in the isLiteral comment above.
 
 This too has a IMO a very practical use: it allows us an easy path to
 get into the SBuf world i/o buffer which have been created by the i/o
 layer. If you're sure, I'll remove it.
   
I understand that it can be a useful optimization. I am sure we should
replace that optimization with copying (in absorb()) for now because we
already have enough bugs to worry about without this special case.
Special cases like this significantly increase the probability of a bug
left unnoticed by a reviewer, IMO.
 * Some throw() declarations are still left in the code. Grep for them.
 
 Are you sure? There's only one that I could find, in the virtual
 destructor for OutOfBoundsException, and in that case it's needed to
 match the signature of TextException.
   
I forgot that standard exception class forces us to use throw(). I guess
this one will have to stay then.
 * cow() logic feels wrong because grow() does not just grow(). Grow is
 discussed below. The cow() code should be something very simple like:

if (we are alone)
return false; // no need to copy
store_ = new MemBuf(*store_); // get ourselves an exclusive copy of
 the blob
return false;
 No grow() magic -- we are getting an exclusive copy, not growing something.
 
 MemBlob has disabled copy-constructor and assignment operator by
 choice, as it gets handled through (refcount) pointers. (BTW: is this
 a violation of the coding guidelines?)
 It's certainly possible to add it, it's just a different style.

 The only extra operation performed by the current implementation is to
 run a re-evaluation of the blob store sizing heuristics: at the end
 it's a cheap operation compared to the copying, and since we're
 copying we may as well check it.
 I'll change if you confirm me that you have also considered this side
 of the argument.
   
Here, I am not complaining about what happens under the hood. I am
complaining about the on-the-surface logic of cow(). It is broken.
Whether the implementation actually works is irrelevant. There should be
no grow() in cow().

So you need to replace grow() with copying. If we do not want to expose
a MemBuf copy constructor, you can:

- Implement MemBuf copy constructor as a private member.
- Add MemBuf::Copy(const MemBuf mb) that returns a Pointer to new
MemBuf (mb).

This approach is both clean and efficient.

The copy constructor can optimize, but please do keep it simple.
 My suggestion: reintroduce reAlloc() which gets called by cow() and
 does the low-level work, and have both cow() and grow() perform
 checks, heuristics and call reAlloc(). How would you like this?
   
I believe the Copy approach above is superior as far as cow() is
concerned. I will have to revisit grow() separately.
 * estimateCapacity() is too complex to understand its performance
 effects, especially since nreallocs is a very strange animal. I suggest
 that we keep it very simple and add optimizations later, if needed.
 Something like

granularity = desired  PageSize ? 16 : PageSize;
return roundto(desired, granularity)

 is much easier to comprehend, analyze, and optimize. Let's postpone
 complex optimizations.
 
 I agree, with a SLIGHTLY more complex design to match the current
 standard MemPool String sizes.
 Stuff like:
 desired *= constant_growth_factor;
 if (desired ShortStringsSize)
   return ShortStringsSize;
 if (desired  MediumStringsSize)
   return ShortStringsSize;
 if (desired  BigStringsSize)
   return BigStringsSize;
 if (desired  2kBuffersSize)
   return BigStringsSize;
   
Can you add some static(?) SuggestMemSize() method to the String pool
and call it from SBuf? We should not duplicate the logic of
string-related memory size calculations, IMO.
 return roundto(desired,PageSize);

 We may want to discuss whether squidSystemPageSize should be static to
 this module (as it is now) or belongs to the global namespace.
   
Does any other Squid code use it? If yes, it would be nice to
store/calculate it in a single, shared location.
 * startsWith() is buggy -- it ignores the contents of its parameter's
 buffer.
 

 Does it?
   
It does.  s/*this/S/.

 * SBuf::consume() 

Re: SBuf review at r9370

2009-02-26 Thread Amos Jeffries

Kinkie wrote:

On Thu, Feb 26, 2009 at 1:46 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:

Hi Kinkie,

   Here are a few corrections for
http://bazaar.launchpad.net/~kinkie/squid/stringng at r9370.

* Remove isLiteral and the corresponding code/logic. We might add this
very minor performance optimization as a non-public Blob member once the
code is known to work well. For now, the code is confusing enough
without it.


The Literal thing was actually added to answer to a very specific
need. IIRC it was because without it valgrind complained about leaked
memory in case of a global static SBuf.
If you're sure, I'll remove it.


Once isLiteral is removed, compare with the prototype pointer instead of
testing isLiteral in cow().

* Remove isImported. Copy and then free the buffer when importing
instead. Same motivation as in the isLiteral comment above.


This too has a IMO a very practical use: it allows us an easy path to
get into the SBuf world i/o buffer which have been created by the i/o
layer.
If you're sure, I'll remove it.


* SBuf copy constructor and assignment operator still initialize members
differently.


Fixed. nreallocs is now imported by copy-constructor.


* Some throw() declarations are still left in the code. Grep for them.


Are you sure? There's only one that I could find, in the virtual
destructor for OutOfBoundsException, and in that case it's needed to
match the signature of TextException.


* cow() logic feels wrong because grow() does not just grow(). Grow is
discussed below. The cow() code should be something very simple like:

   if (we are alone)
   return false; // no need to copy


The only difference I could find is an isLiteral check..


   store_ = new MemBuf(*store_); // get ourselves an exclusive copy of
the blob
   return false;
No grow() magic -- we are getting an exclusive copy, not growing something.



MemBlob has disabled copy-constructor and assignment operator by
choice, as it gets handled through (refcount) pointers. (BTW: is this
a violation of the coding guidelines?)
It's certainly possible to add it, it's just a different style.

The only extra operation performed by the current implementation is to
run a re-evaluation of the blob store sizing heuristics: at the end
it's a cheap operation compared to the copying, and since we're
copying we may as well check it.
I'll change if you confirm me that you have also considered this side
of the argument.

I however agree with you that the current cow()/grow() logic is flawed
in that a certain amout of growing is always performed, regardless the
fact that the MemBlob is used or not. Which means that a SBuf which
gets aliased and cow()ed a lot will grow and waste memory.

My suggestion: reintroduce reAlloc() which gets called by cow() and
does the low-level work, and have both cow() and grow() perform
checks, heuristics and call reAlloc(). How would you like this?


* roundto(5,5) should return 5 but returns 10. The comment and
implementation should be fixed. The method probably does not belong here
though.


Reimplemented and re-checked. Now works as advertised.
Moved to libmiscutil.


* estimateCapacity() is too complex to understand its performance
effects, especially since nreallocs is a very strange animal. I suggest
that we keep it very simple and add optimizations later, if needed.
Something like

   granularity = desired  PageSize ? 16 : PageSize;
   return roundto(desired, granularity)

is much easier to comprehend, analyze, and optimize. Let's postpone
complex optimizations.


I agree, with a SLIGHTLY more complex design to match the current
standard MemPool String sizes.
Stuff like:
desired *= constant_growth_factor;
if (desired ShortStringsSize)
  return ShortStringsSize;
if (desired  MediumStringsSize)
  return ShortStringsSize;
if (desired  BigStringsSize)
  return BigStringsSize;
if (desired  2kBuffersSize)
  return BigStringsSize;
return roundto(desired,PageSize);

We may want to discuss whether squidSystemPageSize should be static to
this module (as it is now) or belongs to the global namespace.


* To me, compare() returns the opposite of what strcmp() does. This
should be treated as the first parameter of strcmp() and not the second
one, IMO.


Gah!
Fixed and added explicit cross-checks to unit tests.


* startsWith() is buggy -- it ignores the contents of its parameter's
buffer.


Does it?
Its implementation reads: if *this is too short to match, then it
can't match. Otherwise does a SBuf equality test on a a substr() of
*this long just as the parameter..


* SBuf::consume() returns a copy of SBuf. It should return a reference
to this or void.


Er.. no.
Consume shortens the SBuf by chopping off the head and returns the chopped part.
See how it's used in the tokenizer.

Given a:
SBuf data=some very long string;

then:
SBuf chopped=data.consume(5);

is equivalent to:
SBuf chopped=data.substr(0,5);
data.substrSelf(5,SBuf::npos);

or to:
SBuf chopped=data.substr(0,5);

Re: SBuf review at r9370

2009-02-26 Thread Alex Rousskov
On 02/26/2009 08:46 PM, Amos Jeffries wrote:
 Kinkie wrote:
 * Please use .length() everywhere you can. Do not mix .len_ and
 .length().
 Ok, checked.
 I've left len_ where it's being assigned to another object's len_ or
 in very low-level stuff (e.g. object dumping, stuff doing pointer
 maths).
 Um, IMO its better to be liberal with length() and very,very
 conservative with len_.

 Dumping stuff and pointer maths particularly is one place its often
 better/safer to use const length() const instead of len_, unless the
 math is actually intended to change len_.
Yeah. For zero-overhead wrapper methods like length(), the rule of thumb
is very simple: use the wrapper method unless you modify the data member
(or take its address).

Cheers,

Alex.



SBuf review at r9370

2009-02-25 Thread Alex Rousskov
Hi Kinkie,

Here are a few corrections for
http://bazaar.launchpad.net/~kinkie/squid/stringng at r9370.

* Remove isLiteral and the corresponding code/logic. We might add this
very minor performance optimization as a non-public Blob member once the
code is known to work well. For now, the code is confusing enough
without it.

Once isLiteral is removed, compare with the prototype pointer instead of
testing isLiteral in cow().

* Remove isImported. Copy and then free the buffer when importing
instead. Same motivation as in the isLiteral comment above.

* SBuf copy constructor and assignment operator still initialize members
differently.

* Some throw() declarations are still left in the code. Grep for them.

* cow() logic feels wrong because grow() does not just grow(). Grow is
discussed below. The cow() code should be something very simple like:

if (we are alone)
return false; // no need to copy

store_ = new MemBuf(*store_); // get ourselves an exclusive copy of
the blob
return false;

No grow() magic -- we are getting an exclusive copy, not growing something.


* roundto(5,5) should return 5 but returns 10. The comment and
implementation should be fixed. The method probably does not belong here
though.

* estimateCapacity() is too complex to understand its performance
effects, especially since nreallocs is a very strange animal. I suggest
that we keep it very simple and add optimizations later, if needed.
Something like
   
granularity = desired  PageSize ? 16 : PageSize;
return roundto(desired, granularity)

is much easier to comprehend, analyze, and optimize. Let's postpone
complex optimizations.


* To me, compare() returns the opposite of what strcmp() does. This
should be treated as the first parameter of strcmp() and not the second
one, IMO.

* startsWith() is buggy -- it ignores the contents of its parameter's
buffer.

* SBuf::consume() returns a copy of SBuf. It should return a reference
to this or void.

* Move if !canAppend then grow logic to a single method. Call it
reserve(). More on grow() below.

* grow() comments are wrong. The method does not always allocate a new
buffer.

* grow() implementation is odd. It ignores the remaining blob space
size. Am I missing something? I have a feeling you implemented something
other than grow() so that you can rely on its strange combination of
do-not-grow and grow-even-if-you-do-not-have-to logic in the rest of the
code. This needs to be fixed in grow() and in the callers().

* I wonder if SBuf::grow() should be moved to MemBlob that may
(eventually) have a low-level knowledge on how to do that efficiently.
What do you and others think?

* Please use .length() everywhere you can. Do not mix .len_ and .length().

* s/shortcut: same length and same store/shortcut: same store, offset,
and length/

* SBuf::consume() comments are stale, talking about NULL.

* actual in SBuf::consume() should be const.

* getStats should return a const reference not a copy of, potentially
large, stats object.

* Please use XMIN or XMAX instead of manually comparing values with a
?-operator. It is much easier to see and check the logic that way.

* Exceptions thrown by SBuf will lack source code location info, right?
I think we need to add a ThrowHere(ex) macro that supplies that to the
ex parameter.

* bufEnd() should return the pointer to the byte after the last one. You
can call it end(). If you find yourself accessing the last byte often,
add last() that returns a char.

* s/prototype/starter/ or s/prototype/primer/
And no need to prefix the corresponding member with store, IMO.

* s/A SBuf is GUARRANTEED to be defined.//
Any C++ object is defined.

* cow() declaration comments are stale

* Please rename substrSelf() as discussed.

* Please capitalize static members.

* s/roundto/RoundTo/

* s/importCString/absorb/ or s/importCString/absorbCString/
because you are taking full control over the string memory.

* roundto is declared as a global static function in a .cci file. It
should probably be declared as an inline.

* Use static_cast in SBuf::plength() and elsewhere. It would be much
easier to find and tune casting later if needed.

* s/store/blob/ or s/store/mem/
We have enough store* stuff in squid.

* exportCopy comments are wrong or stale. It does not return NULL.  It
should throw on memory allocation errors (but probably does not yet).
The comment should say that the returned string is zero terminated.

* Remove comments after #includes. Some tools break on C++ comments
there and you are partially lying and partially not supplying any new
info there.

#include config.h //various typedefs
#include Debug.h  //debugs etc


At this point, I got Launchpad is offline for scheduled maintenance. We
should be back soon. error so I will have to look at MemBlob later.

HTH,

Alex.




Re: SBuf review at r9370

2009-02-25 Thread Amos Jeffries

Alex Rousskov wrote:

Hi Kinkie,

Here are a few corrections for
http://bazaar.launchpad.net/~kinkie/squid/stringng at r9370.

* Remove isLiteral and the corresponding code/logic. We might add this
very minor performance optimization as a non-public Blob member once the
code is known to work well. For now, the code is confusing enough
without it.

Once isLiteral is removed, compare with the prototype pointer instead of
testing isLiteral in cow().

* Remove isImported. Copy and then free the buffer when importing
instead. Same motivation as in the isLiteral comment above.

* SBuf copy constructor and assignment operator still initialize members
differently.

* Some throw() declarations are still left in the code. Grep for them.

* cow() logic feels wrong because grow() does not just grow(). Grow is
discussed below. The cow() code should be something very simple like:

if (we are alone)
return false; // no need to copy

store_ = new MemBuf(*store_); // get ourselves an exclusive copy of
the blob
return false;

No grow() magic -- we are getting an exclusive copy, not growing something.


* roundto(5,5) should return 5 but returns 10. The comment and
implementation should be fixed. The method probably does not belong here
though.


Aye. we have util.h and libmisc for generic utility functions like this.
(even though that may change later/sooner its worth making them outside 
string stuff to start with).




* estimateCapacity() is too complex to understand its performance
effects, especially since nreallocs is a very strange animal. I suggest
that we keep it very simple and add optimizations later, if needed.
Something like
   
granularity = desired  PageSize ? 16 : PageSize;

return roundto(desired, granularity)

is much easier to comprehend, analyze, and optimize. Let's postpone
complex optimizations.


* To me, compare() returns the opposite of what strcmp() does. This
should be treated as the first parameter of strcmp() and not the second
one, IMO.

* startsWith() is buggy -- it ignores the contents of its parameter's
buffer.

* SBuf::consume() returns a copy of SBuf. It should return a reference
to this or void.

* Move if !canAppend then grow logic to a single method. Call it
reserve(). More on grow() below.

* grow() comments are wrong. The method does not always allocate a new
buffer.

* grow() implementation is odd. It ignores the remaining blob space
size. Am I missing something? I have a feeling you implemented something
other than grow() so that you can rely on its strange combination of
do-not-grow and grow-even-if-you-do-not-have-to logic in the rest of the
code. This needs to be fixed in grow() and in the callers().

* I wonder if SBuf::grow() should be moved to MemBlob that may
(eventually) have a low-level knowledge on how to do that efficiently.
What do you and others think?


As long as MemBlob::grow() returns a pointer/reference to larger self or 
larger copy that SBuf can change to use.
AND as long as SBuf::grow() is only needed internally. Yet another 
wrapper indirection is the last thing we need for String.




* Please use .length() everywhere you can. Do not mix .len_ and .length().

* s/shortcut: same length and same store/shortcut: same store, offset,
and length/

* SBuf::consume() comments are stale, talking about NULL.

* actual in SBuf::consume() should be const.

* getStats should return a const reference not a copy of, potentially
large, stats object.

* Please use XMIN or XMAX instead of manually comparing values with a
?-operator. It is much easier to see and check the logic that way.



max() and min() are themselves provided in the compat layer right next 
to #define XMAX(a,b) (max(a,b)) and XMIN likewise.


Does anyone know of a reason why we can't simply use min() and max() 
directly?



* Exceptions thrown by SBuf will lack source code location info, right?
I think we need to add a ThrowHere(ex) macro that supplies that to the
ex parameter.

* bufEnd() should return the pointer to the byte after the last one. You
can call it end(). If you find yourself accessing the last byte often,
add last() that returns a char.

* s/prototype/starter/ or s/prototype/primer/
And no need to prefix the corresponding member with store, IMO.

* s/A SBuf is GUARRANTEED to be defined.//
Any C++ object is defined.

* cow() declaration comments are stale

* Please rename substrSelf() as discussed.

* Please capitalize static members.

* s/roundto/RoundTo/

* s/importCString/absorb/ or s/importCString/absorbCString/
because you are taking full control over the string memory.

* roundto is declared as a global static function in a .cci file. It
should probably be declared as an inline.

* Use static_cast in SBuf::plength() and elsewhere. It would be much
easier to find and tune casting later if needed.

* s/store/blob/ or s/store/mem/
We have enough store* stuff in squid.

* exportCopy comments are wrong or stale. It does not return NULL.  It
should throw on