Re: StringNg review: MemBlob and BodyPipe

2011-03-29 Thread Alex Rousskov
On 03/26/2011 03:44 AM, Kinkie wrote:
 IMVHO body pipelining should go through SBuf, not MemBlob, but that's
 for body pipelining to work out.

 Currently, MemBlob's append-only design prevents efficient message body
 pipelining. Since SBuf relies on MemBlob, SBuf inherits the same problem
 (and adds more problems due to poor used area tracking). We will need to
 decide whether to enhance MemBlob/SBuf to improve pipelining support or
 switch to a pipeline that uses a bunch of buffers instead of a single
 one. This is a topic for another thread though.

 Yes. Unfortunately I don't understand what is the use case you have in
 mind :(

 See BodyPipe and BodyPipe::theBuf use specifically.

 My original thought was to use SBuf's append/consume pair for this.

 Unlike MemBuf used by BodyPipe, SBuf::consume() does not free any space and,
 hence, cannot be used for efficient body pipelining (i.e., using a
 relatively small buffer to transfer a possibly large body using asynchronous
 producer and consumer).
 
 I understand now.
 We could probably define some heuristics to handle this case.
 Something like (on top of my mind):
 if after consume() we have
 - a large-ish backing MemBlob (e.g.  8kb)
 - with a large unused prefix space (e.g. 4kb)
 
 Then we cow().
 Would this enable SBuf to be used for BodyPipes in your opinion?

Yes, probably, but the question is not just how to make SBuf usable for
BodyPipe, but which use case we want to optimize SBuf for.

If we follow the above consume-causes-memcpy algorithm, we are
virtually guaranteed to memcpy bytes of any message body that exceeds
I/O buffer size. And copy about N times where N is the content length
divided by the large unused prefix size. Thus, the
consume-causes-memcpy algorithm is bad for body processing
performance. It is debatable whether it is worse than the current
BodyPipe code, but it is unlikely to be much better.

If we optimize for body forwarding needs instead of preserving the
current implementation, we would need to implement a circular buffer so
that consume frees space without memmove or memcpy. This would be a lot
faster and a lot more complex than the current BodyPipe and SBuf code.

Both approaches would have to cow() in consume(), of course, but the
hope is that cow() would be a no-op after the headers are processed.
Still, you are likely to do at least one large memcpy per message just
to keep all the strings pointing to the headers happy.


A third approach is to use a series of buffers instead of a single
buffer. This approach avoids memcpy and avoids cow() in consume(), so it
is the most efficient one. It can be implemented on top of the existing
SBuf so that we do not make SBuf more complex and so that BodyPipe is
usually compatible with other SBuf users (usually avoiding extra
conversion copies).

This third approach requires custom (or inefficient) processing when a
string crosses buffer boundaries though. In practice, such crossings
should be very rare, but we would still need to write code to handle
them (I would opt for inefficient but simple code in this case).


To summarize, we have four options:

current: StringNG is unusable for BodyPipe; must copy to convert to/from
consume-causes-memcpy: lots of memcpy overhead; few StringNG changes
circular-buffer: at least one memcpy overhead; many StringNG changes
many-buffers: usually no memcpy overhead; many StringNG/core additions

We can move to consume-causes-memcpy now, but it adds more SBuf
complexity and heuristic headaches without an obvious performance
advantage. I am not sure, but it may even throw Squid performance back,
and we already have many performance regressions to deal with. It feels
like consume-causes-memcpy may not be worth it.

I think the long-term goal should be the many-buffers design, but I am
not sure. If others agree, we can just accept the fact that BodyPipe and
SBuf are not compatible for now and move on, until the many-buffers
approach is implemented. If StringNG becomes a performance degradation
point because of this, it will have to be introduced after v3.2 becomes
stable.


Thank you,

Alex.


Re: StringNg review: MemBlob

2010-10-19 Thread Kinkie
On Mon, Oct 18, 2010 at 7:56 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 Hi Kinkie,

    Attached is am untested patch with some final polishing changes for
 MemBlob, based on lp:~kinkie/squid/stringng revision 9517. I think the
 MemBlob class should be committed to trunk after these changes are tested
 and polished (I may have missed a few minor problems when moving or renaming
 stuff). The rest of the branch code would need to be synced as well.

revno 9518 covers one of the changes you requested: it turns
MemBlob::calcAllocSize into a nonmember static inline call in
MemBlob.cc.


 The patch preamble documents the major changes.

 While working on this, I realized that the MemBlob::size member is only an
 approximation, and that MemBlob class is probably not usable for efficient
 message body pipelining (BodyPipe and friends). We will probably change
 MemBlob API later to accommodate users other than strings, but that can
 wait. I adjusted the append call to remove the use of pointers which may
 make future changes less disruptive.

 Once MemBlob is committed, please consider re-integrating it with memory
 pools. This is the biggest remaining XXX, IMO.

I'd postpone that to after merging SBuf in. The biggest issues I have
met when trying to do that were related to initialization (IIRC):
- static variables can't use mempools
- some nonstatic MemBlobs get allocated before mempools are
initialized. While this can IIRC be detected at initialization time,
what happens at destruction time? Do the blobs need to remember
whether they were pool-allocated or not? Or do mempools handle
gracefully an attempt to free something which was not obtained from a
pool?

The fact that mempools have only been partly converted to c++ doesn't
really help :(

More to come as I check the patch(es).

Thanks for taking the time Alex; I'm really happy to see things progressing.

-- 
    /kinkie


Re: StringNg review: MemBlob

2010-10-19 Thread Kinkie
On Mon, Oct 18, 2010 at 7:56 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 Hi Kinkie,

    Attached is am untested patch with some final polishing changes for
 MemBlob, based on lp:~kinkie/squid/stringng revision 9517. I think the
 MemBlob class should be committed to trunk after these changes are tested
 and polished (I may have missed a few minor problems when moving or renaming
 stuff). The rest of the branch code would need to be synced as well.

Polished and build-tested.
Unless someone objects, I'm going to merge MemBlob to trunk this evening (GMT).
It lacks unit-tests (SBuf's unit tests provide them indirectly).


 The patch preamble documents the major changes.

 While working on this, I realized that the MemBlob::size member is only an
 approximation, and that MemBlob class is probably not usable for efficient
 message body pipelining (BodyPipe and friends). We will probably change
 MemBlob API later to accommodate users other than strings, but that can
 wait. I adjusted the append call to remove the use of pointers which may
 make future changes less disruptive.

Fixed SBuf accordingly.

 Once MemBlob is committed, please consider re-integrating it with memory
 pools. This is the biggest remaining XXX, IMO.

I've checked and confirmed why MemPools integration was disabled: it's
due to an inherent asymmetry in memAllocString/memFreeString: if
mempools are not initialized, memAllocString will happily allocate the
string using xcalloc. When the time comes to deallocate such strings,
it will fail to find a pool to dealloc from and fail an assertion.

The only way I can think of to cleanly resolve this is quite
disruptive, as it means making MemPools a proper singleton obect to
make sure they are initialized before being used: it's a complex
project on its own which I'd list as a dependency before integrating
MemPools back in.

IMVHO body pipelining should go through SBuf, not MemBlob, but that's
for body pipelining to work out.

-- 
    /kinkie


Re: StringNg review: MemBlob

2010-10-19 Thread Alex Rousskov

On 10/19/2010 08:08 AM, Kinkie wrote:

On Mon, Oct 18, 2010 at 7:56 PM, Alex Rousskov wrote:


Once MemBlob is committed, please consider re-integrating it with memory
pools. This is the biggest remaining XXX, IMO.


I've checked and confirmed why MemPools integration was disabled: it's
due to an inherent asymmetry in memAllocString/memFreeString: if
mempools are not initialized, memAllocString will happily allocate the
string using xcalloc. When the time comes to deallocate such strings,
it will fail to find a pool to dealloc from and fail an assertion.

The only way I can think of to cleanly resolve this is quite
disruptive, as it means making MemPools a proper singleton obect to
make sure they are initialized before being used: it's a complex
project on its own which I'd list as a dependency before integrating
MemPools back in.


What happens if you call Mem::Init() from memAllocString()? I am sure a 
few things will need to be fixed (e.g., Mem::Init() should exit if 
called twice), but it does not sound disruptive to me.




IMVHO body pipelining should go through SBuf, not MemBlob, but that's
for body pipelining to work out.


Currently, MemBlob's append-only design prevents efficient message body 
pipelining. Since SBuf relies on MemBlob, SBuf inherits the same problem 
(and adds more problems due to poor used area tracking). We will need to 
decide whether to enhance MemBlob/SBuf to improve pipelining support or 
switch to a pipeline that uses a bunch of buffers instead of a single 
one. This is a topic for another thread though.


Cheers,

Alex.





Re: StringNg review: MemBlob

2010-10-19 Thread Kinkie
On Tue, Oct 19, 2010 at 4:35 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 10/19/2010 08:08 AM, Kinkie wrote:

 On Mon, Oct 18, 2010 at 7:56 PM, Alex Rousskov wrote:

 Once MemBlob is committed, please consider re-integrating it with memory
 pools. This is the biggest remaining XXX, IMO.

 I've checked and confirmed why MemPools integration was disabled: it's
 due to an inherent asymmetry in memAllocString/memFreeString: if
 mempools are not initialized, memAllocString will happily allocate the
 string using xcalloc. When the time comes to deallocate such strings,
 it will fail to find a pool to dealloc from and fail an assertion.

 The only way I can think of to cleanly resolve this is quite
 disruptive, as it means making MemPools a proper singleton obect to
 make sure they are initialized before being used: it's a complex
 project on its own which I'd list as a dependency before integrating
 MemPools back in.

 What happens if you call Mem::Init() from memAllocString()? I am sure a few
 things will need to be fixed (e.g., Mem::Init() should exit if called
 twice), but it does not sound disruptive to me.

Tried that, using a string-pool as singleton-like initialization
indicator. There are some prerequisites to be called in turn. I have
not investigated further; I can investigate further if you wish.



 IMVHO body pipelining should go through SBuf, not MemBlob, but that's
 for body pipelining to work out.

 Currently, MemBlob's append-only design prevents efficient message body
 pipelining. Since SBuf relies on MemBlob, SBuf inherits the same problem
 (and adds more problems due to poor used area tracking). We will need to
 decide whether to enhance MemBlob/SBuf to improve pipelining support or
 switch to a pipeline that uses a bunch of buffers instead of a single one.
 This is a topic for another thread though.

Yes. Unfortunately I don't understand what is the use case you have in mind :(
My original thought was to use SBuf's append/consume pair for this.

Thanks again!

-- 
    /kinkie


Re: StringNg review: MemBlob

2010-10-19 Thread Alex Rousskov

On 10/19/2010 08:40 AM, Kinkie wrote:

On Tue, Oct 19, 2010 at 4:35 PM, Alex Rousskov
rouss...@measurement-factory.com  wrote:

On 10/19/2010 08:08 AM, Kinkie wrote:


On Mon, Oct 18, 2010 at 7:56 PM, Alex Rousskov wrote:


Once MemBlob is committed, please consider re-integrating it with memory
pools. This is the biggest remaining XXX, IMO.


I've checked and confirmed why MemPools integration was disabled: it's
due to an inherent asymmetry in memAllocString/memFreeString: if
mempools are not initialized, memAllocString will happily allocate the
string using xcalloc. When the time comes to deallocate such strings,
it will fail to find a pool to dealloc from and fail an assertion.

The only way I can think of to cleanly resolve this is quite
disruptive, as it means making MemPools a proper singleton obect to
make sure they are initialized before being used: it's a complex
project on its own which I'd list as a dependency before integrating
MemPools back in.


What happens if you call Mem::Init() from memAllocString()? I am sure a few
things will need to be fixed (e.g., Mem::Init() should exit if called
twice), but it does not sound disruptive to me.


Tried that, using a string-pool as singleton-like initialization
indicator. There are some prerequisites to be called in turn. I have
not investigated further; I can investigate further if you wish.


I think this must be fixed, but I do not think this has to be fixed 
before MemBlob is committed.




IMVHO body pipelining should go through SBuf, not MemBlob, but that's
for body pipelining to work out.


Currently, MemBlob's append-only design prevents efficient message body
pipelining. Since SBuf relies on MemBlob, SBuf inherits the same problem
(and adds more problems due to poor used area tracking). We will need to
decide whether to enhance MemBlob/SBuf to improve pipelining support or
switch to a pipeline that uses a bunch of buffers instead of a single one.
This is a topic for another thread though.


Yes. Unfortunately I don't understand what is the use case you have in mind :(


See BodyPipe and BodyPipe::theBuf use specifically.


My original thought was to use SBuf's append/consume pair for this.


Unlike MemBuf used by BodyPipe, SBuf::consume() does not free any space 
and, hence, cannot be used for efficient body pipelining (i.e., using a 
relatively small buffer to transfer a possibly large body using 
asynchronous producer and consumer).


Hope this clarifies,

Alex.


Re: StringNg review: MemBlob

2010-10-18 Thread Alex Rousskov

Hi Kinkie,

Attached is am untested patch with some final polishing changes for 
MemBlob, based on lp:~kinkie/squid/stringng revision 9517. I think the 
MemBlob class should be committed to trunk after these changes are 
tested and polished (I may have missed a few minor problems when moving 
or renaming stuff). The rest of the branch code would need to be synced 
as well.


The patch preamble documents the major changes.

While working on this, I realized that the MemBlob::size member is 
only an approximation, and that MemBlob class is probably not usable for 
efficient message body pipelining (BodyPipe and friends). We will 
probably change MemBlob API later to accommodate users other than 
strings, but that can wait. I adjusted the append call to remove the use 
of pointers which may make future changes less disruptive.


Once MemBlob is committed, please consider re-integrating it with memory 
pools. This is the biggest remaining XXX, IMO.



Thank you,

Alex.

Got rid of MemBlob.cci because one member should not have been inlined (it is
complex code, with complex linking dependencies, that is used as a part of an
expensive call) and the other members are one-liners that are better
inlined directly (less time spent finding and understanding the code).

Replaced raw pointers with offsets in canAppend(). Current MemBlob users
already use offsets to compute those pointers anyway. And future MemBlobs may
support sliding use window via ever-incrementing offsets, which will make raw
pointer use even less appropriate. This change effectively removes the
implicit am I using the blob I think I am using? check. I think it is OK to
remove that because if some user is so seriously confused, we should not try
to mask the problem at the canAppend() level.

Added clear() method so that users can reset the blob without modifying size
member directly. Users still need access to reference count to know when
clearing is safe, unfortunately. We may add something like clearIfLonely()
later.

Documented the two blob areas and explained how they change. Documented lack
of sharing protections beyond blob refcounting. Documented that size member
is only approximate because it is not decreased when users leave.

Polished creation/destruction messages to match find-alive patterns.

Polished method and method argument names.

Polished comments, made comments more consistent, and applied a few
Source Formatting rules.


=== modified file 'src/Makefile.am'
--- src/Makefile.am	2010-10-15 10:22:29 +
+++ src/Makefile.am	2010-10-18 17:05:43 +
@@ -543,8 +543,7 @@
 	String.cci \
 	SquidString.h \
 	SquidTime.h \
-	SBuf.cci \
-	MemBlob.cci
+	SBuf.cci
 
 BUILT_SOURCES = \
 	cf_gen_defines.cci \

=== modified file 'src/MemBlob.cc'
--- src/MemBlob.cc	2010-10-14 22:31:46 +
+++ src/MemBlob.cc	2010-10-18 17:17:18 +
@@ -37,68 +37,55 @@
 #include iostream
 #endif
 
-MemBlobStats MemBlob::stats;
-InstanceIdDefinitions(MemBlob, MemBlob);
-
-MemBlobStats::MemBlobStats() : alloc(0), live(0), append(0)
+#define MEMBLOB_USES_MEM_POOLS 0
+
+MemBlobStats MemBlob::Stats;
+InstanceIdDefinitions(MemBlob, blob);
+
+
+/* MemBlobStats */
+
+MemBlobStats::MemBlobStats(): alloc(0), live(0), append(0)
 {}
 
-const MemBlobStats
-MemBlob::GetStats()
-{
-return stats;
-}
-
 std::ostream
-MemBlobStats::dump(std::ostream os) const
+MemBlobStats::dump(std::ostream os) const
 {
 os 
-MemBlob allocations:   alloc 
-\nMemBlob live references:   live 
-\nMemBlob append operations:   append 
-\nMemBlob total allocated volume:   liveBytes 
-\nlive MemBlobs mean size:  
+MemBlob created:   alloc 
+\nMemBlob alive:   live 
+\nMemBlob append calls:   append 
+\nMemBlob currently allocated size:   liveBytes 
+\nlive MemBlob mean current allocation size:  
 (static_castdouble(liveBytes)/(live?live:1))  std::endl;
 return os;
 }
 
-std::ostream
-MemBlob::dump (std::ostream os) const
-{
-os  id @  (void *)this
- mem:  (void*) mem
- ,size:  capacity
- ,used:  size
- ,refs:  RefCountCount()  ; ;
-return os;
-}
 
+/* MemBlob */
 
 MemBlob::MemBlob(const MemBlob::size_type reserveSize) :
 mem(NULL), capacity(0), size(0) // will be set by memAlloc
 {
-debugs(MEMBLOB_DEBUGSECTION,9, id  constructed, @
-static_castvoid*(this)
-, reserveSize=  reserveSize);
+debugs(MEMBLOB_DEBUGSECTION,9, HERE  constructed, this=
+static_castvoid*(this)   id=  id 
+ reserveSize=  reserveSize);
 memAlloc(reserveSize);
 }
 
-MemBlob::MemBlob(const char * memBlock, const MemBlob::size_type memSize) :
+MemBlob::MemBlob(const char *buffer, const MemBlob::size_type bufSize) :
 mem(NULL), capacity(0), size(0) // will be set by memAlloc
 {
-debugs(MEMBLOB_DEBUGSECTION,9, id  constructed, @
-static_castvoid*(this)
-, memBlock=  static_castconst void*(memBlock)
-, memSize=  memSize);
-

Re: StringNg review: MemBlob

2010-10-08 Thread Kinkie
On Thu, Oct 7, 2010 at 12:49 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 Hi Kinkie,

Hi alex!


    Here is another review you have requested. This review is based on
 lp:~kinkie/squid/stringng revision 9504.

Thanks

 #include base/TextException.h

 Remove if not needed in MemBlob.h

Moved.

 class MemBlobStats {

 ...

 };

 Not reviewed per Kinkie's request.

Should now be OK. I've removed the reallocs stat, which doesn't really
belong) to MemBlob, as it's a SBuf construct.

 /**
  * Refcounted memory buffer, content-agnostic. Meant to only be used
  * by SBuf.
  */

 Remove Meant to only be used by SBuf. This class should be generally
 usable.

Removed. Removed all references to SBuf in MemBlob.*

 class MemBlob: public RefCountable {
    public:

 Still missing source formatting.

Done.

    char *mem;         /// Raw memory block
    size_type capacity;/// Size of the memory block
    size_type size; /// How much of the memory is actually used

 Start comment with a small letter or end with a period. I recommend the
 former for short comments that are usually not full sentences.

Done.

    //do not meddle with. Use getStats to obtain a copy for dumping
    static MemBlobStats stats;
    static const MemBlobStats getStats();

 Both missing documentation. What not to do instructions are not enough.

With the removal of nreallocs stats can be (and was) made private.

 I do not recall whether we have discussed this already, but if you can make
 the stats member private, please do.

 getStats() name should start with a capital letter since it is static.

Done.

    /**
     * Create a new MemBlob, containing a copy of the
     * contents of the buffer of size memSize
     * pointed to by memBlock. It is up to the caller to
     * manage the memBlock
     */
    MemBlob(const char * memBlock, const size_type memSize);

 Remove It is up to ... because const pointer already implies that.

ok


    /** check whether the supplied pointer is the first unused byte in the
 MemBlob
     */

     _SQUID_INLINE_ bool isAtEnd(const char *ptr) const;


 nitpick: the supplied pointer is not a byte.

 Please document whether this returns true when there are no unused MemBblob
 bytes. I still do not like this interface itself though. There must be a
 cleaner way to do what you want. How about this:

    /// whether writing to the [start, start + n) area is currently OK
    bool writable(const char *start, size_type n) const;

 This would both solve the problem with the lack of unused bytes and check
 for exclusive control of the area. Would that work for the current callers?

This is what canAppend does.
What I'll do is make private the isAtEnd and willFit methods (they're
only really used by canAppend).
I would prefer not to hand-inline them tho, for readability reasons.

 Moreover, do we need this method, willFit(), and canAppend()? They all do
 pretty much the same kind of checks, from different angles, with different
 bugs, under very different names. This needs to be polished, but I do not
 know which end to start from. Can you reduce the problem space?

Done, I hope.


     /** check whether we can append data at the tail of the MemBlob
      *
      * \param toAppend number of bytes to be appended
      * \return true there is at least toAppend free space available in the
 MemBlob
      */
     _SQUID_INLINE_ bool willFit(const size_type toAppend) const;

 .

    /** return the number of bytes available at the tail of the MemBlob
     *
     */

 /// the number of allocated but unused bytes at the end of the blob

    _SQUID_INLINE_ size_type getAvailableSpace() const;

 Wrong name. The method does not give the caller the available space, only
 its size. Here are some combinations that work better:

   size(), spaceSize(), capacity()
   contentSize(), spaceSize(), capacity()
   usedSize() and availableSize(), capacity()

Fixed. availableSize() is the winner.


    /** check whether we can append data at the tail of the MemBlob
     *
     * \param toAppend number of bytes to be appended
     * \return true there is at least toAppend free space available in the
 MemBlob
     */
    _SQUID_INLINE_ bool willFit(const size_type toAppend) const;

 .

    /** check if there is enough unused space at the tail of the MemBlob
     *
     * \return true the append would fit
     * \param bufEnd first unused byte after the end of the SBuf being
 appended to
     * \param toAppend how many bytes would be nice to append
     */
    _SQUID_INLINE_ bool canAppend(char const * bufEnd, const size_type
 toAppend) const;

 MemBlob should not know about SBuf; please remove references to SBuf.

 If we are really _appending_ to MemBlob, we should not have to specify the
 end of the used buffer area because MemBlob knows where it ends. See above
 for general comments and suggestions to fix the will/can/is* mess: We need
 fewer append-related members with a consistent naming scheme.

canAppend can probably get a better 

Re: StringNg review: MemBlob

2010-10-07 Thread Kinkie
On Wed, Sep 22, 2010 at 5:31 PM, Amos Jeffries squ...@treenet.co.nz wrote:
 On 22/09/10 21:30, Kinkie wrote:

 On Tue, Sep 21, 2010 at 4:18 AM, Alex Rousskov
 rouss...@measurement-factory.com  wrote:

 On 09/03/2010 09:21 AM, Kinkie wrote:

 You'll find the branch at lp:~kinkie/squid/stringng

 I decided to take a look at MemBlob because it is used by SBuf which was
 reviewed a few days ago. It would be nice to get a fixed SBuf committed,
 but
 that would require committing the classes it uses.

 Committing smaller self-contained parts may be an overall better strategy
 than trying to fix everything first and then do one huge commit.

 So here is a MemBlob review:

 snip

 Remove the buf prefix until you have some other size and capacity
 members
 to distinction from. Besides, this is not a buf. This is a blob :-)

 Yup. Relics... I've been working on this for more than two years now.

 I would not leave these data members public, but it is your call.

 This class originally was private to SBuf. I've now clarified in the
 class definition that it's not meant for general use; its only client
 should be SBuf.

 If still relevant that would be reason for protected: and friend
 declarations in the class definition instead of public:.

I'm fine either ways, the only client I'm concerned with is SBuf.

     _SQUID_INLINE_ void deleteSelf();

 I hope this can be removed. If not, it must be documented. The
 corresponding
 comments in .cci do not make sense to me and do not match RefCount.h in
 trunk, as far as I can tell.

 http://old.squid-cache.org/Doc/Prog-Guide/prog-guide-24.html
 http://markmail.org/message/kt7357pwttmzrey3

 Maybe the documentaiton is outdated?


 Seems to be. The RefCountable classes no longer have deleteSelf.

removed.

     _SQUID_INLINE_ bool isAtEnd(const char *ptr) const;

 Missing documentation.
 Why is this needed? Some transitional call? If yes, document as such.

 It is needed to know whether a the tail of a SBuf is at the tail of a
 shared MemBlob. Such a SBuf can append without needing a cow, if
 there's enough headroom in the MemBlob.

 So, a char* way of saying:

  if (SBuf::off_+Sbuf::len_ == MemBlob::size 
     SBuf::off_+Sbuf::len_  MemBlob::capacity) {
  ...

 with canAppend being a duplicate as above but = MemBlob::capacity+N

canAppend is exactly isAtEnd  willFit (isAtEnd is always true if
RefCount==1)

 or rather isAtEnd being a case of canAppend(0)

Well, yes: willFit(0) is always true by definition (you can always
append nothing).

 This reads like a command to free some space.
 Consider: spaceSize()
 spaceSize() would be more consistent with other Squid code as well.


     _SQUID_INLINE_ bool canGrow(const size_type howmuch) const;

 Missing documentation.
 Usually unsafe and wastefull. See similar comments for SBuf.

 It'd be unsafe in a concurrent (multithreaded) environment; how would
 it be unsafe in a single-threaded env?


 Squid won't be single-threaded for much longer. At least on the timescale
 for string-level changes.

Most String/Buf implementations avoid locking, there's too much
overhead to be spent for such a low-level function, it's usually
better dealt with on an higher level.

     _SQUID_INLINE_ bool canAppend(char const * bufEnd, const size_type
 howmuch) const;

 Missing documentation.
 Especially since it is not clear why canAppend() needs a pointer.
 Looking at the .cci implementation, I question why would the caller
 keep the bufEnd pointer if MemBlob manages it already? I do not think
 this
 call or duplicated bufEnd pointer is needed.

 if many SBufs reference the same MemBlob, an append operation to one
 of them does not need a cow if the appended-to SBuf is at the tail of
 the MemBlob and there is enough free space at the end of the MemBlob
 to hold the data being appended.
 This canAppend call checks exactly that. the SBuf knows its own tail,
 while the MemBlob knows the tail of the used portion of itself.


 see isAtEnd() comment above.

 /**
  * constructor, creates an empty MemBlob of size= requested
  * \param size the minimum size to be guarranteed from the MemBlob
  */
 MemBlob::MemBlob(const MemBlob::size_type size)
 {
     debugs(MEMBLOB_DEBUGSECTION,9,HERE  size=  size);
     memAlloc(size);
 }

 I would recommend initializing mem and sizes to NULL/0s explicitly in all
 the constructors.  It does not cost much and may save a few hours of
 debugging if things go wrong.

 Isn't setting them in memAlloc enough? Besides the actual size may be
 different (or would you mean to set them twice?)

 In the constructor init list.

 You can then enforce the MUST and ONLY requirements with assert(mem==NULL)
 etc at the start of memAlloc.

Done.
Thanks.


-- 
    /kinkie


Re: StringNg review: MemBlob

2010-10-07 Thread Kinkie
On Wed, Sep 22, 2010 at 5:42 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 09/22/2010 03:30 AM, Kinkie wrote:

 On Tue, Sep 21, 2010 at 4:18 AM, Alex Rousskov wrote:

 On 09/03/2010 09:21 AM, Kinkie wrote:

 You'll find the branch at lp:~kinkie/squid/stringng


 /**
  * Low-level buffer. It's responsible for allocating and managing
  * the blocks of memory. It's refcounted, but it doesn't know
  * which SBufs hold references to it.
  */


 Consider:
 /// A refcounted fixed-size buffer. Tracks the portion used by opaque
 content.

 How about:
 Refcounted memory buffer, content-agnostic. Meant to only be used by
 SBuf. ?

 Fixed-size is an key property unless you think it may change soon.
 Content-agnostic is fine, I guess.

Done.

 As for meant to only be used by Foo, I am still against such restrictions
 on use in general and in this context in particular. We have discussed this
 before. Use-cases may determine the initial design sketch and optimizations,
 but the final APIs should be usable in any context that needs them rather
 than restricted to one specific user. MemBlob is no exception IMO.

Removed the comment.
I hope to avoid overdesign.

 class MemBlob: public RefCountable {
     public:
     typedef RefCountMemBlob  Pointer;
     typedef unsigned int size_type;

 Would be nice to guarantee support for 5GB and larger blobs.

 Do we need them now? It'll be trivial to extend when needed.

 I may need them soon. Based on past experience, I am not convinced that
 replacing 'int' with some other type will be trivial.

I just tried it: changed the typedef in MemBlob and SBuf to int64_t
and adapted SQUIDSBUFPRINT, found one glitch but after fixing it build
and test-suite worked without a hitch.
Until the need arises, I'd prefer to stick to int32_t in order to
avoid overdesign, performance penalties (64-bit math is said to be bad
on many 32-bit platforms) and memory waste.

 I would not leave these data members public, but it is your call.

 This class originally was private to SBuf. I've now clarified in the
 class definition that it's not meant for general use; its only client
 should be SBuf.

 Which is a wrong restriction made worse by wrongly guessed implications. As
 SBuf review shows, we are already paying the price for the lack of proper
 boundaries between the two classes. It makes some areas of the SBuf code
 confusing at best.

Removed all such implications; I agree that the boundary betweeh
MemBlob and SBuf is blurry in some places; the decisions were not
taken lightly and in my very humble opinion pay off in terms of
performance gains. I hope that the documentation improvements you're
suggesting will clarify the situation. Once the API is finalized, I'll
further document in the Wiki (there is already a design blueprint, but
it is out of date)


     _SQUID_INLINE_ void deleteSelf();

 I hope this can be removed. If not, it must be documented. The
 corresponding
 comments in .cci do not make sense to me and do not match RefCount.h in
 trunk, as far as I can tell.

 http://old.squid-cache.org/Doc/Prog-Guide/prog-guide-24.html
 http://markmail.org/message/kt7357pwttmzrey3

 Maybe the documentaiton is outdated?

 That old documentation is wrong. Use the source code. Remove deleteSelf().

Done.The only references to such construct right now are in BasicUser,
NegotiateUser and NTLMUser (out of scope for me)

     _SQUID_INLINE_ size_type freeSpace() const;

 Missing documentation.

 Renamed to getAvailableSpace. Documentation was in the .cci, moved it

 Wrong name. The method does not give the caller the available space, only
 its size. Here are some combinations that work better:

   size(), spaceSize(), capacity()
   contentSize(), spaceSize(), capacity()
   usedSize() and availableSize(), capacity()

availableSize() it is then :)

 MemBlobStats MemBlob::stats;

 If some static code wants to allocate MemBlob, will its stats not get
 counted, depending on static initialization order?

 Yes. I don't think that's much of an issue;

 It becomes an issue when we get bug reports about negative or too-big
 counters. Or when somebody adds an assert() that the counter is correct.
 Both have happened before.

all counters are unsigned. I'm adding documentation telling not to
rely on them. The only way I can think of to disenfranchise from the
static allocation trap is to use a singleton, which would make
bookkeeping a relatively complex operation (much more complex than
incrementing an integer, anyways). Once we put this in context that
I'd remove most counters once SBuf is fully in...

 stats are mostly meant to

 help us diagnose the effectiveness of the solution, and in fact I
 wouldn't be against #ifdef-ing them out in production code.

 Sure, nobody cares about effectiveness of the code in production as long as
 it was effective in the lab :-).

lol!
We have invented Schroedinger's effectiveness. We should get a patent
on it; maybe we can apply in time for next Apr 1st..

 /**
  * 

Re: StringNg review: MemBlob

2010-10-06 Thread Alex Rousskov

Hi Kinkie,

Here is another review you have requested. This review is based on 
lp:~kinkie/squid/stringng revision 9504.



#include base/TextException.h


Remove if not needed in MemBlob.h


class MemBlobStats {

...

};


Not reviewed per Kinkie's request.


/**
 * Refcounted memory buffer, content-agnostic. Meant to only be used
 * by SBuf.
 */


Remove Meant to only be used by SBuf. This class should be generally 
usable.




class MemBlob: public RefCountable {
public:


Still missing source formatting.



char *mem; /// Raw memory block
size_type capacity;/// Size of the memory block
size_type size; /// How much of the memory is actually used


Start comment with a small letter or end with a period. I recommend the 
former for short comments that are usually not full sentences.




//do not meddle with. Use getStats to obtain a copy for dumping
static MemBlobStats stats;
static const MemBlobStats getStats();


Both missing documentation. What not to do instructions are not enough.

I do not recall whether we have discussed this already, but if you can 
make the stats member private, please do.


getStats() name should start with a capital letter since it is static.



/**
 * Create a new MemBlob, containing a copy of the
 * contents of the buffer of size memSize
 * pointed to by memBlock. It is up to the caller to
 * manage the memBlock
 */
MemBlob(const char * memBlock, const size_type memSize);


Remove It is up to ... because const pointer already implies that.



/** check whether the supplied pointer is the first unused byte in the 
MemBlob
 */

 _SQUID_INLINE_ bool isAtEnd(const char *ptr) const;


nitpick: the supplied pointer is not a byte.

Please document whether this returns true when there are no unused 
MemBblob bytes. I still do not like this interface itself though. There 
must be a cleaner way to do what you want. How about this:


/// whether writing to the [start, start + n) area is currently OK
bool writable(const char *start, size_type n) const;

This would both solve the problem with the lack of unused bytes and 
check for exclusive control of the area. Would that work for the current 
callers?


Moreover, do we need this method, willFit(), and canAppend()? They all 
do pretty much the same kind of checks, from different angles, with 
different bugs, under very different names. This needs to be polished, 
but I do not know which end to start from. Can you reduce the problem space?



 /** check whether we can append data at the tail of the MemBlob
  *
  * \param toAppend number of bytes to be appended
  * \return true there is at least toAppend free space available 
in the MemBlob

  */
 _SQUID_INLINE_ bool willFit(const size_type toAppend) const;

.


/** return the number of bytes available at the tail of the MemBlob
 *
 */


/// the number of allocated but unused bytes at the end of the blob


_SQUID_INLINE_ size_type getAvailableSpace() const;


Wrong name. The method does not give the caller the available space, 
only its size. Here are some combinations that work better:


   size(), spaceSize(), capacity()
   contentSize(), spaceSize(), capacity()
   usedSize() and availableSize(), capacity()


/** check whether we can append data at the tail of the MemBlob
 *
 * \param toAppend number of bytes to be appended
 * \return true there is at least toAppend free space available in the 
MemBlob
 */
_SQUID_INLINE_ bool willFit(const size_type toAppend) const;


.


/** check if there is enough unused space at the tail of the MemBlob
 *
 * \return true the append would fit
 * \param bufEnd first unused byte after the end of the SBuf being appended 
to
 * \param toAppend how many bytes would be nice to append
 */
_SQUID_INLINE_ bool canAppend(char const * bufEnd, const size_type 
toAppend) const;


MemBlob should not know about SBuf; please remove references to SBuf.

If we are really _appending_ to MemBlob, we should not have to specify 
the end of the used buffer area because MemBlob knows where it ends. See 
above for general comments and suggestions to fix the will/can/is* mess: 
We need fewer append-related members with a consistent naming scheme.




/** append a c-string by copy and explicit size


Consider:
* Appends exactly n bytes starting with S. Does not check for
* null-termination of S. Throws if there is not enough space.



 * It is responsibility of the caller to make sure that there is
 * enough space


What happens if there is not? I think we should throw. This needs to be 
documented. See above for a suggestion.



 * \param S the char* to be appended at the end of MemBlob


at the end of MemBlob can be interpreted as the end of allocated area. 
Use after the used area or similar.




 * \param Ssize the length of the array to be appended
 */
_SQUID_INLINE_ 

Re: StringNg review: MemBlob

2010-09-22 Thread Amos Jeffries

On 22/09/10 16:31, Alex Rousskov wrote:

On 09/20/2010 08:18 PM, Alex Rousskov wrote:

On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng

...

Comments for MemBlob.cci r9472:


Found one more:


* \note memcpy is used as the copying function. This is safe,
* because it is guarranteed by design that the copied-to
* memory area is only owned by the appended-to string,
* and thus doesn't overlap with the appended-from area
*/
void
MemBlob::append(const char *S, const MemBlob::size_type Ssize)
{
memcpy(mem+bufUsed,S,Ssize);
bufUsed+=Ssize;
++stats.append;
}


The \note is correct,


No. As you pointed out to me earlier with SBuf the S here may be the 
result of:

  a.clear(); a.append(a.mem, 1);

...



but we should not mention string in MemBlob.cci.
MemBlob does not (or should not) know what its users are. There may be a
better way to express the same thought. Consider:

\none memcpy() is safe because we copy to an unused area


*that* is better. There is still no guarantee it wont overflow on the 
destination. memcpy() makes no mention about when happens when one of 
the pointers is NULL, but experience shows a lot of segfaults.


memcpy() docs state always copies exactly num bytes and that the 
buffers should not overlap.


memmove() is the safe one which should be used if there is any doubt 
about overlapping memory. But even that can't validate against overflows.




The note belongs to the append() implementation, not its description.



Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.8
  Beta testers wanted for 3.2.0.2


Re: StringNg review: MemBlob

2010-09-22 Thread Kinkie
On Tue, Sep 21, 2010 at 4:18 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 09/03/2010 09:21 AM, Kinkie wrote:

 You'll find the branch at lp:~kinkie/squid/stringng

 I decided to take a look at MemBlob because it is used by SBuf which was
 reviewed a few days ago. It would be nice to get a fixed SBuf committed, but
 that would require committing the classes it uses.

 Committing smaller self-contained parts may be an overall better strategy
 than trying to fix everything first and then do one huge commit.

 So here is a MemBlob review:

MemBlob and SBuf are meant to be used together. They split the burden
of managing memory correcly.

 Comments for src/MemBlob.h r9472:

 #include MemPool.h

Needed for MEMPROXY_CLASS etc

 Remove if not needed.

 #include RefCount.h

 /**
  * A container for various MemBlob class-wide statistics
  */
 class MemBlobStats {
     public:
     u_int64_t alloc; ///number of allocation operations

 Class allocations or buffer allocations?

Clarified. MemBlob allocations

     u_int64_t realloc; ///number of re-allocation operations

 .

     u_int64_t live; ///number of live references to bufs

 How can you know the number of references? Do you mean the number of MemBob
 instances in existence?

Number of MemBlob instances. Clarified.


     u_int64_t append; ///number of append operations (whtout grow)

 spelling
 Besides, MemBob does not grow!

Fixed

     u_int64_t allocatedBytes; ///the total size of the allocated storage

 Conflicts with the previous use of allocation. If that is not changed,
 use liveBytes here.

Done.

     /**
      * Dumps statistics to an ostream. Use an ostringstream to
      * capture them to a string
      */
     std::ostream dump(std::ostream os) const;

 Consider removing reference to ostringstream or moving it to some .dox file.
 It does not belong to interface documentation.

Fixed.

     MemBlobStats();
 };

 .

 /**
  * Low-level buffer. It's responsible for allocating and managing
  * the blocks of memory. It's refcounted, but it doesn't know
  * which SBufs hold references to it.
  */

 MemBlob does not manage blocks, it manages one block. It does not know many
 things (do not document what is not there).

Fixed


 Consider:
 /// A refcounted fixed-size buffer. Tracks the portion used by opaque
 content.

How about:
Refcounted memory buffer, content-agnostic. Meant to only be used by SBuf. ?


 class MemBlob: public RefCountable {
     public:
     typedef RefCountMemBlob Pointer;
     typedef unsigned int size_type;

 Would be nice to guarantee support for 5GB and larger blobs.

Do we need them now? It'll be trivial to extend when needed.

     char *mem;         /// The block of memory

 Missing  in Doxygen comments?

Yes. I wasn't aware of the construct.


 Consider: /// raw allocated memory

     size_type bufSize; /// Size of the allocated memory
     size_type bufUsed; /// How much of the memory is actually used

 Other Squid code is using capacity and size for this. Std::string also uses
 size() for user-facing buffer length. Please be consistent.

Changed, even though I find the name size confusing in this case.

 Remove the buf prefix until you have some other size and capacity members
 to distinction from. Besides, this is not a buf. This is a blob :-)

Yup. Relics... I've been working on this for more than two years now.

 I would not leave these data members public, but it is your call.

This class originally was private to SBuf. I've now clarified in the
class definition that it's not meant for general use; its only client
should be SBuf.

     _SQUID_INLINE_ void deleteSelf();

 I hope this can be removed. If not, it must be documented. The corresponding
 comments in .cci do not make sense to me and do not match RefCount.h in
 trunk, as far as I can tell.

http://old.squid-cache.org/Doc/Prog-Guide/prog-guide-24.html
http://markmail.org/message/kt7357pwttmzrey3

Maybe the documentaiton is outdated?

     MemBlob(const size_type size);

 Missing explicit. It is better to document whether it allocates anything.

Fixed.

     MemBlob(char * memBlock, const size_type memSize);

 Not sure why this is needed, but needs to be documented, especially with
 regard to memBlock ownership and destruction rules.

Documented.

 Please keep in mind that without a default constructor (and reserve()), it
 would be impossible to have containers of blobs.

Not needed. We want containers of SBufs, a MemBlob is more akin to a
smart struct.

     _SQUID_INLINE_ bool isAtEnd(const char *ptr) const;

 Missing documentation.
 Why is this needed? Some transitional call? If yes, document as such.

It is needed to know whether a the tail of a SBuf is at the tail of a
shared MemBlob. Such a SBuf can append without needing a cow, if
there's enough headroom in the MemBlob.

     _SQUID_INLINE_ size_type freeSpace() const;

 Missing documentation.

Renamed to getAvailableSpace. Documentation was in the .cci, moved it

 This reads like a command 

Re: StringNg review: MemBlob

2010-09-22 Thread Kinkie
On Wed, Sep 22, 2010 at 8:47 AM, Amos Jeffries squ...@treenet.co.nz wrote:
 On 22/09/10 16:31, Alex Rousskov wrote:

 On 09/20/2010 08:18 PM, Alex Rousskov wrote:

 On 09/03/2010 09:21 AM, Kinkie wrote:

 You'll find the branch at lp:~kinkie/squid/stringng

 ...

 Comments for MemBlob.cci r9472:

 Found one more:

 * \note memcpy is used as the copying function. This is safe,
 * because it is guarranteed by design that the copied-to
 * memory area is only owned by the appended-to string,
 * and thus doesn't overlap with the appended-from area
 */
 void
 MemBlob::append(const char *S, const MemBlob::size_type Ssize)
 {
 memcpy(mem+bufUsed,S,Ssize);
 bufUsed+=Ssize;
 ++stats.append;
 }

 The \note is correct,

 No. As you pointed out to me earlier with SBuf the S here may be the result
 of:
  a.clear(); a.append(a.mem, 1);

Using SBuf's raw memory access functions is dangerous, and this is
well documented.
It gives users rope, can't be blamed for how they use it..


 ...


 but we should not mention string in MemBlob.cci.
 MemBlob does not (or should not) know what its users are. There may be a
 better way to express the same thought. Consider:

 \none memcpy() is safe because we copy to an unused area

 *that* is better. There is still no guarantee it wont overflow on the
 destination. memcpy() makes no mention about when happens when one of the
 pointers is NULL, but experience shows a lot of segfaults.

Changed to reflect this.
MemBlob.mem can never be NULL; we could add a guard if the user passes
a NULL char*, and  treat it as a NOP.

 memcpy() docs state always copies exactly num bytes and that the buffers
 should not overlap.
 memmove() is the safe one which should be used if there is any doubt about
 overlapping memory. But even that can't validate against overflows.

Can change to memmove; is there a performance hit?


-- 
    /kinkie


Re: StringNg review: MemBlob

2010-09-22 Thread Amos Jeffries

On 22/09/10 21:30, Kinkie wrote:

On Tue, Sep 21, 2010 at 4:18 AM, Alex Rousskov
rouss...@measurement-factory.com  wrote:

On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng


I decided to take a look at MemBlob because it is used by SBuf which was
reviewed a few days ago. It would be nice to get a fixed SBuf committed, but
that would require committing the classes it uses.

Committing smaller self-contained parts may be an overall better strategy
than trying to fix everything first and then do one huge commit.

So here is a MemBlob review:



snip

Remove the buf prefix until you have some other size and capacity members
to distinction from. Besides, this is not a buf. This is a blob :-)


Yup. Relics... I've been working on this for more than two years now.


I would not leave these data members public, but it is your call.


This class originally was private to SBuf. I've now clarified in the
class definition that it's not meant for general use; its only client
should be SBuf.


If still relevant that would be reason for protected: and friend 
declarations in the class definition instead of public:.





 _SQUID_INLINE_ void deleteSelf();


I hope this can be removed. If not, it must be documented. The corresponding
comments in .cci do not make sense to me and do not match RefCount.h in
trunk, as far as I can tell.


http://old.squid-cache.org/Doc/Prog-Guide/prog-guide-24.html
http://markmail.org/message/kt7357pwttmzrey3

Maybe the documentaiton is outdated?



Seems to be. The RefCountable classes no longer have deleteSelf.



 _SQUID_INLINE_ bool isAtEnd(const char *ptr) const;


Missing documentation.
Why is this needed? Some transitional call? If yes, document as such.


It is needed to know whether a the tail of a SBuf is at the tail of a
shared MemBlob. Such a SBuf can append without needing a cow, if
there's enough headroom in the MemBlob.


So, a char* way of saying:

 if (SBuf::off_+Sbuf::len_ == MemBlob::size 
 SBuf::off_+Sbuf::len_  MemBlob::capacity) {
 ...

with canAppend being a duplicate as above but = MemBlob::capacity+N

or rather isAtEnd being a case of canAppend(0)



This reads like a command to free some space.
Consider: spaceSize()
spaceSize() would be more consistent with other Squid code as well.



 _SQUID_INLINE_ bool canGrow(const size_type howmuch) const;


Missing documentation.
Usually unsafe and wastefull. See similar comments for SBuf.


It'd be unsafe in a concurrent (multithreaded) environment; how would
it be unsafe in a single-threaded env?



Squid won't be single-threaded for much longer. At least on the 
timescale for string-level changes.




 _SQUID_INLINE_ bool canAppend(char const * bufEnd, const size_type
howmuch) const;


Missing documentation.
Especially since it is not clear why canAppend() needs a pointer.
Looking at the .cci implementation, I question why would the caller
keep the bufEnd pointer if MemBlob manages it already? I do not think this
call or duplicated bufEnd pointer is needed.


if many SBufs reference the same MemBlob, an append operation to one
of them does not need a cow if the appended-to SBuf is at the tail of
the MemBlob and there is enough free space at the end of the MemBlob
to hold the data being appended.
This canAppend call checks exactly that. the SBuf knows its own tail,
while the MemBlob knows the tail of the used portion of itself.



see isAtEnd() comment above.


/**
  * constructor, creates an empty MemBlob of size= requested
  * \param size the minimum size to be guarranteed from the MemBlob
  */
MemBlob::MemBlob(const MemBlob::size_type size)
{
 debugs(MEMBLOB_DEBUGSECTION,9,HERE  size=  size);
 memAlloc(size);
}


I would recommend initializing mem and sizes to NULL/0s explicitly in all
the constructors.  It does not cost much and may save a few hours of
debugging if things go wrong.


Isn't setting them in memAlloc enough? Besides the actual size may be
different (or would you mean to set them twice?)


In the constructor init list.

You can then enforce the MUST and ONLY requirements with 
assert(mem==NULL) etc at the start of memAlloc.



Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.8
  Beta testers wanted for 3.2.0.2


Re: StringNg review: MemBlob

2010-09-22 Thread Alex Rousskov

On 09/22/2010 03:30 AM, Kinkie wrote:

On Tue, Sep 21, 2010 at 4:18 AM, Alex Rousskov wrote:

On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng




/**
  * Low-level buffer. It's responsible for allocating and managing
  * the blocks of memory. It's refcounted, but it doesn't know
  * which SBufs hold references to it.
  */



Consider:
/// A refcounted fixed-size buffer. Tracks the portion used by opaque
content.


How about:
Refcounted memory buffer, content-agnostic. Meant to only be used by SBuf. ?


Fixed-size is an key property unless you think it may change soon.
Content-agnostic is fine, I guess.

As for meant to only be used by Foo, I am still against such 
restrictions on use in general and in this context in particular. We 
have discussed this before. Use-cases may determine the initial design 
sketch and optimizations, but the final APIs should be usable in any 
context that needs them rather than restricted to one specific user. 
MemBlob is no exception IMO.




class MemBlob: public RefCountable {
 public:
 typedef RefCountMemBlob  Pointer;
 typedef unsigned int size_type;


Would be nice to guarantee support for 5GB and larger blobs.


Do we need them now? It'll be trivial to extend when needed.


I may need them soon. Based on past experience, I am not convinced that 
replacing 'int' with some other type will be trivial.




I would not leave these data members public, but it is your call.


This class originally was private to SBuf. I've now clarified in the
class definition that it's not meant for general use; its only client
should be SBuf.


Which is a wrong restriction made worse by wrongly guessed implications. 
As SBuf review shows, we are already paying the price for the lack of 
proper boundaries between the two classes. It makes some areas of the 
SBuf code confusing at best.




 _SQUID_INLINE_ void deleteSelf();


I hope this can be removed. If not, it must be documented. The corresponding
comments in .cci do not make sense to me and do not match RefCount.h in
trunk, as far as I can tell.


http://old.squid-cache.org/Doc/Prog-Guide/prog-guide-24.html
http://markmail.org/message/kt7357pwttmzrey3

Maybe the documentaiton is outdated?


That old documentation is wrong. Use the source code. Remove deleteSelf().



 _SQUID_INLINE_ size_type freeSpace() const;


Missing documentation.


Renamed to getAvailableSpace. Documentation was in the .cci, moved it


Wrong name. The method does not give the caller the available space, 
only its size. Here are some combinations that work better:


   size(), spaceSize(), capacity()
   contentSize(), spaceSize(), capacity()
   usedSize() and availableSize(), capacity()




MemBlobStats MemBlob::stats;


If some static code wants to allocate MemBlob, will its stats not get
counted, depending on static initialization order?


Yes. I don't think that's much of an issue;


It becomes an issue when we get bug reports about negative or too-big 
counters. Or when somebody adds an assert() that the counter is correct. 
Both have happened before.



 stats are mostly meant to

help us diagnose the effectiveness of the solution, and in fact I
wouldn't be against #ifdef-ing them out in production code.


Sure, nobody cares about effectiveness of the code in production as long 
as it was effective in the lab :-).




/**
  * constructor, creates an empty MemBlob of size= requested
  * \param size the minimum size to be guarranteed from the MemBlob
  */
MemBlob::MemBlob(const MemBlob::size_type size)
{
 debugs(MEMBLOB_DEBUGSECTION,9,HERE  size=  size);
 memAlloc(size);
}


I would recommend initializing mem and sizes to NULL/0s explicitly in all
the constructors.  It does not cost much and may save a few hours of
debugging if things go wrong.


Isn't setting them in memAlloc enough? Besides the actual size may be
different (or would you mean to set them twice?)


Initialize them to NULLs and 0s before calling memAlloc. Such 
initializations cost little and avoid having an object with 
uninitialized members.



Thank you,

Alex.


Re: StringNg review: MemBlob

2010-09-22 Thread Alex Rousskov

On 09/22/2010 12:47 AM, Amos Jeffries wrote:

On 22/09/10 16:31, Alex Rousskov wrote:

On 09/20/2010 08:18 PM, Alex Rousskov wrote:

On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng

...

Comments for MemBlob.cci r9472:


Found one more:


* \note memcpy is used as the copying function. This is safe,
* because it is guarranteed by design that the copied-to
* memory area is only owned by the appended-to string,
* and thus doesn't overlap with the appended-from area
*/
void
MemBlob::append(const char *S, const MemBlob::size_type Ssize)
{
memcpy(mem+bufUsed,S,Ssize);
bufUsed+=Ssize;
++stats.append;
}


The \note is correct,


No. As you pointed out to me earlier with SBuf the S here may be the
result of:
a.clear(); a.append(a.mem, 1);


The situation here is different, I believe, but I do not understand your 
example. What is a? MemBlob does not have clear(). SBuf does not have 
mem...


If s* are SBufs and you meant

  s1.clear(); s1.append(s1.buf(), s1.size())

then it is fine because s1.size() will be zero.

If you meant

  s1.clear(); s1.append(s2.buf(), s2.size())

then it is fine because s1.append will COW if s1 and s2 share MemBlob 
and s2.size() is positive.


If you mean

 blob.clear; blob.append(blob.mem, blob.size())

then it is fine because blob.size() will be zero.




...



but we should not mention string in MemBlob.cci.
MemBlob does not (or should not) know what its users are. There may be a
better way to express the same thought. Consider:

\none memcpy() is safe because we copy to an unused area


*that* is better. There is still no guarantee it wont overflow on the
destination. memcpy() makes no mention about when happens when one of
the pointers is NULL, but experience shows a lot of segfaults.


The MemBlob append() code should not call memcpy() with a NULL 
destination pointer, of course, but that is not related to the \note. My 
understanding is that blob.mem is never NULL.


The MemBlob append() code must check for overflows. If I did not comment 
on that before, please consider this a please fix comment.


Kinkie, while you are at it, please also avoid calling memcpy/move() 
when size-to-append is zero, just in case.




memcpy() docs state always copies exactly num bytes and that the
buffers should not overlap.

memmove() is the safe one which should be used if there is any doubt
about overlapping memory. But even that can't validate against overflows.


Per \note, I think we are fine with memcpy() here but counter-examples 
are more than welcome. I assume memcpy() is significantly faster in some 
environments we care about.


On the other hand, if memmove() implementations we care about always 
quickly check for overlaps and actually call memcpy() when possible, 
then we should just use memmove()!


Thank you,

Alex.


Re: StringNg review: MemBlob

2010-09-22 Thread Alex Rousskov

On 09/22/2010 03:44 AM, Kinkie wrote:

On Wed, Sep 22, 2010 at 8:47 AM, Amos Jeffriessqu...@treenet.co.nz  wrote:

On 22/09/10 16:31, Alex Rousskov wrote:


On 09/20/2010 08:18 PM, Alex Rousskov wrote:


On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng


...


Comments for MemBlob.cci r9472:


Found one more:


* \note memcpy is used as the copying function. This is safe,
* because it is guarranteed by design that the copied-to
* memory area is only owned by the appended-to string,
* and thus doesn't overlap with the appended-from area
*/
void
MemBlob::append(const char *S, const MemBlob::size_type Ssize)
{
memcpy(mem+bufUsed,S,Ssize);
bufUsed+=Ssize;
++stats.append;
}


The \note is correct,


No. As you pointed out to me earlier with SBuf the S here may be the result
of:
  a.clear(); a.append(a.mem, 1);


Using SBuf's raw memory access functions is dangerous, and this is
well documented.
It gives users rope, can't be blamed for how they use it..


This has little to do with raw memory access, actually.

If a is SBuf and a.mem is a.buf(), then

 a.clear(); a.append(a.buf(), 1);

is wrong not because some areas overlap (they actually do not, see my 
earlier response) but because a.size() is less than 1 after a.clear() 
and so the caller should not be using 1 as the size-to-append 
parameter. In other words, there is a bug, but the bug is not in the 
append(), but in the caller code.


I just realized that SBuf uses length() instead of standard size(). I 
hope you got the idea though. :-(


Alex.


Re: StringNg review: MemBlob

2010-09-21 Thread Alex Rousskov

On 09/20/2010 08:18 PM, Alex Rousskov wrote:

On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng

...

Comments for MemBlob.cci r9472:


Found one more:


 * \note memcpy is used as the copying function. This is safe,
 *   because it is guarranteed by design that the copied-to
 *   memory area is only owned by the appended-to string,
 *   and thus doesn't overlap with the appended-from area
 */
void
MemBlob::append(const char *S, const MemBlob::size_type Ssize)
{
memcpy(mem+bufUsed,S,Ssize);
bufUsed+=Ssize;
++stats.append;
}


The \note is correct, but we should not mention string in MemBlob.cci. 
MemBlob does not (or should not) know what its users are. There may be a 
better way to express the same thought. Consider:


\none memcpy() is safe because we copy to an unused area

The note belongs to the append() implementation, not its description.


Thank you,

Alex.


Re: StringNg review: MemBlob

2010-09-20 Thread Alex Rousskov

On 09/03/2010 09:21 AM, Kinkie wrote:


You'll find the branch at lp:~kinkie/squid/stringng


I decided to take a look at MemBlob because it is used by SBuf which was 
reviewed a few days ago. It would be nice to get a fixed SBuf committed, 
but that would require committing the classes it uses.


Committing smaller self-contained parts may be an overall better 
strategy than trying to fix everything first and then do one huge commit.


So here is a MemBlob review:


Comments for src/MemBlob.h r9472:

 #include MemPool.h

Remove if not needed.

 #include RefCount.h

 /**
  * A container for various MemBlob class-wide statistics
  */
 class MemBlobStats {
 public:
 u_int64_t alloc; ///number of allocation operations

Class allocations or buffer allocations?

 u_int64_t realloc; ///number of re-allocation operations

.

 u_int64_t live; ///number of live references to bufs

How can you know the number of references? Do you mean the number of 
MemBob instances in existence?


 u_int64_t append; ///number of append operations (whtout grow)

spelling
Besides, MemBob does not grow!

 u_int64_t allocatedBytes; ///the total size of the allocated storage

Conflicts with the previous use of allocation. If that is not changed,
use liveBytes here.


 /**
  * Dumps statistics to an ostream. Use an ostringstream to
  * capture them to a string
  */
 std::ostream dump(std::ostream os) const;

Consider removing reference to ostringstream or moving it to some .dox 
file. It does not belong to interface documentation.


 MemBlobStats();
 };

.

 /**
  * Low-level buffer. It's responsible for allocating and managing
  * the blocks of memory. It's refcounted, but it doesn't know
  * which SBufs hold references to it.
  */

MemBlob does not manage blocks, it manages one block. It does not know 
many things (do not document what is not there).


Consider:
/// A refcounted fixed-size buffer. Tracks the portion used by opaque 
content.


 class MemBlob: public RefCountable {
 public:
 typedef RefCountMemBlob Pointer;
 typedef unsigned int size_type;

Would be nice to guarantee support for 5GB and larger blobs.

 char *mem; /// The block of memory

Missing  in Doxygen comments?

Consider: /// raw allocated memory

 size_type bufSize; /// Size of the allocated memory
 size_type bufUsed; /// How much of the memory is actually used

Other Squid code is using capacity and size for this. Std::string also 
uses size() for user-facing buffer length. Please be consistent.


Remove the buf prefix until you have some other size and capacity 
members to distinction from. Besides, this is not a buf. This is a 
blob :-)


I would not leave these data members public, but it is your call.


 _SQUID_INLINE_ void deleteSelf();

I hope this can be removed. If not, it must be documented. The 
corresponding comments in .cci do not make sense to me and do not match 
RefCount.h in trunk, as far as I can tell.



 MemBlob(const size_type size);

Missing explicit. It is better to document whether it allocates anything.


 MemBlob(char * memBlock, const size_type memSize);

Not sure why this is needed, but needs to be documented, especially with
regard to memBlock ownership and destruction rules.


Please keep in mind that without a default constructor (and reserve()), 
it would be impossible to have containers of blobs.



 _SQUID_INLINE_ bool isAtEnd(const char *ptr) const;

Missing documentation.
Why is this needed? Some transitional call? If yes, document as such.


 _SQUID_INLINE_ size_type freeSpace() const;

Missing documentation.

This reads like a command to free some space.
Consider: spaceSize()
spaceSize() would be more consistent with other Squid code as well.


 _SQUID_INLINE_ bool canGrow(const size_type howmuch) const;

Missing documentation.
Usually unsafe and wastefull. See similar comments for SBuf.

After reading MemBuf.cc: MemBlob cannot grow, apparently! Replace this 
with something like willFit(sizeToAppend) or canAppend(sizeToAppend).


s/howmuch/byHowMuch/ or s/howmuch/toHowMuch/, depending on what this does.


 _SQUID_INLINE_ bool canAppend(char const * bufEnd, const 
size_type howmuch) const;


Missing documentation.
Especially since it is not clear why canAppend() needs a pointer.
Looking at the .cci implementation, I question why would the caller
keep the bufEnd pointer if MemBlob manages it already? I do not think 
this call or duplicated bufEnd pointer is needed.



 _SQUID_INLINE_ void append(const char *S, const size_type Ssize);

Missing documentation.



Comments for MemBlob.cci r9472:

* Please move methods longer than one line to .cc.

* Please move public method comments to .h per current documentation rules.

* MemBlob::append() should check boundaries and throw.




Comments for MemBlob.cc r9472:

 #include config.h
 #include MemBlob.h
 #include Debug.h

M and D are out of order per current #include