Re: [squid-users] squid smp fails -k reconfigure

2014-06-06 Thread Amos Jeffries
On 6/06/2014 8:44 p.m., Amos Jeffries wrote:
> Hi Fernando
>  The answer to your repeated "why" questions is that Squid is a huge
> piece of software and the SMP changes are relatively new and incomplete.
> 
> Thank you for finding this bug. I'm forwarding to squid-dev where
> someone working on SMP may be able to help.
> 
> Amos
> 
> On 6/06/2014 4:54 a.m., Fernando Lozano wrote:
>> Hi there,
>>
>> Since I enabled SMP mode on my squid 3.4.3 server, reconfiguring is not
>> working consitently. Here's the relevant log entries:
>>
>> --
>> 2014/06/02 11:35:37| Set Current Directory to /cache
>> 2014/06/02 11:35:37 kid6| Reconfiguring Squid Cache (version 3.4.3)...
>> 2014/06/02 11:35:37 kid6| Logfile: closing log
>> stdio:/var/log/squid/access.log
>> 2014/06/02 11:35:37 kid5| Reconfiguring Squid Cache (version 3.4.3)...
>> ...
>> 2014/06/02 11:35:37 kid6| ERROR opening swap log
>> /cache/worker6/swap.state: (2) No such file or directory
>> 2014/06/02 11:35:37 kid5| ERROR opening swap log
>> /cache/worker5/swap.state: (2) No such file or directory
>> 2014/06/02 11:35:37 kid5| storeDirWriteCleanLogs: Starting...
>> 2014/06/02 11:35:37 kid5| log.clean.start() failed for dir #1
>> 2014/06/02 11:35:37 kid5|   Finished.  Wrote 0 entries.
>> 2014/06/02 11:35:37 kid5|   Took 0.00 seconds (  0.00 entries/sec).
>> FATAL: UFSSwapDir::openLog: Failed to open swap log.
>> Squid Cache (Version 3.4.3): Terminated abnormally.
>> FATAL: UFSSwapDir::openLog: Failed to open swap log.
>> Squid Cache (Version 3.4.3): Terminated abnormally.
>> --
>>
>> I find very strange that workers 6 and 5 try to get aufs cache stores.
>> They are supposed to be the rock store disker and the coordinator! My
>> squid.conf has:
>>
>> workers 4
>> cache_mem 6144 MB
>> cache_dir rock /cache/shared 3 min-size=1 max-size=31000
>> max-swap-rate=250 swap-timeout=350
>> cache_dir aufs /cache/worker${process_number} 25000 16 256
>> min-size=31001 max-size=346030080
>> logfile_rotate 4
>>
>> Would squid be having troubles with my cache_mem and cache_dir big sizes?
>>
>> Is squid -k reconfigure working well for everyone else with SMP?
>>
>> Other strange entries, from earlier in the cache.log:
>> -
>> 2014/06/01 03:13:05 kid5| Set Current Directory to /cache
>> 2014/06/01 03:13:05 kid5| Starting Squid Cache version 3.4.3 for
>> x86_64-redhat-linux-gnu...
>> 2014/06/01 03:13:05 kid5| Process ID 23990
>> 2014/06/01 03:13:05 kid5| Process Roles: disker
>> 2014/06/01 03:13:05 kid5| With 65536 file descriptors available
>> 2014/06/01 03:13:05 kid5| Initializing IP Cache...
>> 2014/06/01 03:13:05 kid5| DNS Socket created at 0.0.0.0, FD 7
>> 2014/06/01 03:13:05 kid5| Adding nameserver 200.20.212.75 from
>> /etc/resolv.conf
>> 2014/06/01 03:13:05 kid5| Adding nameserver 200.20.212.99 from
>> /etc/resolv.conf
>> 2014/06/01 03:13:05 kid5| Adding domain inmetro.gov.br from
>> /etc/resolv.conf
>> 2014/06/01 03:13:05 kid5| Adding domain inmetro.gov.br from
>> /etc/resolv.conf
>> 2014/06/01 03:13:05 kid5| helperOpenServers: Starting 10/100
>> 'basic_ldap_auth' processes
>> -
>>
>> If kid5 is a disker, why does it setups up dns resolver and ldap auth
>> helpers? It looks like disker and coordinator try to process all
>> squid.conf directives, even when they are supposed not to do any
>> network-related stuff.
>>
>> Should I try to "hide" those directives from them?
>>
>> I also got something strange on shutdown:
>>
>> 
>> 2014/06/02 14:36:47| Set Current Directory to /cache
>> 2014/06/02 14:36:47 kid6| Preparing for shutdown after 0 requests
>> 2014/06/02 14:36:47 kid6| Waiting 5 seconds for active connections to
>> finish
>> ...
>> 2014/06/02 14:36:53 kid6| Shutting down...
>> 2014/06/02 14:36:53 kid6| Not currently OK to rewrite swap log.
>> 2014/06/02 14:36:53 kid6| storeDirWriteCleanLogs: Operation aborted.
>> -
>>
>> What means "not OK to rewrite swap log"? kid6 is the coordinator, it
>> shoud not mess with cache dirs!
>>
>>
>> []s, Fernando Lozano
>>
> 



Re: Squid SMP on MacOS

2013-02-24 Thread Robert Collins
On 25 February 2013 18:24, Alex Rousskov
 wrote:
> On 02/24/2013 10:02 PM, Amos Jeffries wrote:
>
>> I'm trying to get the MacOS builds of Squid going again but having some
>> problems with shm_open() in the Rock storage unit-tests.
>>
>> 1) MacOS defines the max name length we can pass to shm_open() at 30
>> bytes. "/squid-testRock__testRockSearch" being 35 or so bytes.
>>   Cutting the definition in testRock.cc down so it becomes
>> "/squid-testRock_Search" resolves that, but then we hit (2).
>
> That TESTDIR name is wrong because it is used for more than just
> "search" testing. I bet the Rock name mimicked the UFS test name, but
> the UFS name is wrong too, for the same reason. We should use
> "cppUnitTestRock" and "cppUnitTestUfs" or something similarly unique and
> short, I guess.

We should use a random name; squidtest-10-bytes-of-entropy should do
it. Random because we don't want tests running in parallel to step on
each other on jenkins slaves.

>
>> 2) With the short string above and the current settings sent to
>> shm_open() in src/ipc/mem/Segment.cc line 73 MacOS shm_open() starts
>> responding with EINVAL.
>
>> theFD = shm_open(theName.termedBuf(), O_CREAT | O_RDWR | O_TRUNC,
>>  S_IRUSR | S_IWUSR);
>
>
> Sounds like some of the five shm_open() flags we are using successfully
> elsewhere do not work on MacOS. I do not know which flag(s) do not work,
> and we have no MacOS boxes in the lab, so we cannot experiment or read
> documentation.
>
> I assume shared segment opening fails with similar symptoms when used
> outside of unit tests (e.g., with a shared memory cache)? If so, please
> feel free to disable shared memory support on MacOS (do not define
> HAVE_SHM?) until somebody who needs it can find the right combination of
> flags.

+1

-Rob
-- 
Robert Collins 
Distinguished Technologist
HP Cloud Services


Re: Squid SMP on MacOS

2013-02-24 Thread Alex Rousskov
On 02/24/2013 10:02 PM, Amos Jeffries wrote:

> I'm trying to get the MacOS builds of Squid going again but having some
> problems with shm_open() in the Rock storage unit-tests.
> 
> 1) MacOS defines the max name length we can pass to shm_open() at 30
> bytes. "/squid-testRock__testRockSearch" being 35 or so bytes.
>   Cutting the definition in testRock.cc down so it becomes
> "/squid-testRock_Search" resolves that, but then we hit (2).

That TESTDIR name is wrong because it is used for more than just
"search" testing. I bet the Rock name mimicked the UFS test name, but
the UFS name is wrong too, for the same reason. We should use
"cppUnitTestRock" and "cppUnitTestUfs" or something similarly unique and
short, I guess.


> 2) With the short string above and the current settings sent to
> shm_open() in src/ipc/mem/Segment.cc line 73 MacOS shm_open() starts
> responding with EINVAL.

> theFD = shm_open(theName.termedBuf(), O_CREAT | O_RDWR | O_TRUNC,
>  S_IRUSR | S_IWUSR);


Sounds like some of the five shm_open() flags we are using successfully
elsewhere do not work on MacOS. I do not know which flag(s) do not work,
and we have no MacOS boxes in the lab, so we cannot experiment or read
documentation.

I assume shared segment opening fails with similar symptoms when used
outside of unit tests (e.g., with a shared memory cache)? If so, please
feel free to disable shared memory support on MacOS (do not define
HAVE_SHM?) until somebody who needs it can find the right combination of
flags.


Thank you,

Alex.



Squid SMP on MacOS

2013-02-24 Thread Amos Jeffries
I'm trying to get the MacOS builds of Squid going again but having some 
problems with shm_open() in the Rock storage unit-tests.


1) MacOS defines the max name length we can pass to shm_open() at 30 
bytes. "/squid-testRock__testRockSearch" being 35 or so bytes.
  Cutting the definition in testRock.cc down so it becomes 
"/squid-testRock_Search" resolves that, but then we hit (2).


2) With the short string above and the current settings sent to 
shm_open() in src/ipc/mem/Segment.cc line 73 MacOS shm_open() starts 
responding with EINVAL.


  Any ideas?


Amos


Re: Squid-SMP problems in current trunk

2011-08-08 Thread Amos Jeffries

On Mon, 08 Aug 2011 12:50:15 -0600, Alex Rousskov wrote:

On 07/15/2011 10:11 PM, Amos Jeffries wrote:
I'm not very certain about SMP listening sockets, which process(es) 
are
safe to close() on reconfigure/shutdown? the unsafe ones must do 
fd=-1
to abandon the FD information explicitly before the conn object 
destructs.


Just for the record, all listening processes can close their FDs in 
SMP
mode. The underlying descriptor will be really closed when the last 
user

closes it. Thus, there should be no need to coordinate closing of
individual descriptors among SMP kids.

HTH,

Alex.


Excellent thank you. That simplifies things a lot.

Amos


Re: Squid-SMP problems in current trunk

2011-08-08 Thread Alex Rousskov
On 07/15/2011 10:11 PM, Amos Jeffries wrote:
> I'm not very certain about SMP listening sockets, which process(es) are
> safe to close() on reconfigure/shutdown? the unsafe ones must do fd=-1
> to abandon the FD information explicitly before the conn object destructs.

Just for the record, all listening processes can close their FDs in SMP
mode. The underlying descriptor will be really closed when the last user
closes it. Thus, there should be no need to coordinate closing of
individual descriptors among SMP kids.

HTH,

Alex.



Re: Squid-SMP problems in current trunk

2011-07-20 Thread Tsantilas Christos

On 07/20/2011 06:52 AM, Amos Jeffries wrote:

yay!

I think this is the correct and complete fix for bug 3264. If you are
happy that it works without bringing up more issues it can go in as the
fix for that bug ASAP.


OK I commit it to trunk, forgot the --fixes argument in "bzr commit"




Amos



Re: Squid-SMP problems in current trunk

2011-07-19 Thread Amos Jeffries

On Tue, 19 Jul 2011 20:53:54 +0300, Tsantilas Christos wrote:

On 07/19/2011 01:52 PM, Amos Jeffries wrote:




see attached patch. I've no idea if this will even build, but its 
what I

am hoping will work. It assumes the SharedListenResponse::conn -->
SharedListenResponse::fd change from bug 3264 has also been made.


This patch is based in your patch. Also includes the
SharedListenResponse::conn -->  SharedListenResponse::fd change from
bug 3264 .

The change over your patch is that it sets the StartListeningCb::conn
member to listenConn, in the case the squid-SMP used, inside the
Ipc::StartListening() function.


oops. Fine.



This patch looks that solves the squid-SMP related problems.



yay!

I think this is the correct and complete fix for bug 3264. If you are 
happy that it works without bringing up more issues it can go in as the 
fix for that bug ASAP.


Amos



Re: Squid-SMP problems in current trunk

2011-07-19 Thread Tsantilas Christos


The change over your patch is that it sets the StartListeningCb::conn 
member to listenConn, in the case the squid-SMP used, inside the 
Ipc::StartListening() function.


This patch looks that solves the squid-SMP related problems.




Amos


=== modified file 'src/ipc/Coordinator.cc'
--- src/ipc/Coordinator.cc	2011-06-18 00:12:51 +
+++ src/ipc/Coordinator.cc	2011-07-15 14:07:47 +
@@ -128,7 +128,7 @@
request.params.addr << " to kid" << request.requestorId <<
" mapId=" << request.mapId);
 
-SharedListenResponse response(c, errNo, request.mapId);
+SharedListenResponse response(c->fd, errNo, request.mapId);
 TypedMsgHdr message;
 response.pack(message);
 SendMessage(MakeAddr(strandAddrPfx, request.requestorId), message);

=== modified file 'src/ipc/SharedListen.cc'
--- src/ipc/SharedListen.cc	2011-05-13 08:13:01 +
+++ src/ipc/SharedListen.cc	2011-07-19 16:15:23 +
@@ -82,18 +82,17 @@
 }
 
 
-Ipc::SharedListenResponse::SharedListenResponse(const Comm::ConnectionPointer &c, int anErrNo, int aMapId):
-conn(c), errNo(anErrNo), mapId(aMapId)
+Ipc::SharedListenResponse::SharedListenResponse(int aFd, int anErrNo, int aMapId):
+fd(aFd), errNo(anErrNo), mapId(aMapId)
 {
 }
 
 Ipc::SharedListenResponse::SharedListenResponse(const TypedMsgHdr &hdrMsg):
-conn(NULL), errNo(0), mapId(-1)
+fd(-1), errNo(0), mapId(-1)
 {
 hdrMsg.checkType(mtSharedListenResponse);
 hdrMsg.getPod(*this);
-conn = new Comm::Connection;
-conn->fd = hdrMsg.getFd();
+fd = hdrMsg.getFd();
 // other conn details are passed in OpenListenerParams and filled out by SharedListenJoin()
 }
 
@@ -101,7 +100,7 @@
 {
 hdrMsg.setType(mtSharedListenResponse);
 hdrMsg.putPod(*this);
-hdrMsg.putFd(conn->fd);
+hdrMsg.putFd(fd);
 }
 
 
@@ -127,10 +126,8 @@
 
 void Ipc::SharedListenJoined(const SharedListenResponse &response)
 {
-Comm::ConnectionPointer c = response.conn;
-
 // Dont debugs c fully since only FD is filled right now.
-debugs(54, 3, HERE << "got listening FD " << c->fd << " errNo=" <<
+debugs(54, 3, HERE << "got listening FD " << response.fd << " errNo=" <<
response.errNo << " mapId=" << response.mapId);
 
 Must(TheSharedListenRequestMap.find(response.mapId) != TheSharedListenRequestMap.end());
@@ -138,22 +135,24 @@
 Must(por.callback != NULL);
 TheSharedListenRequestMap.erase(response.mapId);
 
-if (Comm::IsConnOpen(c)) {
+StartListeningCb *cbd = dynamic_cast(por.callback->getDialer());
+assert(cbd && cbd->conn != NULL);
+Must(cbd && cbd->conn != NULL);
+cbd->conn->fd = response.fd;
+
+if (Comm::IsConnOpen(cbd->conn)) {
 OpenListenerParams &p = por.params;
-c->local = p.addr;
-c->flags = p.flags;
+cbd->conn->local = p.addr;
+cbd->conn->flags = p.flags;
 // XXX: leave the comm AI stuff to comm_import_opened()?
 struct addrinfo *AI = NULL;
 p.addr.GetAddrInfo(AI);
 AI->ai_socktype = p.sock_type;
 AI->ai_protocol = p.proto;
-comm_import_opened(c, FdNote(p.fdNote), AI);
+comm_import_opened(cbd->conn, FdNote(p.fdNote), AI);
 p.addr.FreeAddrInfo(AI);
 }
 
-StartListeningCb *cbd = dynamic_cast(por.callback->getDialer());
-Must(cbd);
-cbd->conn = c;
 cbd->errNo = response.errNo;
 cbd->handlerSubscription = por.params.handlerSubscription;
 ScheduleCallHere(por.callback);

=== modified file 'src/ipc/SharedListen.h'
--- src/ipc/SharedListen.h	2010-11-26 09:57:06 +
+++ src/ipc/SharedListen.h	2011-07-15 14:04:24 +
@@ -60,12 +60,12 @@
 class SharedListenResponse
 {
 public:
-SharedListenResponse(const Comm::ConnectionPointer &c, int errNo, int mapId);
+SharedListenResponse(int fd, int errNo, int mapId);
 explicit SharedListenResponse(const TypedMsgHdr &hdrMsg); ///< from recvmsg()
 void pack(TypedMsgHdr &hdrMsg) const; ///< prepare for sendmsg()
 
 public:
-Comm::ConnectionPointer conn; ///< opened listening socket or -1
+int fd; ///< opened listening socket or -1
 int errNo; ///< errno value from comm_open_sharedListen() call
 int mapId; ///< to map future response to the requestor's callback
 };

=== modified file 'src/ipc/StartListening.cc'
--- src/ipc/StartListening.cc	2011-01-31 11:50:28 +
+++ src/ipc/StartListening.cc	2011-07-19 16:16:19 +
@@ -30,6 +30,10 @@
 Ipc::StartListening(int sock_type, int proto, const Comm::ConnectionPointer &listenConn,
 FdNoteId fdNote, AsyncCall::Pointer &callback)
 {
+StartListeningCb *cbd = dynamic_cast(call

Re: Squid-SMP problems in current trunk

2011-07-19 Thread Amos Jeffries

On 19/07/11 22:04, Tsantilas Christos wrote:

On 07/19/2011 05:51 AM, Amos Jeffries wrote:

On Mon, 18 Jul 2011 20:03:22 +0300, Christos Tsantilas wrote:

On 07/18/2011 01:04 PM, Tsantilas Christos wrote:

On 07/16/2011 07:11 AM, Amos Jeffries wrote:

On 16/07/11 07:43, Tsantilas Christos wrote:

But now I am hitting the following assertion:
assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of
the
listening sockets.


"fd < 0" indicates something is failing to call conn->close() when
abandoning an open socket.

NP: close() is reentrant. So components can and should always close()
when they are sure the FD/socket must no longer be used.

I'm not very certain about SMP listening sockets, which process(es)
are
safe to close() on reconfigure/shutdown? the unsafe ones must do fd=-1
to abandon the FD information explicitly before the conn object
destructs.

What situations are you hitting "fd < 0" Christos?


I am hitting this assertion on kids immediately after start.
Looks that the connection looses all references on its self and
deleted.
The socked of the connection is a listening socket.


When a kid starting get from the parent the filedescriptors of
listening sockets, and creates a Comm::Connection objects for these
filedescriptors.

What needed here is to assign the created Comm::Connection objects to
the related http_port_list object (to increase the refcount and keep
the connection open)

A way to implement the above is to use the ListeningStartedDialer
class implemented in client_side.cc file.

I am attaching a patch which solves this problem and allow smp-squid
start and serve HTTP requests, but there are similar or related bugs
in icp and snmp. When I am defining icp_port and snmp_port in
squid.conf the smp-squid does not start.


In this patch the dialer has no "conn" object to assign.


There is a Ipc::StartListeningCb::conn object. The
ListeningStartedDialer is a kid class of StartListeningCb. The conn
object exists and it is valid.
The patch is working but of course is incomplete. I post it just to
point the problem.

Currently the ListeningStartedDialer dialer is the only object we have
which maps the connection (or the FD better) to a listening port.




NOTE:
clientHttpConnectionsOpen() does "s->listenConn = new
Comm::Connection()" before starting IPC.
When dialling post-IPC happens
ListeningStartedDialer::portCfg->listenConn is still a ref of that
object.

I'm thinking:
If OpenListenerParams adds a member to hold the listenConn created by
clientHttpConnectionsOpen() it can be set with the response.fd in
Ipc::SharedListenJoined().


Just take a look inside Ipc::SharedListenJoined(), in the case of SMP
just sent to the kids the local address, flags and socket type, and
return. These are informations which can be send through IPC,pipes etc.
Unfortunately the OpenListenerParams copied with memcpy in a message
send thought ICP. How can we send a refcounted object like the
listenConn through a pipe? This will cause bugs similar to the bug 3264.
We can send only the FD.

I believe that the only we can do is to add a new parameter of type
Comm::Connection to the Ipc::JoinSharedListen function to pass the
listenConn object and add it to the related PendingOpenRequest.



We may or may not then have to do the c=new Comm::Connection() in IPC.


OK, maybe, we can just set the FD and other informations to the existing
listenConn...




The best point to set anything coming back anyway seems to be in
Ipc::SharedListenJoined() where the SharedlistenResponse,


It is the best, because looks that it will solve the related problems in
snmp and icp too...
But, at this time we do not have the listenConn object inside
Ipc::SharedListenJoined() and we do not have enough information to
retrieve it.
I will try to implement what I am proposing above (passing listenConn to
JoinsSharedListen...), it looks simple.

>


SharedListenResponse needs to use getInt(this->fd) or similar instead of
putPod/getPod(*this) anyway. Or at the very least give is a sub-struct
that can be documented as raw socket bytes and get*(&this->data_).
memcpy re-init of a whole class with methods straight from the socket
does not seem like a good idea.


Please see the patch in the bug 3264:
http://bugs.squid-cache.org/show_bug.cgi?id=3264
The bug described here is different to the bugs we are discussing here
but they are related. I believe just applying the patch I post here or a
similar is enough.



see attached patch. I've no idea if this will even build, but its what I
am hoping will work. It assumes the SharedListenResponse::conn --> 
SharedListenResponse::fd change from bug 3264 has also been made.



Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.14
  Beta testers wanted for 3.2.0.9
=== modified file 'src/ipc/SharedListen.cc'
--- src/ipc/SharedListen.cc	2011-05-13 08:13:01 +
+++ src/ipc/SharedListen.cc	2011-07-19 10:46:58 +
@@ -127,10 +127,8 @@
 
 void 

Re: Squid-SMP problems in current trunk

2011-07-19 Thread Tsantilas Christos

On 07/19/2011 05:51 AM, Amos Jeffries wrote:

On Mon, 18 Jul 2011 20:03:22 +0300, Christos Tsantilas wrote:

On 07/18/2011 01:04 PM, Tsantilas Christos wrote:

On 07/16/2011 07:11 AM, Amos Jeffries wrote:

On 16/07/11 07:43, Tsantilas Christos wrote:

But now I am hitting the following assertion:
assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of the
listening sockets.


"fd < 0" indicates something is failing to call conn->close() when
abandoning an open socket.

NP: close() is reentrant. So components can and should always close()
when they are sure the FD/socket must no longer be used.

I'm not very certain about SMP listening sockets, which process(es) are
safe to close() on reconfigure/shutdown? the unsafe ones must do fd=-1
to abandon the FD information explicitly before the conn object
destructs.

What situations are you hitting "fd < 0" Christos?


I am hitting this assertion on kids immediately after start.
Looks that the connection looses all references on its self and deleted.
The socked of the connection is a listening socket.


When a kid starting get from the parent the filedescriptors of
listening sockets, and creates a Comm::Connection objects for these
filedescriptors.

What needed here is to assign the created Comm::Connection objects to
the related http_port_list object (to increase the refcount and keep
the connection open)

A way to implement the above is to use the ListeningStartedDialer
class implemented in client_side.cc file.

I am attaching a patch which solves this problem and allow smp-squid
start and serve HTTP requests, but there are similar or related bugs
in icp and snmp. When I am defining icp_port and snmp_port in
squid.conf the smp-squid does not start.


In this patch the dialer has no "conn" object to assign.


There is a Ipc::StartListeningCb::conn object. The 
ListeningStartedDialer is a kid class of StartListeningCb. The conn 
object exists and it is valid.
The patch is working but of course is incomplete. I post it just to 
point the problem.


Currently the ListeningStartedDialer dialer is the only object we have 
which maps the connection (or the FD better) to a listening port.





NOTE:
clientHttpConnectionsOpen() does "s->listenConn = new
Comm::Connection()" before starting IPC.
When dialling post-IPC happens
ListeningStartedDialer::portCfg->listenConn is still a ref of that object.

I'm thinking:
If OpenListenerParams adds a member to hold the listenConn created by
clientHttpConnectionsOpen() it can be set with the response.fd in
Ipc::SharedListenJoined().


Just take a look inside Ipc::SharedListenJoined(), in the case of SMP 
just sent to the kids the local address, flags and socket type, and 
return. These are informations which can be send through IPC,pipes etc.
Unfortunately the OpenListenerParams copied with memcpy in a message 
send thought ICP. How can we send a refcounted  object like the 
listenConn through a pipe? This will cause bugs similar to the bug 3264.

We can send only the FD.

I believe that the only we can do is to add a new parameter of type 
Comm::Connection to the Ipc::JoinSharedListen function to pass the 
listenConn object and add it to the related PendingOpenRequest.




We may or may not then have to do the c=new Comm::Connection() in IPC.


OK, maybe, we can just set the FD and other informations to the existing 
listenConn...





The best point to set anything coming back anyway seems to be in
Ipc::SharedListenJoined() where the SharedlistenResponse,


It is the best, because looks that it will solve the related problems in 
snmp and icp too...
But, at this time we do not have the listenConn object inside 
Ipc::SharedListenJoined() and we do not have enough information to 
retrieve it.
I will try to implement what I am proposing above (passing listenConn to 
JoinsSharedListen...), it looks simple.





SharedListenResponse needs to use getInt(this->fd) or similar instead of
putPod/getPod(*this) anyway. Or at the very least give is a sub-struct
that can be documented as raw socket bytes and get*(&this->data_).
memcpy re-init of a whole class with methods straight from the socket
does not seem like a good idea.


Please see the patch in the bug 3264:
   http://bugs.squid-cache.org/show_bug.cgi?id=3264
The bug described  here is different to the bugs we are discussing here 
but they are related. I believe just applying the patch I post here or a 
similar is enough.




Amos





Re: Squid-SMP problems in current trunk

2011-07-18 Thread Mohsen Pahlevanzadeh
On Tue, 2011-07-19 at 14:51 +1200, Amos Jeffries wrote:
> On Mon, 18 Jul 2011 20:03:22 +0300, Christos Tsantilas wrote:
> > On 07/18/2011 01:04 PM, Tsantilas Christos wrote:
> >> On 07/16/2011 07:11 AM, Amos Jeffries wrote:
> >>> On 16/07/11 07:43, Tsantilas Christos wrote:
>  But now I am hitting the following assertion:
>  assertion failed: Connection.cc:29: "fd < 0"
>  The later problem looks that it has to do with file descriptors of 
>  the
>  listening sockets.
> >>>
> >>> "fd < 0" indicates something is failing to call conn->close() when
> >>> abandoning an open socket.
> >>>
> >>> NP: close() is reentrant. So components can and should always 
> >>> close()
> >>> when they are sure the FD/socket must no longer be used.
> >>>
> >>> I'm not very certain about SMP listening sockets, which process(es) 
> >>> are
> >>> safe to close() on reconfigure/shutdown? the unsafe ones must do 
> >>> fd=-1
> >>> to abandon the FD information explicitly before the conn object
> >>> destructs.
> >>>
> >>> What situations are you hitting "fd < 0" Christos?
> >>
> >> I am hitting this assertion on kids immediately after start.
> >> Looks that the connection looses all references on its self and 
> >> deleted.
> >> The socked of the connection is a listening socket.
> >
> > When a kid starting get from the parent the filedescriptors  of
> > listening sockets, and creates a Comm::Connection objects for these
> > filedescriptors.
> >
> > What needed here is to assign the created Comm::Connection objects to
> > the related http_port_list object (to increase the refcount and keep
> > the connection open)
> >
> > A way to implement the above is to use the ListeningStartedDialer
> > class implemented in client_side.cc file.
> >
> > I am attaching a patch which solves this problem and allow smp-squid
> > start and serve HTTP requests, but there are similar or related bugs
> > in icp and snmp. When I am defining icp_port and snmp_port in
> > squid.conf the smp-squid does not start.
> 
>  In this patch the dialer has no "conn" object to assign.
> 
>  NOTE:
>   clientHttpConnectionsOpen() does "s->listenConn = new 
>  Comm::Connection()" before starting IPC.
>   When dialling post-IPC happens 
>  ListeningStartedDialer::portCfg->listenConn is still a ref of that 
>  object.
> 
>  I'm thinking:
>   If OpenListenerParams adds a member to hold the listenConn created by 
>  clientHttpConnectionsOpen() it can be set with the response.fd in 
>  Ipc::SharedListenJoined().
>   We may or may not then have to do the c=new Comm::Connection() in IPC.
> 
> 
>  The best point to set anything coming back anyway seems to be in 
>  Ipc::SharedListenJoined() where the SharedlistenResponse, 
>  OpenListenerParams, and StartListeningCb are all available.
> 
> 
>  SharedListenResponse needs to use getInt(this->fd) or similar instead 
>  of putPod/getPod(*this) anyway. Or at the very least give is a 
>  sub-struct that can be documented as raw socket bytes and 
>  get*(&this->data_). memcpy re-init of a whole class with methods 
>  straight from the socket does not seem like a good idea.
> 
>  Amos
Very very thank dear Amos.
--mohsen


signature.asc
Description: This is a digitally signed message part


Re: Squid-SMP problems in current trunk

2011-07-18 Thread Amos Jeffries

On Mon, 18 Jul 2011 20:03:22 +0300, Christos Tsantilas wrote:

On 07/18/2011 01:04 PM, Tsantilas Christos wrote:

On 07/16/2011 07:11 AM, Amos Jeffries wrote:

On 16/07/11 07:43, Tsantilas Christos wrote:

But now I am hitting the following assertion:
assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of 
the

listening sockets.


"fd < 0" indicates something is failing to call conn->close() when
abandoning an open socket.

NP: close() is reentrant. So components can and should always 
close()

when they are sure the FD/socket must no longer be used.

I'm not very certain about SMP listening sockets, which process(es) 
are
safe to close() on reconfigure/shutdown? the unsafe ones must do 
fd=-1

to abandon the FD information explicitly before the conn object
destructs.

What situations are you hitting "fd < 0" Christos?


I am hitting this assertion on kids immediately after start.
Looks that the connection looses all references on its self and 
deleted.

The socked of the connection is a listening socket.


When a kid starting get from the parent the filedescriptors  of
listening sockets, and creates a Comm::Connection objects for these
filedescriptors.

What needed here is to assign the created Comm::Connection objects to
the related http_port_list object (to increase the refcount and keep
the connection open)

A way to implement the above is to use the ListeningStartedDialer
class implemented in client_side.cc file.

I am attaching a patch which solves this problem and allow smp-squid
start and serve HTTP requests, but there are similar or related bugs
in icp and snmp. When I am defining icp_port and snmp_port in
squid.conf the smp-squid does not start.


In this patch the dialer has no "conn" object to assign.

NOTE:
 clientHttpConnectionsOpen() does "s->listenConn = new 
Comm::Connection()" before starting IPC.
 When dialling post-IPC happens 
ListeningStartedDialer::portCfg->listenConn is still a ref of that 
object.


I'm thinking:
 If OpenListenerParams adds a member to hold the listenConn created by 
clientHttpConnectionsOpen() it can be set with the response.fd in 
Ipc::SharedListenJoined().

 We may or may not then have to do the c=new Comm::Connection() in IPC.


The best point to set anything coming back anyway seems to be in 
Ipc::SharedListenJoined() where the SharedlistenResponse, 
OpenListenerParams, and StartListeningCb are all available.



SharedListenResponse needs to use getInt(this->fd) or similar instead 
of putPod/getPod(*this) anyway. Or at the very least give is a 
sub-struct that can be documented as raw socket bytes and 
get*(&this->data_). memcpy re-init of a whole class with methods 
straight from the socket does not seem like a good idea.


Amos


Re: Squid-SMP problems in current trunk

2011-07-18 Thread Christos Tsantilas

On 07/18/2011 01:04 PM, Tsantilas Christos wrote:

On 07/16/2011 07:11 AM, Amos Jeffries wrote:

On 16/07/11 07:43, Tsantilas Christos wrote:

But now I am hitting the following assertion:
assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of the
listening sockets.


"fd < 0" indicates something is failing to call conn->close() when
abandoning an open socket.

NP: close() is reentrant. So components can and should always close()
when they are sure the FD/socket must no longer be used.

I'm not very certain about SMP listening sockets, which process(es) are
safe to close() on reconfigure/shutdown? the unsafe ones must do fd=-1
to abandon the FD information explicitly before the conn object
destructs.

What situations are you hitting "fd < 0" Christos?


I am hitting this assertion on kids immediately after start.
Looks that the connection looses all references on its self and deleted.
The socked of the connection is a listening socket.


When a kid starting get from the parent the filedescriptors  of 
listening sockets, and creates a Comm::Connection objects for these 
filedescriptors.


What needed here is to assign the created Comm::Connection objects to 
the related http_port_list object (to increase the refcount and keep the 
connection open)


A way to implement the above is to use the ListeningStartedDialer class 
implemented in client_side.cc file.


I am attaching a patch which solves this problem and allow smp-squid 
start and serve HTTP requests, but there are similar or related bugs in 
icp and snmp. When I am defining icp_port and snmp_port in squid.conf 
the smp-squid does not start.





This is a backtrace:
#0 0x7f0c3210ea75 in *__GI_raise (sig=)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x7f0c321125c0 in *__GI_abort () at abort.c:92
#2 0x005128f5 in xassert (msg=0x4955 ,
file=0x1208120 "\230\236E2\f\177", line=29) at debug.cc:567
#3 0x00656239 in ~Connection (this=0x10fb530,
__in_chrg=, __vtt_parm=)
at Connection.cc:29
#4 0x004fbb04 in ~ListeningStartedDialer (this=0x10e6fd0,
__in_chrg=, __vtt_parm=)
at client_side.cc:144
#5 ~AsyncCallT (this=0x10e6fd0, __in_chrg=,
__vtt_parm=) at ../src/base/AsyncCall.h:133
#6 0x00608dcf in RefCount::dereference (
this=) at ../../include/RefCount.h:96
#7 ~RefCount (this=) at ../../include/RefCount.h:52
#8 AsyncCallQueue::fireNext (this=)
at AsyncCallQueue.cc:55
#9 0x00608ef0 in AsyncCallQueue::fire (this=0xed3d90)
at AsyncCallQueue.cc:40
#10 0x005204bc in EventLoop::runOnce (this=0x7fffc79e0080)
at EventLoop.cc:131
#11 0x00520598 in EventLoop::run (this=0x7fffc79e0080)
at EventLoop.cc:95
#12 0x005798a5 in SquidMain (argc=,
argv=0x7fffc79e0248) at main.cc:1506
#13 0x0057a0a6 in SquidMainSafe (argc=18773, argv=0x4955)
at main.cc:1239





Amos






--
Tsantilas Christos
Network and Systems Engineer
email:chris...@chtsanti.net
  web:http://www.chtsanti.net
Phone:+30 6977678842
=== modified file 'src/client_side.cc'
--- src/client_side.cc	2011-06-23 08:31:56 +
+++ src/client_side.cc	2011-07-18 16:24:16 +
@@ -136,41 +136,44 @@
 #endif
 
 #if LINGERING_CLOSE
 #define comm_close comm_lingering_close
 #endif
 
 /// dials clientListenerConnectionOpened call
 class ListeningStartedDialer: public CallDialer, public Ipc::StartListeningCb
 {
 public:
 typedef void (*Handler)(http_port_list *portCfg, const Ipc::FdNoteId note, const Subscription::Pointer &sub);
 ListeningStartedDialer(Handler aHandler, http_port_list *aPortCfg, const Ipc::FdNoteId note, const Subscription::Pointer &aSub):
 handler(aHandler), portCfg(aPortCfg), portTypeNote(note), sub(aSub) {}
 
 virtual void print(std::ostream &os) const {
 startPrint(os) <<
 ", " << FdNote(portTypeNote) << " port=" << (void*)portCfg << ')';
 }
 
 virtual bool canDial(AsyncCall &) const { return true; }
-virtual void dial(AsyncCall &) { (handler)(portCfg, portTypeNote, sub); }
+virtual void dial(AsyncCall &) {
+portCfg->listenConn = conn;
+(handler)(portCfg, portTypeNote, sub);
+}
 
 public:
 Handler handler;
 
 private:
 http_port_list *portCfg;   ///< from Config.Sockaddr.http
 Ipc::FdNoteId portTypeNote;///< Type of IPC socket being opened
 Subscription::Pointer sub; ///< The handler to be subscribed for this connetion listener
 };
 
 static void clientListenerConnectionOpened(http_port_list *s, const Ipc::FdNoteId portTypeNote, const Subscription::Pointer &sub);
 
 /* our socket-related context */
 
 
 CBDATA_CLASS_INIT(ClientSocketContext);
 
 void *
 ClientSocketContext::operator new (size_t byteCount)
 {



Re: Squid-SMP problems in current trunk

2011-07-18 Thread Tsantilas Christos

On 07/16/2011 07:11 AM, Amos Jeffries wrote:

On 16/07/11 07:43, Tsantilas Christos wrote:

But now I am hitting the following assertion:
assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of the
listening sockets.


"fd < 0" indicates something is failing to call conn->close() when
abandoning an open socket.

NP: close() is reentrant. So components can and should always close()
when they are sure the FD/socket must no longer be used.

I'm not very certain about SMP listening sockets, which process(es) are
safe to close() on reconfigure/shutdown? the unsafe ones must do fd=-1
to abandon the FD information explicitly before the conn object destructs.

What situations are you hitting "fd < 0" Christos?


I am hitting this assertion on kids immediately after start.
Looks that the connection looses all references on its self and deleted.
The socked of the connection is a listening socket.

This is a backtrace:
#0  0x7f0c3210ea75 in *__GI_raise (sig=)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x7f0c321125c0 in *__GI_abort () at abort.c:92
#2  0x005128f5 in xassert (msg=0x4955 bounds>,

file=0x1208120 "\230\236E2\f\177", line=29) at debug.cc:567
#3  0x00656239 in ~Connection (this=0x10fb530,
__in_chrg=, __vtt_parm=)
at Connection.cc:29
#4  0x004fbb04 in ~ListeningStartedDialer (this=0x10e6fd0,
__in_chrg=, __vtt_parm=)
at client_side.cc:144
#5  ~AsyncCallT (this=0x10e6fd0, __in_chrg=,
__vtt_parm=) at ../src/base/AsyncCall.h:133
#6  0x00608dcf in RefCount::dereference (
this=) at ../../include/RefCount.h:96
#7  ~RefCount (this=) at ../../include/RefCount.h:52
#8  AsyncCallQueue::fireNext (this=)
at AsyncCallQueue.cc:55
#9  0x00608ef0 in AsyncCallQueue::fire (this=0xed3d90)
at AsyncCallQueue.cc:40
#10 0x005204bc in EventLoop::runOnce (this=0x7fffc79e0080)
at EventLoop.cc:131
#11 0x00520598 in EventLoop::run (this=0x7fffc79e0080)
at EventLoop.cc:95
#12 0x005798a5 in SquidMain (argc=,
argv=0x7fffc79e0248) at main.cc:1506
#13 0x0057a0a6 in SquidMainSafe (argc=18773, argv=0x4955)
at main.cc:1239





Amos




Re: Squid-SMP problems in current trunk

2011-07-15 Thread Amos Jeffries

On 16/07/11 07:43, Tsantilas Christos wrote:

Hi all,
currently the squid-smp is broken in squid-trunk.
The squid kids crashing immediately after started. The first probelm is
the "assertion failed: mem.cc:516: "MemPools[t]" " assertion which is
fixed by Amos patch "[PATCH] Allow MemPool late initialization". This
patch looks that solves this problem and should applied.


Okay. Thanks for the verify. Applied.



After applying this patch someone will hit the bug 3264:
http://bugs.squid-cache.org/show_bug.cgi?id=3264
This bug too does not allow kids process to start processing requests.
There is a patch attached here which (I believe) solves the problem.


Thank you.


But now I am hitting the following assertion:
assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of the
listening sockets.


"fd < 0" indicates something is failing to call conn->close() when 
abandoning an open socket.


NP: close() is reentrant. So components can and should always close() 
when they are sure the FD/socket must no longer be used.


I'm not very certain about SMP listening sockets, which process(es) are 
safe to close() on reconfigure/shutdown? the unsafe ones must do fd=-1 
to abandon the FD information explicitly before the conn object destructs.


What situations are you hitting "fd < 0" Christos?

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.14
  Beta testers wanted for 3.2.0.9


Squid-SMP problems in current trunk

2011-07-15 Thread Tsantilas Christos

Hi all,
  currently the squid-smp is broken in squid-trunk.
The squid kids crashing immediately after started. The first probelm is 
the "assertion failed: mem.cc:516: "MemPools[t]" " assertion which is 
fixed by Amos patch "[PATCH] Allow MemPool late initialization". This 
patch looks that solves this problem and should applied.


After applying this patch someone will hit the bug 3264:
  http://bugs.squid-cache.org/show_bug.cgi?id=3264
This bug too does not allow kids process to start processing requests.
There is a patch attached here which (I believe) solves the problem.

But now I am hitting the following assertion:
   assertion failed: Connection.cc:29: "fd < 0"
The later problem looks that it has to do with file descriptors of the 
listening sockets.


Regards,
   Christos


Re: How should I test squid-smp ??

2010-01-12 Thread Alex Rousskov
On 01/11/2010 04:49 AM, Sachin Malave wrote:

> I have multi-core(32) core server here, Now the squid-smp is running
> with two threads(schedule and dispatch), Some bugs may be there. I
> want to test squid here in my lab, please tell me a mechanism or
> strategy that must be used for rigorous testing Any tool that is
> available, because i know i have not locked everything from
> simultaneous accesses...

If you are talking about serious performance testing, I would recommend
Web Polygraph.  One set of tests that we are running now is a
multi-instance versus SMP comparison. You can configure Polygraph to
send requests to multiple HTTP proxies (e.g., multiple http_ports in
Squid) which should give you something like an upper bound for SMP
performance in a non-caching environment.

There is no single comprehensive functionality test suite for Squid,
unfortunately. "Make check" is a start but it is not going to get you
very far. Heavy performance tests, with feature-reach workloads (see
above) often expose many functionality bugs. Co-Advisor tests can be
added to probe deeper into HTTP handling stack.

When the code is relatively stable, it can be tried live on IRCache.

HTH,

Alex.



How should I test squid-smp ??

2010-01-11 Thread Sachin Malave
Hello,

I have multi-core(32) core server here, Now the squid-smp is running
with two threads(schedule and dispatch), Some bugs may be there. I
want to test squid here in my lab, please tell me a mechanism or
strategy that must be used for rigorous testing Any tool that is
available, because i know i have not locked everything from
simultaneous accesses...

--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Sachin Malave
On Wed, Nov 25, 2009 at 7:48 AM, Amos Jeffries  wrote:
> On Tue, 24 Nov 2009 16:13:37 -0700, Alex Rousskov
>  wrote:
>> On 11/20/2009 10:59 PM, Robert Collins wrote:
>>> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:
>> Q1. What are the major areas or units of asynchronous code
> execution?
>> Some of us may prefer large areas such as "http_port acceptor" or
>> "cache" or "server side". Others may root for AsyncJob as the
> largest
>> asynchronous unit of execution. These two approaches and their
>> implications differ a lot. There may be other designs worth
>> considering.
>>
>>> I'd like to let people start writing (and perf testing!) patches. To
>>> unblock people. I think the primary questions are:
>>>  - do we permit multiple approaches inside the same code base. E.g.
>>> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
>>> queues' or some such abstraction elsewhere ?
>>>     (I vote yes, but with caution: someone trying something we don't
>>> already do should keep it on a branch and really measure it well until
>>> its got plenty of buy in).
>>
>> I vote for multiple approaches at lower levels of the architecture and
>> against multiple approaches at highest level of the architecture. My Q1
>> was only about the highest levels, BTW.
>>
>> For example, I do not think it is a good idea to allow a combination of
>> OpenMP, ACE, and something else as a top-level design. Understanding,
>> supporting, and tuning such a mix would be a nightmare, IMO.
>>
>> On the other hand, using threads within some disk storage schemes while
>> using processes for things like "cache" may make a lot of sense, and we
>> already have examples of some of that working.
>>
>
> OpenMP seems almost unanimous negative by the people who know it.
>

OK


>>
>> This is why I believe that the decision of processes versus threads *at
>> the highest level* of the architecture is so important. Yes, we are,
>> can, and will use threads at lower levels. There is no argument there.
>> The question is whether we can also use threads to split Squid into
>> several instances of "major areas" like client side(s), cache(s), and
>> server side(s).
>>
>> See Henrik's email on why it is difficult to use threads at highest
>> levels. I am not convinced yet, but I do see Henrik's point, and I
>> consider the dangers he cites critical for the right Q1 answer.
>>
>>
>>>  - If we do *not* permit multiple approaches, then what approach do we
>>> want for parallelisation. E.g. a number of long lived threads that take
>>> on work, or many transient threads as particular bits of the code need
>>> threads. I favour the former (long lived 'worker' threads).
>>
>> For highest-level models, I do not think that "one job per
>> thread/process", "one call per thread/process", or any other "one little
>> short-lived something per thread/process" is a good idea. I do believe
>> we have to parallelize "major areas", and I think we should support
>> multiple instances of some of those "areas" (e.g., multiple client
>> sides). Each "major area" would be long-lived process/thread, of course.
>
> Agreed. mostly.
>
> As Rob points out the idea is for one small'ish pathway of the code to be
> run N times with different state data each time by a single thread.
>
> Sachins' initial AcceptFD thread proposal would perhapse be exemplar for
> this type of thread. Where one thread does the comm layer; accept() through
> to the scheduling call hand-off to handlers outside comm. Then goes back
> for the next accept().
>
> The only performance issue brought up was by you that its particular case
> might flood the slower main process if done first. Not all code can be done
> this way.
>
> Overheads are simply moving the state data in/out of the thread. IMO
> starting/stopping threads too often is a fairly bad idea. Most events will
> end up being grouped together into types (perhapse categorized by
> component, perhapse by client request, perhapse by pathway) with a small
> thread dedicated to handling that type of call.
>
>>
>> Again for higher-level models, I am also skeptical that it is a good
>> idea to just split Squid into N mostly non-cooperating nearly identical
>> instances. It may be the right first step, but I would like to offer
>> more than that in terms of overall performance and tunability.
>
> The answer to that is: of all the SMP models we theorize, that one is the
> only proven model so far.
> Administrators are already doing it with all the instance management
> manually handled on quad+ core machines. With a lot of performance success.
>
> In last nights discussion on IRC we covered what issues are outstanding
> from making this automatic and all are resolvable except cache index. It's
> not easily shareable between instances.
>
>>
>> I hope the above explains why I consider Q1 critical for the meant
>> "highest level" scope and why "we already use processes and threads" is
>> certainly true but irrelevant withi

Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Amos Jeffries
On Tue, 24 Nov 2009 16:13:37 -0700, Alex Rousskov
 wrote:
> On 11/20/2009 10:59 PM, Robert Collins wrote:
>> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:
> Q1. What are the major areas or units of asynchronous code
execution?
> Some of us may prefer large areas such as "http_port acceptor" or
> "cache" or "server side". Others may root for AsyncJob as the
largest
> asynchronous unit of execution. These two approaches and their
> implications differ a lot. There may be other designs worth
> considering.
> 
>> I'd like to let people start writing (and perf testing!) patches. To
>> unblock people. I think the primary questions are:
>>  - do we permit multiple approaches inside the same code base. E.g.
>> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
>> queues' or some such abstraction elsewhere ?
>> (I vote yes, but with caution: someone trying something we don't
>> already do should keep it on a branch and really measure it well until
>> its got plenty of buy in).
> 
> I vote for multiple approaches at lower levels of the architecture and
> against multiple approaches at highest level of the architecture. My Q1
> was only about the highest levels, BTW.
> 
> For example, I do not think it is a good idea to allow a combination of
> OpenMP, ACE, and something else as a top-level design. Understanding,
> supporting, and tuning such a mix would be a nightmare, IMO.
> 
> On the other hand, using threads within some disk storage schemes while
> using processes for things like "cache" may make a lot of sense, and we
> already have examples of some of that working.
> 

OpenMP seems almost unanimous negative by the people who know it.

> 
> This is why I believe that the decision of processes versus threads *at
> the highest level* of the architecture is so important. Yes, we are,
> can, and will use threads at lower levels. There is no argument there.
> The question is whether we can also use threads to split Squid into
> several instances of "major areas" like client side(s), cache(s), and
> server side(s).
> 
> See Henrik's email on why it is difficult to use threads at highest
> levels. I am not convinced yet, but I do see Henrik's point, and I
> consider the dangers he cites critical for the right Q1 answer.
> 
> 
>>  - If we do *not* permit multiple approaches, then what approach do we
>> want for parallelisation. E.g. a number of long lived threads that take
>> on work, or many transient threads as particular bits of the code need
>> threads. I favour the former (long lived 'worker' threads).
> 
> For highest-level models, I do not think that "one job per
> thread/process", "one call per thread/process", or any other "one little
> short-lived something per thread/process" is a good idea. I do believe
> we have to parallelize "major areas", and I think we should support
> multiple instances of some of those "areas" (e.g., multiple client
> sides). Each "major area" would be long-lived process/thread, of course.

Agreed. mostly.

As Rob points out the idea is for one small'ish pathway of the code to be
run N times with different state data each time by a single thread.

Sachins' initial AcceptFD thread proposal would perhapse be exemplar for
this type of thread. Where one thread does the comm layer; accept() through
to the scheduling call hand-off to handlers outside comm. Then goes back
for the next accept().

The only performance issue brought up was by you that its particular case
might flood the slower main process if done first. Not all code can be done
this way.

Overheads are simply moving the state data in/out of the thread. IMO
starting/stopping threads too often is a fairly bad idea. Most events will
end up being grouped together into types (perhapse categorized by
component, perhapse by client request, perhapse by pathway) with a small
thread dedicated to handling that type of call.

> 
> Again for higher-level models, I am also skeptical that it is a good
> idea to just split Squid into N mostly non-cooperating nearly identical
> instances. It may be the right first step, but I would like to offer
> more than that in terms of overall performance and tunability.

The answer to that is: of all the SMP models we theorize, that one is the
only proven model so far.
Administrators are already doing it with all the instance management
manually handled on quad+ core machines. With a lot of performance success.

In last nights discussion on IRC we covered what issues are outstanding
from making this automatic and all are resolvable except cache index. It's
not easily shareable between instances.

> 
> I hope the above explains why I consider Q1 critical for the meant
> "highest level" scope and why "we already use processes and threads" is
> certainly true but irrelevant within that scope.
> 
> 
> Thank you,
> 
> Alex.

Thank you for clarifying that. I now think we are all more or less headed
in the same direction(s). With three models proposed for t

Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Robert Collins
On Tue, 2009-11-24 at 16:13 -0700, Alex Rousskov wrote:

> For example, I do not think it is a good idea to allow a combination of
> OpenMP, ACE, and something else as a top-level design. Understanding,
> supporting, and tuning such a mix would be a nightmare, IMO.

I think that would be hard, yes.

> See Henrik's email on why it is difficult to use threads at highest
> levels. I am not convinced yet, but I do see Henrik's point, and I
> consider the dangers he cites critical for the right Q1 answer.

> >  - If we do *not* permit multiple approaches, then what approach do we
> > want for parallelisation. E.g. a number of long lived threads that take
> > on work, or many transient threads as particular bits of the code need
> > threads. I favour the former (long lived 'worker' threads).
> 
> For highest-level models, I do not think that "one job per
> thread/process", "one call per thread/process", or any other "one little
> short-lived something per thread/process" is a good idea.

Neither do I. Short lived things have a high overhead. But consider that
a queue of tasks in a single long lived thread doesn't have the high
overhead of making a new thread or process per item in the queue. Using
ACLs as an example, ACL checking is callback based nearly everywhere; we
could have a thread that does ACL checking and free up the main thread
to continue doing work. Later on, with more auditing we could have
multiple concurrent ACL checking threads.

-Rob


signature.asc
Description: This is a digitally signed message part


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Alex Rousskov
On 11/20/2009 10:59 PM, Robert Collins wrote:
> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:
 Q1. What are the major areas or units of asynchronous code execution?
 Some of us may prefer large areas such as "http_port acceptor" or
 "cache" or "server side". Others may root for AsyncJob as the largest
 asynchronous unit of execution. These two approaches and their
 implications differ a lot. There may be other designs worth considering.

> I'd like to let people start writing (and perf testing!) patches. To
> unblock people. I think the primary questions are:
>  - do we permit multiple approaches inside the same code base. E.g.
> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
> queues' or some such abstraction elsewhere ?
> (I vote yes, but with caution: someone trying something we don't
> already do should keep it on a branch and really measure it well until
> its got plenty of buy in).

I vote for multiple approaches at lower levels of the architecture and
against multiple approaches at highest level of the architecture. My Q1
was only about the highest levels, BTW.

For example, I do not think it is a good idea to allow a combination of
OpenMP, ACE, and something else as a top-level design. Understanding,
supporting, and tuning such a mix would be a nightmare, IMO.

On the other hand, using threads within some disk storage schemes while
using processes for things like "cache" may make a lot of sense, and we
already have examples of some of that working.


This is why I believe that the decision of processes versus threads *at
the highest level* of the architecture is so important. Yes, we are,
can, and will use threads at lower levels. There is no argument there.
The question is whether we can also use threads to split Squid into
several instances of "major areas" like client side(s), cache(s), and
server side(s).

See Henrik's email on why it is difficult to use threads at highest
levels. I am not convinced yet, but I do see Henrik's point, and I
consider the dangers he cites critical for the right Q1 answer.


>  - If we do *not* permit multiple approaches, then what approach do we
> want for parallelisation. E.g. a number of long lived threads that take
> on work, or many transient threads as particular bits of the code need
> threads. I favour the former (long lived 'worker' threads).

For highest-level models, I do not think that "one job per
thread/process", "one call per thread/process", or any other "one little
short-lived something per thread/process" is a good idea. I do believe
we have to parallelize "major areas", and I think we should support
multiple instances of some of those "areas" (e.g., multiple client
sides). Each "major area" would be long-lived process/thread, of course.

Again for higher-level models, I am also skeptical that it is a good
idea to just split Squid into N mostly non-cooperating nearly identical
instances. It may be the right first step, but I would like to offer
more than that in terms of overall performance and tunability.

I hope the above explains why I consider Q1 critical for the meant
"highest level" scope and why "we already use processes and threads" is
certainly true but irrelevant within that scope.


Thank you,

Alex.


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Sachin Malave
On Tue, Nov 24, 2009 at 6:08 PM, Henrik Nordstrom
 wrote:
> ons 2009-11-25 klockan 00:55 +1300 skrev Amos Jeffries:
>
>> I kind of mean that by the "smaller units". I'm thinking primarily here
>> of the internal DNS. It's API is very isolated from the work.
>
> And also a good example of where the CPU usage is negligible.
>
> And no, it's not really that isolated. It's allocating data for the
> response which is then handed to the caller, and modified in other parts
> of the code via ipcache..
>
> But yes, it's a good example of where one can try scheduling the
> processing on a separate thread to experiment with such model.

Its not only about how much CPU usage we are distributing among
threads, But we also have to consider that thread works only inside
its own memory if shared data is less( & must be), If we could let
thread to work inside its own private memory maximum time, it is worth
to create thread so a thread scheduled on a core accessing its own
cache will definitely speed up our squid.

Yes we have to consider how OS is doing read/write operations, Because
all write operations must be done serially if using WRITE THROUGH
policy to update all levels of memory ( cache or main), otherwise no
issues



> Regards
> Henrik
>
>



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Henrik Nordstrom
ons 2009-11-25 klockan 00:55 +1300 skrev Amos Jeffries:

> I kind of mean that by the "smaller units". I'm thinking primarily here 
> of the internal DNS. It's API is very isolated from the work.

And also a good example of where the CPU usage is negligible.

And no, it's not really that isolated. It's allocating data for the
response which is then handed to the caller, and modified in other parts
of the code via ipcache..

But yes, it's a good example of where one can try scheduling the
processing on a separate thread to experiment with such model.

Regards
Henrik



Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Amos Jeffries

Henrik Nordstrom wrote:

sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries:

I think we can open the doors earlier than after that. I'm happy with an 
approach that would see the smaller units of Squid growing in 
parallelism to encompass two full cores.


And I have a more careful opinion.

Introducing threads in the current Squid core processing is very
non-trivial. This due to the relatively high amount of shared data with
no access protection. We already have sufficient nightmares from data
access synchronization issues in the current non-threaded design, and
trying to synchronize access in a threaded operations is many orders of
magnitude more complex.

The day the code base is cleaned up to the level that one can actually
assess what data is being accessed where threads may be a viable
discussion, but as things are today it's almost impossible to judge what
data will be directly or indirectly accessed by any larger operation.


I kind of mean that by the "smaller units". I'm thinking primarily here 
of the internal DNS. It's API is very isolated from the work.





Using threads for micro operations will not help us. The overhead
involved in scheduling an operation to a thread is comparably large to
most operations we are performing, and if adding to this the amount of
synchronization needed to shield the data accessed by that operation
then the overhead will in nearly all cases by far out weight the actual
processing time of the micro operations only resulting in a net loss of
performance. There is some isolated cases I can think of like SSL
handshake negotiation where actual processing may be significant, but at
the general level I don't see many operations which would be candidates
for micro threading.


These are the ones I can see without really looking ...

 * receive DNS packet,
 * validate
 * add to cache
 * schedule event
 * repeat
::shared: call event data, IP memory block (copy?), queue access, any 
stats counted


or the one Sachin found:
 * accept connection
 * perform NAT if needed
 * perform SSL handshakes if needed
 * generate connection state objects
 * schedule
 * repeat
::shared: state data object (write), SSL context (read-only?), call 
event data, call queue access, any stats counted


or the request body pump is a dead-end for handling:
 * read data chunk
 * compress/decompress
 * write to disk
 * write data chunk to client
 * repeat
::shared: state data object (read-only, if thread provides its own data 
buffer), 2N FD data (read-only), any stats counted



Yes this last is overkill unless bunching up the concurrency a 
little/lot in each thread. so the request body data pump can pull/push 
up to N active client connections at once.




Using threads for isolated things like disk I/O is one thing. The code
running in those threads are very very isolated and limited in what it's
allowed to do (may only access the data given to them, may NOT allocate
new data or look up any other global data), but is still heavily
penalized from synchronization overhead. Further the only reason why we
have the threaded I/O model is because Posix AIO do not provide a rich
enough interface, missing open/close operations which may both block for
significant amount of time. So we had to implement our own alternative
having open/close operations. If you look closely at the threads I/O
code you will see that it goes to quite great lengths to isolate the
threads from the main code, with obvious performance drawbacks. The
initial code even went much further in isolation, but core changes have
over time provided a somewhat more suitable environment for some of
those operations.


For the same reasons I don't see OpenMP as fitting for the problem scope
we have. The strength of OpenMP is to parallize CPU intensive operations
of the code where those regions is well defined in what data they
access, not to deal with a large scale of concurrent operations with
access to unknown amounts of shared data.



Trying to thread the Squid core engine is in many ways similar to the
problems kernel developers have had to fight in making the OS kernels
multithreaded, except that we don't even have threads of execution (the
OS developers at least had processes). If trying to do the same with the
Squid code then we would need an approach like the following:

1. Create a big Squid main lock, always held except for audited regions
known to use more fine grained locking.

2. Set up N threads of executing, all initially fighting for that big
main lock in each operation.

3. Gradually work over the code identify areas where that big lock is
not needed to be held, transition over to more fine grained locking.
Starting at the main loops and work down from there.

This is not a path I favor for the Squid code. It's a transition which
is larger than the Squid-3 transition, and which have even bigger
negative impacts on performance until most of the work have been
completed.



Another alternative is to start on Squid-4, rewri

Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Henrik Nordstrom
sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries:

> I think we can open the doors earlier than after that. I'm happy with an 
> approach that would see the smaller units of Squid growing in 
> parallelism to encompass two full cores.

And I have a more careful opinion.

Introducing threads in the current Squid core processing is very
non-trivial. This due to the relatively high amount of shared data with
no access protection. We already have sufficient nightmares from data
access synchronization issues in the current non-threaded design, and
trying to synchronize access in a threaded operations is many orders of
magnitude more complex.

The day the code base is cleaned up to the level that one can actually
assess what data is being accessed where threads may be a viable
discussion, but as things are today it's almost impossible to judge what
data will be directly or indirectly accessed by any larger operation.

Using threads for micro operations will not help us. The overhead
involved in scheduling an operation to a thread is comparably large to
most operations we are performing, and if adding to this the amount of
synchronization needed to shield the data accessed by that operation
then the overhead will in nearly all cases by far out weight the actual
processing time of the micro operations only resulting in a net loss of
performance. There is some isolated cases I can think of like SSL
handshake negotiation where actual processing may be significant, but at
the general level I don't see many operations which would be candidates
for micro threading.

Using threads for isolated things like disk I/O is one thing. The code
running in those threads are very very isolated and limited in what it's
allowed to do (may only access the data given to them, may NOT allocate
new data or look up any other global data), but is still heavily
penalized from synchronization overhead. Further the only reason why we
have the threaded I/O model is because Posix AIO do not provide a rich
enough interface, missing open/close operations which may both block for
significant amount of time. So we had to implement our own alternative
having open/close operations. If you look closely at the threads I/O
code you will see that it goes to quite great lengths to isolate the
threads from the main code, with obvious performance drawbacks. The
initial code even went much further in isolation, but core changes have
over time provided a somewhat more suitable environment for some of
those operations.


For the same reasons I don't see OpenMP as fitting for the problem scope
we have. The strength of OpenMP is to parallize CPU intensive operations
of the code where those regions is well defined in what data they
access, not to deal with a large scale of concurrent operations with
access to unknown amounts of shared data.



Trying to thread the Squid core engine is in many ways similar to the
problems kernel developers have had to fight in making the OS kernels
multithreaded, except that we don't even have threads of execution (the
OS developers at least had processes). If trying to do the same with the
Squid code then we would need an approach like the following:

1. Create a big Squid main lock, always held except for audited regions
known to use more fine grained locking.

2. Set up N threads of executing, all initially fighting for that big
main lock in each operation.

3. Gradually work over the code identify areas where that big lock is
not needed to be held, transition over to more fine grained locking.
Starting at the main loops and work down from there.

This is not a path I favor for the Squid code. It's a transition which
is larger than the Squid-3 transition, and which have even bigger
negative impacts on performance until most of the work have been
completed.



Another alternative is to start on Squid-4, rewriting the code base
completely from scratch starting at a parallel design and then plug in
any pieces that can be rescued from earlier Squid generations if any.
But for obvious staffing reasons this is an approach I do not recommend
in this project. It's effectively starting another project, with very
little shared with the Squid we have today.


For these reasons I am more in favor for multi-process approaches. The
amount of work needed for making Squid multi-process capable is fairly
limited and mainly circulates around the cache index and a couple of
other areas that need to be shared for proper operation. We can fully
parallelize Squid today at process level if disabling persistent shared
cache + digest auth, and this is done by many users already. Squid-2 can
even do it on the same http_port, letting the OS schedule connections to
the available Squid processes.


Regards
Henrik



Re: squid-smp: synchronization issue & solutions

2009-11-21 Thread Amos Jeffries

Robert Collins wrote:

On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:



Important features of  OPENMP, you might be interested in...

** If your compiler is not supporting OPENMP then you dont have to do
any special thing, Compiler simply ignores these #pragmas..
and runs codes as if they are in sequential single thread program,
without affecting the end goal.


I don't think this is useful to us: all the platforms we consider
important have threading libraries. Support does seem widespread.
 

** Programmers need not to create any locking mechanism and worry
about critical sections,


We have to worry about this because:
 - OpenMP is designed for large data set manipulation
 - few of our datasets are large except for:
   - some ACL's
   - the main hash table

So we'll need separate threads created around large constructs like
'process a request' (unless we take a thread-per-CPU approach and a
queue of jobs). Either approach will require careful synchronisation on
the 20 or so shared data structures.


** By default it creates number threads equals to processors( * cores
per processor) in your system.

All of the above make me think that OPENMP-enabled Squid may be
significantly slower than multi-instance Squid. I doubt OPENMP is so
smart that it can correctly and efficiently orchestrate the work of
Squid "threads" that are often not even visible/identifiable in the
current code.


I think it could, if we had a shared-nothing model under the hood so
that we could 'simply' parallelise the front end dispatch and let
everything run. However, that doesn't really fit our problem.


- Designed for parallelizing computation-intensive programs such as
various math models running on massively parallel computers. AFAICT, the
OpenMP steering group is comprised of folks that deal with such models
in such environments. Our environment and performance goals are rather
different.


But that doesnt mean that we can not have independent threads,

It means that there is a high probability that it will not work well for
other, very different, problem areas. It may work, but not work well enough.


I agree. From my reading openMP isn't really suitable to our domain.
I've asked around a little and noone has said 'Yes! you should Do It'.
The similar servers I know of like drizzle(Mysql) do not do it.


I think our first questions should instead include:

Q1. What are the major areas or units of asynchronous code execution?
Some of us may prefer large areas such as "http_port acceptor" or
"cache" or "server side". Others may root for AsyncJob as the largest
asynchronous unit of execution. These two approaches and their
implications differ a lot. There may be other designs worth considering.


I'd like to let people start writing (and perf testing!) patches. To
unblock people. I think the primary questions are:
 - do we permit multiple approaches inside the same code base. E.g.
OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
queues' or some such abstraction elsewhere ?
(I vote yes, but with caution: someone trying something we don't
already do should keep it on a branch and really measure it well until
its got plenty of buy in).


I'm also in favor of the mixed approach. With care that the particular 
approach taken at each point is appropriate for the operation being done.
 For example I wouldn't place each Call into a process. But a thread 
each might be arguable. Whereas a Job might be a process with multiple 
threads, or a thread with async hops in time.




 - If we do *not* permit multiple approaches, then what approach do we
want for parallelisation. E.g. a number of long lived threads that take
on work, or many transient threads as particular bits of the code need
threads. I favour the former (long lived 'worker' threads).

If we can reach either a 'yes' on the first of these two questions or a
decision on the second, then folk can start working on their favourite
part of the code base. As long as its well tested and delivered with
appropriate synchronisation, I think the benefit of letting folk scratch
itches will be considerable.

I know you have processes vs threads as a key question, but I don't
actually think it is.


I don't think so either. Sounds like a good Q but its a choice of two 
alternatives where the best alternative is number 3: both.


We _already_ have a mixed environment. The helpers and diskd/unlinkd are 
perfect examples of having chosen the process model for some small 
internal units of Squid and the idns vs dnsserver being an example of 
the other choice being made.


We are not deciding on how to make Squid parallel, but how to make is 
massively _more_ parallel than it already is.




We *already* have significant experience with threads (threaded disk io
engine) and multiple processes (diskd io engine, helpers). We shouldn't
require a single answer for breaking squid up, rather good analysis by
the person doing the work on breaking a particular bit of it up.



Re: squid-smp: synchronization issue & solutions

2009-11-20 Thread Robert Collins
On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:


> > Important features of  OPENMP, you might be interested in...
> > 
> > ** If your compiler is not supporting OPENMP then you dont have to do
> > any special thing, Compiler simply ignores these #pragmas..
> > and runs codes as if they are in sequential single thread program,
> > without affecting the end goal.

I don't think this is useful to us: all the platforms we consider
important have threading libraries. Support does seem widespread.
 
> > ** Programmers need not to create any locking mechanism and worry
> > about critical sections,

We have to worry about this because:
 - OpenMP is designed for large data set manipulation
 - few of our datasets are large except for:
   - some ACL's
   - the main hash table

So we'll need separate threads created around large constructs like
'process a request' (unless we take a thread-per-CPU approach and a
queue of jobs). Either approach will require careful synchronisation on
the 20 or so shared data structures.

> > ** By default it creates number threads equals to processors( * cores
> > per processor) in your system.
> 
> All of the above make me think that OPENMP-enabled Squid may be
> significantly slower than multi-instance Squid. I doubt OPENMP is so
> smart that it can correctly and efficiently orchestrate the work of
> Squid "threads" that are often not even visible/identifiable in the
> current code.

I think it could, if we had a shared-nothing model under the hood so
that we could 'simply' parallelise the front end dispatch and let
everything run. However, that doesn't really fit our problem.

> >> - Designed for parallelizing computation-intensive programs such as
> >> various math models running on massively parallel computers. AFAICT, the
> >> OpenMP steering group is comprised of folks that deal with such models
> >> in such environments. Our environment and performance goals are rather
> >> different.
> >>
> > 
> > But that doesnt mean that we can not have independent threads,
> 
> It means that there is a high probability that it will not work well for
> other, very different, problem areas. It may work, but not work well enough.

I agree. From my reading openMP isn't really suitable to our domain.
I've asked around a little and noone has said 'Yes! you should Do It'.
The similar servers I know of like drizzle(Mysql) do not do it.

> >> I think our first questions should instead include:
> >>
> >> Q1. What are the major areas or units of asynchronous code execution?
> >> Some of us may prefer large areas such as "http_port acceptor" or
> >> "cache" or "server side". Others may root for AsyncJob as the largest
> >> asynchronous unit of execution. These two approaches and their
> >> implications differ a lot. There may be other designs worth considering.

I'd like to let people start writing (and perf testing!) patches. To
unblock people. I think the primary questions are:
 - do we permit multiple approaches inside the same code base. E.g.
OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
queues' or some such abstraction elsewhere ?
(I vote yes, but with caution: someone trying something we don't
already do should keep it on a branch and really measure it well until
its got plenty of buy in).

 - If we do *not* permit multiple approaches, then what approach do we
want for parallelisation. E.g. a number of long lived threads that take
on work, or many transient threads as particular bits of the code need
threads. I favour the former (long lived 'worker' threads).

If we can reach either a 'yes' on the first of these two questions or a
decision on the second, then folk can start working on their favourite
part of the code base. As long as its well tested and delivered with
appropriate synchronisation, I think the benefit of letting folk scratch
itches will be considerable.

I know you have processes vs threads as a key question, but I don't
actually think it is.

We *already* have significant experience with threads (threaded disk io
engine) and multiple processes (diskd io engine, helpers). We shouldn't
require a single answer for breaking squid up, rather good analysis by
the person doing the work on breaking a particular bit of it up.


> > I AM THINKING ABOUT HYBRID OF BOTH...
> > 
> > Somebody might implement process model, Then we would merge both
> > process and thread models .. together we could have a better squid..
> > :)
> > What do u think? 
> 
> I doubt we have the resources to do a generic process model so I would
> rather decide on a single primary direction (processes or threads) and
> try to generalize that later if needed. However, a process (if we decide
> to go down that route) may still have lower-level threads, but that is a
> secondary question/decision.

We could simply adopt ACE wholesale and focus on the squid unique bits
of the stack. Squid is a pretty typical 'all in one' bundle at the
moment; I'd like to see us focus and reuse/split ou

Re: squid-smp: synchronization issue & solutions

2009-11-19 Thread Adrian Chadd
Right. Thats the easy bit. I could even do that in Squid-2 with a
little bit of luck. The hard bit is rewriting the relevant code which
relies on cbdata style reference counting behaviour. That is the
tricky bit.



Adrian

2009/11/20 Robert Collins :
> On Wed, 2009-11-18 at 10:46 +0800, Adrian Chadd wrote:
>> Plenty of kernels nowdays do a bit of TCP and socket process in
>> process/thread context; so you need to do your socket TX/RX in
>> different processes/threads to get parallelism in the networking side
>> of things.
>
> Very good point.
>
>> You could fake it somewhat by pushing socket IO into different threads
>> but then you have all the overhead of shuffling IO and completed IO
>> between threads. This may be .. complicated.
>
> The event loop I put together for -3 should be able to do that without
> changing the loop - just extending the modules that hook into it.
>
> -Rob
>


Re: squid-smp: synchronization issue & solutions

2009-11-19 Thread Robert Collins
On Wed, 2009-11-18 at 10:46 +0800, Adrian Chadd wrote:
> Plenty of kernels nowdays do a bit of TCP and socket process in
> process/thread context; so you need to do your socket TX/RX in
> different processes/threads to get parallelism in the networking side
> of things.

Very good point.

> You could fake it somewhat by pushing socket IO into different threads
> but then you have all the overhead of shuffling IO and completed IO
> between threads. This may be .. complicated.

The event loop I put together for -3 should be able to do that without
changing the loop - just extending the modules that hook into it.

-Rob


signature.asc
Description: This is a digitally signed message part


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Adrian Chadd
Plenty of kernels nowdays do a bit of TCP and socket process in
process/thread context; so you need to do your socket TX/RX in
different processes/threads to get parallelism in the networking side
of things.

You could fake it somewhat by pushing socket IO into different threads
but then you have all the overhead of shuffling IO and completed IO
between threads. This may be .. complicated.


Adrian

2009/11/18 Gonzalo Arana :
> On Tue, Nov 17, 2009 at 12:45 PM, Alex Rousskov
>  wrote:
>> On 11/17/2009 04:09 AM, Sachin Malave wrote:
>>
>> 
>>
>>> I AM THINKING ABOUT HYBRID OF BOTH...
>>>
>>> Somebody might implement process model, Then we would merge both
>>> process and thread models .. together we could have a better squid..
>>> :)
>>> What do u think? 
>
> In my limited squid expierence, cpu usage is hardly a bottleneck.  So,
> why not just use smp for the cpu/disk-intensive parts?
>
> The candidates I can think of are:
>  * evaluating regular expressions (url_regex acls).
>  * aufs/diskd (squid already has support for this).
>
> Best regards,
>
> --
> Gonzalo A. Arana
>
>


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Robert Collins
On Tue, 2009-11-17 at 15:49 -0300, Gonzalo Arana wrote:


> In my limited squid expierence, cpu usage is hardly a bottleneck.  So,
> why not just use smp for the cpu/disk-intensive parts?
> 
> The candidates I can think of are:
>   * evaluating regular expressions (url_regex acls).
>   * aufs/diskd (squid already has support for this).

So, we can drive squid to 100% CPU in production high load environments.
To scale further we need:
 - more cpus
 - more performance from the cpu's we have

Adrian is working on the latter, and the SMP discussion is about the
former. Simply putting each request in its own thread would go a long
way towards getting much more bang for buck - but thats not actually
trivial to do :)

-Rob


signature.asc
Description: This is a digitally signed message part


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Sachin Malave
On Tue, Nov 17, 2009 at 9:15 PM, Alex Rousskov
 wrote:
> On 11/17/2009 04:09 AM, Sachin Malave wrote:
>
>>> After spending 2 minutes on openmp.org, I am not very excited about
>>> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:
>>>
>>> - An "approach" or "model" requiring compiler support and language
>>> extensions. It is _not_ a library. You examples with #pragmas is a good
>>> illustration.
>
>> Important features of  OPENMP, you might be interested in...
>>
>> ** If your compiler is not supporting OPENMP then you dont have to do
>> any special thing, Compiler simply ignores these #pragmas..
>> and runs codes as if they are in sequential single thread program,
>> without affecting the end goal.
>>
>> ** Programmers need not to create any locking mechanism and worry
>> about critical sections,
>>
>> ** By default it creates number threads equals to processors( * cores
>> per processor) in your system.
>
> All of the above make me think that OPENMP-enabled Squid may be
> significantly slower than multi-instance Squid. I doubt OPENMP is so
> smart that it can correctly and efficiently orchestrate the work of
> Squid "threads" that are often not even visible/identifiable in the
> current code.
>
>>> - Designed for parallelizing computation-intensive programs such as
>>> various math models running on massively parallel computers. AFAICT, the
>>> OpenMP steering group is comprised of folks that deal with such models
>>> in such environments. Our environment and performance goals are rather
>>> different.
>>>
>>
>> But that doesnt mean that we can not have independent threads,
>
> It means that there is a high probability that it will not work well for
> other, very different, problem areas. It may work, but not work well enough.
>
>>> I think our first questions should instead include:
>>>
>>> Q1. What are the major areas or units of asynchronous code execution?
>>> Some of us may prefer large areas such as "http_port acceptor" or
>>> "cache" or "server side". Others may root for AsyncJob as the largest
>>> asynchronous unit of execution. These two approaches and their
>>> implications differ a lot. There may be other designs worth considering.
>>>
>>
>> See my sample codes, I sent in last mail.. There i have separated out
>> the schedule() and dial()  functions, Where one thread is registering
>> calls in AsyncCallQueue and another is dispatching them..
>> Well, We can concentrate on other areas also
>
> scheedule() and dial() are low level routines that are irrelevant for Q1.
>
>>> Q2. Threads versus processes. Depending on Q1, we may have a choice. The
>>> choice will affect the required locking mechanism and other key decisions.
>>>
>>
>> If you are planning to use processes then it is as good as running
>> multiple squids on single machine..,
>
> I am not planning to use processes yet, but if they are indeed as good
> as running multiple Squids, that is a plus. Hopefully, we can do better
> than multi-instance Squid, but we should be at least as bad/good.
>
>
>>  Only thing is they must be
>> accepting requests on different ports... But if we want distribute
>> single squid's work then i feel threading is the best choice..
>
> You can have a process accepting a request and then forwarding the work
> to another process or receiving a cache hit from another process.
> Inter-process communication is slower than inter-thread communication,
> but it is not impossible.
>
>
>> I AM THINKING ABOUT HYBRID OF BOTH...
>>
>> Somebody might implement process model, Then we would merge both
>> process and thread models .. together we could have a better squid..
>> :)
>> What do u think? 
>
> I doubt we have the resources to do a generic process model so I would
> rather decide on a single primary direction (processes or threads) and
> try to generalize that later if needed. However, a process (if we decide
> to go down that route) may still have lower-level threads, but that is a
> secondary question/decision.
>

OK then, please come precisely,
What exactly you are thinking ?
tell me areas where i should concentrate ?
I want to know what exactly is going in your mind so that i could
start working and experimenting in that direction ... :)

meanwhile i would also try to experiment with threading, i am doing
right now, it would help me when we start actual development, is that
OK ?


Thanx..



> Cheers,
>
> Alex.
>



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Gonzalo Arana
On Tue, Nov 17, 2009 at 12:45 PM, Alex Rousskov
 wrote:
> On 11/17/2009 04:09 AM, Sachin Malave wrote:
>
> 
>
>> I AM THINKING ABOUT HYBRID OF BOTH...
>>
>> Somebody might implement process model, Then we would merge both
>> process and thread models .. together we could have a better squid..
>> :)
>> What do u think? 

In my limited squid expierence, cpu usage is hardly a bottleneck.  So,
why not just use smp for the cpu/disk-intensive parts?

The candidates I can think of are:
  * evaluating regular expressions (url_regex acls).
  * aufs/diskd (squid already has support for this).

Best regards,

-- 
Gonzalo A. Arana


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Alex Rousskov
On 11/17/2009 04:09 AM, Sachin Malave wrote:

>> After spending 2 minutes on openmp.org, I am not very excited about
>> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:
>>
>> - An "approach" or "model" requiring compiler support and language
>> extensions. It is _not_ a library. You examples with #pragmas is a good
>> illustration.

> Important features of  OPENMP, you might be interested in...
> 
> ** If your compiler is not supporting OPENMP then you dont have to do
> any special thing, Compiler simply ignores these #pragmas..
> and runs codes as if they are in sequential single thread program,
> without affecting the end goal.
> 
> ** Programmers need not to create any locking mechanism and worry
> about critical sections,
> 
> ** By default it creates number threads equals to processors( * cores
> per processor) in your system.

All of the above make me think that OPENMP-enabled Squid may be
significantly slower than multi-instance Squid. I doubt OPENMP is so
smart that it can correctly and efficiently orchestrate the work of
Squid "threads" that are often not even visible/identifiable in the
current code.

>> - Designed for parallelizing computation-intensive programs such as
>> various math models running on massively parallel computers. AFAICT, the
>> OpenMP steering group is comprised of folks that deal with such models
>> in such environments. Our environment and performance goals are rather
>> different.
>>
> 
> But that doesnt mean that we can not have independent threads,

It means that there is a high probability that it will not work well for
other, very different, problem areas. It may work, but not work well enough.

>> I think our first questions should instead include:
>>
>> Q1. What are the major areas or units of asynchronous code execution?
>> Some of us may prefer large areas such as "http_port acceptor" or
>> "cache" or "server side". Others may root for AsyncJob as the largest
>> asynchronous unit of execution. These two approaches and their
>> implications differ a lot. There may be other designs worth considering.
>>
> 
> See my sample codes, I sent in last mail.. There i have separated out
> the schedule() and dial()  functions, Where one thread is registering
> calls in AsyncCallQueue and another is dispatching them..
> Well, We can concentrate on other areas also

scheedule() and dial() are low level routines that are irrelevant for Q1.

>> Q2. Threads versus processes. Depending on Q1, we may have a choice. The
>> choice will affect the required locking mechanism and other key decisions.
>>
> 
> If you are planning to use processes then it is as good as running
> multiple squids on single machine..,

I am not planning to use processes yet, but if they are indeed as good
as running multiple Squids, that is a plus. Hopefully, we can do better
than multi-instance Squid, but we should be at least as bad/good.


>  Only thing is they must be
> accepting requests on different ports... But if we want distribute
> single squid's work then i feel threading is the best choice..

You can have a process accepting a request and then forwarding the work
to another process or receiving a cache hit from another process.
Inter-process communication is slower than inter-thread communication,
but it is not impossible.


> I AM THINKING ABOUT HYBRID OF BOTH...
> 
> Somebody might implement process model, Then we would merge both
> process and thread models .. together we could have a better squid..
> :)
> What do u think? 

I doubt we have the resources to do a generic process model so I would
rather decide on a single primary direction (processes or threads) and
try to generalize that later if needed. However, a process (if we decide
to go down that route) may still have lower-level threads, but that is a
secondary question/decision.

Cheers,

Alex.


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Sachin Malave
On Mon, Nov 16, 2009 at 9:43 PM, Alex Rousskov
 wrote:
> On 11/15/2009 11:59 AM, Sachin Malave wrote:
>
>> Since last few days i am analyzing squid code for smp support, I found
>> one big issue regarding debugs() function, It is very hard get rid of
>> this issue as it is appearing at almost everywhere in the code. So for
>> testing purpose i have disable the debug option in squid.conf as
>> follows
>>
>> ---
>> debug_options 0,0
>> ---
>>
>> Well this was only way, as did not want to spend time on this issue.
>
> You can certainly disable any feature as an intermediate step as long as
> the overall approach allows for the later efficient support of the
> temporary disabled feature. Debugging is probably the worst feature to
> disable though because without it we do not know much about Squid operation.
>
I agree, We should find a way to re-enable this feature. It is
temporarily disabled...
Off-course locking debugs() was not the solution thats why it is disabled...


>
>> Now concentrating on locking mechanism...
>
> I would not recommend starting with such low-level decisions as locking
> mechanisms. We need to decide what needs to be locked first. AFAIK,
> there is currently no consensus whether we start with processes or
> threads, for example. The locking mechanism would depend on that.
>


>
>> As OpenMP library is widely supported by almost all platforms and
>> compilers, I am inheriting locking mechanism from the same
>> Just include omp.h & compile code with -fopenmp option if using gcc,
>> Other may use similar thing on their platform, Well that is not a big
>> issue..


>
> After spending 2 minutes on openmp.org, I am not very excited about
> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:
>
> - An "approach" or "model" requiring compiler support and language
> extensions. It is _not_ a library. You examples with #pragmas is a good
> illustration.
>

We have to use something to create and manage threads, there are some
other libraries and models also but i feel we need something that will
work on all platforms,
Important features of  OPENMP, you might be interested in...

** If your compiler is not supporting OPENMP then you dont have to do
any special thing, Compiler simply ignores these #pragmas..
and runs codes as if they are in sequential single thread program,
without affecting the end goal.

** Programmers need not to create any locking mechanism and worry
about critical sections,

** By default it creates number threads equals to processors( * cores
per processor) in your system.

** Its fork and join model is scalable.. ( Off-course we must find
such areas in exiting code)

** OPENMP is OLD but still growing .. Providing new features with new
releases.. Think about other threading libraries, I think their
developments are stopped, Some of them are not freely available, some
of them are available only on WINDOWS..

** IT IS FREE and OPEN-SOURCE like us..

** INTEL just has released TBB ( Threading Building Blocks), But i
doubt its performance on AMD ( non-intel ) hardware.

** You might be thinking about old Pthreads, But i think OPENMP is
very safe and better than pthreads for programmers

SPECIALLY ONE WHO IS MAKING CHANGES IN EXISTING CODES.  and easy to debugs.

 please think about my last point... :)






> - Designed for parallelizing computation-intensive programs such as
> various math models running on massively parallel computers. AFAICT, the
> OpenMP steering group is comprised of folks that deal with such models
> in such environments. Our environment and performance goals are rather
> different.
>

But that doesnt mean that we can not have independent threads, Only
thing is that we have to start these threads in main(), because main
never ends.. Otherwise those independent threads will die after
returning to calling function..



>
>> 1. hash_link  LOCKED
>>
>> 2. dlink_list  LOCKED
>>
>> 3. ipcache, fqdncache   LOCKED,
>>
>> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
>> please discuss.
>>
>> 5. statistic counters --- NOT LOCKED ( I know this is very important,
>> But these are scattered all around squid code, Write now they may be
>> holding wrong values)
>>
>> 6. memory manager --- DID NOT FOLLOW
>>
>> 7. configuration objects --- DID NOT FOLLOW
>
> I worry that the end result of this exercise would produce a slow and
> buggy Squid for several reasons:
>
> - Globally locking low-level but interdependent objects is likely to
> create deadlocks when two or more locked objects need to lock other
> locked objects in a circular fashion.
>

is there any other option ? As discussed, Amos is trying to make these
areas as independent as possible. So that we would have less locking
in the code.

> - Locking low-level objects without an overall performance-aware plan is
> likely to result in performance-killing competition for critical locks.
> I believe that with

Re: squid-smp: synchronization issue & solutions

2009-11-16 Thread Alex Rousskov
On 11/15/2009 11:59 AM, Sachin Malave wrote:

> Since last few days i am analyzing squid code for smp support, I found
> one big issue regarding debugs() function, It is very hard get rid of
> this issue as it is appearing at almost everywhere in the code. So for
> testing purpose i have disable the debug option in squid.conf as
> follows
> 
> ---
> debug_options 0,0
> ---
> 
> Well this was only way, as did not want to spend time on this issue.

You can certainly disable any feature as an intermediate step as long as
the overall approach allows for the later efficient support of the
temporary disabled feature. Debugging is probably the worst feature to
disable though because without it we do not know much about Squid operation.


> Now concentrating on locking mechanism...

I would not recommend starting with such low-level decisions as locking
mechanisms. We need to decide what needs to be locked first. AFAIK,
there is currently no consensus whether we start with processes or
threads, for example. The locking mechanism would depend on that.


> As OpenMP library is widely supported by almost all platforms and
> compilers, I am inheriting locking mechanism from the same
> Just include omp.h & compile code with -fopenmp option if using gcc,
> Other may use similar thing on their platform, Well that is not a big
> issue..

After spending 2 minutes on openmp.org, I am not very excited about
using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:

- An "approach" or "model" requiring compiler support and language
extensions. It is _not_ a library. You examples with #pragmas is a good
illustration.

- Designed for parallelizing computation-intensive programs such as
various math models running on massively parallel computers. AFAICT, the
OpenMP steering group is comprised of folks that deal with such models
in such environments. Our environment and performance goals are rather
different.


> 1. hash_link  LOCKED
> 
> 2. dlink_list  LOCKED
> 
> 3. ipcache, fqdncache   LOCKED,
> 
> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
> please discuss.
> 
> 5. statistic counters --- NOT LOCKED ( I know this is very important,
> But these are scattered all around squid code, Write now they may be
> holding wrong values)
> 
> 6. memory manager --- DID NOT FOLLOW
> 
> 7. configuration objects --- DID NOT FOLLOW

I worry that the end result of this exercise would produce a slow and
buggy Squid for several reasons:

- Globally locking low-level but interdependent objects is likely to
create deadlocks when two or more locked objects need to lock other
locked objects in a circular fashion.

- Locking low-level objects without an overall performance-aware plan is
likely to result in performance-killing competition for critical locks.
I believe that with the right design, many locks can be avoided.


I think our first questions should instead include:

Q1. What are the major areas or units of asynchronous code execution?
Some of us may prefer large areas such as "http_port acceptor" or
"cache" or "server side". Others may root for AsyncJob as the largest
asynchronous unit of execution. These two approaches and their
implications differ a lot. There may be other designs worth considering.

Q2. Threads versus processes. Depending on Q1, we may have a choice. The
choice will affect the required locking mechanism and other key decisions.


Thank you,

Alex.



Re: squid-smp: synchronization issue & solutions

2009-11-15 Thread Amos Jeffries
[NP: eliding recipients I know are getting these mails through squid-dev
anyway]

On Mon, 16 Nov 2009 12:52:15 +1100, Robert Collins
 wrote:
> On Mon, 2009-11-16 at 00:29 +0530, Sachin Malave wrote:
>> Hello,
>> 
>> Since last few days i am analyzing squid code for smp support, I found
>> one big issue regarding debugs() function, It is very hard get rid of
>> this issue as it is appearing at almost everywhere in the code. So for
>> testing purpose i have disable the debug option in squid.conf as
>> follows
>> 
>> ---
>> debug_options 0,0
>> ---
>> 
>> Well this was only way, as did not want to spend time on this
issue.
> 
> Its very important that debugs works.

What exactly were the problems identified?

> 
> 
>> 1. hash_link  LOCKED
> 
> Bad idea, not all hashes will be cross-thread, so making the primitive
> lock incurs massive overhead for all threads.
> 
>> 2. dlink_list  LOCKED
> 
> Ditto.
> 

Aye. These two need to be checked for thread-safe implementations and any
locking done in the caller code per the distinctly named hash/dlink.

>> 3. ipcache, fqdncache   LOCKED,
> 
> Probably important.
> 
>> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
>> please discuss.
> 
>  we need analysis and proof, not 'seems to work'.

Aye. NP: this is one of the critical data stores in Squid. I wouldn't be
too far off generalizing the "everything" up and down the request handling
uses it semi-'random access' directly or indirectly.

> 
>> 5. statistic counters --- NOT LOCKED ( I know this is very important,
>> But these are scattered all around squid code, Write now they may be
>> holding wrong values)
> 
> Will need to be fixed.
> 
>> 6. memory manager --- DID NOT FOLLOW
> 
> Will need attention, e.g. per thread allocators.
> 
>> 7. configuration objects --- DID NOT FOLLOW
> 
> ACL's are not threadsafe.
> 
>> AND FINALLY, Two sections in EventLoop.cc are separated and executed
>> in two threads simultaneously
>> as follows (#pragma lines added in existing code, no other changes)
> 
> I'm not at all sure that splitting the event loop like that is sensible.
> 
> Better to have the dispatcher dispatch to threads.
> 
> -Rob

Amos


Re: squid-smp: synchronization issue & solutions

2009-11-15 Thread Robert Collins
On Mon, 2009-11-16 at 00:29 +0530, Sachin Malave wrote:
> Hello,
> 
> Since last few days i am analyzing squid code for smp support, I found
> one big issue regarding debugs() function, It is very hard get rid of
> this issue as it is appearing at almost everywhere in the code. So for
> testing purpose i have disable the debug option in squid.conf as
> follows
> 
> ---
> debug_options 0,0
> ---
> 
> Well this was only way, as did not want to spend time on this issue.

Its very important that debugs works. 


> 1. hash_link  LOCKED

Bad idea, not all hashes will be cross-thread, so making the primitive
lock incurs massive overhead for all threads.

> 2. dlink_list  LOCKED

Ditto.

> 3. ipcache, fqdncache   LOCKED,

Probably important.

> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
> please discuss.

 we need analysis and proof, not 'seems to work'.

> 5. statistic counters --- NOT LOCKED ( I know this is very important,
> But these are scattered all around squid code, Write now they may be
> holding wrong values)

Will need to be fixed.

> 6. memory manager --- DID NOT FOLLOW

Will need attention, e.g. per thread allocators.

> 7. configuration objects --- DID NOT FOLLOW

ACL's are not threadsafe.

> AND FINALLY, Two sections in EventLoop.cc are separated and executed
> in two threads simultaneously
> as follows (#pragma lines added in existing code, no other changes)

I'm not at all sure that splitting the event loop like that is sensible.

Better to have the dispatcher dispatch to threads.

-Rob


signature.asc
Description: This is a digitally signed message part


squid-smp: synchronization issue & solutions

2009-11-15 Thread Sachin Malave
Hello,

Since last few days i am analyzing squid code for smp support, I found
one big issue regarding debugs() function, It is very hard get rid of
this issue as it is appearing at almost everywhere in the code. So for
testing purpose i have disable the debug option in squid.conf as
follows

---
debug_options 0,0
---

Well this was only way, as did not want to spend time on this issue.

Now concentrating on locking mechanism...

As OpenMP library is widely supported by almost all platforms and
compilers, I am inheriting locking mechanism from the same
Just include omp.h & compile code with -fopenmp option if using gcc,
Other may use similar thing on their platform, Well that is not a big
issue..

BUT, is it wise to take support from this library? Please discuss on
this issue  I felt it is really easy to manage threads and
critical sections if we use OPENMP.

AS DISCUSSED BEFORE.. AND details available on
http://wiki.squid-cache.org/Features/SmpScale
I think, I have solved SOME critical section problems in existing squid code.

*AsyncCallQueue.cc***

void AsyncCallQueue::schedule(AsyncCall::Pointer &call)
{

#pragma omp critical (AsyncCallQueueLock_c) // HERE IS THE LOCK
{
   if (theHead != NULL) { // append
       assert(!theTail->theNext);
       theTail->theNext = call;
       theTail = call;
   } else { // create queue from cratch
       theHead = theTail = call;
   }
}

//AND THEN

AsyncCallQueue::fireNext()
{
AsyncCall::Pointer call;
#pragma omp critical (AsyncCallQueueLock_c)  // SAME LOCK
{
       call = theHead;
   theHead = call->theNext;
   call->theNext = NULL;
   if (theTail == call)
       theTail = NULL;
}
       

}

ITS WORKING, AS SAME CRITICAL SECTIONS (i.e AsyncCallQueueLock_c) CAN
NOT BE CALLED SIMULTANEOUSLY
**

Well in the same way following thing as appearing on
/Features/SmpScale are also locked( May be incompletely)

1. hash_link  LOCKED

2. dlink_list  LOCKED

3. ipcache, fqdncache   LOCKED,

4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
please discuss.

5. statistic counters --- NOT LOCKED ( I know this is very important,
But these are scattered all around squid code, Write now they may be
holding wrong values)

6. memory manager --- DID NOT FOLLOW

7. configuration objects --- DID NOT FOLLOW

AND FINALLY, Two sections in EventLoop.cc are separated and executed
in two threads simultaneously
as follows (#pragma lines added in existing code, no other changes)

**EventLoop.cc

#pragma omp parallel sections //PARALLEL SECTIONS
{

  #pragma omp section   //THREAD-1
  {
       if (waitingEngine != NULL)
            checkEngine(waitingEngine, true);
       if (timeService != NULL)
             timeService->tick();
       checked = true;
  }


#pragma omp section //THREAD-2
 {
  while(1)
  {
      if ( lastRound == true) break;
      sawActivity = dispatchCalls();
      if (sawActivity)
          runOnceResult = false;
      if(checked == true) lastRound = true;
   }
 }
}


May need deep testing , but it is working..
am I on the right path ?

Thank you,


--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp

2009-10-27 Thread Sachin Malave
On Tue, Oct 27, 2009 at 8:09 AM, Amos Jeffries  wrote:
> Alex Rousskov wrote:
>>
>> Hello,
>>
>>    I will be working on SMP support in the coming months. I caught up
>> on this and the previous SMP-related squid-dev thread and it looks like
>>  the approach I currently favor has been discussed (again) and did not
>> cause any violent objections, although some ideas were much more
>> ambitious. I am not sure how we can reach consensus on this topic, but I
>> will through in some specifics in hope to better identify competing
>> approaches.
>
> Okay. Before you get buried in this work might I request that you look at
> the stale-while-revalidate / stale-if-error code that was nearly finished?
> http://bugs.squid-cache.org/show_bug.cgi?id=2255
>
> Yahoo! named those as two of their requirements before it would be possible
> for them to assist with performance testing Squid-3. They might not be
> adverse to helping test SMP support if the other requirements are available.
>
> Collapsed forwarding is also a requirement, but I suspect it is too close to
> the request handling and needs a re-designed code architecture to fit with
> whatever SMP threading model is taken anyways.
>
>>
>> My short-term focus would be on the following three areas:
>>
>> A) Identifying a few "large", "rarely-interacting" threads that would
>> work reasonably well on an 8-core 2-CPU machine with 8 http_ports. This
>> should take the lessons learned from existing SMP designs into account,
>> with Squid specifics in mind. Henrik, Amos, and Adrian started
>> discussing this already.
>>
>> B) Making commonly used primitives thread-safe (mostly not in terms of
>> locking their shared state but in terms of not using static/shared data
>> that needs locking). Many posts on this subject, starting with Roberts
>> advice to desynchronize.
>>
>> C) Posting performance benchmarking results for single- and
>> multi-instance Squids on mutli-core systems as a baseline.
>>
>>
>> My mid-term focus will probably be on sharing http_port, memory cache,
>> disk cache and possibly loggin/stats among a "few large threads".
>>
>>
>> My overall goal is to at least approach the performance of a
>> multi-instance caching Squid on 8-core hardware.
>>
>>
>> I am not excited by the "one thread per message", "one thread per
>> AsyncJob", or similar "many tiny threads" designs because, IMO, they
>> would require too much rewriting to be implemented properly. This may
>> need to be re-evaluated as the world moves towards 1000-core systems,
>> but a lot of improvements necessary for the "few large threads" design
>> will not be wasted anyway.
>>
>> I hope that by focusing on a "few large threads" design and fixing
>> primitives we can gain "enough" SMP benefits in a few months of active
>> development. If you think there is a better way to get SMP benefits in
>> the foreseeable future, please post.
>>
>> Thank you,
>>
>> Alex.
>
>
> --
> Please be using
>  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
>  Current Beta Squid 3.1.0.14
>

Thats very good Alex, I am also working on the same, have not explored
everything yet, But with all you guys hopefully we will complete this
project as soon as possible, Issues in my mind will be discussed soon.



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp

2009-10-27 Thread Amos Jeffries

Alex Rousskov wrote:

Hello,

I will be working on SMP support in the coming months. I caught up
on this and the previous SMP-related squid-dev thread and it looks like
 the approach I currently favor has been discussed (again) and did not
cause any violent objections, although some ideas were much more
ambitious. I am not sure how we can reach consensus on this topic, but I
will through in some specifics in hope to better identify competing
approaches.


Okay. Before you get buried in this work might I request that you look 
at the stale-while-revalidate / stale-if-error code that was nearly 
finished?

http://bugs.squid-cache.org/show_bug.cgi?id=2255

Yahoo! named those as two of their requirements before it would be 
possible for them to assist with performance testing Squid-3. They might 
not be adverse to helping test SMP support if the other requirements are 
available.


Collapsed forwarding is also a requirement, but I suspect it is too 
close to the request handling and needs a re-designed code architecture 
to fit with whatever SMP threading model is taken anyways.




My short-term focus would be on the following three areas:

A) Identifying a few "large", "rarely-interacting" threads that would
work reasonably well on an 8-core 2-CPU machine with 8 http_ports. This
should take the lessons learned from existing SMP designs into account,
with Squid specifics in mind. Henrik, Amos, and Adrian started
discussing this already.

B) Making commonly used primitives thread-safe (mostly not in terms of
locking their shared state but in terms of not using static/shared data
that needs locking). Many posts on this subject, starting with Roberts
advice to desynchronize.

C) Posting performance benchmarking results for single- and
multi-instance Squids on mutli-core systems as a baseline.


My mid-term focus will probably be on sharing http_port, memory cache,
disk cache and possibly loggin/stats among a "few large threads".


My overall goal is to at least approach the performance of a
multi-instance caching Squid on 8-core hardware.


I am not excited by the "one thread per message", "one thread per
AsyncJob", or similar "many tiny threads" designs because, IMO, they
would require too much rewriting to be implemented properly. This may
need to be re-evaluated as the world moves towards 1000-core systems,
but a lot of improvements necessary for the "few large threads" design
will not be wasted anyway.

I hope that by focusing on a "few large threads" design and fixing
primitives we can gain "enough" SMP benefits in a few months of active
development. If you think there is a better way to get SMP benefits in
the foreseeable future, please post.

Thank you,

Alex.



--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
  Current Beta Squid 3.1.0.14


Re: squid-smp

2009-10-27 Thread Alex Rousskov
Hello,

I will be working on SMP support in the coming months. I caught up
on this and the previous SMP-related squid-dev thread and it looks like
 the approach I currently favor has been discussed (again) and did not
cause any violent objections, although some ideas were much more
ambitious. I am not sure how we can reach consensus on this topic, but I
will through in some specifics in hope to better identify competing
approaches.

My short-term focus would be on the following three areas:

A) Identifying a few "large", "rarely-interacting" threads that would
work reasonably well on an 8-core 2-CPU machine with 8 http_ports. This
should take the lessons learned from existing SMP designs into account,
with Squid specifics in mind. Henrik, Amos, and Adrian started
discussing this already.

B) Making commonly used primitives thread-safe (mostly not in terms of
locking their shared state but in terms of not using static/shared data
that needs locking). Many posts on this subject, starting with Roberts
advice to desynchronize.

C) Posting performance benchmarking results for single- and
multi-instance Squids on mutli-core systems as a baseline.


My mid-term focus will probably be on sharing http_port, memory cache,
disk cache and possibly loggin/stats among a "few large threads".


My overall goal is to at least approach the performance of a
multi-instance caching Squid on 8-core hardware.


I am not excited by the "one thread per message", "one thread per
AsyncJob", or similar "many tiny threads" designs because, IMO, they
would require too much rewriting to be implemented properly. This may
need to be re-evaluated as the world moves towards 1000-core systems,
but a lot of improvements necessary for the "few large threads" design
will not be wasted anyway.

I hope that by focusing on a "few large threads" design and fixing
primitives we can gain "enough" SMP benefits in a few months of active
development. If you think there is a better way to get SMP benefits in
the foreseeable future, please post.

Thank you,

Alex.


Happy Diwali to all : squid-smp

2009-10-15 Thread Sachin Malave
Hello,

Here in India we are celebrating DIWALI festival.
So I am on vacation for next 5 days. Will continue squid-smp
discussion soon

Happy Diwali to all..

-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp

2009-10-15 Thread Sachin Malave
On Thu, Oct 15, 2009 at 5:56 AM, Adrian Chadd  wrote:
> Oh, I can absolutely give you guys food for thought. I was just hoping
> someone else would already try to do a bit of legwork.
>
> Things to think about:
>
> * Do you really, -really- want to reinvent the malloc wheel? This is
> separate from caching results and pre-made class instances. There's
> been a lot of work in well-performing, thread-aware malloc libraries
> * Do you want to run things in multiple processes or multiple threads?
> Or support both?
> * How much of the application do you want to push out into separate
> threads? run lots of "copies" of Squid concurrently, with some locking
> going on? Break up individual parts of the processing pipeline into
> threads? (eg, what I'm going to be experimenting with soon - handling
> ICP/HTCP in a separate thread for some basic testing)
> * Survey the current codebase and figure out what depends upon what -
> in a way that you can use for figuring out what needs to be made
> re-entrant and what may need locking. Think about how to achieve all
> of this. Best example of this - you're going to need to figure out how
> to do concurrent debug logging and memory allocation - so see what
> that code uses, what that codes' code uses, etc
> * 10GE cards are dumping individual PCIe channels to CPUs; which means
> that the "most efficient" way of pumping data around will be to
> somehow throw individual connections onto specific CPUs, and keep them
> there. There's no OS support for this yet, but OSes may be "magical"
> (ie, handing you sockets in specific threads via accept() and hoping
> that the NIC doesn't reorganise its connection->PCIe channel hash)
> * Do you think its worth being able to migrate specific connections
> between threads? Or once they're in a thread they're there for good
> * If you split up squid into "lots of threads running the whole app",
> what and where would you envisage locking and blocking? What about
> data sharing? How would that scale given a handful of example
> workloads? What about in abnormal situations? How well will things
> degrade?
> * What about using message passing and message queues? Where would it
> be appropriate? Where wouldn't it be appropriate? Why?
>
> Here's an example:
>
> * Imagine you're doing store lookups using message passing with your
> "store" being a separate thread with a message queue. Think about how
> you'd handle say, ICP peering between two caches doing > 10,000
> requests a second. What repercussions does that have for the locking
> of the message queues between other threads. What are the other
> threads doing?
>
> With that in mind, survey the kinds of ways that current network apps
> "do" threading:
>
> * look at the various ways apache does it - eg, the per-connection
> thread+process hybrid model, the event-worker thread model, etc

Yes, I think...  if we have loosely coupled
architecture(distributed or multiprocessor )  then it is better to use
processes otherwise on multi-core platform threading model can be
used ( I am targeting multi-core ).

> * look at memcached - one thread doing accept'ing, farming requests
> off to other threads that just run a squid-like event loop. Minimal
> inter-thread communication for the most part
> * investigate what the concurrency hooks for various frameworks do -
> eg, the boost asio library stuff has "colours" which you mark thread
> events with. These colours dictate which events need to be run
> sequentially and which can run in parallel
> * look at all of the random blogs written by windows networking coders
> - they're further ahead of the massively-concurrent network
> application stack because Windows has had it for a number of years.
>

Ok !!! Questions that you have raised would be considered while
creating threads or processes.

> Now. You've mentioned you've looked at the others and you think major
> replumbing is going to be needed. Here's a hint - its going to be
> needed. Thinking you can avoid it is silly. Figuring out what you can
> do right now that doesn't lock you into a specific trajectory is -not-
> silly. For example, figuring out what APIs need to be changed to make
> them re-enterant is not silly. Most of the stuff in lib/ with static
> char buffers that they return need to be changed. That can be done
> -now- without having to lock yourself into a particular concurrency
> model.
>
> 2c,


thank you :)

>
>
>
> Adrian
>
> 2009/10/15 Amos Jeffries :
>> Adrian Chadd wrote:
>>>
>>> 2009/10/15 Sachin Malave :
>>>
 Its not like we want to make project bad. Squid was not deployed on
 smp before because we did not have shared memory architectures
 (multi-cores), also the library support for multi-threading was like
 nightmare for people. Now things are changed, it is very easy to
 manage threads, people have multi-core machines at their desktops, and
 as hardware is available now or later somebody has to try and build
 

Re: squid-smp

2009-10-15 Thread Adrian Chadd
Oh, I can absolutely give you guys food for thought. I was just hoping
someone else would already try to do a bit of legwork.

Things to think about:

* Do you really, -really- want to reinvent the malloc wheel? This is
separate from caching results and pre-made class instances. There's
been a lot of work in well-performing, thread-aware malloc libraries
* Do you want to run things in multiple processes or multiple threads?
Or support both?
* How much of the application do you want to push out into separate
threads? run lots of "copies" of Squid concurrently, with some locking
going on? Break up individual parts of the processing pipeline into
threads? (eg, what I'm going to be experimenting with soon - handling
ICP/HTCP in a separate thread for some basic testing)
* Survey the current codebase and figure out what depends upon what -
in a way that you can use for figuring out what needs to be made
re-entrant and what may need locking. Think about how to achieve all
of this. Best example of this - you're going to need to figure out how
to do concurrent debug logging and memory allocation - so see what
that code uses, what that codes' code uses, etc
* 10GE cards are dumping individual PCIe channels to CPUs; which means
that the "most efficient" way of pumping data around will be to
somehow throw individual connections onto specific CPUs, and keep them
there. There's no OS support for this yet, but OSes may be "magical"
(ie, handing you sockets in specific threads via accept() and hoping
that the NIC doesn't reorganise its connection->PCIe channel hash)
* Do you think its worth being able to migrate specific connections
between threads? Or once they're in a thread they're there for good
* If you split up squid into "lots of threads running the whole app",
what and where would you envisage locking and blocking? What about
data sharing? How would that scale given a handful of example
workloads? What about in abnormal situations? How well will things
degrade?
* What about using message passing and message queues? Where would it
be appropriate? Where wouldn't it be appropriate? Why?

Here's an example:

* Imagine you're doing store lookups using message passing with your
"store" being a separate thread with a message queue. Think about how
you'd handle say, ICP peering between two caches doing > 10,000
requests a second. What repercussions does that have for the locking
of the message queues between other threads. What are the other
threads doing?

With that in mind, survey the kinds of ways that current network apps
"do" threading:

* look at the various ways apache does it - eg, the per-connection
thread+process hybrid model, the event-worker thread model, etc
* look at memcached - one thread doing accept'ing, farming requests
off to other threads that just run a squid-like event loop. Minimal
inter-thread communication for the most part
* investigate what the concurrency hooks for various frameworks do -
eg, the boost asio library stuff has "colours" which you mark thread
events with. These colours dictate which events need to be run
sequentially and which can run in parallel
* look at all of the random blogs written by windows networking coders
- they're further ahead of the massively-concurrent network
application stack because Windows has had it for a number of years.

Now. You've mentioned you've looked at the others and you think major
replumbing is going to be needed. Here's a hint - its going to be
needed. Thinking you can avoid it is silly. Figuring out what you can
do right now that doesn't lock you into a specific trajectory is -not-
silly. For example, figuring out what APIs need to be changed to make
them re-enterant is not silly. Most of the stuff in lib/ with static
char buffers that they return need to be changed. That can be done
-now- without having to lock yourself into a particular concurrency
model.

2c,



Adrian

2009/10/15 Amos Jeffries :
> Adrian Chadd wrote:
>>
>> 2009/10/15 Sachin Malave :
>>
>>> Its not like we want to make project bad. Squid was not deployed on
>>> smp before because we did not have shared memory architectures
>>> (multi-cores), also the library support for multi-threading was like
>>> nightmare for people. Now things are changed, it is very easy to
>>> manage threads, people have multi-core machines at their desktops, and
>>> as hardware is available now or later somebody has to try and build
>>> SMP support. think about future...
>>>
>>> To cop with internet speed & increase in number of users, Squid must
>>> use multi-core architecture and distribute its work
>>
>> I 100% agree with your comments. I agree 100% that Squid needs to be
>> made scalable on multi-core boxes.
>>
>> Writing threaded code may be easier now than in the past, but the ways
>> of screwing stability, debuggability, performance and such -haven't-
>> changed.. This is what I'm trying to get across. :)
>
> Aye, understood. Which is why I've made sure all this discussion is done i

Re: squid-smp

2009-10-15 Thread Amos Jeffries

Adrian Chadd wrote:

2009/10/15 Sachin Malave :


Its not like we want to make project bad. Squid was not deployed on
smp before because we did not have shared memory architectures
(multi-cores), also the library support for multi-threading was like
nightmare for people. Now things are changed, it is very easy to
manage threads, people have multi-core machines at their desktops, and
as hardware is available now or later somebody has to try and build
SMP support. think about future...

To cop with internet speed & increase in number of users, Squid must
use multi-core architecture and distribute its work


I 100% agree with your comments. I agree 100% that Squid needs to be
made scalable on multi-core boxes.

Writing threaded code may be easier now than in the past, but the ways
of screwing stability, debuggability, performance and such -haven't-
changed.. This is what I'm trying to get across. :)


Aye, understood. Which is why I've made sure all this discussion is done 
in squid-dev. So those like yourself who might have anything to point at 
as good/bad examples can do so.


Sure, Squid can be re-written from the group up yet again. But none of 
us want the ten year delay that will cause. The answer is to drop eight 
years of improvements and use the Squid-2 code, or go ahead with a 
somewhat incompletely upgraded Squid-3 code. Leveraging some of the SMP 
work to further upgrade the remaining sections, while just slipping SMP 
into the currently upgraded components.


Do you actually have any relevant implementations you in your infinite 
wisdom and foresight want to point us at? Or just diss us for not 
knowing enough?


I'm already aware of the overall models Varnish, Oops, Apache, and 
Polipo, and Nginx are documented as using. Without looking at the code 
it's clear that their approaches are not beneficial to Squid without 
major re-plumbing.


The solution we have to use is a mix, possibly unique to Squid, which 
retains Squids features and niche coverage. The right mix of tools for 
each task to be performed: child processes, IPC, and events. Now adding 
threads for the pieces that are applicable. There is order in the chaos.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
  Current Beta Squid 3.1.0.14


Re: squid-smp

2009-10-15 Thread Adrian Chadd
2009/10/15 Sachin Malave :

> Its not like we want to make project bad. Squid was not deployed on
> smp before because we did not have shared memory architectures
> (multi-cores), also the library support for multi-threading was like
> nightmare for people. Now things are changed, it is very easy to
> manage threads, people have multi-core machines at their desktops, and
> as hardware is available now or later somebody has to try and build
> SMP support. think about future...
>
> To cop with internet speed & increase in number of users, Squid must
> use multi-core architecture and distribute its work

I 100% agree with your comments. I agree 100% that Squid needs to be
made scalable on multi-core boxes.

Writing threaded code may be easier now than in the past, but the ways
of screwing stability, debuggability, performance and such -haven't-
changed.. This is what I'm trying to get across. :)



Adrian


Re: squid-smp

2009-10-15 Thread Sachin Malave
On Wed, Oct 14, 2009 at 11:17 PM, Adrian Chadd  wrote:
> 2009/10/14 Amos Jeffries :
>
> [snip]
>
> I still find it very amusing that noone else has sat down and talked
> about the last 20 + years of writing threaded, concurrent code and
> what the pro/cons of them would be here; nor what other projects are
> doing.
>
> Please don't sit down and talk about how to shoehorn SMP into some
> existing Squid-3 "thing" (be it AsyncCalls, or anything really) before
> doing this. You'll just be re-inventing the same mistakes made in the
> past and it will make the project look bad.
>
>
>
> Adrian
>

Its not like we want to make project bad. Squid was not deployed on
smp before because we did not have shared memory architectures
(multi-cores), also the library support for multi-threading was like
nightmare for people. Now things are changed, it is very easy to
manage threads, people have multi-core machines at their desktops, and
as hardware is available now or later somebody has to try and build
SMP support. think about future...

To cop with internet speed & increase in number of users, Squid must
use multi-core architecture and distribute its work



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp

2009-10-14 Thread Adrian Chadd
2009/10/14 Amos Jeffries :

[snip]

I still find it very amusing that noone else has sat down and talked
about the last 20 + years of writing threaded, concurrent code and
what the pro/cons of them would be here; nor what other projects are
doing.

Please don't sit down and talk about how to shoehorn SMP into some
existing Squid-3 "thing" (be it AsyncCalls, or anything really) before
doing this. You'll just be re-inventing the same mistakes made in the
past and it will make the project look bad.



Adrian


Re: squid-smp

2009-10-13 Thread Amos Jeffries
On Tue, 13 Oct 2009 10:54:14 -0400, Sachin Malave 
wrote:
> On Tue, Oct 13, 2009 at 8:12 AM, Amos Jeffries 
> wrote:
>> Sachin Malave wrote:
>>>
>>>
>>> On Mon, Oct 12, 2009 at 8:33 PM, Amos Jeffries >> > wrote:
>>>
>>>On Tue, 13 Oct 2009 00:29:56 +0200, Henrik Nordstrom
>>>mailto:hen...@henriknordstrom.net>>
>>> wrote:
>>> > fre 2009-10-09 klockan 01:50 -0400 skrev Sachin Malave:
>>> >
>>> >> I think it is possible to have a thread , which will be
watching
>>> >> AsyncCallQueue, if it finds an entry there then it will execute
>>> the
>>> >> dial() function.
>>> >
>>> > Except that none of the dialed AsyncCall handlers is currently
>>> thread
>>> > safe.. all expect to be running in the main thread all alone.
>>>
>>>Which raises the issue of whether to add a second main queue loop
>>> for
>>>thread-safe calls.  Then schedule calls which have been audited and
>>>found
>>>safe to that queue instead of the current main queue. Usage would
be
>>>low to
>>>start with but would allow ongoing incremental SMP improvements by
>>>gradually migrating chunks of code to be thread-safe.
>>>
>>>An alternate would be thread-safe the queue and add a flag to say
>>>particular calls are thread-safe. That would mean walking the queue
>>>repeatedly looking for them. Which is perhaps less desirable at the
>>>start
>>>of conversion when few calls are threaded. But gains in utility
>>>relative to
>>>the thread-safety progress.
>>>
>>>This involves a small amount of extra code in schedule() to flag
>>> which
>>>queue the calls is sent to, and a chunk of extra memory for
>>>duplicate queue
>>>management objects.
>>>
>>>
>>>
---
>>> Ok if that is possible then would like to make those changes,  Either
>>> of
>>> them will be tried...
>>>
>>> One more thing...
>>>
>>> Are you thinking about spawning multiple threads or single thread
>>> separated from main is sufficient for handling all scheduled calls.
>>> Here multiple threads means, we could have threads all trying to dial
>>> entries in AsyncCallQueue simultaneously.
>>
>> That would be up to you.
>>
>> I had not thought more than one dialer thread per CPU necessary at this
>> stage. Though with both verified thread-safe calls and a thread-safe
>> queue,
>> multiple dialer threads should not be an issue. Doing more than
necessary
>> would merely be a waste of resources.
> 
> Yeah ! I am also thinking the same...
> 
>>
>> Squid would essentially segment into multiple 'main' threads / dialers
>> running one to a CPU and sharing minimal amounts of state. Slightly
more
>> efficient and far easier to configure than current setups of multiple
>> interlinked Squid instances.
> 
> Ok, This could be done...  will give good performance results.. But
> want to know more about this, please come again with more precise
> definition.
> Are you talking about multi-instance squid ?
> http://wiki.squid-cache.org/MultipleInstances.

Yes. That is the current way of handling SMP. Rather nasty from the admin
viewpoint.
I'm just looking far ahead to the end-result of adding multiple dialer
threads. It's a happier place :).

>>
>> Without knowing too much, I'm assuming the Job ID can be used to
identify
>> calls a particular thread/job runs.
>>
>> Amos
>>
> 
> 
> 
> 
>>>
>>>
---
>>>
>>> >
>>> >> can we separate dispatchCalls() in EventLoop.cc for that
>>> purpose?
>>> We
>>> >> can have a thread executing distatchCalls() continuously
>>>
>>>This is an end-goal. Jumping straight there for everything is
>>> usually a
>>>mistake. But good to re-state it anyway.
>>>
>>> >> and if error
>>> >> condition occurs it is written  in "error" shared variable.
>>>which
>>> >> is then read by main thread executing mainLoop... in the
>>>same way
>>> >> returned dispatchedSome can also be passed to main thread...
>>>
>>>I think I follow. You mean something like the way errno works in
the
>>> OS?
>>>Doing that would be a major crutch in Squid. I'd rather have an
>>> error
>>>object per-job (field in the job descriptor object) which the job
>>>handlers
>>>can use according to the job needs.
>>>Some will result in data sent back to the client, some in a
>>> completely
>>>altered handling pathway.
>>>
>>>Amos
>>>
>>>
>>>
>>>
>>> --
>>> Mr. S. H. Malave
>>> Computer Science & Engineering Department,
>>> Walchand College of Engineering,Sangli.
>>> sachinmal...@wce.org.in 
>>

Amos


Re: squid-smp

2009-10-13 Thread Sachin Malave
On Tue, Oct 13, 2009 at 8:12 AM, Amos Jeffries  wrote:
> Sachin Malave wrote:
>>
>>
>> On Mon, Oct 12, 2009 at 8:33 PM, Amos Jeffries > > wrote:
>>
>>    On Tue, 13 Oct 2009 00:29:56 +0200, Henrik Nordstrom
>>    mailto:hen...@henriknordstrom.net>> wrote:
>>     > fre 2009-10-09 klockan 01:50 -0400 skrev Sachin Malave:
>>     >
>>     >> I think it is possible to have a thread , which will be watching
>>     >> AsyncCallQueue, if it finds an entry there then it will execute the
>>     >> dial() function.
>>     >
>>     > Except that none of the dialed AsyncCall handlers is currently
>> thread
>>     > safe.. all expect to be running in the main thread all alone.
>>
>>    Which raises the issue of whether to add a second main queue loop for
>>    thread-safe calls.  Then schedule calls which have been audited and
>>    found
>>    safe to that queue instead of the current main queue. Usage would be
>>    low to
>>    start with but would allow ongoing incremental SMP improvements by
>>    gradually migrating chunks of code to be thread-safe.
>>
>>    An alternate would be thread-safe the queue and add a flag to say
>>    particular calls are thread-safe. That would mean walking the queue
>>    repeatedly looking for them. Which is perhaps less desirable at the
>>    start
>>    of conversion when few calls are threaded. But gains in utility
>>    relative to
>>    the thread-safety progress.
>>
>>    This involves a small amount of extra code in schedule() to flag which
>>    queue the calls is sent to, and a chunk of extra memory for
>>    duplicate queue
>>    management objects.
>>
>>
>> ---
>> Ok if that is possible then would like to make those changes,  Either of
>> them will be tried...
>>
>> One more thing...
>>
>> Are you thinking about spawning multiple threads or single thread
>> separated from main is sufficient for handling all scheduled calls.
>> Here multiple threads means, we could have threads all trying to dial
>> entries in AsyncCallQueue simultaneously.
>
> That would be up to you.
>
> I had not thought more than one dialer thread per CPU necessary at this
> stage. Though with both verified thread-safe calls and a thread-safe queue,
> multiple dialer threads should not be an issue. Doing more than necessary
> would merely be a waste of resources.

Yeah ! I am also thinking the same...

>
> Squid would essentially segment into multiple 'main' threads / dialers
> running one to a CPU and sharing minimal amounts of state. Slightly more
> efficient and far easier to configure than current setups of multiple
> interlinked Squid instances.

Ok, This could be done...  will give good performance results.. But
want to know more about this, please come again with more precise
definition.
Are you talking about multi-instance squid ?
http://wiki.squid-cache.org/MultipleInstances.


>
> Without knowing too much, I'm assuming the Job ID can be used to identify
> calls a particular thread/job runs.
>
> Amos
>




>>
>> ---
>>
>>     >
>>     >> can we separate dispatchCalls() in EventLoop.cc for that purpose?
>> We
>>     >> can have a thread executing distatchCalls() continuously
>>
>>    This is an end-goal. Jumping straight there for everything is usually a
>>    mistake. But good to re-state it anyway.
>>
>>     >> and if error
>>     >> condition occurs it is written  in "error" shared variable.
>>    which
>>     >> is then read by main thread executing mainLoop... in the
>>    same way
>>     >> returned dispatchedSome can also be passed to main thread...
>>
>>    I think I follow. You mean something like the way errno works in the
>> OS?
>>    Doing that would be a major crutch in Squid. I'd rather have an error
>>    object per-job (field in the job descriptor object) which the job
>>    handlers
>>    can use according to the job needs.
>>    Some will result in data sent back to the client, some in a completely
>>    altered handling pathway.
>>
>>    Amos
>>
>>
>>
>>
>> --
>> Mr. S. H. Malave
>> Computer Science & Engineering Department,
>> Walchand College of Engineering,Sangli.
>> sachinmal...@wce.org.in 
>
>
> --
> Please be using
>  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
>  Current Beta Squid 3.1.0.14
>



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp

2009-10-13 Thread Amos Jeffries

Sachin Malave wrote:



On Mon, Oct 12, 2009 at 8:33 PM, Amos Jeffries > wrote:


On Tue, 13 Oct 2009 00:29:56 +0200, Henrik Nordstrom
mailto:hen...@henriknordstrom.net>> wrote:
 > fre 2009-10-09 klockan 01:50 -0400 skrev Sachin Malave:
 >
 >> I think it is possible to have a thread , which will be watching
 >> AsyncCallQueue, if it finds an entry there then it will execute the
 >> dial() function.
 >
 > Except that none of the dialed AsyncCall handlers is currently thread
 > safe.. all expect to be running in the main thread all alone.

Which raises the issue of whether to add a second main queue loop for
thread-safe calls.  Then schedule calls which have been audited and
found
safe to that queue instead of the current main queue. Usage would be
low to
start with but would allow ongoing incremental SMP improvements by
gradually migrating chunks of code to be thread-safe.

An alternate would be thread-safe the queue and add a flag to say
particular calls are thread-safe. That would mean walking the queue
repeatedly looking for them. Which is perhaps less desirable at the
start
of conversion when few calls are threaded. But gains in utility
relative to
the thread-safety progress.

This involves a small amount of extra code in schedule() to flag which
queue the calls is sent to, and a chunk of extra memory for
duplicate queue
management objects.

---
Ok if that is possible then would like to make those changes,  Either of 
them will be tried...


One more thing...

Are you thinking about spawning multiple threads or single thread 
separated from main is sufficient for handling all scheduled calls.
Here multiple threads means, we could have threads all trying to dial 
entries in AsyncCallQueue simultaneously.


That would be up to you.

I had not thought more than one dialer thread per CPU necessary at this 
stage. Though with both verified thread-safe calls and a thread-safe 
queue, multiple dialer threads should not be an issue. Doing more than 
necessary would merely be a waste of resources.


Squid would essentially segment into multiple 'main' threads / dialers 
running one to a CPU and sharing minimal amounts of state. Slightly more 
efficient and far easier to configure than current setups of multiple 
interlinked Squid instances.


Without knowing too much, I'm assuming the Job ID can be used to 
identify calls a particular thread/job runs.


Amos


---
 


 >
 >> can we separate dispatchCalls() in EventLoop.cc for that purpose? We
 >> can have a thread executing distatchCalls() continuously

This is an end-goal. Jumping straight there for everything is usually a
mistake. But good to re-state it anyway.

 >> and if error
 >> condition occurs it is written  in "error" shared variable.
which
 >> is then read by main thread executing mainLoop... in the
same way
 >> returned dispatchedSome can also be passed to main thread...

I think I follow. You mean something like the way errno works in the OS?
Doing that would be a major crutch in Squid. I'd rather have an error
object per-job (field in the job descriptor object) which the job
handlers
can use according to the job needs.
Some will result in data sent back to the client, some in a completely
altered handling pathway.

Amos




--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in 



--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
  Current Beta Squid 3.1.0.14


squid-smp

2009-10-13 Thread Sachin Malave
-- Forwarded message --
From: Sachin Malave 
Date: Tue, Oct 13, 2009 at 7:56 AM
Subject: Re: squid-smp
To: Amos Jeffries 
Cc: Henrik Nordstrom , Squid Developers





On Mon, Oct 12, 2009 at 8:33 PM, Amos Jeffries  wrote:
>
> On Tue, 13 Oct 2009 00:29:56 +0200, Henrik Nordstrom
>  wrote:
> > fre 2009-10-09 klockan 01:50 -0400 skrev Sachin Malave:
> >
> >> I think it is possible to have a thread , which will be watching
> >> AsyncCallQueue, if it finds an entry there then it will execute the
> >> dial() function.
> >
> > Except that none of the dialed AsyncCall handlers is currently thread
> > safe.. all expect to be running in the main thread all alone.
>
> Which raises the issue of whether to add a second main queue loop for
> thread-safe calls.  Then schedule calls which have been audited and found
> safe to that queue instead of the current main queue. Usage would be low to
> start with but would allow ongoing incremental SMP improvements by
> gradually migrating chunks of code to be thread-safe.
>
> An alternate would be thread-safe the queue and add a flag to say
> particular calls are thread-safe. That would mean walking the queue
> repeatedly looking for them. Which is perhaps less desirable at the start
> of conversion when few calls are threaded. But gains in utility relative to
> the thread-safety progress.
>
> This involves a small amount of extra code in schedule() to flag which
> queue the calls is sent to, and a chunk of extra memory for duplicate queue
> management objects.
>


---
Ok if that is possible then would like to make those changes,  Either
of them will be tried...

One more thing...

Are you thinking about spawning multiple threads or single thread
separated from main is sufficient for handling all scheduled calls.
Here multiple threads means, we could have threads all trying to dial
entries in AsyncCallQueue simultaneously.
---



>
> >
> >> can we separate dispatchCalls() in EventLoop.cc for that purpose? We
> >> can have a thread executing distatchCalls() continuously
>
> This is an end-goal. Jumping straight there for everything is usually a
> mistake. But good to re-state it anyway.
>
> >> and if error
> >> condition occurs it is written  in "error" shared variable. which
> >> is then read by main thread executing mainLoop... in the same way
> >> returned dispatchedSome can also be passed to main thread...
>
> I think I follow. You mean something like the way errno works in the OS?
> Doing that would be a major crutch in Squid. I'd rather have an error
> object per-job (field in the job descriptor object) which the job handlers
> can use according to the job needs.
> Some will result in data sent back to the client, some in a completely
> altered handling pathway.
>
> Amos
>



--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp

2009-10-12 Thread Amos Jeffries
On Tue, 13 Oct 2009 00:29:56 +0200, Henrik Nordstrom
 wrote:
> fre 2009-10-09 klockan 01:50 -0400 skrev Sachin Malave:
> 
>> I think it is possible to have a thread , which will be watching
>> AsyncCallQueue, if it finds an entry there then it will execute the
>> dial() function.
> 
> Except that none of the dialed AsyncCall handlers is currently thread
> safe.. all expect to be running in the main thread all alone.

Which raises the issue of whether to add a second main queue loop for
thread-safe calls.  Then schedule calls which have been audited and found
safe to that queue instead of the current main queue. Usage would be low to
start with but would allow ongoing incremental SMP improvements by
gradually migrating chunks of code to be thread-safe.

An alternate would be thread-safe the queue and add a flag to say
particular calls are thread-safe. That would mean walking the queue
repeatedly looking for them. Which is perhaps less desirable at the start
of conversion when few calls are threaded. But gains in utility relative to
the thread-safety progress.

This involves a small amount of extra code in schedule() to flag which
queue the calls is sent to, and a chunk of extra memory for duplicate queue
management objects.

> 
>> can we separate dispatchCalls() in EventLoop.cc for that purpose? We
>> can have a thread executing distatchCalls() continuously

This is an end-goal. Jumping straight there for everything is usually a
mistake. But good to re-state it anyway.

>> and if error
>> condition occurs it is written  in "error" shared variable. which
>> is then read by main thread executing mainLoop... in the same way
>> returned dispatchedSome can also be passed to main thread...

I think I follow. You mean something like the way errno works in the OS?
Doing that would be a major crutch in Squid. I'd rather have an error
object per-job (field in the job descriptor object) which the job handlers
can use according to the job needs.
Some will result in data sent back to the client, some in a completely
altered handling pathway.

Amos



Re: squid-smp

2009-10-12 Thread Henrik Nordstrom
fre 2009-10-09 klockan 01:50 -0400 skrev Sachin Malave:

> I think it is possible to have a thread , which will be watching
> AsyncCallQueue, if it finds an entry there then it will execute the
> dial() function.

Except that none of the dialed AsyncCall handlers is currently thread
safe.. all expect to be running in the main thread all alone..

> can we separate dispatchCalls() in EventLoop.cc for that purpose? We
> can have a thread executing distatchCalls() continuously and if error
> condition occurs it is written  in "error" shared variable. which
> is then read by main thread executing mainLoop... in the same way
> returned dispatchedSome can also be passed to main thread...

Not sure I follow.

Regards
Henrik



Re: Squid-smp : Please discuss

2009-10-08 Thread Sachin Malave
On Mon, Sep 14, 2009 at 6:43 AM, Amos Jeffries  wrote:
>
> I think we should take this on-list so the others with more detailed 
> knowledge can give advice in case I have my facts wrong about AsyncCalls...
>
> I mean some mutex/lock on AsyncCallQueue so the multiple threads can do  
> AsyncCallQueue::schedule(call) without pointer collisions with theTail and 
> theHead, when they setup the first read of an accepted()'d FD.
>
> For example something like this...
>
> thread #1:
>   while (  ) {
>      queuedEventX.dial()
>   }


I think it is possible to have a thread , which will be watching
AsyncCallQueue, if it finds an entry there then it will execute the
dial() function.
can we separate dispatchCalls() in EventLoop.cc for that purpose? We
can have a thread executing distatchCalls() continuously and if error
condition occurs it is written  in "error" shared variable. which
is then read by main thread executing mainLoop... in the same way
returned dispatchedSome can also be passed to main thread...

we need to lock the AsyncCallQueue using some locking mechanism. If
that is done, is there any other variable that also must be considered
?


And no need to create a separate thread for schedule(), as it is not
doing much computations there, main thread can handle it easily...

>
> thread #2:
>   while(  ) {
>     newFD = accept()
>     readFromNewFd = new AsyncCallPointer...
>     AsyncCallQueue::schedule(readFromNewFd);
>   }
>
>
>>
>> Need some time for further analysis...
>>
>> Thank you so much..
>>
>
> Amos
> --
> Please be using
>  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE19
>  Current Beta Squid 3.1.0.13



--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


squid-smp

2009-10-08 Thread Sachin Malave
As discussed before...




On Mon, Sep 14, 2009 at 6:43 AM, Amos Jeffries  wrote:
>
> I think we should take this on-list so the others with more detailed 
> knowledge can give advice in case I have my facts wrong about AsyncCalls...
>
> I mean some mutex/lock on AsyncCallQueue so the multiple threads can do  
> AsyncCallQueue::schedule(call) without pointer collisions with theTail and 
> theHead, when they setup the first read of an accepted()'d FD.
>
> For example something like this...
>
> thread #1:
>   while (  ) {
>  queuedEventX.dial()
>   }


I think it is possible to have a thread , which will be watching
AsyncCallQueue, if it finds an entry there then it will execute the
dial() function.
can we separate dispatchCalls() in EventLoop.cc for that purpose? We
can have a thread executing distatchCalls() continuously and if error
condition occurs it is written  in "error" shared variable. which
is then read by main thread executing mainLoop... in the same way
returned dispatchedSome can also be passed to main thread...

we need to lock the AsyncCallQueue using some locking mechanism. If
that is done, is there any other variable that also must be considered
?


And no need to create a separate thread for schedule(), as it is not
doing much computations there, main thread can handle it easily...

>
> thread #2:
>   while(  ) {
> newFD = accept()
> readFromNewFd = new AsyncCallPointer...
> AsyncCallQueue::schedule(readFromNewFd);
>   }
>
>
>>
>> Need some time for further analysis...
>>
>> Thank you so much..
>>
>
> Amos
> --
> Please be using
>  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE19
>  Current Beta Squid 3.1.0.13



--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


R: R: Squid-smp : Please discuss

2009-09-16 Thread Guido Serassio
Hi Henrik,

> -Messaggio originale-
> Da: Henrik Nordstrom [mailto:hen...@henriknordstrom.net]
> Inviato: martedì 15 settembre 2009 21.24
> A: Guido Serassio
> Cc: Sachin Malave; Adrian Chadd; Robert Collins; Amos Jeffries; Alex
> Rousskov; Squid Developers
> Oggetto: Re: R: Squid-smp : Please discuss
> 
> tis 2009-09-15 klockan 08:39 +0200 skrev Guido Serassio:
> 
> > But MSYS + MinGW provides gcc 3.4.5 and the Squid 3 Visual Studio
> > Project is based on Visual Studio 2005.
> 
> There is GCC-4.x for MinGW as well. What I have in my installations.
> Just not classified as the current production release for some reason
> which more and more is ignoring toda.
> 
> Regards
> Henrik

Yes, I know this, but on principle I use STABLE development tools. 

Regards

Guido Serassio
Acme Consulting S.r.l.
Microsoft Gold Certified Partner
Via Lucia Savarino, 110098 - Rivoli (TO) - ITALY
Tel. : +39.011.9530135   Fax. : +39.011.9781115
Email: guido.seras...@acmeconsulting.it
WWW: http://www.acmeconsulting.it



Re: R: Squid-smp : Please discuss

2009-09-15 Thread Henrik Nordstrom
tis 2009-09-15 klockan 10:58 -0400 skrev Sachin Malave:

> But if we use external library like openmp then i dont think we need
> to make much changes in actual codes. But that is only possible if we
> find blocks  in existing codes which could be executed by different
> threads...

Unfortunately the execution blocks in Squid is very small, with it being
designed for non-blocking I/O operation (which can be compared as a form
of cooperative multitasking if you like).

Regards
Henrik



Re: R: Squid-smp : Please discuss

2009-09-15 Thread Henrik Nordstrom
tis 2009-09-15 klockan 08:39 +0200 skrev Guido Serassio:

> But MSYS + MinGW provides gcc 3.4.5 and the Squid 3 Visual Studio
> Project is based on Visual Studio 2005.

There is GCC-4.x for MinGW as well. What I have in my installations.
Just not classified as the current production release for some reason
which more and more is ignoring toda.

Regards
Henrik



Re: Squid-smp : Please discuss

2009-09-15 Thread Henrik Nordstrom
tis 2009-09-15 klockan 05:27 +0200 skrev Kinkie:

> I'm going to kick-start a new round then. If the approach has already
> been discussed, please forgive me and ignore this post.
> The idea is.. but what if we tried using a shared-nothing approach?

Yes, that was my preference in the previous round as well, and then move
from there to add back shared aspects. Using one process per CPU core,
non-blocking within that process and maybe internal offloading to
threads in things like eCAP.

Having requests bounce between threads is generally a bad idea from a
performance perspective, and should only be used when there is obvious
offload benefits where the operation to be performed is considerably
heavier than the transition between threads. Most operations are not..

Within the process I have been toying with the idea of using a message
based design rather than async calls with callbacks to further break up
and isolate components, especially in areas where adding back sharedness
is desired. But that's a side track, and same goals can be accomplished
with asynccall interfaces.

> Quick run-down: there is farm of processes, each with its own
> cache_mem and cache_dir(s). When a process receives a request, it
> parses it, hashes it somehow (CARP or a variation thereof) and defines
> if should handle it or if some other process should handle it. If it's
> some other process, it uses a Unix socket and some simple
> serialization protocol to pass around the parsed request and the file
> descriptor, so that the receiving process can pick up and continue
> servicing the request.

Just doing an internal CARP type forwarding is probably preferable, even
if it adds another internal hop. Things like SSL complicates fully
moving the request.

> There are some hairy bits (management, accounting, reconfiguration..)
> and some less hairy bits (hashing algorithm to use, whether there is a
> "master" process and a workers farm, or whether workers compete on
> accept()ing), but on a first sight it would seem a simpler approach
> than the extensive threading and locking we're talking about, AND it's
> completely orthogonal to it (so it could be viewed as a medium-term
> solution while AsyncCalls-ification remains active as a long-term
> refactoring activity, which will eventually lead to a true MT-squid)

I do not think we will see Squid ever become a true MT-squid without a
complete ground-up rewrite. Moving from single-thread all shared single
access data without locking to multithread is a very complex path.

Regards
Henrik



Re: R: Squid-smp : Please discuss

2009-09-15 Thread Sachin Malave
On Tue, Sep 15, 2009 at 9:47 AM, Matt W. Benjamin  wrote:
> Hi all,
>
> I'm quite sure that you're going to do what you always do, but my belief is, 
> it's fruitless to look to external libraries and (especially) more complex 
> language infrastructure as a substitute for the old-fashioned work of working 
> through the code, subsystem by subsystem, to establish MP safety.
>
> The result of this sort of approach, as I think the result has been from the 
> transition to C++ as a whole, to could very likely be to:
>
> a. increase the size of
> b. potentially, -reduce- the profiled performance of
> c. and I think very likely, reduce the overall value of
>
> the codebase as a whole.

Yes, that is also one of the issues..  squid3 codes(c++) are very
complicated compared to original squid written in c, That already have
increased size of source codes..
But if we use external library like openmp then i dont think we need
to make much changes in actual codes. But that is only possible if we
find blocks  in existing codes which could be executed by different
threads...

I am trying hard to analyze it for the same & would like to spend some
time on it before come to conclusion. Will sure not disturb
everything. And also want to keep codes as simple as possible

Will come up with new ideas soon

Thanks


>
> Matt
>
> - "Sachin Malave"  wrote:
>
>> On Tue, Sep 15, 2009 at 2:39 AM, Guido Serassio
>>  wrote:
>> > Hi,
>> >
>> >>
>> >> And  current generation libraries are also far better than older,
>> like
>> >> OpenMP, creating threads and handling synchronization issues in
>> OpenMP
>> >> is very easy...
>> >>
>> >> Automatic locks are provided, u need not to design your own
>> locking
>> >> mechanisms Just a statement and u can lock the shared
>> >> variable...
>> >> Then the major work remains is to identify the shared access.
>> >>
>> >> I WANT TO USE OPENMP library.
>> >>
>> >> ANY suggestions.
>> >
>> > Just a multi platform consideration:
>> >
>> > Don't forget that such libraries like OpenMP could not be available
>> on
>> > all Squid supported platforms.
>> > As example, on Windows OpenMP is available only using gcc 4.3.2 and
>> > later or MS Visual Studio 2008.
>> > But MSYS + MinGW provides gcc 3.4.5 and the Squid 3 Visual Studio
>> > Project is based on Visual Studio 2005.
>> >
>> > So, please, we should be very careful when thinking about Squid MP.
>>
>> Well, if that is so. We will use OpenMP 2.0 standard which is
>> available on Visual Studio 2005 also
>> Tasking model is added in 3.0 version, which has changed whole OpenMP
>> programming style but it is not needed in our case, I guess..
>> Thanks for pointing out this..
>>
>> Will try to use 2.0 version. Please tell me if you have different
>> library in your mind.
>>
>>
>>
>> >
>> > Regards
>> >
>> > Guido Serassio
>> > Acme Consulting S.r.l.
>> > Microsoft Gold Certified Partner
>> > Via Lucia Savarino, 1                10098 - Rivoli (TO) - ITALY
>> > Tel. : +39.011.9530135               Fax. : +39.011.9781115
>> > Email: guido.seras...@acmeconsulting.it
>> > WWW: http://www.acmeconsulting.it
>> >
>> >
>> >
>
> --
>
> Matt Benjamin
>
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
>
> http://linuxbox.com
>
> tel. 734-761-4689
> fax. 734-769-8938
> cel. 734-216-5309
>



-- 
Mr. S. H. Malave
MTech,
Computer Science & Engineering Dept.,
Walchand College of Engineering,
Sangli.
Mob. 9860470739
sachinmal...@wce.org.in


Re: R: Squid-smp : Please discuss

2009-09-15 Thread Matt W. Benjamin
Hi all,

I'm quite sure that you're going to do what you always do, but my belief is, 
it's fruitless to look to external libraries and (especially) more complex 
language infrastructure as a substitute for the old-fashioned work of working 
through the code, subsystem by subsystem, to establish MP safety.

The result of this sort of approach, as I think the result has been from the 
transition to C++ as a whole, to could very likely be to:

a. increase the size of
b. potentially, -reduce- the profiled performance of
c. and I think very likely, reduce the overall value of

the codebase as a whole.

Matt

- "Sachin Malave"  wrote:

> On Tue, Sep 15, 2009 at 2:39 AM, Guido Serassio
>  wrote:
> > Hi,
> >
> >>
> >> And  current generation libraries are also far better than older,
> like
> >> OpenMP, creating threads and handling synchronization issues in
> OpenMP
> >> is very easy...
> >>
> >> Automatic locks are provided, u need not to design your own
> locking
> >> mechanisms Just a statement and u can lock the shared
> >> variable...
> >> Then the major work remains is to identify the shared access.
> >>
> >> I WANT TO USE OPENMP library.
> >>
> >> ANY suggestions.
> >
> > Just a multi platform consideration:
> >
> > Don't forget that such libraries like OpenMP could not be available
> on
> > all Squid supported platforms.
> > As example, on Windows OpenMP is available only using gcc 4.3.2 and
> > later or MS Visual Studio 2008.
> > But MSYS + MinGW provides gcc 3.4.5 and the Squid 3 Visual Studio
> > Project is based on Visual Studio 2005.
> >
> > So, please, we should be very careful when thinking about Squid MP.
> 
> Well, if that is so. We will use OpenMP 2.0 standard which is
> available on Visual Studio 2005 also
> Tasking model is added in 3.0 version, which has changed whole OpenMP
> programming style but it is not needed in our case, I guess..
> Thanks for pointing out this..
> 
> Will try to use 2.0 version. Please tell me if you have different
> library in your mind.
> 
> 
> 
> >
> > Regards
> >
> > Guido Serassio
> > Acme Consulting S.r.l.
> > Microsoft Gold Certified Partner
> > Via Lucia Savarino, 1                10098 - Rivoli (TO) - ITALY
> > Tel. : +39.011.9530135               Fax. : +39.011.9781115
> > Email: guido.seras...@acmeconsulting.it
> > WWW: http://www.acmeconsulting.it
> >
> >
> >

-- 

Matt Benjamin

The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309


Re: R: Squid-smp : Please discuss

2009-09-15 Thread Sachin Malave
On Tue, Sep 15, 2009 at 2:39 AM, Guido Serassio
 wrote:
> Hi,
>
>>
>> And  current generation libraries are also far better than older, like
>> OpenMP, creating threads and handling synchronization issues in OpenMP
>> is very easy...
>>
>> Automatic locks are provided, u need not to design your own locking
>> mechanisms Just a statement and u can lock the shared
>> variable...
>> Then the major work remains is to identify the shared access.
>>
>> I WANT TO USE OPENMP library.
>>
>> ANY suggestions.
>
> Just a multi platform consideration:
>
> Don't forget that such libraries like OpenMP could not be available on
> all Squid supported platforms.
> As example, on Windows OpenMP is available only using gcc 4.3.2 and
> later or MS Visual Studio 2008.
> But MSYS + MinGW provides gcc 3.4.5 and the Squid 3 Visual Studio
> Project is based on Visual Studio 2005.
>
> So, please, we should be very careful when thinking about Squid MP.

Well, if that is so. We will use OpenMP 2.0 standard which is
available on Visual Studio 2005 also
Tasking model is added in 3.0 version, which has changed whole OpenMP
programming style but it is not needed in our case, I guess..
Thanks for pointing out this..

Will try to use 2.0 version. Please tell me if you have different
library in your mind.



>
> Regards
>
> Guido Serassio
> Acme Consulting S.r.l.
> Microsoft Gold Certified Partner
> Via Lucia Savarino, 1                10098 - Rivoli (TO) - ITALY
> Tel. : +39.011.9530135               Fax. : +39.011.9781115
> Email: guido.seras...@acmeconsulting.it
> WWW: http://www.acmeconsulting.it
>
>
>


R: Squid-smp : Please discuss

2009-09-14 Thread Guido Serassio
Hi,

> 
> And  current generation libraries are also far better than older, like
> OpenMP, creating threads and handling synchronization issues in OpenMP
> is very easy...
> 
> Automatic locks are provided, u need not to design your own locking
> mechanisms Just a statement and u can lock the shared
> variable...
> Then the major work remains is to identify the shared access.
> 
> I WANT TO USE OPENMP library.
> 
> ANY suggestions.

Just a multi platform consideration:

Don't forget that such libraries like OpenMP could not be available on
all Squid supported platforms.
As example, on Windows OpenMP is available only using gcc 4.3.2 and
later or MS Visual Studio 2008.
But MSYS + MinGW provides gcc 3.4.5 and the Squid 3 Visual Studio
Project is based on Visual Studio 2005.

So, please, we should be very careful when thinking about Squid MP.

Regards

Guido Serassio
Acme Consulting S.r.l.
Microsoft Gold Certified Partner
Via Lucia Savarino, 110098 - Rivoli (TO) - ITALY
Tel. : +39.011.9530135   Fax. : +39.011.9781115
Email: guido.seras...@acmeconsulting.it
WWW: http://www.acmeconsulting.it




Re: Squid-smp : Please discuss

2009-09-14 Thread Adrian Chadd
If you want to start looking at -threading- inside Squid, I'd suggest
thinking first how you'd create a generic thread "helper" framework
that allows Squid to run multiple internal threads that can do
"stuff", and then implement some message/data queues and handle
notification between threads.

You can then push some "stuff" into these worker threads as an
experiment and see exactly what the issues are.

Building worker threads into Squid is easy. Making them do anything?
Not so easy :)


Adrian

2009/9/15 Sachin Malave :
> On Tue, Sep 15, 2009 at 1:38 AM, Adrian Chadd  wrote:
>> 2009/9/15 Sachin Malave :
>>> On Tue, Sep 15, 2009 at 1:18 AM, Adrian Chadd  
>>> wrote:
 Guys,

 Please look at what other multi-CPU network applications do, how they
 work and don't work well, before continuing this kind of discussion.

 Everything that has been discussed has already been done to death
 elsewhere. Please don't re-invent the wheel, badly.
>>
>>> Yes synchronization is always expensive . So we must target only those
>>> areas where shared data is updated infrequently. Also if we are making
>>> thread then the amount of work done must be more as compared to
>>> overheads required in thread creation, synchronization & scheduling.
>>
>> Current generation CPUs are a lot, lot better at the thread-style sync
>> primitives than older CPUs.
>>
>> There's other things to think about, such as lockless queues,
>> transactional memory hackery, atomic instructions in general, etc,
>> etc, which depend entirely upon the type of hardware being targetted.
>>
>>> If we try to provide locks to existing data structures then
>>> synchronization factor will definitely affect to our design.
>>
>>> Redesigning of such structures and there behavior is time consuming
>>> and may change whole design of the Squid.
>>
>>
>> Adrian
>>
>
>
>
> And  current generation libraries are also far better than older, like
> OpenMP, creating threads and handling synchronization issues in OpenMP
> is very easy...
>
> Automatic locks are provided, u need not to design your own locking
> mechanisms Just a statement and u can lock the shared
> variable...
> Then the major work remains is to identify the shared access.
>
> I WANT TO USE OPENMP library.
>
> ANY suggestions.
>
>


Re: Squid-smp : Please discuss

2009-09-14 Thread Sachin Malave
On Mon, Sep 14, 2009 at 6:14 PM, Alex Rousskov
 wrote:
> On Mon, 14 Sep 2009, Amos Jeffries wrote:
>
>> I think we should take this on-list so the others with more detailed
>> knowledge can give advice in case I have my facts wrong about AsyncCalls...
>
> I am afraid this discussion focuses on a small part of a much bigger problem
> so finalizing the design decisions here may be counter productive until we
> have an agreement on how to split Squid into threads in general.
>
> There are a few different ways to partition Squid and most of them have been
> discussed in the past. I am not sure whether the discussions have ever
> reached a consensus point. I am also not sure there is consensus whether we
> should design for 2 cores, 8, cores, 16 cores, 32 cores, or more, or all of
> the above?
>

Thats correct, but it is manageable...
Depending upon the hardware we can change behavior of the squid. I
know it is not easy but not impossible.
thanks for pointing out this issue

> There is also a question on how configurable everything should be. For
> example, if the box has only two cores, will the user be able to specify
> which threads are created and which created threads run on which core? Also,
> will the user be able to specify whether memory or disk cache is shared?
>
> I also agree that the major difficulty here is not implementing the
> threading code itself, but making sure that no shared data is accessed
> unlocked. This is easy when you start from scratch, but we have a lot of
> globals and static objects accessed all around the code. Protecting each of
> them individually by hand would require a lot of coding and risk. IMO, we
> need to add a few classes that would make sharing global data simple and
> safe. This is where C++ would help a lot.
>

Yes we have to think about this. Risk is there...
New structures could be added.. But I dont want to comment over
this issue, it would only be possible after in-dept
analysis


> And even the protection of globals requires a high-level design plan: do we
> protect global objects like Config or individual data types like
> SquidString?
>
> Finally, I agree that making accepting code into a thread may lead to "too
> aggressive" incoming stream of requests that would overwhelm the other parts
> of the system. I have recently observed that kind of behavior in another
> threaded performance-focused application. This does not mean that accept
> must not be a thread, but it means that we should think how the overall
> design would prevent one thread from overloading others with work.
>
> Cheers,
>
> Alex.
>
>
>> Sachin Malave wrote:
>>>
>>> On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries 
>>> wrote:

 On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave
 
 wrote:
>
> Hello Amos,
>
>
> As discussed before, I am analyzing codes for the changes as on
> http://wiki.squid-cache.org/Features/SmpScale, and have concentrated
> on epoll ( select ) implementations in squid. It is found that epoll
> is polling all the descriptors & processing them one by one. There is
> an important FD used by http port which is always busy, but has to
> wait for other descriptors in queue to be processed.
>
> Then, I also found that it is possible to separateworking of all fd
> handlers , e.g fd used by http port.(tried)
> This can be done by making some changes in codes.
> i have been trying to code & test these changes since last few days,
> of course this may not be correct or need some improvements to meet
> our requirements, Please give me feedback and tell me dependencies i
> might have not considered,
>
> Again one important issue, I know that, doing changes as mentioned
> below will create and kill thread after each timeout but we can extend
> it further, and make a separate thread that will never exit, we will
> discuss on this issue later, before everything, please check proposed
> changes so that we can move further.

 You mean the main http_port listener (port 3128 etc)?
 This is currently not handled specially, due to there being more than
 one
 listener FD in many Squid setups (multiple http_port and https_port then
 other protocols like HTTPS, ICP, HTCP, DNS), any threading solution
 needs
 to handle the listeners agnostic of what they do. Though splitting
 listener
 FD accepts into a separate loop from other FD does seem sound.

 Special pseudo-thread handling is already hacked up in a pseudo-thread
 poller for DNS replies. Which is complicating the FD handling there.
 What
 I'd like to see is resource-locking added to the Async queue when adding
 new queue entries.

 That allows making the whole select loop(s) happen in parallel to the
 rest
 of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into the
 main squid processing queue.

Re: Squid-smp : Please discuss

2009-09-14 Thread Adrian Chadd
2009/9/15 Sachin Malave :
> On Tue, Sep 15, 2009 at 1:18 AM, Adrian Chadd  wrote:
>> Guys,
>>
>> Please look at what other multi-CPU network applications do, how they
>> work and don't work well, before continuing this kind of discussion.
>>
>> Everything that has been discussed has already been done to death
>> elsewhere. Please don't re-invent the wheel, badly.

> Yes synchronization is always expensive . So we must target only those
> areas where shared data is updated infrequently. Also if we are making
> thread then the amount of work done must be more as compared to
> overheads required in thread creation, synchronization & scheduling.

Current generation CPUs are a lot, lot better at the thread-style sync
primitives than older CPUs.

There's other things to think about, such as lockless queues,
transactional memory hackery, atomic instructions in general, etc,
etc, which depend entirely upon the type of hardware being targetted.

> If we try to provide locks to existing data structures then
> synchronization factor will definitely affect to our design.

> Redesigning of such structures and there behavior is time consuming
> and may change whole design of the Squid.


Adrian


Re: Squid-smp : Please discuss

2009-09-14 Thread Sachin Malave
On Tue, Sep 15, 2009 at 1:38 AM, Adrian Chadd  wrote:
> 2009/9/15 Sachin Malave :
>> On Tue, Sep 15, 2009 at 1:18 AM, Adrian Chadd  wrote:
>>> Guys,
>>>
>>> Please look at what other multi-CPU network applications do, how they
>>> work and don't work well, before continuing this kind of discussion.
>>>
>>> Everything that has been discussed has already been done to death
>>> elsewhere. Please don't re-invent the wheel, badly.
>
>> Yes synchronization is always expensive . So we must target only those
>> areas where shared data is updated infrequently. Also if we are making
>> thread then the amount of work done must be more as compared to
>> overheads required in thread creation, synchronization & scheduling.
>
> Current generation CPUs are a lot, lot better at the thread-style sync
> primitives than older CPUs.
>
> There's other things to think about, such as lockless queues,
> transactional memory hackery, atomic instructions in general, etc,
> etc, which depend entirely upon the type of hardware being targetted.
>
>> If we try to provide locks to existing data structures then
>> synchronization factor will definitely affect to our design.
>
>> Redesigning of such structures and there behavior is time consuming
>> and may change whole design of the Squid.
>
>
> Adrian
>



And  current generation libraries are also far better than older, like
OpenMP, creating threads and handling synchronization issues in OpenMP
is very easy...

Automatic locks are provided, u need not to design your own locking
mechanisms Just a statement and u can lock the shared
variable...
Then the major work remains is to identify the shared access.

I WANT TO USE OPENMP library.

ANY suggestions.


Re: Squid-smp : Please discuss

2009-09-14 Thread Adrian Chadd
Guys,

Please look at what other multi-CPU network applications do, how they
work and don't work well, before continuing this kind of discussion.

Everything that has been discussed has already been done to death
elsewhere. Please don't re-invent the wheel, badly.



Adrian

2009/9/15 Robert Collins :
> On Tue, 2009-09-15 at 14:27 +1200, Amos Jeffries wrote:
>>
>>
>> RefCounting done properly forms a lock on certain read-only types like
>> Config. Though we are currently handling that for Config by leaking
>> the
>> memory out every gap.
>>
>> SquidString is not thread-safe. But StringNG with its separate
>> refcounted
>> buffers is almost there. Each thread having a copy of StringNG sharing
>> a
>> SBuf equates to a lock with copy-on-write possibly causing issues we
>> need
>> to look at if/when we get to that scope.
>
> General rule: you do /not/ want thread safe objectse for high usage
> objects like RefCount and StringNG.
>
> synchronisation is expensive; design to avoid synchronisation and hand
> offs as much as possible.
>
> -Rob
>
>


Re: Squid-smp : Please discuss

2009-09-14 Thread Sachin Malave
On Tue, Sep 15, 2009 at 1:18 AM, Adrian Chadd  wrote:
> Guys,
>
> Please look at what other multi-CPU network applications do, how they
> work and don't work well, before continuing this kind of discussion.
>
> Everything that has been discussed has already been done to death
> elsewhere. Please don't re-invent the wheel, badly.
>
>
>
> Adrian
>
> 2009/9/15 Robert Collins :
>> On Tue, 2009-09-15 at 14:27 +1200, Amos Jeffries wrote:
>>>
>>>
>>> RefCounting done properly forms a lock on certain read-only types like
>>> Config. Though we are currently handling that for Config by leaking
>>> the
>>> memory out every gap.
>>>
>>> SquidString is not thread-safe. But StringNG with its separate
>>> refcounted
>>> buffers is almost there. Each thread having a copy of StringNG sharing
>>> a
>>> SBuf equates to a lock with copy-on-write possibly causing issues we
>>> need
>>> to look at if/when we get to that scope.
>>
>> General rule: you do /not/ want thread safe objectse for high usage
>> objects like RefCount and StringNG.
>>
>> synchronisation is expensive; design to avoid synchronisation and hand
>> offs as much as possible.
>>
>> -Rob
>>
>>
>


Yes synchronization is always expensive . So we must target only those
areas where shared data is updated infrequently. Also if we are making
thread then the amount of work done must be more as compared to
overheads required in thread creation, synchronization & scheduling.

If we try to provide locks to existing data structures then
synchronization factor will definitely affect to our design.
Redesigning of such structures and there behavior is time consuming
and may change whole design of the Squid.

Whatever it may be..

WE HAVE TO MOVE SQUID TO MULTI-CORE... because future is
MULTI-CORE..

Anyways still there are many question in my mind but as am new here,
would like to spend some time in analysis of current design

PLEASE cope with my speed


-- 
Mr. S. H. Malave
MTech,
Computer Science & Engineering Dept.,
Walchand College of Engineering,
Sangli.
Mob. 9860470739
sachinmal...@wce.org.in


Re: Squid-smp : Please discuss

2009-09-14 Thread Robert Collins
On Tue, 2009-09-15 at 14:27 +1200, Amos Jeffries wrote:
> 
> 
> RefCounting done properly forms a lock on certain read-only types like
> Config. Though we are currently handling that for Config by leaking
> the
> memory out every gap.
> 
> SquidString is not thread-safe. But StringNG with its separate
> refcounted
> buffers is almost there. Each thread having a copy of StringNG sharing
> a
> SBuf equates to a lock with copy-on-write possibly causing issues we
> need
> to look at if/when we get to that scope. 

General rule: you do /not/ want thread safe objectse for high usage
objects like RefCount and StringNG.

synchronisation is expensive; design to avoid synchronisation and hand
offs as much as possible.

-Rob



signature.asc
Description: This is a digitally signed message part


Re: Squid-smp : Please discuss

2009-09-14 Thread Kinkie
>>> I think we should take this on-list so the others with more detailed
>>> knowledge can give advice in case I have my facts wrong about
>>> AsyncCalls...
>>
>> I am afraid this discussion focuses on a small part of a much bigger
>> problem so finalizing the design decisions here may be counter
>> productive until we have an agreement on how to split Squid into
>> threads in general.
>
> Aye. Thats the idea.
>
>>
>> There are a few different ways to partition Squid and most of them
>> have been discussed in the past. I am not sure whether the discussions
>> have ever reached a consensus point. I am also not sure there is
>> consensus whether we should design for 2 cores, 8, cores, 16 cores, 32
>> cores, or more, or all of the above?
>
> The partitioning discussion must have happened well before my time. The
> last few years its been consensus that the components get partitioned into
> SourceLayout components and AsyncCalls codepaths.
> Further partitioning we have not discussed recently.

I'm going to kick-start a new round then. If the approach has already
been discussed, please forgive me and ignore this post.
The idea is.. but what if we tried using a shared-nothing approach?

Quick run-down: there is farm of processes, each with its own
cache_mem and cache_dir(s). When a process receives a request, it
parses it, hashes it somehow (CARP or a variation thereof) and defines
if should handle it or if some other process should handle it. If it's
some other process, it uses a Unix socket and some simple
serialization protocol to pass around the parsed request and the file
descriptor, so that the receiving process can pick up and continue
servicing the request.

There are some hairy bits (management, accounting, reconfiguration..)
and some less hairy bits (hashing algorithm to use, whether there is a
"master" process and a workers farm, or whether workers compete on
accept()ing), but on a first sight it would seem a simpler approach
than the extensive threading and locking we're talking about, AND it's
completely orthogonal to it (so it could be viewed as a medium-term
solution while AsyncCalls-ification remains active as a long-term
refactoring activity, which will eventually lead to a true MT-squid)

Please forgive me if it's a 5AM sleep-deprivation-induced brain-crap,
or if this approach was already discussed and discarded.

-- 
/kinkie


Re: Squid-smp : Please discuss

2009-09-14 Thread Amos Jeffries
On Mon, 14 Sep 2009 16:14:12 -0600 (MDT), Alex Rousskov
 wrote:
> On Mon, 14 Sep 2009, Amos Jeffries wrote:
> 
>> I think we should take this on-list so the others with more detailed 
>> knowledge can give advice in case I have my facts wrong about
>> AsyncCalls...
> 
> I am afraid this discussion focuses on a small part of a much bigger 
> problem so finalizing the design decisions here may be counter 
> productive until we have an agreement on how to split Squid into 
> threads in general.

Aye. Thats the idea.

> 
> There are a few different ways to partition Squid and most of them 
> have been discussed in the past. I am not sure whether the discussions 
> have ever reached a consensus point. I am also not sure there is 
> consensus whether we should design for 2 cores, 8, cores, 16 cores, 32 
> cores, or more, or all of the above?

The partitioning discussion must have happened well before my time. The
last few years its been consensus that the components get partitioned into
SourceLayout components and AsyncCalls codepaths.
Further partitioning we have not discussed recently. 

> 
> There is also a question on how configurable everything should be. For 
> example, if the box has only two cores, will the user be able to 
> specify which threads are created and which created threads run on 
> which core? Also, will the user be able to specify whether memory or 
> disk cache is shared?

This can be decided and added after the initial experiments and locking
tests yes? When squid is starting to push CPU intensive stuff into threads
other than the current main one.

> 
> I also agree that the major difficulty here is not implementing the 
> threading code itself, but making sure that no shared data is accessed 
> unlocked. This is easy when you start from scratch, but we have a lot 
> of globals and static objects accessed all around the code. Protecting 
> each of them individually by hand would require a lot of coding and 
> risk. IMO, we need to add a few classes that would make sharing global 
> data simple and safe. This is where C++ would help a lot.
> 
> And even the protection of globals requires a high-level design plan: 
> do we protect global objects like Config or individual data types like 
> SquidString?

IMHO, objects/types which may be written to or deleted while a thread is
trying to read from it.

RefCounting done properly forms a lock on certain read-only types like
Config. Though we are currently handling that for Config by leaking the
memory out every gap.

SquidString is not thread-safe. But StringNG with its separate refcounted
buffers is almost there. Each thread having a copy of StringNG sharing a
SBuf equates to a lock with copy-on-write possibly causing issues we need
to look at if/when we get to that scope.

> 
> Finally, I agree that making accepting code into a thread may lead to 
> "too aggressive" incoming stream of requests that would overwhelm the 
> other parts of the system. I have recently observed that kind of 
> behavior in another threaded performance-focused application. This 
> does not mean that accept must not be a thread, but it means that we 
> should think how the overall design would prevent one thread from 
> overloading others with work.
> 
> Cheers,
> 
> Alex.
> 

The biggest question underlying SMP before we can even look at locking and
resources is whether AsyncCalls is a suitable interface between threads
(thread A schedules call, thread B runs it...) or do we need yet another
queuing mechanism.

Locking can be iteratively/incrementally done as things are converted. Each
resource will have its own challenges and requirements. We can't face them
all now and expect to get it right.   When the async question is answered
we look at exactly what the best way to lock the queue is.

This is a small scope though suitable for experimenting with some easily
adapted area of the code as a starting point with a clearly measurable
works/fails. It's only locking dependence is on the immediate data and what
queue to add the next event/call/whatever to.

Then as you point out, how to prevent the accept()'s overloading the main
queue as multi-threads funnel down to the old central one. The 'safe'
approach is to convert the hard way. From server-facing code out to
client-facing. Unfortunately that approach does involve a lot more locking
design problems very early on.

Amos

> 
>> Sachin Malave wrote:
>>> On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries  
>>> wrote:
 On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave
 
 wrote:
> Hello Amos,
> 
> 
> As discussed before, I am analyzing codes for the changes as on
> http://wiki.squid-cache.org/Features/SmpScale, and have concentrated
> on epoll ( select ) implementations in squid. It is found that epoll
> is polling all the descriptors & processing them one by one. There is
> an important FD used by http port which is always busy, but has to
> wait for other descriptors in queue 

Re: Squid-smp : Please discuss

2009-09-14 Thread Alex Rousskov

On Mon, 14 Sep 2009, Amos Jeffries wrote:

I think we should take this on-list so the others with more detailed 
knowledge can give advice in case I have my facts wrong about AsyncCalls...


I am afraid this discussion focuses on a small part of a much bigger 
problem so finalizing the design decisions here may be counter 
productive until we have an agreement on how to split Squid into 
threads in general.


There are a few different ways to partition Squid and most of them 
have been discussed in the past. I am not sure whether the discussions 
have ever reached a consensus point. I am also not sure there is 
consensus whether we should design for 2 cores, 8, cores, 16 cores, 32 
cores, or more, or all of the above?


There is also a question on how configurable everything should be. For 
example, if the box has only two cores, will the user be able to 
specify which threads are created and which created threads run on 
which core? Also, will the user be able to specify whether memory or 
disk cache is shared?


I also agree that the major difficulty here is not implementing the 
threading code itself, but making sure that no shared data is accessed 
unlocked. This is easy when you start from scratch, but we have a lot 
of globals and static objects accessed all around the code. Protecting 
each of them individually by hand would require a lot of coding and 
risk. IMO, we need to add a few classes that would make sharing global 
data simple and safe. This is where C++ would help a lot.


And even the protection of globals requires a high-level design plan: 
do we protect global objects like Config or individual data types like 
SquidString?


Finally, I agree that making accepting code into a thread may lead to 
"too aggressive" incoming stream of requests that would overwhelm the 
other parts of the system. I have recently observed that kind of 
behavior in another threaded performance-focused application. This 
does not mean that accept must not be a thread, but it means that we 
should think how the overall design would prevent one thread from 
overloading others with work.


Cheers,

Alex.



Sachin Malave wrote:
On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries  
wrote:

On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave 
wrote:

Hello Amos,


As discussed before, I am analyzing codes for the changes as on
http://wiki.squid-cache.org/Features/SmpScale, and have concentrated
on epoll ( select ) implementations in squid. It is found that epoll
is polling all the descriptors & processing them one by one. There is
an important FD used by http port which is always busy, but has to
wait for other descriptors in queue to be processed.

Then, I also found that it is possible to separateworking of all fd
handlers , e.g fd used by http port.(tried)
This can be done by making some changes in codes.
i have been trying to code & test these changes since last few days,
of course this may not be correct or need some improvements to meet
our requirements, Please give me feedback and tell me dependencies i
might have not considered,

Again one important issue, I know that, doing changes as mentioned
below will create and kill thread after each timeout but we can extend
it further, and make a separate thread that will never exit, we will
discuss on this issue later, before everything, please check proposed
changes so that we can move further.


You mean the main http_port listener (port 3128 etc)?
This is currently not handled specially, due to there being more than one
listener FD in many Squid setups (multiple http_port and https_port then
other protocols like HTTPS, ICP, HTCP, DNS), any threading solution needs
to handle the listeners agnostic of what they do. Though splitting 
listener

FD accepts into a separate loop from other FD does seem sound.

Special pseudo-thread handling is already hacked up in a pseudo-thread
poller for DNS replies. Which is complicating the FD handling there. What
I'd like to see is resource-locking added to the Async queue when adding
new queue entries.

That allows making the whole select loop(s) happen in parallel to the rest
of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into the
main squid processing queue.

Workable?



*Changes are tagged with "NEW"

1.> inside client_side.cc

  void clientHttpConnectionsOpen(void)
{
 .
 httpfd=fd; //httfd now holding http file descriptor (NEW)
 .
 .
 .
 comm_accept(fd, httpAccept, s);

 }


2.>  inside comm_epoll.cc

int kdpfdHttp;
int useHttpThread = 1;

void comm_select_init(void)
 {

   peventsHttp = (struct epoll_event *) xmalloc(1 *
sizeof(struct epoll_event)); //NEW

   kdpfdHttp = epoll_create(1);  //NEW


}

void commSetSelect(int fd, unsigned int type, PF * handler,  void
*client_data, time_t timeout)
 {



 if (!F->flags.open) {
 if (useHttpThread)   //NEW

Re: Squid-smp : Please discuss

2009-09-14 Thread Henrik Nordstrom
mån 2009-09-14 klockan 22:43 +1200 skrev Amos Jeffries:

> >>> on epoll ( select ) implementations in squid. It is found that epoll
> >>> is polling all the descriptors & processing them one by one. There is
> >>> an important FD used by http port which is always busy, but has to
> >>> wait for other descriptors in queue to be processed.

The http_port is far from that buzy. Unless you are under heavy overload
the http_port is not ready most poll loops. The only traffic seen on
that filedescriptor is accepting of new connections. Additionally with
the nature of HTTP data flow the timing between connection
establishement and processing the new connection is not very critical,
as long as it gets done within reasonable time. There is a large margin
thanks to the time it takes for the client to send the first request.

> >>> Then, I also found that it is possible to separateworking of all fd
> >>> handlers , e.g fd used by http port.(tried)
> >>> This can be done by making some changes in codes.

The issue preventing this is how to deal with the access to all shared
data. Will need quite a bit of locks to be introduced if starting to
introduce threads.

> >> Special pseudo-thread handling is already hacked up in a pseudo-thread
> >> poller for DNS replies. Which is complicating the FD handling there.

Hmm.. is there? Where?

Checking the state of epoll in Squid-3 I notice it's in the same shape
2.5 was, missing the cleanups and generalizations done in Squid-2. In
squid-2 we do have a set of "incoming filedescriptors" which are polled
more frequently than the others if there is too much work in the select
loop. But users of high traffic servers have found that for epoll this
makes accepting new connections too agressive, and Squid runs better
(more smoothly) with this disabled.

> >> That allows making the whole select loop(s) happen in parallel to the rest
> >> of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into the
> >> main squid processing queue.
> >>
> >> Workable?

Yes, but i kind of doubt it will give any benefit at all.

When Squid is running under normal load epoll queue length is in the
range of 10-40, not more, and occationally the http_port fd's is marked
ready (far from always). The epoll in itself is not very heavy
operation, the heavy part is reading and processing the request/response
data.

But the up side is that this is a easy spot to attack to try adding and
experimenting with threading. The accept() is very isolated in the
amount of data it needs access to.

Regards
Henrik



Re: Squid-smp : Please discuss

2009-09-14 Thread Amos Jeffries


I think we should take this on-list so the others with more detailed 
knowledge can give advice in case I have my facts wrong about AsyncCalls...



Sachin Malave wrote:

On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries  wrote:

On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave 
wrote:

Hello Amos,


As discussed before, I am analyzing codes for the changes as on
http://wiki.squid-cache.org/Features/SmpScale, and have concentrated
on epoll ( select ) implementations in squid. It is found that epoll
is polling all the descriptors & processing them one by one. There is
an important FD used by http port which is always busy, but has to
wait for other descriptors in queue to be processed.

Then, I also found that it is possible to separateworking of all fd
handlers , e.g fd used by http port.(tried)
This can be done by making some changes in codes.
i have been trying to code & test these changes since last few days,
of course this may not be correct or need some improvements to meet
our requirements, Please give me feedback and tell me dependencies i
might have not considered,

Again one important issue, I know that, doing changes as mentioned
below will create and kill thread after each timeout but we can extend
it further, and make a separate thread that will never exit, we will
discuss on this issue later, before everything, please check proposed
changes so that we can move further.


You mean the main http_port listener (port 3128 etc)?
This is currently not handled specially, due to there being more than one
listener FD in many Squid setups (multiple http_port and https_port then
other protocols like HTTPS, ICP, HTCP, DNS), any threading solution needs
to handle the listeners agnostic of what they do. Though splitting listener
FD accepts into a separate loop from other FD does seem sound.

Special pseudo-thread handling is already hacked up in a pseudo-thread
poller for DNS replies. Which is complicating the FD handling there. What
I'd like to see is resource-locking added to the Async queue when adding
new queue entries.

That allows making the whole select loop(s) happen in parallel to the rest
of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into the
main squid processing queue.

Workable?



*Changes are tagged with "NEW"

1.> inside client_side.cc

  void clientHttpConnectionsOpen(void)
{
 .
 httpfd=fd; //httfd now holding http file descriptor (NEW)
 .
 .
 .
 comm_accept(fd, httpAccept, s);

 }


2.>  inside comm_epoll.cc

int kdpfdHttp;
int useHttpThread = 1;

void comm_select_init(void)
 {

   peventsHttp = (struct epoll_event *) xmalloc(1 *
sizeof(struct epoll_event)); //NEW

   kdpfdHttp = epoll_create(1);  //NEW


}

void commSetSelect(int fd, unsigned int type, PF * handler,  void
*client_data, time_t timeout)
 {



 if (!F->flags.open) {
 if (useHttpThread)   //NEW
epoll_ctl(kdpfdHttp, EPOLL_CTL_DEL, fd, 
&ev);  //NEW
else
 epoll_ctl(kdpfd, EPOLL_CTL_DEL, fd, &ev);
  return;
   }



  if (fd==getHttpfd()) {   //NEW
  printf("Setting epoll_ctl for 
httpfd=%d\n",getHttpfd());
  if (epoll_ctl(kdpfdHttp, epoll_ctl_type, fd, &ev) < 0) {
  debugs(5, DEBUG_EPOLL ? 0 : 8, "commSetSelect: 
epoll_ctl(," <<
epolltype_atoi(epoll_ctl_type) << ",,): failed on FD " << fd << ": "
<< xstrerror());
  }
  }


comm_err_t
comm_select(int msec)
{
int num, i,fd=-1;
fde *F;
PF *hdl;

//SHM: num2
  int num2=-1;  //NEW
//SHM: End

struct epoll_event *cevents;
struct epoll_event *ceventsHttp;
  printf("Inside comm_select:comm_epoll.cc\n");
//PROF_start(comm_check_incoming);

if (msec > max_poll_time)
msec = max_poll_time;


for (;;) {
  printf("(for(;;):Inside comm_select:comm_epoll.cc\n");
num = epoll_wait(kdpfd, pevents, 1, msec);

   //SHM: epoll_wait for kpdfdHttp
  if (useHttpThread) {//NEW
  printf("(for(;;):USEHTTP:Inside 
comm_select:comm_epoll.cc\n");
  num2 = epoll_wait(kdpfdHttp, peventsHttp, 1, msec);  
//NEW
  printf("\n\n\n num2=%d\n\n\n", num2);
  }
//SHM: End

++statCounter.select_loops;

if (num >= 0 || num2 >= 0)  //NEW
break;

if (ignoreErrno(errno))
break;

getCurrentTime();

//PROF_stop(comm_check_incoming);

return COMM_ERROR;
}

//PROF_stop(comm_check_incoming); //PLEASE DISCUSS

THIS...

The PROF_* bits are rarely used. Removing them from here is acceptable as