tl;dr: I am searching for a pattern (later code) to apply expiration to 
operations.



Introduction:

One nice aspect of Mongodb is that it has built-in data distribution[1] and 
configurable retention[2]. The upstream project has a document called "Server 
Discovery and Monitoring (SDAM)", defining how a client should behave. Martin 
Dias is currently implementing SDAM in MongoTalk/Voyage and I took it on a test 
drive.


Behavior:

My software stack is using Zinc, Zinc-REST, Voyage and Mongo. When a new REST 
requests arrives I am using Voyage (e.g. >>#selectOne:) which will use 
MongoTalk. The MongoTalk code needs to select the right server. It's currently 
done by waiting for a result.

Next I started to simulate database outages. The rest clients retried when not 
receiving a result within two seconds (no back-off/jitter). What happened was 
roughly the following:


[
        1.) ZnServer accepts a new connection
        2.) MongoTalk waits for a server longer than 2s
        "nothing.. the above waits..."
] repeat.




Problem:

What happened next surprised me. I expected to have a bad time when the 
database recovers and all the stale (remember the REST clients already gave up 
and closed the socket) requests will be answered. Instead my image crashed 
early in my test as the ExternalSemaphoreTable was full.

Let's focus on the timeout behavior and discuss the existence of the 
ExternalSemaphoreTable and the number of entries separately at a different time.




To me the two main problems I see are:


1.) Lack of back-pressure for ZnManagingMultiThreadedServer

2.) Disconnect of time between the Application Layer handling REST is allowed 
to take and down the stack how long MongoTalk may sleep and wait for a server.


The first item is difficult. Even answering HTTP 500 when we are out of space 
in the ExternalSemaphore is difficult... Let's ignore this for now as well.






What I look for:


1.) Voluntarily Timeout

Inside my Application code I would like to tag an operation with a timeout. 
This means everything that is done should complete within X seconds. It can be 
used on a voluntarily basis.


>>#lookupPerson

   "We expect all database operations to complete within two seconds"
   person := ComputeContext current withTimeout: 2 seconds during: [
        repository selectOne: Person where: [:each name | ...],
   ].
  


MongoTalk>>stuff
  "See if the outer context timeout has expired and signal. E.g. before writing
  something into the socket to keep consistency."
  ComputeContext current checkExpired.


MongoTalk>>other
  "Sleep for up to the remaining time out
  (someSemaphore waitTimeoutContext: ComputeContext current) ifFalse: [
     SomethingExpired signal.
  ]



2.) Cancellation


More difficult to write in pseudo code (without TaskIt?). In my above case we 
are waiting for the database to be ready while the client already closed the 
file descriptor. Now we are not able to see this until much later.

The idea is that in addition to the timeout we can pass a block that is called 
when an operation should be cancelled and the ComputeContext can be checked if 
something has been cancelled?




The above takes inspiration from Go's context package[3]. In Go the context 
should be passed as parameter but we could make it a Process variable?





Question:

How do you handle this in your systems? Is this something we can consider for 
Pharo9? 



thanks
        holger








[1] It has the concept of "replicationSet" and works by having a primary, 
secondary and arbiters running.
[2] For every write one can configure if the write should succeed immediately 
(before it is even on disk) or when it has been written to multiple stores 
(e.g. majority, US and EMEA)
[3] https://golang.org/pkg/context/



Reply via email to