Re: [Twisted-Python] spawnProcess - reapProcess not retrying on failures

2014-09-04 Thread Adi Roiban
On 3 September 2014 18:55,  exar...@twistedmatrix.com wrote:
 On 03:27 pm, a...@roiban.ro wrote:

 On 3 September 2014 14:39,  exar...@twistedmatrix.com wrote:
[snip]
 Do you have any suggestion for how the calls should be made?

 reactor.run(installSignalHandlers=True,  installStopHandlers=False)


 Perhaps.
[snip]

 It might be nice to try to be somewhat flexible - in case there's some
 reason to change what signals the reactor wants to handle in the future.
 Perhaps:

reactor.run(installSignalHandlers={SIGCHLD})

 An entirely different direction could be to make this bit of configuration
 into initialization for the reactor.

from twisted.internet.epollreactor import install
install(installSignalHandlers={SIGCHLD})

from twisted.internet import reactor
...
reactor.run()

 By keeping these details away from `IReactorCore.run`, that method remains
 maximally useful.  For example, if you could set up the reactor this way, a
 normal `twistd` plugin would still be able to benefit from your choice, even
 with twistd's naive call of `reactor.run()` with no extra arguments.

 Application code calling these `install` functions is already supported
 (it's how you select a specific reactor, after all).  Some of the install
 functions even accept arguments already.

 This would actually eliminate another existing issue - `IReactorCore.run` is
 actually defined to take no arguments.  The implementations ignore this
 because someone thought it was important to be able to disable installation
 of signal handlers.

I am happy to have a simple reactor.run() and move
installSignalHandlers somewhere else.

working with install(installSignalHandlers={SIGCHLD})  seems a bit complicated,
as I assume that many developers rely on the automatic reactor installation.

In the same time, I assume that 'installSignalHandlers' argument would
be supported by all reactors  this is why maybe we can
have something like:

from twisted.internet import reactor

def customHandler(signum, frame):
pass

reactor.installSignalHandlers(
SIGCHLD=True,  # Install default handler
SIGTERM=None,  # Don't install handler
SIGINT=customHandler,  # Install custom handler
   # SIGBREAK is not request so that default handler is installed.
   )
# reactor.installSignalHandlers() installs all default handlers.
reactor.run()



reactor.run(InstallSignalHandlers=True|False) would be deprecated.

In case reactor.installSignalHandlers is not called before run(), all
default handlers will be installed.

[snip]

 The sidecar process is an example of a general fix, though.  The idea there
 is that Twisted itself runs a private child process (perhaps only when the
 first call to spawnProcess is made).  It talks to that process using a file
 descriptor.  That process can install a SIGCHLD handler (because Twisted
 owns it, application developers don't get to say they don't want one
 installed) or use another more invasive strategy for child process
 management.  When you want to spawn a process, the main process tells the
 sidecar to do it.  The sidecar relays traffic between the child and the
 original parent (or does something involving passing file descriptors across
 processes).

 This removes the need to ever install a SIGCHLD handler in the main process.
 It also probably enables some optimizations (reapProcesses is O(N!) on the
 number of child processes right now) that are very tricky or impossible
 otherwise.

 Jean-Paul

Thanks for the details regarding the side-process dedicated to child
process management.

Not sure if we need a separate ticket for that, or add it as a comment
to https://twistedmatrix.com/trac/ticket/5710

Thanks!

-- 
Adi Roiban

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] spawnProcess - reapProcess not retrying on failures

2014-09-04 Thread exarkun

On 07:26 am, a...@roiban.ro wrote:

On 3 September 2014 18:55,  exar...@twistedmatrix.com wrote:

On 03:27 pm, a...@roiban.ro wrote:


On 3 September 2014 14:39,  exar...@twistedmatrix.com wrote:

[snip]

Do you have any suggestion for how the calls should be made?

reactor.run(installSignalHandlers=True,  installStopHandlers=False)




Also note there is an old, widely scoped ticket:

 https://twistedmatrix.com/trac/ticket/2415

with some more stuff (not necessarily directly related to your comments 
on signal handling) on it.


What would be really nice is if someone collected *all* of the 
complaints about `spawnProcess` into one place and integrated solutions 
to them into a design for a replacement. :)


Jean-Paul

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Graceful shutdown of twistd application

2014-09-04 Thread exarkun

On 10:52 am, sangiova...@nweb.it wrote:

Hello list,

I need to implement a graceful shutdown procedure for a twistd 
application.

The application is made up of two services: an internet.TCPClient and
an internet.TCPServer.
They're glued together with a MultiService instance, which is in turn 
set

to have 'application' as parent.
The server and the client work together, making a proxy (SMTP server 
and

AMQP client).

My goal is the following:
- intercept a SIGTERM signal
- 'block' on the server side: since it's SMTP I get this by setting a
variable that makes the server return tempfails (4xx) for new messages,
while keeping current sessions active
- wait until current requests are satisfied (I keep a dictionary of 
current

pending messages)
- shut the whole thing down


This is exactly what before shutdown triggers are for.  Alternatively, 
use the higher-level API and implement `stopService` on one of your 
services.


Either way, return a `Deferred` that only fires when you're satisfied it 
is time for shutdown to proceed.


You said before shutdown triggers are too late but you didn't say why. 
I think that's based on a misunderstanding - but if not, then explain 
why it doesn't work for your scenario.


Jean-Paul
What is the best solution for this use case? It's not really clear to 
me
how to catch SIGTERM and handle pending requests *before* the 
underlying

services start to shutdown (i.e. even addSystemEventTrigger('before',
'shutdown', callable) is called too late for my needs).

Thank you very much for your help!

Fabio


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Graceful shutdown of twistd application

2014-09-04 Thread Fabio Sangiovanni
On Thu, Sep 4, 2014 at 2:02 PM, exar...@twistedmatrix.com wrote:



 You said before shutdown triggers are too late but you didn't say why. I
 think that's based on a misunderstanding - but if not, then explain why it
 doesn't work for your scenario.


Hi, thanks for your reply.

I've tried the following:

def sleep(secs):
log.msg('from within trigger')
d = defer.Deferred()
reactor.callLater(secs, d.callback, None)
return d

reactor.addSystemEventTrigger('before', 'shutdown', sleep, 10)


This is what I can see in the logs:

Sep  4 14:25:06 prepyproxy01 proxy [4924]: [-] Received SIGTERM, shutting
down.
Sep  4 14:25:06 prepyproxy01 proxy [4924]: [-] from within trigger
Sep  4 14:25:06 prepyproxy01 proxy [4924]:
[TwistedProtocolConnection,client] twisted.internet.tcp.Connector instance
at 0x05717be0 will retry in 2 seconds
Sep  4 14:25:06 prepyproxy01 proxy [4924]:
[TwistedProtocolConnection,client] Stopping factory
__builtin__.RabbitMQClientFactory instance at 0x057172c0
Sep  4 14:25:06 prepyproxy01 proxy [4924]: [-] (TCP Port 10025 Closed)
Sep  4 14:25:06 prepyproxy01 proxy [4924]: [-] Stopping factory
__builtin__.TempfailingESMTPFactory instance at 0x057172a0
Sep  4 14:25:09 prepyproxy01 proxy [4924]: [-] Starting factory
__builtin__.RabbitMQClientFactory instance at 0x057172c0
Sep  4 14:25:09 prepyproxy01 proxy [4924]:
[TwistedProtocolConnection,client] [rmq01] RabbitMQ connection established
Sep  4 14:25:16 prepyproxy01 proxy [4924]:
[TwistedProtocolConnection,client] twisted.internet.tcp.Connector instance
at 0x05717be0 will retry in 2 seconds
Sep  4 14:25:16 prepyproxy01 proxy [4924]:
[TwistedProtocolConnection,client] Stopping factory
__builtin__.RabbitMQClientFactory instance at 0x057172c0
Sep  4 14:25:16 prepyproxy01 proxy [4924]: [-] Main loop terminated.
Sep  4 14:25:16 prepyproxy01 proxy [4924]: [-] Server Shut Down.

It seems to me that the shutdown phase doesn't wait for the deferred to
fire before stopping my client and server.
To be clear: my expected result is:
- SIGTERM
- pause 10s
- client/server shutdown

I am surely missing something, but I really can't figure out what.

Oh, for the records: I'm using Twisted 13.2.0 on Pypy.

Thanks!
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


[Twisted-Python] High throughput database logger

2014-09-04 Thread Adi Libotean

Hi,

I'm looking at various options for implementing a high throughput 
database logger that will work with Twisted.


My requirements, listed by importance:

1) small memory footprint
2) high speed
3) low garbage generation

The application I'm working on runs continuously (24/7). I've 
experimented a bit with pysqlite and Twisted to see which approach is 
better suited (see attached example).




Question 1: I noticed that all of the Twisted based versions are very 
slow compared to the plain sqlite3 test. This seems to be caused by 
atomic transaction management, namely a commit after each insert.


Would be interested to know if there is a simple way to avoid this and 
do my own transaction management (aka batch commit).


One other thing is the greatly varying amounts of garbage generated 
(peak memory) and memory usage between the Twisted variants.




Question 2: I would have expected B (Twisted ADBAPI) to behave very 
similar to C/E since I'm using a connection pool of size 1 and all 
requests are queued and handled sequentially.


Could any of you please give me some pointers as to why this is happening?



Question 3: Even though objgraph lists the exact same reference count 
once the code is ran, the amount of used memory greatly differs. Any 
ideas what might be causing this?


Any suggestions and/or pointers on how to improve/do this are more than 
welcome.


Thank you for your time,
Adrian
import gc
import objgraph
import os
import sqlite3
import sys

from time import sleep
from twisted.enterprise.adbapi import ConnectionPool
from twisted.internet import defer, task, reactor


def _removeFile(path):
try:
os.unlink(path)
except OSError:
pass


def plain_sqlite3(conn, rows):
query = 'INSERT INTO t (value) VALUES (1)'

cursor = conn.cursor()
for row in range(rows):
cursor.execute(query)

cursor.close()
conn.commit()



def adbapi(pool, rows):
query = 'INSERT INTO tw (value) VALUES (2)'

last = None
for row in range(rows):
last = pool.runOperation(query)
last.addCallback(lambda _: None)

return last


def inline_callbacks(pool, rows):
query = 'INSERT INTO tw (value) VALUES (3)'

@defer.inlineCallbacks
def do_insert():
for row in range(rows):
deferred = pool.runOperation(query)
deferred.addCallback(lambda _: None)
yield deferred

return do_insert()


def semaphore(pool, rows):
query = 'INSERT INTO tw (value) VALUES (4)'

semaphore = defer.DeferredSemaphore(1)
last = None
for row in range(rows):
last = semaphore.run(pool.runOperation, query)
last.addCallback(lambda _: None)

return last


def cooperator(pool, rows):
query = 'INSERT INTO tw (value) VALUES (5)'

def generator():
for row in range(rows):
deferred = pool.runOperation(query)
deferred.addCallback(lambda _: None)
yield deferred

cooperator = task.Cooperator()
return cooperator.coiterate(generator())



def run(callable, repeats):
_removeFile('test-sq3.db3')
conn = sqlite3.connect('./test-sq3.db3')
cursor = conn.cursor()
cursor.execute('CREATE TABLE t (id ROWID, value INTEGER)')

for step in range(repeats):
print Run #%d %s... % (step, inserter)
callable(conn, 2000)

conn.close()

cursor = None
conn = None


def run_twisted(callable, repeats):
_removeFile('test-twisted.db3')
pool = ConnectionPool('sqlite3', cp_min=1, cp_max=1, 
database='test-twisted.db3', check_same_thread=False)
pool.runOperation('CREATE TABLE tw (id ROWID, value INTEGER)')

last = None

@defer.inlineCallbacks
def execute():
for step in range(repeats):
print Run #%d %s... % (step, callable)
last = callable(pool, 2000)
yield last

last.addCallback(lambda _: pool.close())
last.addCallback(lambda _: reactor.stop())

reactor.callWhenRunning(execute)
reactor.run()

last = None
pool = None


gc.collect()
objgraph.show_growth()

#run(plain_sqlite3, 100)
#run_twisted(adbapi, 100)
#run_twisted(inline_callbacks, 100)
#run_twisted(semaphore, 100)
run_twisted(cooperator, 100)

print Press ENTER to exit...
sys.stdin.read(1)

gc.collect()
objgraph.show_growth()
A. Plain SQLite3


Memory: 17 Mb
Peak memory: 19 Mb


B. Twisted ADBAPI
-

Memory: 36 Mb
Peak memory: 240 Mb

wrapper_descriptor 1326   +15
function   2716   +13
dict   1895+8
getset_descriptor   444+5
weakref1067+4
member_descriptor   374+3
list331+3
method_descriptor   700+1
classobj108+1
module  165+1

C. Twisted Inline Callbacks
---

Memory: 21 Mb
Peak memory: 23 Mb


Re: [Twisted-Python] Graceful shutdown of twistd application

2014-09-04 Thread exarkun

On 12:36 pm, sangiova...@nweb.it wrote:

On Thu, Sep 4, 2014 at 2:02 PM, exar...@twistedmatrix.com wrote:



You said before shutdown triggers are too late but you didn't say why. 
I
think that's based on a misunderstanding - but if not, then explain 
why it

doesn't work for your scenario.


Hi, thanks for your reply.

I've tried the following:

def sleep(secs):
   log.msg('from within trigger')
   d = defer.Deferred()
   reactor.callLater(secs, d.callback, None)
   return d

reactor.addSystemEventTrigger('before', 'shutdown', sleep, 10)


All 'before' trigger are run concurrently.  If you're using 
`Application` then your `sleep` trigger runs concurrently with the 
application's `stopService` trigger (because `Application` has its 
stopService added as another 'before' shutdown' trigger alongside 
yours).


If you want to delay your application shutdown, you need to cooperate a 
little more closely with it.  Either attach your application shutdown 
code as a callback to the sleep Deferred or move the sleep into the 
stopService implementation of one of the services on your application 
and trigger the remaining stopService calls (eg the stopService call on 
the MultiService you mentioned) when the sleep Deferred fires there.


Jean-Paul

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] High throughput database logger

2014-09-04 Thread exarkun

On 12:51 pm, adi.libot...@proatria.com wrote:

Hi,

I'm looking at various options for implementing a high throughput 
database logger that will work with Twisted.


My requirements, listed by importance:

1) small memory footprint
2) high speed
3) low garbage generation

The application I'm working on runs continuously (24/7). I've 
experimented a bit with pysqlite and Twisted to see which approach is 
better suited (see attached example).




Question 1: I noticed that all of the Twisted based versions are very 
slow compared to the plain sqlite3 test. This seems to be caused by 
atomic transaction management, namely a commit after each insert.


Not only this but in some of the Twisted versions you've introduced a 
round-trip communication from the reactor thread to a worker thread 
between each operation.  This will greatly impact throughput by adding 
lots of latency to each insert.
Would be interested to know if there is a simple way to avoid this and 
do my own transaction management (aka batch commit).


Using twisted.enterprise.adbapi?  You could probably hack something 
horrible together but it would definitely be a hack.  I suggest you take 
a look at adbapi2 instead - http://trac.calendarserver.org/wiki/twext.
One other thing is the greatly varying amounts of garbage generated 
(peak memory) and memory usage between the Twisted variants.


Garbage and peak memory are different things.  The Twisted-using 
version does a lot more - and some of your Twisted-using versions put 
the *entire* data set into memory (in a vastly expanded form, where each 
insert is represented by multiple large objects including Deferreds). 
So it's not too surprising the memory usage is greater.



Question 2: I would have expected B (Twisted ADBAPI) to behave very 
similar to C/E since I'm using a connection pool of size 1 and all 
requests are queued and handled sequentially.


Could any of you please give me some pointers as to why this is 
happening?


You didn't actually label the code with these letters. :)  I'm guessing 
B is the `adbapi` function, C is `inline_callbacks`, and E is 
`cooperator`.


Also you didn't say in what respect you expected them to behavior 
similarly.  You expected their memory usage to be the same?  You 
expected their runtime to be the same?  You expected them to put the 
same data into the database?


As far as memory usage goes, B uses lots of memory for the same reason 
`semaphore` (D?) uses lots of memory.  You queue up the entire dataset 
in memory as piles of tuples, lists, Deferreds, etc.


adbapi might be executing the operations one at a time, but the *loop* 
inside `adbapi` runs all the way to the end all in one go.  It starts 
every one of those `runOperation`s before any of them (probably) has a 
chance to execute.



Question 3: Even though objgraph lists the exact same reference count 
once the code is ran, the amount of used memory greatly differs. Any 
ideas what might be causing this?


Hopefully the above helps explain this.

Something else you might consider is batching up your inserts (not just 
only committing after a batch of inserts).  Since SQLite3 can only write 
from a single thread at a time, you're effectively limited to serialized 
inserts - so it doesn't make sense to try to start a second insert 
before the first has finished.


When the first finishes, if 50 more data points have arrived, you should 
do one insert for all 50 of those - not 50 inserts each for one piece of 
data.  This cuts off a bunch of your overhead - Python objects, round- 
trip latency for inter-thread communication, function calls, etc.


Jean-Paul
Any suggestions and/or pointers on how to improve/do this are more than 
welcome.


Thank you for your time,
Adrian


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python