Hi, I would like to share short story with you and share what we have accomplished with PyPy and its friends so far.
Company that I have worked for last 7 months (intentionally unnamed) gave me absolute permission to pick up technologies on which we based our solution. What we do is: crawl for PDFs and newspapers articles, download, translate them if needed, OCR if needed, do extensive analysis of downloaded PDFs and articles, store them in more organized structures for faster querying, search for them and generate bunch of complex reports. >From very beginning I decided to go with PyPy no matter what. What we picked is following: * Flask for web framework, and few of its extensions such as Flask-Login, Flask-Principal, Flask-WTF, Flask-Mail, etc. * Cassandra as database because of its features and great experience with it. PyCassa is used as client to talk to Cassandra server. * ElasticSearch as distributed search engine, and its client library pyes. * Whoosh as search engine, but with some modifications to support Cassandra as storage and distributed locking. * Redis, and its client library redis-py, for caching and to speed up common auto-completion patterns. * ZooKeeper, and its client library Kazoo, for distributed locking which plays essential role in system for transaction-like behavior over many services at once. * Celery in conjunction with RabbitMQ for task distribution. * Sentry for error logging. What we have developed on our own are wrappers and clients for: * Moses which is language translator * Tesseract which is OCR engine * Cassandra store for Whoosh * wkhtmltopdf and wkhtmltoimage which are used for conversion of HTML to PDF/Image * etc Now when product is finished and in final testing phase, I can say that we did not regret because we used PyPy and stack around it. Typical speed improvement is 2x-3x over CPython in our case, but anyway we are mostly IO and memory bound, expect for Celery workers where we do analysis which are again many small CPU intensive tasks that are exchanged via RabbitMQ. Another reason why we don't see speedup us is that we are dependent on external software (servers) written in Erlang and Java. I'm already planing to do Cassandra (distributed key/value only database without index features), ZooKeeper, Redis and ElasticSearch ports in Python for next projects, and hopefully opensource them. Regards, Marko Tasic _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev