On Thu, Feb 7, 2013 at 1:55 PM, Marko Tasic <mtasi...@gmail.com> wrote: > Hi, > > I would like to share short story with you and share what we have > accomplished with PyPy and its friends so far. > > Company that I have worked for last 7 months (intentionally unnamed) > gave me absolute permission to pick up technologies on which we based > our solution. What we do is: crawl for PDFs and newspapers articles, > download, translate them if needed, OCR if needed, do extensive > analysis of downloaded PDFs and articles, store them in more organized > structures for faster querying, search for them and generate bunch of > complex reports. > > From very beginning I decided to go with PyPy no matter what. What we > picked is following: > * Flask for web framework, and few of its extensions such as > Flask-Login, Flask-Principal, Flask-WTF, Flask-Mail, etc. > * Cassandra as database because of its features and great experience > with it. PyCassa is used as client to talk to Cassandra server. > * ElasticSearch as distributed search engine, and its client library pyes. > * Whoosh as search engine, but with some modifications to support > Cassandra as storage and distributed locking. > * Redis, and its client library redis-py, for caching and to speed up > common auto-completion patterns. > * ZooKeeper, and its client library Kazoo, for distributed locking > which plays essential role in system for transaction-like behavior > over many services at once. > * Celery in conjunction with RabbitMQ for task distribution. > * Sentry for error logging. > > What we have developed on our own are wrappers and clients for: > * Moses which is language translator > * Tesseract which is OCR engine > * Cassandra store for Whoosh > * wkhtmltopdf and wkhtmltoimage which are used for conversion of HTML > to PDF/Image > * etc > > Now when product is finished and in final testing phase, I can say > that we did not regret because we used PyPy and stack around it. > Typical speed improvement is 2x-3x over CPython in our case, but > anyway we are mostly IO and memory bound, expect for Celery workers > where we do analysis which are again many small CPU intensive tasks > that are exchanged via RabbitMQ. Another reason why we don't see > speedup us is that we are dependent on external software (servers) > written in Erlang and Java. > > I'm already planing to do Cassandra (distributed key/value only > database without index features), ZooKeeper, Redis and ElasticSearch > ports in Python for next projects, and hopefully opensource them. > > Regards, > Marko Tasic > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > http://mail.python.org/mailman/listinfo/pypy-dev
Awesome! I'm glad people can make pypy work for non-trivial tasks which require a lot of dependencies. We're trying to lower the bar, however it takes time. Cheers, fijal _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev