Re: taking python enterprise level?...
On Wed, 2010-03-03 at 16:23 -0500, D'Arcy J.M. Cain wrote: > On Wed, 03 Mar 2010 20:39:35 +0100 > mk wrote: > > > If you denormalise the table, and update the first index to be on > > > (client_id, project_id, date) it can end up running far more quickly - > > Maybe. Don't start with denormalization. Write it properly and only > consider changing if profiling suggests that that is your bottleneck. Quite - and I'd add to cache reads as much in front end machines as is permissible in your use case before considering denormalisation. > With a decent database engine and proper design it will hardly ever be. I completely agree - I'm simply responding to the request for an example where denormalisation may be a good idea. Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Wed, 2010-03-03 at 20:39 +0100, mk wrote: > Hello Tim, > > Pardon the questions but I haven't had the need to use denormalization > yet, so: > IOW you basically merged the tables like follows? > > CREATE TABLE projects ( > client_id BIGINT NOT NULL, > project_id BIGINT NOT NULL, > cost INT, > date DATETIME, > INDEX(client_id, project_id, date) > ); Yup > From what you write further in the mail I conclude that you have not > eliminated the first table, just made table projects look like I wrote > above, right? (and used stored procedures to make sure that both tables > contain the relevant data for client_id and project_id columns in both > tables) Yup > Have you had some other joins on denormalized keys? i.e. in example how > the join of hypothetical TableB with projects on projects.client_id > behave with such big tables? (bc I assume that you obviously can't > denormalize absolutely everything, so this implies the need of doing > some joins on denormalized columns like client_id). For these joins (for SELECT statements) this _can_ end up running faster - of course all of this depends on what kind of queries you normally end up getting and the distribution of data in the indexes. I've never written anything that started out with a schema like this, but several have ended up getting denormalised as the projects have matured and query behaviour has been tested > > assuming you can access the first mapping anyway - > > ? I'm not clear on what you mean here. I'm refering to not eliminating the first table as you concluded > > Regards, > mk > -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
Philip Semanchuk wrote: Well OK, but that's a very different argument. Yes, joins can be expensive. They're often still the best option, though. The first step people usually take to get away from joins is denormalization which can improve SELECT performance at the expense of slowing down INSERTs, UPDATEs, and DELETEs, not to mention complicating one's code and data model. Is that a worthwhile trade? I'd say that in more than 99% of situations: NO. More than that: if I haven't normalized my data as it should have been normalized, I wouldn't be able to do complicated querying that I really, really have to be able to do due to business logic. A few of my queries have a few hundred lines each with many sub-queries and multiple many-to-many joins: I *dread the thought* what would happen if I had to reliably do it in a denormalized db and still ensure data integrity across all the business logic contexts. And performance is still more than good enough: so there's no point for me, as of the contexts I normally work in, to denormalize data at all. It's just interesting for me to see what happens in that <1% of situations. Depends on the application. As I said, sometimes the cure is worse than the disease. Don't worry about joins until you know they're a problem. As Knuth said, premature optimization is the root of all evil. Sure -- the cost of joins is just interesting to me as a 'corner case'. I don't have datasets large enough for this to matter in the first place (and I probably won't have them that huge). PS - Looks like you're using Postgres -- excellent choice. I miss using it. If you can, I'd recommend using SQLAlchemy layer on top of Oracle/Mysql/Sqlite, if that's what you have to use: this *largely* insulates you from the problems below and it does the job of translating into a peculiar dialect very well. For my purposes, SQLAlchemy worked wonderfully: it's very flexible, it has middle-level sql expression language if normal querying is not flexible enough (and normal querying is VERY flexible), it has a ton of nifty features like autoloading and rarely fails bc of some lower-level DB quirk AND its high-level object syntax is so similar to SQL that you quickly & intuitively grasp it. (and if you have to/prefer writing some query in "low-level" SQL, as I have done a few times, it's still easy to make SQLAlchemy slurp the result into objects provided you ensure there are all of the necessary columns in the query result) Regards, mk -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
till i think i absolutely need to trade-off easier and less complicated code, better db structure (from a relational perspective) and generally less "head aches" for speed, i think i'll stick with the joins for now!...;) the thought of denormalization really doesnt appeal to me... -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Mar 3, 2010, at 3:58 PM, mk wrote: Philip Semanchuk wrote: So there *may* be some evidence that joins are indeed bad in practice. If someone has smth specific/interesting on the subject, please post. It's an unprovable assertion, or a meaningless one depending on how one defines the terms. You could also say "there *may* be some evidence that Python lists are bad in practice". Python lists and SQL JOINs are like any part of a language or toolkit. They're good tools for solving certain classes of problems. They can also be misapplied to problems that they're not so good at. Sometimes they're a performance bottleneck, even when solving the problems for which they're best. Sometimes the best way to solve a performance bottleneck is to redesign your app/system so you don't need to solve that kind of problem anymore (hence the join-less databases). Other times, the cure is worse than the disease and you're better off throwing hardware at the problem. Look, I completely agree with what you're saying, but: that doesn't change the possibility that joins may be expensive in comparison to other SQL operations. This is the phrase I should have used perhaps; 'expensive in comparison with other SQL operations' instead of 'bad'. Well OK, but that's a very different argument. Yes, joins can be expensive. They're often still the best option, though. The first step people usually take to get away from joins is denormalization which can improve SELECT performance at the expense of slowing down INSERTs, UPDATEs, and DELETEs, not to mention complicating one's code and data model. Is that a worthwhile trade? Depends on the application. As I said, sometimes the cure is worse than the disease. Don't worry about joins until you know they're a problem. As Knuth said, premature optimization is the root of all evil. Good luck Philip PS - Looks like you're using Postgres -- excellent choice. I miss using it. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Wed, 03 Mar 2010 20:39:35 +0100 mk wrote: > > If you denormalise the table, and update the first index to be on > > (client_id, project_id, date) it can end up running far more quickly - Maybe. Don't start with denormalization. Write it properly and only consider changing if profiling suggests that that is your bottleneck. With a decent database engine and proper design it will hardly ever be. > From what you write further in the mail I conclude that you have not > eliminated the first table, just made table projects look like I wrote > above, right? (and used stored procedures to make sure that both tables > contain the relevant data for client_id and project_id columns in both > tables) Note that rather than speeding things up this could actually slow things down depending on your usage. If you do lots of updates and you have to write extra information every time then that's worse than a few extra reads, especially since read data can be cached but written data must be pushed to disk immediately in an ACID database. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
Philip Semanchuk wrote: So there *may* be some evidence that joins are indeed bad in practice. If someone has smth specific/interesting on the subject, please post. It's an unprovable assertion, or a meaningless one depending on how one defines the terms. You could also say "there *may* be some evidence that Python lists are bad in practice". Python lists and SQL JOINs are like any part of a language or toolkit. They're good tools for solving certain classes of problems. They can also be misapplied to problems that they're not so good at. Sometimes they're a performance bottleneck, even when solving the problems for which they're best. Sometimes the best way to solve a performance bottleneck is to redesign your app/system so you don't need to solve that kind of problem anymore (hence the join-less databases). Other times, the cure is worse than the disease and you're better off throwing hardware at the problem. Look, I completely agree with what you're saying, but: that doesn't change the possibility that joins may be expensive in comparison to other SQL operations. This is the phrase I should have used perhaps; 'expensive in comparison with other SQL operations' instead of 'bad'. Example from my app, where I behaved "by the book" (I hope) and normalized my data: $ time echo "\c hrs; SELECT hosts.ip, reservation.start_date, architecture.architecture, os_kind.os_kind, os_rel.os_rel, os_version.os_version, project.project, email.email FROM hosts INNER JOIN project ON project.id = hosts.project_id INNER JOIN architecture ON hosts.architecture_id = architecture.id INNER JOIN os_kind ON os_kind.id = hosts.os_kind_id INNER JOIN os_rel ON hosts.os_rel_id = os_rel.id INNER JOIN os_version ON hosts.os_version_id = os_version.id INNER JOIN reservation_hosts ON hosts.id = reservation_hosts.host_id INNER JOIN reservation on reservation.id = reservation_hosts.reservation_id INNER JOIN email ON reservation.email_id = email.id ;" | psql > /dev/null real0m0.099s user0m0.015s sys 0m0.005s $ time echo "\c hrs; > SELECT hosts.ip FROM hosts; > SELECT reservation.start_date FROM reservation; > SELECT architecture.architecture FROM architecture; > SELECT os_rel.os_rel FROM os_rel; > SELECT os_version.os_version FROM os_version; > SELECT project.project FROM project; > SELECT email.email FROM email; > " | psql > /dev/null real0m0.046s user0m0.008s sys 0m0.004s Note: I've created indexes on those tables, both on data columns like hosts.ip and on .id columns. So yes, joins apparently are at least twice as expensive as simple selects without joins, on a small dataset. Not a drastic increase in cost, but smth definitely shows. It would be interesting to see what happens when row numbers increase to large numbers, but I have no such data. Regards, mk -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
Hello Tim, Pardon the questions but I haven't had the need to use denormalization yet, so: Tim Wintle wrote: /* Table A*/ CREATE TABLE TableA ( project_id BIGINT NOT NULL, cost INT, date DATETIME, PRIMARY KEY (project_id, date) ); /* Table projects */ CREATE TABLE projects ( client_id BIGINT NOT NULL, project_id BIGINT NOT NULL, INDEX(client_id) ); now the index on TableA has been optimised for queries against date ranges on specific project ids which should more or less be sequential (under a load of other assumptions) - but that reduces the efficiency of the query under a join with the table "projects". If you denormalise the table, and update the first index to be on (client_id, project_id, date) it can end up running far more quickly - IOW you basically merged the tables like follows? CREATE TABLE projects ( client_id BIGINT NOT NULL, project_id BIGINT NOT NULL, cost INT, date DATETIME, INDEX(client_id, project_id, date) ); From what you write further in the mail I conclude that you have not eliminated the first table, just made table projects look like I wrote above, right? (and used stored procedures to make sure that both tables contain the relevant data for client_id and project_id columns in both tables) Have you had some other joins on denormalized keys? i.e. in example how the join of hypothetical TableB with projects on projects.client_id behave with such big tables? (bc I assume that you obviously can't denormalize absolutely everything, so this implies the need of doing some joins on denormalized columns like client_id). assuming you can access the first mapping anyway - ? I'm not clear on what you mean here. so you're still storing the first table, with stored procedures to ensure you still have correct data in all tables. Regards, mk -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Mar 3, 2010, at 11:26 AM, mk wrote: D'Arcy J.M. Cain wrote: I keep seeing this statement but nothing to back it up. I have created many apps that run on Python with a PostgreSQL database with a fully normalized schema and I can assure you that database joins were never my problem unless I made a badly constructed query or left off a critical index. I too have done that (Python/PGSQL), even adding a complicated layer of SQLAlchemy on top of it and have not had issue with this: when I profiled one of my apps, it turned out that it spent most of its computation time... rendering HTML. Completely unexpected: I expected DB to be bottleneck (although it might be that with huge datasets this might change). Having said that, re evidence that joins are bad: from what I've *heard* about Hibernate in Java from people who used it (I haven't used Hibernate apart from "hello world"), in case of complicated object hierarchies it supposedly generates a lot of JOINs and that supposedly kills DB performance. So there *may* be some evidence that joins are indeed bad in practice. If someone has smth specific/interesting on the subject, please post. It's an unprovable assertion, or a meaningless one depending on how one defines the terms. You could also say "there *may* be some evidence that Python lists are bad in practice". Python lists and SQL JOINs are like any part of a language or toolkit. They're good tools for solving certain classes of problems. They can also be misapplied to problems that they're not so good at. Sometimes they're a performance bottleneck, even when solving the problems for which they're best. Sometimes the best way to solve a performance bottleneck is to redesign your app/system so you don't need to solve that kind of problem anymore (hence the join-less databases). Other times, the cure is worse than the disease and you're better off throwing hardware at the problem. My $.02 Philip -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
mk wrote: > D'Arcy J.M. Cain wrote: >> I keep seeing this statement but nothing to back it up. I have created >> many apps that run on Python with a PostgreSQL database with a fully >> normalized schema and I can assure you that database joins were never >> my problem unless I made a badly constructed query or left off a >> critical index. > > I too have done that (Python/PGSQL), even adding a complicated layer of > SQLAlchemy on top of it and have not had issue with this: when I > profiled one of my apps, it turned out that it spent most of its > computation time... rendering HTML. Completely unexpected: I expected DB > to be bottleneck (although it might be that with huge datasets this > might change). > > Having said that, re evidence that joins are bad: from what I've *heard* > about Hibernate in Java from people who used it (I haven't used > Hibernate apart from "hello world"), in case of complicated object > hierarchies it supposedly generates a lot of JOINs and that supposedly > kills DB performance. > > So there *may* be some evidence that joins are indeed bad in practice. > If someone has smth specific/interesting on the subject, please post. > I suspect that this myth is propagated from the distributed database world: joining tables across two different servers can indeed be problematic from a performance point of view. However, the classic advice in database design is to start with a normalized design and then vary it only if you need to for performance reasons (which will also involve taking a hit on the coding side, especially if updates are involved). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Wed, 2010-03-03 at 17:26 +0100, mk wrote: > > So there *may* be some evidence that joins are indeed bad in > practice. > If someone has smth specific/interesting on the subject, please post. I have found joins to cause problems in a few cases - I'm talking about relatively large tables though - roughly order 10^8 rows. I'm on Mysql normally, but that shouldn't make any difference - I've seen almost the same situation on Oracle consider this simple example: /* Table A*/ CREATE TABLE TableA ( project_id BIGINT NOT NULL, cost INT, date DATETIME, PRIMARY KEY (project_id, date) ); /* Table projects */ CREATE TABLE projects ( client_id BIGINT NOT NULL, project_id BIGINT NOT NULL, INDEX(client_id) ); ... now the index on TableA has been optimised for queries against date ranges on specific project ids which should more or less be sequential (under a load of other assumptions) - but that reduces the efficiency of the query under a join with the table "projects". If you denormalise the table, and update the first index to be on (client_id, project_id, date) it can end up running far more quickly - assuming you can access the first mapping anyway - so you're still storing the first table, with stored procedures to ensure you still have correct data in all tables. I'm definitely glossing over the details - but I've definitely got situations where I've had to choose denormalisation over purity of data. Rolled-up data tables are other situations - where you know half your queries are grouping by field "A" it's sometimes a requirement to store that. Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
D'Arcy J.M. Cain wrote: I keep seeing this statement but nothing to back it up. I have created many apps that run on Python with a PostgreSQL database with a fully normalized schema and I can assure you that database joins were never my problem unless I made a badly constructed query or left off a critical index. I too have done that (Python/PGSQL), even adding a complicated layer of SQLAlchemy on top of it and have not had issue with this: when I profiled one of my apps, it turned out that it spent most of its computation time... rendering HTML. Completely unexpected: I expected DB to be bottleneck (although it might be that with huge datasets this might change). Having said that, re evidence that joins are bad: from what I've *heard* about Hibernate in Java from people who used it (I haven't used Hibernate apart from "hello world"), in case of complicated object hierarchies it supposedly generates a lot of JOINs and that supposedly kills DB performance. So there *may* be some evidence that joins are indeed bad in practice. If someone has smth specific/interesting on the subject, please post. Regards, mk -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
In article , D'Arcy J.M. Cain wrote: > >Put as much memory as you can afford/fit into your database server. >It's the cheapest performance boost you can get. If you have a serious >application put at least 4GB into your dedicated database server. >Swapping is your enemy. Also, put your log/journal files on a different spindle from the database files. That makes a *huge* difference. -- Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/ "Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important." --Henry Spencer -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Mon, 1 Mar 2010 16:20:06 -0800 (PST) mdipierro wrote: > Joins are the bottle neck of most web app that relay on relational > databases. That is why non-relational databases such as Google App > Engine, CouchDB, MongoDB do not even support Joins. You have to try to > minimize joins as much as possible by using tricks such as de- > normalization and caching. I keep seeing this statement but nothing to back it up. I have created many apps that run on Python with a PostgreSQL database with a fully normalized schema and I can assure you that database joins were never my problem unless I made a badly constructed query or left off a critical index. > I meant 512MB. The point is you need a lot of ram because you want to > run multiple python instances, cache in ram as much as possible and > also allow the database to buffer in ram as much as possible. You will > see Ram usage tends to spike when you have lots of concurrent > requests. Put as much memory as you can afford/fit into your database server. It's the cheapest performance boost you can get. If you have a serious application put at least 4GB into your dedicated database server. Swapping is your enemy. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Mar 1, 6:32 am, simn_stv wrote: ... > > > You have to follow some tricks: > > > 1) have the web server serve static pages directly and set the pragma > > cache expire to one month > > 2) cache all pages that do not have forms for at least few minutes > > 3) avoid database joins > > but this would probably be to the detriment of my database design, > which is a no-no as far as im concerned. The way the tables would be > structured requires 'joins' when querying the db; or could you > elaborate a little?? Joins are the bottle neck of most web app that relay on relational databases. That is why non-relational databases such as Google App Engine, CouchDB, MongoDB do not even support Joins. You have to try to minimize joins as much as possible by using tricks such as de- normalization and caching. > > 4) use a server with at least 512KB Ram. > > hmmm...!, still thinking about what you mean by this statement also. I meant 512MB. The point is you need a lot of ram because you want to run multiple python instances, cache in ram as much as possible and also allow the database to buffer in ram as much as possible. You will see Ram usage tends to spike when you have lots of concurrent requests. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Mon, 1 Mar 2010 06:42:28 -0800 (PST) simn_stv wrote: > On Feb 26, 10:19 am, "Diez B. Roggisch" wrote: > > So when you talk about ACKs, you don't mean these on the TCP-level > > (darn, whatever iso-level that is...), but on some higher level? > > i think its on the TCP that he's referring to or is it?... No, I mean in our own application layer. > if it is, that means he's doing some 'mean' network level scripting, > impressive...but i never thought python could go that deep in network > programming!... What I meant was that we just keep sending packets which TCP/IP keeps in order for us by reassembling out-of-order and retransmitted packets. Asynchronously we sent back to our own application an ACK that our app level packet was finally received. It's a sliding window protocol. http://en.wikipedia.org/wiki/Sliding_Window_Protocol -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Feb 26, 10:19 am, "Diez B. Roggisch" wrote: > Am 26.02.10 05:01, schrieb D'Arcy J.M. Cain: > > > > > On Fri, 26 Feb 2010 01:12:00 +0100 > > "Diez B. Roggisch" wrote: > >>> That better way turned out to asynchronous update transactions. All we > >>> did was keep feeding updates to the remote site and forget about ACKS. > >>> We then had a second process which handled ACKS and tracked which > >>> packets had been properly transferred. The system had IDs on each > >>> update and retries happened if ACKS didn't happen soon enough. > >>> Naturally we ignored ACKS that we had already processed. > > >> sounds like using UDP to me, of course with a protocol on top (namely > >> the one you implemented). > > >> Any reason you sticked to TCP instead? > > > TCP does a great job of delivering a stream of data in order and > > handling the retries. The app really was connection oriented and we > > saw no reason to emulate that over an unconnected protocol. There were > > other wheels to reinvent that were more important. > > So when you talk about ACKs, you don't mean these on the TCP-level > (darn, whatever iso-level that is...), but on some higher level? > > Diez i think its on the TCP that he's referring to or is it?... if it is, that means he's doing some 'mean' network level scripting, impressive...but i never thought python could go that deep in network programming!... -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Mon, 1 Mar 2010 04:11:07 -0800 (PST) simn_stv wrote: > > All of the above (and much more complexity not even discussed here) was > > handled by Python code and database manipulation. There were a few > > bumps along the way but overall it worked fine. If we were using C or > > even assembler we would not have sped up anything and the solution we > > came up with would have been horrendous to code. As it was I and my > > chief programmer locked ourselves in the boardroom and had a working > > solution before the day was out. > > sure it wouldnt have sped it up a bit, even a bit?; probably the > development and maintenance time would be a nightmare but it should > speed the app up a bit... What do you mean by "even a bit?" The bulk of the time is in sending bits on the wire. Computer time was always negligible in this situation. Yes, I can write all of my applications in assembler to get a 0.01% increase in speed but who cares? If you have decent algorithms in place then 99% of the time I/O will be your bottleneck and if it isn't then you have a compute heavy problem that assembler isn't going to fix. And even if I get a 100% increase in speed, I still lose. Computer time is cheaper than programmer time by so many orders of magnitude that it isn't even worh factoring in the speedup. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Feb 26, 10:32 am, mdipierro wrote: > 100,000 hits a day is not a low. I get that some day on my web server > without problem and without one request dropped. > > Most frameworks web2py, Django, Pylons can handle that kind of load > since Python is not the bottle neck. taking a look at django right now, doesnt look too bad from where im standing, maybe when i get into the code i'd run into some issues that would cause some headaches!! > You have to follow some tricks: > > 1) have the web server serve static pages directly and set the pragma > cache expire to one month > 2) cache all pages that do not have forms for at least few minutes > 3) avoid database joins but this would probably be to the detriment of my database design, which is a no-no as far as im concerned. The way the tables would be structured requires 'joins' when querying the db; or could you elaborate a little?? > 4) use a server with at least 512KB Ram. hmmm...!, still thinking about what you mean by this statement also. > 5) if you pages are large, use gzip compression > > If you develop your app with the web2py framework, you always have the > option to deploy on the Google App Engine. If you can live with their > constraints you should have no scalability problems. > > Massimo > > On Feb 25, 4:26 am, simn_stv wrote: > > > hello people, i have been reading posts on this group for quite some > > time now and many, if not all (actually not all!), seem quite > > interesting. > > i plan to build an application, a network based application that i > > estimate (and seriously hope) would get as many as 100, 000 hits a day > > (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' > > or anything like it, its mainly for a financial transactions which > > gets pretty busy... > > so my question is this would anyone have anything that would make > > python a little less of a serious candidate (cos it already is) and > > the options may be to use some other languages (maybe java, C (oh > > God))...i am into a bit of php and building API's in php would not be > > the hard part, what i am concerned about is scalability and > > efficiency, well, as far as the 'core' is concerned. > > > would python be able to manage giving me a solid 'core' and will i be > > able to use python provide any API i would like to implement?... > > > im sorry if my subject was not as clear as probably should be!. > > i guess this should be the best place to ask this sort of thing, hope > > im so right. > > > Thanks thanks for the feedback... -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Feb 25, 5:18 pm, "D'Arcy J.M. Cain" wrote: > On Thu, 25 Feb 2010 15:29:34 + > "Martin P. Hellwig" wrote: > > > On 02/25/10 13:58, D'Arcy J.M. Cain wrote: > > > On Thu, 25 Feb 2010 02:26:18 -0800 (PST) > > > > > Our biggest problem was in > > > a network heavy element of the app and that was low level TCP/IP stuff > > > that rather than being Python's problem was something we used Python to > > > fix. > > > > Out off interest, could you elaborate on that? > > Somewhat - there is an NDA so I can't give exact details. It was > crucial to our app that we sync up databases in Canada and the US (later > Britain, Europe and Japan) in real time with those transactions. Our > problem was that even though our two server systems were on the > backbone, indeed with the same major carrier, we could not keep them in > sync. We were taking way to long to transact an update across the > network. > > The problem had to do with the way TCP/IP works, especially closer to > the core. Our provider was collecting data and sending it only after > filling a buffer or after a timeout. The timeout was short so it > wouldn't normally be noticed and in most cases (web pages e.g.) the > connection is opened, data is pushed and the connection is closed so > the buffer is flushed immediately. Our patterns were different so we > were hitting the timeout on every single transaction and there was no > way we would have been able to keep up. > > Our first crack at fixing this was to simply add garbage to the packet > we were sending. Making the packets an order of magnitude bigger sped > up the proccessing dramatically. That wasn't a very clean solution > though so we looked for a better way. > > That better way turned out to asynchronous update transactions. All we > did was keep feeding updates to the remote site and forget about ACKS. > We then had a second process which handled ACKS and tracked which > packets had been properly transferred. The system had IDs on each > update and retries happened if ACKS didn't happen soon enough. > Naturally we ignored ACKS that we had already processed. > > All of the above (and much more complexity not even discussed here) was > handled by Python code and database manipulation. There were a few > bumps along the way but overall it worked fine. If we were using C or > even assembler we would not have sped up anything and the solution we > came up with would have been horrendous to code. As it was I and my > chief programmer locked ourselves in the boardroom and had a working > solution before the day was out. sure it wouldnt have sped it up a bit, even a bit?; probably the development and maintenance time would be a nightmare but it should speed the app up a bit... > > Python wins again. > > -- > D'Arcy J.M. Cain | Democracy is three > wolveshttp://www.druid.net/darcy/ | and a sheep voting on > +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. seriously added to the reputation of python, from my own perspective...kudos python! -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Feb 25, 5:18 pm, "D'Arcy J.M. Cain" wrote: > On Thu, 25 Feb 2010 15:29:34 + > "Martin P. Hellwig" wrote: > > > On 02/25/10 13:58, D'Arcy J.M. Cain wrote: > > > On Thu, 25 Feb 2010 02:26:18 -0800 (PST) > > > > > Our biggest problem was in > > > a network heavy element of the app and that was low level TCP/IP stuff > > > that rather than being Python's problem was something we used Python to > > > fix. > > > > Out off interest, could you elaborate on that? > > Somewhat - there is an NDA so I can't give exact details. It was > crucial to our app that we sync up databases in Canada and the US (later > Britain, Europe and Japan) in real time with those transactions. Our > problem was that even though our two server systems were on the > backbone, indeed with the same major carrier, we could not keep them in > sync. We were taking way to long to transact an update across the > network. > > The problem had to do with the way TCP/IP works, especially closer to > the core. Our provider was collecting data and sending it only after > filling a buffer or after a timeout. The timeout was short so it > wouldn't normally be noticed and in most cases (web pages e.g.) the > connection is opened, data is pushed and the connection is closed so > the buffer is flushed immediately. Our patterns were different so we > were hitting the timeout on every single transaction and there was no > way we would have been able to keep up. > > Our first crack at fixing this was to simply add garbage to the packet > we were sending. Making the packets an order of magnitude bigger sped > up the proccessing dramatically. That wasn't a very clean solution > though so we looked for a better way. > > That better way turned out to asynchronous update transactions. All we > did was keep feeding updates to the remote site and forget about ACKS. > We then had a second process which handled ACKS and tracked which > packets had been properly transferred. The system had IDs on each > update and retries happened if ACKS didn't happen soon enough. > Naturally we ignored ACKS that we had already processed. > > All of the above (and much more complexity not even discussed here) was > handled by Python code and database manipulation. There were a few > bumps along the way but overall it worked fine. If we were using C or > even assembler we would not have sped up anything and the solution we > came up with would have been horrendous to code. As it was I and my > chief programmer locked ourselves in the boardroom and had a working > solution before the day was out. sure it wouldnt have sped it up a bit, even a bit?; probably the development and maintenance time would be a nightmare but it should speed the app up a bit... > > Python wins again. > > -- > D'Arcy J.M. Cain | Democracy is three > wolveshttp://www.druid.net/darcy/ | and a sheep voting on > +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. seriously added to the reputation of python, from my own perspective...kudos python! -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Feb 25, 12:21 pm, "Martin P. Hellwig" wrote: > On 02/25/10 10:26, simn_stv wrote: > > what i am concerned about is scalability and > > efficiency, well, as far as the 'core' is concerned. > > > would python be able to manage giving me a solid 'core' and will i be > > able to use python provide any API i would like to implement?... > > > Python isn't the most efficient language, the assembler provided by the > maker of your CPU probably is the best you can get, LOL...;), yeah right, the mere thought of writing assembler instructions is SCARY!! >everything after > that is a trade-off between performance and flexibility (flexible in the > most flexible sense of the word :-)). > > That being said, for me, Python (well actually any turing complete > programming language), is more like a box of lego with infinite amount > of pieces. > Scalability and API issues are the same as the shape and function of the > model your making with lego. > > Sure some type of pieces might be more suited than other types but since > you can simulate any type of piece with the ones that are already > present, you are more limited by your imagination than by the language. > > So in short, I don't see any serious problems using Python, I have used > it in Enterprise environments without any problems but than again I was > aware not to use it for numerical intensive parts without the use of 3rd > party libraries like numpy. Which for me resulted in not doing the > compression of a database delta's in pure python but to offload that to > a more suitable external program, still controlled from Python though. > > -- > mph -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
100,000 hits a day is not a low. I get that some day on my web server without problem and without one request dropped. Most frameworks web2py, Django, Pylons can handle that kind of load since Python is not the bottle neck. You have to follow some tricks: 1) have the web server serve static pages directly and set the pragma cache expire to one month 2) cache all pages that do not have forms for at least few minutes 3) avoid database joins 4) use a server with at least 512KB Ram. 5) if you pages are large, use gzip compression If you develop your app with the web2py framework, you always have the option to deploy on the Google App Engine. If you can live with their constraints you should have no scalability problems. Massimo On Feb 25, 4:26 am, simn_stv wrote: > hello people, i have been reading posts on this group for quite some > time now and many, if not all (actually not all!), seem quite > interesting. > i plan to build an application, a network based application that i > estimate (and seriously hope) would get as many as 100, 000 hits a day > (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' > or anything like it, its mainly for a financial transactions which > gets pretty busy... > so my question is this would anyone have anything that would make > python a little less of a serious candidate (cos it already is) and > the options may be to use some other languages (maybe java, C (oh > God))...i am into a bit of php and building API's in php would not be > the hard part, what i am concerned about is scalability and > efficiency, well, as far as the 'core' is concerned. > > would python be able to manage giving me a solid 'core' and will i be > able to use python provide any API i would like to implement?... > > im sorry if my subject was not as clear as probably should be!. > i guess this should be the best place to ask this sort of thing, hope > im so right. > > Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
Am 26.02.10 05:01, schrieb D'Arcy J.M. Cain: On Fri, 26 Feb 2010 01:12:00 +0100 "Diez B. Roggisch" wrote: That better way turned out to asynchronous update transactions. All we did was keep feeding updates to the remote site and forget about ACKS. We then had a second process which handled ACKS and tracked which packets had been properly transferred. The system had IDs on each update and retries happened if ACKS didn't happen soon enough. Naturally we ignored ACKS that we had already processed. sounds like using UDP to me, of course with a protocol on top (namely the one you implemented). Any reason you sticked to TCP instead? TCP does a great job of delivering a stream of data in order and handling the retries. The app really was connection oriented and we saw no reason to emulate that over an unconnected protocol. There were other wheels to reinvent that were more important. So when you talk about ACKs, you don't mean these on the TCP-level (darn, whatever iso-level that is...), but on some higher level? Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Fri, 26 Feb 2010 01:12:00 +0100 "Diez B. Roggisch" wrote: > > That better way turned out to asynchronous update transactions. All we > > did was keep feeding updates to the remote site and forget about ACKS. > > We then had a second process which handled ACKS and tracked which > > packets had been properly transferred. The system had IDs on each > > update and retries happened if ACKS didn't happen soon enough. > > Naturally we ignored ACKS that we had already processed. > > sounds like using UDP to me, of course with a protocol on top (namely > the one you implemented). > > Any reason you sticked to TCP instead? TCP does a great job of delivering a stream of data in order and handling the retries. The app really was connection oriented and we saw no reason to emulate that over an unconnected protocol. There were other wheels to reinvent that were more important. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
That better way turned out to asynchronous update transactions. All we did was keep feeding updates to the remote site and forget about ACKS. We then had a second process which handled ACKS and tracked which packets had been properly transferred. The system had IDs on each update and retries happened if ACKS didn't happen soon enough. Naturally we ignored ACKS that we had already processed. sounds like using UDP to me, of course with a protocol on top (namely the one you implemented). Any reason you sticked to TCP instead? Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
In article , "D'Arcy J.M. Cain" wrote: > The problem had to do with the way TCP/IP works, especially closer to > the core. Our provider was collecting data and sending it only after > filling a buffer or after a timeout. The timeout was short so it > wouldn't normally be noticed and in most cases (web pages e.g.) the > connection is opened, data is pushed and the connection is closed so > the buffer is flushed immediately. Our patterns were different so we > were hitting the timeout on every single transaction and there was no > way we would have been able to keep up. > > Our first crack at fixing this was to simply add garbage to the packet > we were sending. Making the packets an order of magnitude bigger sped > up the proccessing dramatically. That wasn't a very clean solution > though so we looked for a better way. Interesting, the system I'm working with now has a similar problem. We've got a request/ack protocol over TCP which often sends lots of small packets and can have all sorts of performance issues because of this. In fact, we break completely on Solaris-10 with TCP Fusion enabled. We've gone back and forth with Sun on this (they claim what we're doing is broken, we claim TCP Fusion is broken). In the end, we just tell all of our Solaris-10 customers to disable TCP Fusion. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On 02/25/10 16:18, D'Arcy J.M. Cain wrote: Very interesting, I had a similar kind of problem (a network balancer that doesn't balance small tcp packages too well) and solved it by wrapping the TCP package in UDP. UDP was treated differently, although in overall switch and router manager it has a lower priority compared to other tcp packages, in normal usage it was faster. Probably because UDP has less things to inspect and by this can be processed faster by all the network equipment in between, but to be honest it worked for me and the client wasn't interested in academic explanations and since this was a working solution I didn't investigated it any further. Oh and a big thank you for PyGreSQL,! It has proven to be an extremely useful module for me (especially since I used to hop a lot between different unixes and Windows). -- mph -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
In article <5cd38064-34d6-40d3-b3dc-2c853fc86...@i39g2000yqm.googlegroups.com>, simn_stv wrote: > >i plan to build an application, a network based application that i >estimate (and seriously hope) would get as many as 100, 000 hits a day >(hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' >or anything like it, its mainly for a financial transactions which >gets pretty busy... Remember that YouTube runs on Python. -- Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/ "Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important." --Henry Spencer -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Thu, 25 Feb 2010 15:29:34 + "Martin P. Hellwig" wrote: > On 02/25/10 13:58, D'Arcy J.M. Cain wrote: > > On Thu, 25 Feb 2010 02:26:18 -0800 (PST) > > > Our biggest problem was in > > a network heavy element of the app and that was low level TCP/IP stuff > > that rather than being Python's problem was something we used Python to > > fix. > > Out off interest, could you elaborate on that? Somewhat - there is an NDA so I can't give exact details. It was crucial to our app that we sync up databases in Canada and the US (later Britain, Europe and Japan) in real time with those transactions. Our problem was that even though our two server systems were on the backbone, indeed with the same major carrier, we could not keep them in sync. We were taking way to long to transact an update across the network. The problem had to do with the way TCP/IP works, especially closer to the core. Our provider was collecting data and sending it only after filling a buffer or after a timeout. The timeout was short so it wouldn't normally be noticed and in most cases (web pages e.g.) the connection is opened, data is pushed and the connection is closed so the buffer is flushed immediately. Our patterns were different so we were hitting the timeout on every single transaction and there was no way we would have been able to keep up. Our first crack at fixing this was to simply add garbage to the packet we were sending. Making the packets an order of magnitude bigger sped up the proccessing dramatically. That wasn't a very clean solution though so we looked for a better way. That better way turned out to asynchronous update transactions. All we did was keep feeding updates to the remote site and forget about ACKS. We then had a second process which handled ACKS and tracked which packets had been properly transferred. The system had IDs on each update and retries happened if ACKS didn't happen soon enough. Naturally we ignored ACKS that we had already processed. All of the above (and much more complexity not even discussed here) was handled by Python code and database manipulation. There were a few bumps along the way but overall it worked fine. If we were using C or even assembler we would not have sped up anything and the solution we came up with would have been horrendous to code. As it was I and my chief programmer locked ourselves in the boardroom and had a working solution before the day was out. Python wins again. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On 02/25/10 13:58, D'Arcy J.M. Cain wrote: On Thu, 25 Feb 2010 02:26:18 -0800 (PST) Our biggest problem was in a network heavy element of the app and that was low level TCP/IP stuff that rather than being Python's problem was something we used Python to fix. Out off interest, could you elaborate on that? Thanks -- mph -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Thu, 25 Feb 2010 02:26:18 -0800 (PST) simn_stv wrote: > i plan to build an application, a network based application that i > estimate (and seriously hope) would get as many as 100, 000 hits a day That's nothing. I ran a financial type app on Python that sometimes hit 100,000 transactions an hour. We kept looking for bottlenecks that we could convert to C but never found any. Our biggest problem was in a network heavy element of the app and that was low level TCP/IP stuff that rather than being Python's problem was something we used Python to fix. As others have pointed out, you will want some kind of enterprise database that will do a lot of the heavy lifting. I suggest PostgreSQL. It is the best open source database engine around. That will take the biggest load off your app. There's lots of decisions to make in the days ahead but I think that choosing Python as your base language is a good first one. > so my question is this would anyone have anything that would make > python a little less of a serious candidate (cos it already is) and > the options may be to use some other languages (maybe java, C (oh > God))...i am into a bit of php and building API's in php would not be > the hard part, what i am concerned about is scalability and > efficiency, well, as far as the 'core' is concerned. Scaleability and efficiency won't be your issues. Speed of development and clarity of code will be. Python wins. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
simn_stv wrote: i plan to build an application, a network based application that i estimate (and seriously hope) would get as many as 100, 000 hits a day (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' or anything like it, its mainly for a financial transactions which gets pretty busy... so my question is this would anyone have anything that would make python a little less of a serious candidate (cos it already is) and the options may be to use some other languages (maybe java, C (oh God))...i am into a bit of php and building API's in php would not be the hard part, what i am concerned about is scalability and efficiency, well, as far as the 'core' is concerned. Python is as "enterprise" as the developer who wields it. Scalability revolves entirely around application design & implementation. Or you could use Erlang (or haskell, etc ;-) -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Feb 25, 12:13 pm, Steve Holden wrote: > simn_stv wrote: > > hello people, i have been reading posts on this group for quite some > > time now and many, if not all (actually not all!), seem quite > > interesting. > > i plan to build an application, a network based application that i > > estimate (and seriously hope) would get as many as 100, 000 hits a day > > (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' > > or anything like it, its mainly for a financial transactions which > > gets pretty busy... > > so my question is this would anyone have anything that would make > > python a little less of a serious candidate (cos it already is) and > > the options may be to use some other languages (maybe java, C (oh > > God))...i am into a bit of php and building API's in php would not be > > the hard part, what i am concerned about is scalability and > > efficiency, well, as far as the 'core' is concerned. > > > would python be able to manage giving me a solid 'core' and will i be > > able to use python provide any API i would like to implement?... > > > im sorry if my subject was not as clear as probably should be!. > > i guess this should be the best place to ask this sort of thing, hope > > im so right. > > > Thanks > > I'd suggest that if you are running an operation that gets 100,000 hits > a day then your problems won't be with Python but with organizational > aspects of your operation. > > regards > Steve > -- very well noted steve, i'd be careful (which is a very relative word) with the organizational aspects... i'm sure ure quite rooted in that aspect, hey u need a job??;) -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On Thu, 2010-02-25 at 02:26 -0800, simn_stv wrote: > i plan to build an application, a network based application that i > estimate (and seriously hope) would get as many as 100, 000 hits a day > (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' > or anything like it, its mainly for a financial transactions which > gets pretty busy... I've got apps running that handle *well* over 100,000 hits / process / day using Python - although some of the heavy lifting is off-loaded to C and MySql - obviously without actually looking at your requirements that doesn't mean much as I don't know how much work each hit requires. Regarding financial transactions - you'll almost certainly want to integrate with something that already has transactional support (sql etc) - so I expect that will bear the brunt of the load > so my question is this would anyone have anything that would make > python a little less of a serious candidate (cos it already is) and > the options may be to use some other languages (maybe java, C (oh > God)) I've avoided integrating java with my python (I'm not a big fan of java) - but I've integrated quite a bit of C - it's fairly easy to do, and you can just port the inner loops if you see the need arise. > ...i am into a bit of php and building API's in php would not be > the hard part, what i am concerned about is scalability and > efficiency, well, as far as the 'core' is concerned. I've heard that php can be well scaled (by compiling it to bytecode/C++) - but my preference would always be to python. Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
On 02/25/10 10:26, simn_stv wrote: what i am concerned about is scalability and efficiency, well, as far as the 'core' is concerned. would python be able to manage giving me a solid 'core' and will i be able to use python provide any API i would like to implement?... Python isn't the most efficient language, the assembler provided by the maker of your CPU probably is the best you can get, everything after that is a trade-off between performance and flexibility (flexible in the most flexible sense of the word :-)). That being said, for me, Python (well actually any turing complete programming language), is more like a box of lego with infinite amount of pieces. Scalability and API issues are the same as the shape and function of the model your making with lego. Sure some type of pieces might be more suited than other types but since you can simulate any type of piece with the ones that are already present, you are more limited by your imagination than by the language. So in short, I don't see any serious problems using Python, I have used it in Enterprise environments without any problems but than again I was aware not to use it for numerical intensive parts without the use of 3rd party libraries like numpy. Which for me resulted in not doing the compression of a database delta's in pure python but to offload that to a more suitable external program, still controlled from Python though. -- mph -- http://mail.python.org/mailman/listinfo/python-list
Re: taking python enterprise level?...
simn_stv wrote: > hello people, i have been reading posts on this group for quite some > time now and many, if not all (actually not all!), seem quite > interesting. > i plan to build an application, a network based application that i > estimate (and seriously hope) would get as many as 100, 000 hits a day > (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' > or anything like it, its mainly for a financial transactions which > gets pretty busy... > so my question is this would anyone have anything that would make > python a little less of a serious candidate (cos it already is) and > the options may be to use some other languages (maybe java, C (oh > God))...i am into a bit of php and building API's in php would not be > the hard part, what i am concerned about is scalability and > efficiency, well, as far as the 'core' is concerned. > > would python be able to manage giving me a solid 'core' and will i be > able to use python provide any API i would like to implement?... > > im sorry if my subject was not as clear as probably should be!. > i guess this should be the best place to ask this sort of thing, hope > im so right. > > Thanks I'd suggest that if you are running an operation that gets 100,000 hits a day then your problems won't be with Python but with organizational aspects of your operation. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ -- http://mail.python.org/mailman/listinfo/python-list
taking python enterprise level?...
hello people, i have been reading posts on this group for quite some time now and many, if not all (actually not all!), seem quite interesting. i plan to build an application, a network based application that i estimate (and seriously hope) would get as many as 100, 000 hits a day (hehe,...my dad always told me to 'AIM HIGH' ;0), not some 'facebook' or anything like it, its mainly for a financial transactions which gets pretty busy... so my question is this would anyone have anything that would make python a little less of a serious candidate (cos it already is) and the options may be to use some other languages (maybe java, C (oh God))...i am into a bit of php and building API's in php would not be the hard part, what i am concerned about is scalability and efficiency, well, as far as the 'core' is concerned. would python be able to manage giving me a solid 'core' and will i be able to use python provide any API i would like to implement?... im sorry if my subject was not as clear as probably should be!. i guess this should be the best place to ask this sort of thing, hope im so right. Thanks -- http://mail.python.org/mailman/listinfo/python-list