subject:"Re\: Scaling Django"

Re: Scaling Django

2016-02-07 Thread Sergiy Khohlov

Normalization  is something like that :

http://www.studytonight.com/dbms/database-normalization.php


 hardware for this Mysql was :
serg@anomehost:~$ free -m
 total   used   free sharedbuffers cached
Mem:  4049   3920129  0338   2016
-/+ buffers/cache:   1565   2484
Swap: 2863516   2347

serg@anomehost:~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU   E5310  @ 1.60GHz
stepping: 11
microcode   : 0xb7
cpu MHz : 1596.170



 Mysql has a problem :
 db file is grown regularly and  run vacuum is  really hard. Only
partitutioning  helps in this case

Many thanks,

Serge


+380 636150445
skype: skhohlov

On Sun, Feb 7, 2016 at 5:37 AM, Dexter T.  wrote:

> Hi Sergiy, are you referring to my post or to the OP?
>
> On Sunday, February 7, 2016 at 6:03:11 AM UTC+8, Sergiy Khohlov wrote:
>>
>> Print database structure.
>> Check possibility of DB normalization.
>>
>
> You might have meant "denormalization" here (?), especially when operating
> at such scale. We do used denormalization for some of our larger tables.
>
>
>> 100 GB  (my "record" is 452 GB  )is not so high but  this size requires
>> some attention. (Look like you Mysql used only one db file: try to set
>> table per file.  Check index size , and verify that indexes  are working
>> corectly)
>>
>
> We are using innodb_file_per_table. But see that I mentioned that all this
> 100GB data fit on a lowly 8GB ram VM, 50% of which was allocated to innodb
> buffers. With such little resources, but at the same time intimately
> knowing your database workload, it is still possible to handle such db
> size. And yes, our indexes are used well, as most queries were EXPLAINed
> and optimized accordingly.
>
> What hardware are you running your 452GB db in?
>
> Review  your project:
>>  try to avoid  Many to Many field
>> Is it possible switch from hardcode SQL to  stored function and procedure
>> ?
>>
>
> See my above post about denormalization. And arguably storedprocs are even
> harder to manage, code-wise, and deployment wise.
>
>
>>  Look like  this issue in not connected to django only.
>>
>
> Again, if you are referring to my post, I am not the OP. Not that our
> system is perfect, and yes we're not the ones with scaling problems.
> I was in fact sharing the practices of scaling that worked for us. See the
> OPs post on what problems they're facing (organizational / political /
> methodological).
>
> Cheers!
>
>
>>
>>
>> Many thanks,
>>
>> Serge
>>
>>
>> +380 636150445
>> skype: skhohlov
>>
>> On Sat, Feb 6, 2016 at 7:09 PM, Dexter T.  wrote:
>>
>>> Lots of great replies already.
>>> I also want to add a few (random) things ...
>>>
>>> - have you clearly defined and isolated what issue(s) are you facing?
>>> - you mentioned using DRF in a service, with a large JSON reponse taking
>>> seconds to finish, how did you troubleshoot/profile this? Seconds to
>>> process server-side? Seconds to download client-side? Where specifically?
>>> If you said you don't know, then find out!
>>> - your system will have so many legs, have you made an effort to
>>> instrument and measure and isolate which parts are slow and why?
>>> - you mentioned using the debug toolbar, have you proven that your
>>> database schema is optimal? Any queries in your slow queries log? Indexes
>>> used and ok and optimal? For your workload, can_read caching help? Db
>>> replicas be of help?
>>> - how are your server resources utilized? Are you sure you are not
>>> bottlenecked by thrashing disk-io? Overcomitted CPU? Low memory/swapping?
>>> File descriptor count?
>>> - have you checked if clients are not bottlenecked? An ajax call to
>>> download a  complex nested json object is both costly to serialize, CPU and
>>> bandwidth wise. Gzip can help here, if applicable.
>>> - for more context, can you share some numbers, like http and db level
>>> req/sec, row count for the most heavily used tables? How about server
>>> infrastructure specs?
>>>
>>> Note that these are basic questions and are basic problem-solving steps,
>>> im assuming your teams should be aware and be taking steps like these
>>> already.
>>>
>>> In one project of mine, we're doing a 100gb mysql db, some tables above
>>> 100mil recs and growing rapidly, properly indexed and optimized, it works
>>> ok on a lowly single vps instance with 8gb ram; workload is clearly oltp,
>>> we're throwing more sustained writes (100s/sec) than reads, all queries
>>> were scrutinized, almost all using the ORM, some handwritten SQL, other
>>> complex queries rewritten to be done at application level, joins are harder
>>> at this scale and therefore preferrably avoided (major architectural
>>> decision anticipated). But still we can easily throw hardware if needed.
>>>
>>> For us, scaling is an

Re: Scaling Django

2016-02-06 Thread Dexter T.

Hi Sergiy, are you referring to my post or to the OP?

On Sunday, February 7, 2016 at 6:03:11 AM UTC+8, Sergiy Khohlov wrote:
>
> Print database structure.
> Check possibility of DB normalization. 
>

You might have meant "denormalization" here (?), especially when operating 
at such scale. We do used denormalization for some of our larger tables.
 

> 100 GB  (my "record" is 452 GB  )is not so high but  this size requires 
> some attention. (Look like you Mysql used only one db file: try to set 
> table per file.  Check index size , and verify that indexes  are working 
> corectly)
>

We are using innodb_file_per_table. But see that I mentioned that all this 
100GB data fit on a lowly 8GB ram VM, 50% of which was allocated to innodb 
buffers. With such little resources, but at the same time intimately 
knowing your database workload, it is still possible to handle such db 
size. And yes, our indexes are used well, as most queries were EXPLAINed 
and optimized accordingly. 

What hardware are you running your 452GB db in?

Review  your project: 
>  try to avoid  Many to Many field 
> Is it possible switch from hardcode SQL to  stored function and procedure 
> ? 
>

See my above post about denormalization. And arguably storedprocs are even 
harder to manage, code-wise, and deployment wise.
 

>  Look like  this issue in not connected to django only. 
>

Again, if you are referring to my post, I am not the OP. Not that our 
system is perfect, and yes we're not the ones with scaling problems. 
I was in fact sharing the practices of scaling that worked for us. See the 
OPs post on what problems they're facing (organizational / political / 
methodological).

Cheers!
 

>
>
> Many thanks,
>
> Serge
>
>
> +380 636150445
> skype: skhohlov
>
> On Sat, Feb 6, 2016 at 7:09 PM, Dexter T.  > wrote:
>
>> Lots of great replies already.
>> I also want to add a few (random) things ...
>>
>> - have you clearly defined and isolated what issue(s) are you facing?
>> - you mentioned using DRF in a service, with a large JSON reponse taking 
>> seconds to finish, how did you troubleshoot/profile this? Seconds to 
>> process server-side? Seconds to download client-side? Where specifically? 
>> If you said you don't know, then find out!
>> - your system will have so many legs, have you made an effort to 
>> instrument and measure and isolate which parts are slow and why?
>> - you mentioned using the debug toolbar, have you proven that your 
>> database schema is optimal? Any queries in your slow queries log? Indexes 
>> used and ok and optimal? For your workload, can_read caching help? Db 
>> replicas be of help?
>> - how are your server resources utilized? Are you sure you are not 
>> bottlenecked by thrashing disk-io? Overcomitted CPU? Low memory/swapping? 
>> File descriptor count?
>> - have you checked if clients are not bottlenecked? An ajax call to 
>> download a  complex nested json object is both costly to serialize, CPU and 
>> bandwidth wise. Gzip can help here, if applicable.
>> - for more context, can you share some numbers, like http and db level 
>> req/sec, row count for the most heavily used tables? How about server 
>> infrastructure specs?
>>
>> Note that these are basic questions and are basic problem-solving steps, 
>> im assuming your teams should be aware and be taking steps like these 
>> already.
>>
>> In one project of mine, we're doing a 100gb mysql db, some tables above 
>> 100mil recs and growing rapidly, properly indexed and optimized, it works 
>> ok on a lowly single vps instance with 8gb ram; workload is clearly oltp, 
>> we're throwing more sustained writes (100s/sec) than reads, all queries 
>> were scrutinized, almost all using the ORM, some handwritten SQL, other 
>> complex queries rewritten to be done at application level, joins are harder 
>> at this scale and therefore preferrably avoided (major architectural 
>> decision anticipated). But still we can easily throw hardware if needed.
>>
>> For us, scaling is an continuous commitment to measure and refactor.
>>
>> And one very important learning for me in my years of writing software: 
>> rewriting is very very very very costly.
>>
>> These new engineers/other colleagues coming in, are they familiar with 
>> the domain problem, the exisiting codebase, the scale at which you operate 
>> now and expected in the future? Are they experienced in doing similar 
>> scaling before? And even if you think you can throw your old work, and now 
>> that you guys think you know better, be very careful of 
>> The-Second-System-Effect.
>>
>> I hope you succeed.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users...@googlegroups.com .
>> To post to this group, send email to django...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-users.

Re: Scaling Django

2016-02-06 Thread Sergiy Khohlov

Print database structure.
Check possibility of DB normalization.
100 GB  (my "record" is 452 GB  )is not so high but  this size requires
some attention. (Look like you Mysql used only one db file: try to set
table per file.  Check index size , and verify that indexes  are working
corectly)
Review  your project:
 try to avoid  Many to Many field
Is it possible switch from hardcode SQL to  stored function and procedure ?
 Look like  this issue in not connected to django only.


Many thanks,

Serge


+380 636150445
skype: skhohlov

On Sat, Feb 6, 2016 at 7:09 PM, Dexter T.  wrote:

> Lots of great replies already.
> I also want to add a few (random) things ...
>
> - have you clearly defined and isolated what issue(s) are you facing?
> - you mentioned using DRF in a service, with a large JSON reponse taking
> seconds to finish, how did you troubleshoot/profile this? Seconds to
> process server-side? Seconds to download client-side? Where specifically?
> If you said you don't know, then find out!
> - your system will have so many legs, have you made an effort to
> instrument and measure and isolate which parts are slow and why?
> - you mentioned using the debug toolbar, have you proven that your
> database schema is optimal? Any queries in your slow queries log? Indexes
> used and ok and optimal? For your workload, can_read caching help? Db
> replicas be of help?
> - how are your server resources utilized? Are you sure you are not
> bottlenecked by thrashing disk-io? Overcomitted CPU? Low memory/swapping?
> File descriptor count?
> - have you checked if clients are not bottlenecked? An ajax call to
> download a  complex nested json object is both costly to serialize, CPU and
> bandwidth wise. Gzip can help here, if applicable.
> - for more context, can you share some numbers, like http and db level
> req/sec, row count for the most heavily used tables? How about server
> infrastructure specs?
>
> Note that these are basic questions and are basic problem-solving steps,
> im assuming your teams should be aware and be taking steps like these
> already.
>
> In one project of mine, we're doing a 100gb mysql db, some tables above
> 100mil recs and growing rapidly, properly indexed and optimized, it works
> ok on a lowly single vps instance with 8gb ram; workload is clearly oltp,
> we're throwing more sustained writes (100s/sec) than reads, all queries
> were scrutinized, almost all using the ORM, some handwritten SQL, other
> complex queries rewritten to be done at application level, joins are harder
> at this scale and therefore preferrably avoided (major architectural
> decision anticipated). But still we can easily throw hardware if needed.
>
> For us, scaling is an continuous commitment to measure and refactor.
>
> And one very important learning for me in my years of writing software:
> rewriting is very very very very costly.
>
> These new engineers/other colleagues coming in, are they familiar with the
> domain problem, the exisiting codebase, the scale at which you operate now
> and expected in the future? Are they experienced in doing similar scaling
> before? And even if you think you can throw your old work, and now that you
> guys think you know better, be very careful of The-Second-System-Effect.
>
> I hope you succeed.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/ddc6db79-af4c-4e78-a16f-84f2dc8b69ae%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CADTRxJN1yiVH1oviDiJRKCL2dN5%2BKpTwxyU7SksrF%3DwV6sWPXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-04 Thread Luis Zárate

" It's hard to hire Django engineers "

I don't think that this is a problem because good software developer can
learn Django faster than other frameworks. For example I have a Costa Rican
startup that develop in Django, as small company in small country we don't
have inversor that allows to hire a big team so when we need to build a
solution that is bigger than our capacity we need to hire temporary
developers without django knowledge and training them.  In few days they
are developing basic function that help in the project and help them to
gain experience.

I think Costa Rica is an excellent place to make an inversion in training
people, because with minimum salary in USA you can pay one developer and
the training of other here so in few moths you will get high qualified
engineers to work remote with only one hour of difference.

So I suggests you to make a training plan for you future developers, with
hackaton included :)



El jueves, 4 de febrero de 2016, bobhaugen  escribió:
> This is a sidelight to the OP, but he did mention django forms in one
message. They are a dog. I have profiled a couple of slow pages with a lot
of small forms and that's where all the time was spent (rendering forms on
the server). We're moving those to DRF-serving-json to a javascript
client-side framework. Not done yet, but the same data from DRF is way
faster.
> I would still be interested in some tips for speeding up django forms,
though, because they are really great for speed of development, and they do
work.
>
> --
> You received this message because you are subscribed to the Google Groups
"Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/390f0a38-5d7c-4312-a1e3-90ea7bc2928d%40googlegroups.com
.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
"La utopía sirve para caminar" Fernando Birri

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAG%2B5VyO5yG05Vnyg-y0O-upXyVyo4LfsbK3obNJpVU9_cDUY4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-04 Thread bobhaugen

This is a sidelight to the OP, but he did mention django forms in one 
message. They are a dog. I have profiled a couple of slow pages with a lot 
of small forms and that's where all the time was spent (rendering forms on 
the server). We're moving those to DRF-serving-json to a javascript 
client-side framework. Not done yet, but the same data from DRF is way 
faster.

I would still be interested in some tips for speeding up django forms, 
though, because they are really great for speed of development, and they do 
work.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/390f0a38-5d7c-4312-a1e3-90ea7bc2928d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Erik Cederstrand


> Den 3. feb. 2016 kl. 22.30 skrev Joshua Pokotilow :
> 
> At the startup where I work, we've written a lot of our server code in 
> Django. So far, we've adopted a "build it fast" mentality, so we invested 
> very little time in optimizing our code. A small amount of load testing has 
> revealed our codebase / infrastructure as it stands today needs to run faster 
> and support more users.
> 
> We recently hired some new engineers who are extremely skeptical that we 
> should optimize our existing code.

I was in a startup like that. We *had* a working solution, and we *had* 
customers. Not enough to pay our salaries, but enough to keep us and our 
investors hopeful.

Someone decided we needed a rewrite, because Django, because blog posts, 
because WebScale(TM), because in one year we might have 1000-fold users if our 
wildest startup dreams came true. So we started to rewrite. It was supposed to 
take one month. Our working, legacy solution started to deteriorate because we 
were busy rewriting. Two months. Bugs reports piled up in the tracker, but we 
didn't care because the rewrite was just around the corner and would solve 
everything, and we couldn't possibly work on two systems at the same time. Some 
customers left, but it was okay because our WebScale solution would make us 
filthy rich. Three months. Everyone was overworked, tired and the WebScale 
solution was still just around the corner... Four months, and our investors 
decided we were not part of the 1%.

In short, don't rewrite. Refactor. And know *exactly* why you are refactoring. 
As in, "We have profiled, discussed architecture, hardware, algorithms, etc, 
etc. The GIL is killing us and Guido doesn't care", not "Django doesn't scale". 
The year "monolithic" is an argument in itself is the year of the HURD desktop.

Except if you're programming in VBScript. Then by all means, rewrite.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/DC23E544-1108-40A5-821D-681E498DBEAC%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Russell Keith-Magee

On Wed, Feb 3, 2016 at 11:30 PM, Joshua Pokotilow 
wrote:

> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer
> in it, and I believe it would scale. To defend my position, I've pointed
> out to my colleagues that it's easy to identify bottlenecks with tools like
> the Django Debug Toolbar and Yet Another Django Profiler. With my
> colleagues present, I've isolated and fixed significant speed problems
> inside of a few hours. I don't believe the Django ORM is inherently bad,
> although I do think that coders who use it should Know What They're Doing.
> Finally, I've referenced blog entries that talk about how Instagram and
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us
> rewrite large portions of our infrastructure as separate services before we
> try to fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that
> Django won't scale with the business.
>

My immediate reaction, knowing nothing about the site or it’s codebase -

1) There’s nothing they’re proposing that excludes Django from the mix.
2) From an engineering management perspective, the solution they’re
proposing is much more concerning than the problems you’re describing.

My suggestion for convincing management:

Tell them that you can write Microservices in Django. Because you can.
Build a minimal Django stack - something that just returns a static Hello
World - and do some load testing. This will prove that Django can serve
high load - or at least as much load as whatever technology they’re
proposing.

Tell them that Microservices is just a new word for something software
engineers have been calling “High cohesion, low coupling” since the 1960s.
The only difference is that this time, instead of using the low latency,
high speed interface of a function call, we’re using the slow, unreliable
transfer of HTTP. If you’re actually focussing on performance, it’s trivial
to build a high cohesion, low coupling stack in *any* technology. All that
Webservices do is enforce this by making the inter-module barrier obvious.

Tell them that Microservices aren’t magic fairy sauce for speed. If the
issue with your existing codebase is the speed of database queries, that
problem isn’t going to go away by putting your code behind microservices.
You’re just going to add the cost of inter-service HTTP transfer to the
overhead of making a query. And if you’re putting something essential -
like the user database - behind a service, then you’d better be prepared to
add the round-trip time of a HTTP lookup onto Every. Single. Page. (Tell me
again how this is good for performance?)

Teach them about Second Systems Syndrome [1] [2].

[1]
https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effect
[2] http://coliveira.net/software/what-is-second-system-syndrome/

Tell them that while Django engineers might be hard to hire, they’re also
relatively easy to grow from scratch. DjangoGirls proves you can take
people with no experience in programming and make them competent Django
developers. Take someone with a history in *any* programming language, and
you can teach them Django; hire one or two Django experts to provide an
internal knowledge and review, and you’re set.

Lastly, tell them that despite their protestations, your site isn’t
Instagram, Disqus, or anything like it. 99% of web sites are not in the top
1% of websites by traffic. Your website is *not* in the top 1%. It might be
one day. But it isn’t now. And if you’re *ever* in a position where you
might end up in the top 1% - I can *guarantee* that it will be accompanied
by a metric buttload of engineers and money who will have a lot more
experience in scaling large scale services than any of the people who are
proposing microservices as a silver bullet.

Now - I’m saying all this without having

Re: Scaling Django

2016-02-03 Thread Daniel Chimeno

As you said the project is using DRF for an API, it came to my mind some 
blog post I've read about it:

   - 
   
http://ses4j.github.io/2015/11/23/optimizing-slow-django-rest-framework-performance/
   - 
   https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/
   - 
   https://docs.djangoproject.com/es/1.9/ref/models/querysets/#prefetch-related
   
I'm sure with some little tricks (that shouldn't be tricks after all) 
you'll go over that situation.
As others said, first look the problem, then search the solution.

In that specific case that you are getting thousand of results from 
database, you can go further in:
- SQL
- Caching
- Serialize
- Pagination

Hope it helps.


El miércoles, 3 de febrero de 2016, 16:30:05 (UTC+1), Joshua Pokotilow 
escribió:
>
> At the startup where I work, we've written a lot of our server code in 
> Django. So far, we've adopted a "build it fast" mentality, so we invested 
> very little time in optimizing our code. A small amount of load testing has 
> revealed our codebase / infrastructure as it stands today needs to run 
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we 
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is 
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of 
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer 
> in it, and I believe it would scale. To defend my position, I've pointed 
> out to my colleagues that it's easy to identify bottlenecks with tools like 
> the Django Debug Toolbar and Yet Another Django Profiler. With my 
> colleagues present, I've isolated and fixed significant speed problems 
> inside of a few hours. I don't believe the Django ORM is inherently bad, 
> although I do think that coders who use it should Know What They're Doing. 
> Finally, I've referenced blog entries that talk about how Instagram and 
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us 
> rewrite large portions of our infrastructure as separate services before we 
> try to fix them. For example, we have one slow REST endpoint that returns a 
> boatload of user data, and so there's talk about using a new microservice 
> for users in lieu of our existing Django models. Even if we are able to fix 
> bottlenecks we encounter in a timely fashion, my colleagues fear that 
> Django won't scale with the business.
>
> I'm writing this post to garner additional evidence that Django will 
> scale. Anything compelling (and preferably not obvious) that would help 
> shed some light on Django's ability to scale would be *greatly* 
> appreciated, as it's very difficult for me to defend my position that 
> Django is a viable long-term solution without solid evidence to back up my 
> claims. It certainly doesn't help that I don't have any experience scaling 
> Django myself!
>
> Thank you.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/b831b050-fb2f-4718-a9c6-610c6152865a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread orzodk


While optimizing the code will bring you improvements and you shouldn't
stop doing this, for the most part (as noted from Rafael's resources)
you should update your architecture to support Django in scaling.

As you mentioned, instead of hitting the DB for every multi-second API
call you scale it by caching results so they aren't recalculated on demand.


Microservices are something you should work toward by refactoring rather
than rewriting as that is a really great way to kill a start up.

http://steveblank.com/2011/01/25/startup-suicide-%E2%80%93-rewriting-the-code/

"Rafael E. Ferrero"  writes:

> Maybe I don't understand you very well, and for shure you have a very specific
> problem to solve... but... do you read something of this?
>
> http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
> https://www.digitalocean.com/community/tutorials/how-to-scale-django-beyond-the-basics
> https://highperformancedjango.com/
> http://talks.caktusgroup.com/djangocon/2013/scaling/#slide17
> https://docs.djangoproject.com/en/1.8/faq/general/#does-django-scale
>
> Rafael E. Ferrero
>
> 2016-02-03 12:30 GMT-03:00 Joshua Pokotilow :
>
> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing 
> has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
> 
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
> 
> - We need to move to a service-oriented infrastructure because Django is 
> too
> monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of 
> people,
> they don't use it for any serious backend work
> 
> After having worked with Django for the last 3 years, I'm a big believer 
> in
> it, and I believe it would scale. To defend my position, I've pointed out 
> to
> my colleagues that it's easy to identify bottlenecks with tools like the
> Django Debug Toolbar and Yet Another Django Profiler. With my colleagues
> present, I've isolated and fixed significant speed problems inside of a 
> few
> hours. I don't believe the Django ORM is inherently bad, although I do 
> think
> that coders who use it should Know What They're Doing. Finally, I've
> referenced blog entries that talk about how Instagram and Disqus use 
> Django
> on the backend for backend-y tasks.
> 
> Despite my best efforts, my colleagues are still pushing to have us 
> rewrite
> large portions of our infrastructure as separate services before we try to
> fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to 
> fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that 
> Django
> won't scale with the business.
> 
> 
> 
> I'm writing this post to garner additional evidence that Django will 
> scale.
> Anything compelling (and preferably not obvious) that would help shed some
> light on Django's ability to scale would be *greatly* appreciated, as it's
> very difficult for me to defend my position that Django is a viable
> long-term solution without solid evidence to back up my claims. It 
> certainly
> doesn't help that I don't have any experience scaling Django myself!
> 
> 
> Thank you.
>
> -- 
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> 
> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
>.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit

Re: Scaling Django

2016-02-03 Thread Fred Stluka


  
  
Joshua,

My team is producing a Django app with a small number of 
users, so we haven't worried too much about performance yet,
but we know we may have to some day, so I've accumulated a 
list of ways to improve performance in a JIRA ticket for if/when 
it becomes a priority.  

We've done some of them already, with a quick and easy 
thousand-fold increase in speed.

Here's a cut/paste of our ideas.  I hope you find it useful.



  Optimize the site for speed 
  
I/O speed
  
Prime candidate #3. May be costing us
at least a couple seconds.
  Easy fix
Especially to phones with slow Internet connections
Minimize the size of the HTML, JS, CSS and image files
  
Combine CSS into a single file, and JS into a single
  file
Minify the HTML, JS, and CSS
Compress (zip) the HTML, JS, and CSS
  
May want to use gulp to minify and compress.
  See:
  
http://revsys.com/blog/2014/oct/21/ultimate-front-end-development-setup/
  

  

Move CSS to the top and JS to the bottom of HTML
  files
  
See "How to improve web page performance from
  RevSys" below
  

http://www.revsys.com/12days/front-end-performance/
New HTML5 "Picture" element ("art direction"
  technique) and CSS calc() function to optimize image
  download speed:
  
http://arstechnica.com/information-technology/2014/09/how-a-new-html-element-will-make-the-web-faster/
https://longhandpixels.net/blog/2014/02/complete-guide-picture-element
https://developer.mozilla.org/en-US/docs/Web/CSS/calc
https://www.youtube.com/watch?v=QINlm3vjnaY
http://alistapart.com/article/responsive-images-in-practice
  

  

Limit the size of image files uploaded by the patients. 
  Scale them smaller as they are uploaded if necessary.
Optimize caching by browsers
  
Far future expires headers
  
See "How to improve web page performance from
  RevSys" below
http://www.revsys.com/12days/front-end-performance/
  

Version number embedded in resource names for force
  re-load at new release
  

  

  
  
CPU speed
  
Do as much as possible in JS on the local device, not in
  Python/Django on the shared server
  

  
  
Disk speed
  
Use the AWS Console to change the disk drives of the
  server from magnetic
  disks to SSDs.
Set up a RAID volume
  

  
  
DB speed
  
Do all selects/joins/filtering in the DB. Do not query
  lots of data
  from the DB and then iterate over it or otherwise filter
  it in Python.
  Especially, do not query lots of data from the DB, pass it
  all to the
  browser, and iterate over it or otherwise filter it in
  _javascript_.
Create DB indexes for all fields used frequently in
  SELECTs
  
But don't over-index
  

Move the DB server to a different Linux server than the
  Web server, to split the load between 2 servers, if the DB
  is the bottleneck. Use Amazon RDS
http://www.revsys.com/12days/finding-sources-of-slowness/
  
EXPLAIN PLAN
EXPLAIN ANALYZE
  

  

  
  
Django DB speed:
  
Monitoring DB access:
  
http://stackoverflow.com/questions/2133627/using-django-db-connection-queries
  

select_related
  
https://docs.djangoproject.com/en/1.7/ref/models/querysets/#select-related
  

prefetch_related
  
https://docs.djangoproject.com/en/1.7/ref/models/querysets/#django.db.models.query.QuerySet.prefetch_related
  

Optimize Django's ORM access to DB:

Re: Scaling Django

2016-02-03 Thread Joshua Pokotilow

Thank you! I agree that we need to investigate before coming up with a 
solution.

On Wednesday, February 3, 2016 at 11:11:04 AM UTC-5, Avraham Serour wrote:
>
> if your problem is the DB or network or small processor it won't help 
> rewriting the application.
> The first step is to investigate the problem, then you can have solution, 
> sometimes people have a solution and then look for a problem, in your case 
> they want to leave python and django and are looking for problems with it.
>
> more than which language, library and tech stack you use the system design 
> impacts a lot in the overall speed among other things, I've worked with 
> systems with too many indirections for example, the performance was not 
> impacted so much by the libraries we were using or not.
>
> sometimes rewriting can be a good because of this, it is a way to give you 
> a chance to make a new design.
>
> Avraham
>
> On Wed, Feb 3, 2016 at 6:02 PM, Joshua Pokotilow  > wrote:
>
>> The service uses the Django REST Framework and takes multiple seconds to 
>> return a response. The response is a JSON array with thousands of 
>> dictionaries. We haven't yet investigated why it's slow, nor have we tried 
>> to cache / memoize anything to speed it up.
>>
>> On Wednesday, February 3, 2016 at 10:46:25 AM UTC-5, Avraham Serour wrote:
>>>
>>> what do you mean by slow? can you measure in ms?
>>>
>>> On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow  
>>> wrote:
>>>
 At the startup where I work, we've written a lot of our server code in 
 Django. So far, we've adopted a "build it fast" mentality, so we invested 
 very little time in optimizing our code. A small amount of load testing 
 has 
 revealed our codebase / infrastructure as it stands today needs to run 
 faster and support more users.

 We recently hired some new engineers who are extremely skeptical that 
 we should optimize our existing code. Their main concerns are:

 - We need to move to a service-oriented infrastructure because Django 
 is too monolithic (monolithic = technology lock-in & difficult to 
 troubleshoot)
 - It's too easy to write slow queries using the Django ORM
 - It's hard to hire Django engineers
 - While Instagram and DISQUS use Django to service large numbers of 
 people, they don't use it for any serious backend work

 After having worked with Django for the last 3 years, I'm a big 
 believer in it, and I believe it would scale. To defend my position, I've 
 pointed out to my colleagues that it's easy to identify bottlenecks with 
 tools like the Django Debug Toolbar and Yet Another Django Profiler. With 
 my colleagues present, I've isolated and fixed significant speed problems 
 inside of a few hours. I don't believe the Django ORM is inherently bad, 
 although I do think that coders who use it should Know What They're Doing. 
 Finally, I've referenced blog entries that talk about how Instagram and 
 Disqus use Django on the backend for backend-y tasks.

 Despite my best efforts, my colleagues are still pushing to have us 
 rewrite large portions of our infrastructure as separate services before 
 we 
 try to fix them. For example, we have one slow REST endpoint that returns 
 a 
 boatload of user data, and so there's talk about using a new microservice 
 for users in lieu of our existing Django models. Even if we are able to 
 fix 
 bottlenecks we encounter in a timely fashion, my colleagues fear that 
 Django won't scale with the business.

 I'm writing this post to garner additional evidence that Django will 
 scale. Anything compelling (and preferably not obvious) that would help 
 shed some light on Django's ability to scale would be *greatly* 
 appreciated, as it's very difficult for me to defend my position that 
 Django is a viable long-term solution without solid evidence to back up my 
 claims. It certainly doesn't help that I don't have any experience scaling 
 Django myself!

 Thank you.

 -- 
 You received this message because you are subscribed to the Google 
 Groups "Django users" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to django-users...@googlegroups.com.
 To post to this group, send email to django...@googlegroups.com.
 Visit this group at https://groups.google.com/group/django-users.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com

 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>> -- 
>> You received this message because you are subscribed to

Re: Scaling Django

2016-02-03 Thread Joshua Pokotilow

Thanks Remco. I'll look at High Performance Django.

On Wednesday, February 3, 2016 at 11:05:11 AM UTC-5, Remco Gerlich wrote:
>
> There is a book (ebook) named "High Performance Django" that has many 
> useful tips.
>
> Also, new software developers are _always_ skeptical when they start a new 
> job. They have an old way of doing things from their previous job, "that's 
> not how we work here!" reflexes also from that old job, and they take a 
> while to adjust to the new tools.
>
>  First let them get productive with what exists, _then_ let them gather 
> real statistics about real problems when they want to change something.
>
> You can have scalibility issues and solve them with whatever framework, 
> Django included.
>
> Remco Gerlich
>
>
> On Wed, Feb 3, 2016 at 4:30 PM, Joshua Pokotilow  > wrote:
>
>> At the startup where I work, we've written a lot of our server code in 
>> Django. So far, we've adopted a "build it fast" mentality, so we invested 
>> very little time in optimizing our code. A small amount of load testing has 
>> revealed our codebase / infrastructure as it stands today needs to run 
>> faster and support more users.
>>
>> We recently hired some new engineers who are extremely skeptical that we 
>> should optimize our existing code. Their main concerns are:
>>
>> - We need to move to a service-oriented infrastructure because Django is 
>> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
>> - It's too easy to write slow queries using the Django ORM
>> - It's hard to hire Django engineers
>> - While Instagram and DISQUS use Django to service large numbers of 
>> people, they don't use it for any serious backend work
>>
>> After having worked with Django for the last 3 years, I'm a big believer 
>> in it, and I believe it would scale. To defend my position, I've pointed 
>> out to my colleagues that it's easy to identify bottlenecks with tools like 
>> the Django Debug Toolbar and Yet Another Django Profiler. With my 
>> colleagues present, I've isolated and fixed significant speed problems 
>> inside of a few hours. I don't believe the Django ORM is inherently bad, 
>> although I do think that coders who use it should Know What They're Doing. 
>> Finally, I've referenced blog entries that talk about how Instagram and 
>> Disqus use Django on the backend for backend-y tasks.
>>
>> Despite my best efforts, my colleagues are still pushing to have us 
>> rewrite large portions of our infrastructure as separate services before we 
>> try to fix them. For example, we have one slow REST endpoint that returns a 
>> boatload of user data, and so there's talk about using a new microservice 
>> for users in lieu of our existing Django models. Even if we are able to fix 
>> bottlenecks we encounter in a timely fashion, my colleagues fear that 
>> Django won't scale with the business.
>>
>> I'm writing this post to garner additional evidence that Django will 
>> scale. Anything compelling (and preferably not obvious) that would help 
>> shed some light on Django's ability to scale would be *greatly* 
>> appreciated, as it's very difficult for me to defend my position that 
>> Django is a viable long-term solution without solid evidence to back up my 
>> claims. It certainly doesn't help that I don't have any experience scaling 
>> Django myself!
>>
>> Thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users...@googlegroups.com .
>> To post to this group, send email to django...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-users.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/eff1cd01-58f0-48d8-b58d-ef4d71a6ab29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Joshua Pokotilow

Thank you Sergiy! I agree that the code needs to be fixed.

We don't have a Tomcat endpoint to compare with, although I did scare my 
coworkers a bit when we profiled a Django endpoint that took 300 - 400ms to 
return a response due (ostensibly) in large part to form object 
instantiation. Specifically, the bottleneck seemed to be that forms 
deep-copy their fields on instantiation.

On Wednesday, February 3, 2016 at 11:01:43 AM UTC-5, Sergiy Khohlov wrote:
>
> Hello, 
>  Your first  words have a answer. Swift coding always produces performance 
> problem.  This is expected.  Looks like  few new engineers  use another one 
> technology  and would not like to use django.  This a reason of his 
> criticism. Mostly low performance is related to the DB  performance. I'm 
> preferring avoid using ManyToMany  ability by due to high res usage at the 
> DB level. Writing  correct models and DB function helps in  the most case. 
> I have no idea about proposed solution but I definitely sure that code 
> produces bottleneck not programming language.  RubyOnRails has a same 
> problem such a Django for example.. Have you got  good performance at 
> JAva+Tomkat+Apache ?  I'm ready to see this high performance  ASP. 
> Half year ago I've rewritten GTS  service from C# to python. As result CPU 
> usage dropped  from 45% to 7-9% and memory usage from 1.5Gb  to 300kb. 
> Wrong solution, mistakes at the building project requirement stage 
> produces  more problem that selecting programming language.
>
> Many thanks,
>
> Serge
>
>
> +380 636150445
> skype: skhohlov
>
> On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow  > wrote:
>
>> At the startup where I work, we've written a lot of our server code in 
>> Django. So far, we've adopted a "build it fast" mentality, so we invested 
>> very little time in optimizing our code. A small amount of load testing has 
>> revealed our codebase / infrastructure as it stands today needs to run 
>> faster and support more users.
>>
>> We recently hired some new engineers who are extremely skeptical that we 
>> should optimize our existing code. Their main concerns are:
>>
>> - We need to move to a service-oriented infrastructure because Django is 
>> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
>> - It's too easy to write slow queries using the Django ORM
>> - It's hard to hire Django engineers
>> - While Instagram and DISQUS use Django to service large numbers of 
>> people, they don't use it for any serious backend work
>>
>> After having worked with Django for the last 3 years, I'm a big believer 
>> in it, and I believe it would scale. To defend my position, I've pointed 
>> out to my colleagues that it's easy to identify bottlenecks with tools like 
>> the Django Debug Toolbar and Yet Another Django Profiler. With my 
>> colleagues present, I've isolated and fixed significant speed problems 
>> inside of a few hours. I don't believe the Django ORM is inherently bad, 
>> although I do think that coders who use it should Know What They're Doing. 
>> Finally, I've referenced blog entries that talk about how Instagram and 
>> Disqus use Django on the backend for backend-y tasks.
>>
>> Despite my best efforts, my colleagues are still pushing to have us 
>> rewrite large portions of our infrastructure as separate services before we 
>> try to fix them. For example, we have one slow REST endpoint that returns a 
>> boatload of user data, and so there's talk about using a new microservice 
>> for users in lieu of our existing Django models. Even if we are able to fix 
>> bottlenecks we encounter in a timely fashion, my colleagues fear that 
>> Django won't scale with the business.
>>
>> I'm writing this post to garner additional evidence that Django will 
>> scale. Anything compelling (and preferably not obvious) that would help 
>> shed some light on Django's ability to scale would be *greatly* 
>> appreciated, as it's very difficult for me to defend my position that 
>> Django is a viable long-term solution without solid evidence to back up my 
>> claims. It certainly doesn't help that I don't have any experience scaling 
>> Django myself!
>>
>> Thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users...@googlegroups.com .
>> To post to this group, send email to django...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-users.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because

Re: Scaling Django

2016-02-03 Thread Joshua Pokotilow

Thanks Bill. This was helpful. I understand that it's difficult to offer 
advice without too many specifics, so I'm hoping to get some high-level 
advice / examples of Django at scale, which you provided.

Thank you!

On Wednesday, February 3, 2016 at 10:49:51 AM UTC-5, Bill Blanchard wrote:
>
> Let's try to adress some of their concerns:
>
> - We need to move to a service-oriented infrastructure because Django is 
>> too monolithic 
>
>
> It depends on what your application does and what you're planning to do 
> with it in the future.  People are quick prescribe SOA as the end all way 
> to scale, but they tend to ignore the added complexity that comes with 
> building out and integrating smaller services.
>
>  - It's too easy to write slow queries using the Django ORM
>
>
> It's just as easy (arguably easier) to write slow queries using pure SQL 
> or any other ORM.  The ORM makes a lot of good decisions for mediocre 
> programmers (I'd put myself in that category).  If you're a great 
> programmer and have great programmers who really understand SQL, then 
> you're just as likely to get your ORM queries right as you are straight 
> SQL. 
>
>>
>> - It's hard to hire Django engineers
>
>
> Compared to what?  .NET or Java engineers?  Probably.  Harder than the 
> newest shiny javascript framework engineers? Probably not.  Django has 
> about as robust an  engineering population as Ruby/Rails does.  I don't 
> know what you'd convert to in order to make hiring easier.  All engineers 
> (especially good ones) are really hard to come by these days.  If you're 
> looking at outsourcing to Southwest Asia, then yes, the Django population 
> isn't as high as .NET/Java/PHP.  However, hiring challenges are most 
> typically defined by your location and your ability and/or willingness to 
> explore remote workers.
>
> While Instagram and DISQUS use Django to service large numbers of people, 
>> they don't use it for any serious backend work
>
>
> Reddit is also a large Django user.  All engineering decisions should be 
> made around what your particular needs are and what skills your team 
> possesses or is able to acquire.  Needs of an organization evolve over time 
> and the organizations adjust as they need to.
>
> Many organizations start with a Python/Django or Ruby/Rails application to 
> build a product *quickly *which is what those stacks excel at.   A mantra 
> typically heard in the community is "don't optimize prematurely".  If 
> you're saying "man, we're going to hit a wall at 100,000 users", well you 
> need to get to 90,000 users first before worrying about 100,000.  Getting 
> the 90k users is the real hard part.
>
> All this being said, your colleagues could be right to want to move off 
> Django.  We don't know much about your particular circumstances.
>
> For more information on optimizing Django for scale, check out this book.
> https://highperformancedjango.com/
>
> Best of luck.
>
> Bill
>
>
>
>
> On Wed, Feb 3, 2016 at 10:30 AM, Joshua Pokotilow  > wrote:
>
>> At the startup where I work, we've written a lot of our server code in 
>> Django. So far, we've adopted a "build it fast" mentality, so we invested 
>> very little time in optimizing our code. A small amount of load testing has 
>> revealed our codebase / infrastructure as it stands today needs to run 
>> faster and support more users.
>>
>> We recently hired some new engineers who are extremely skeptical that we 
>> should optimize our existing code. Their main concerns are:
>>
>> - We need to move to a service-oriented infrastructure because Django is 
>> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
>> - It's too easy to write slow queries using the Django ORM
>> - It's hard to hire Django engineers
>> - While Instagram and DISQUS use Django to service large numbers of 
>> people, they don't use it for any serious backend work
>>
>> After having worked with Django for the last 3 years, I'm a big believer 
>> in it, and I believe it would scale. To defend my position, I've pointed 
>> out to my colleagues that it's easy to identify bottlenecks with tools like 
>> the Django Debug Toolbar and Yet Another Django Profiler. With my 
>> colleagues present, I've isolated and fixed significant speed problems 
>> inside of a few hours. I don't believe the Django ORM is inherently bad, 
>> although I do think that coders who use it should Know What They're Doing. 
>> Finally, I've referenced blog entries that talk about how Instagram and 
>> Disqus use Django on the backend for backend-y tasks.
>>
>> Despite my best efforts, my colleagues are still pushing to have us 
>> rewrite large portions of our infrastructure as separate services before we 
>> try to fix them. For example, we have one slow REST endpoint that returns a 
>> boatload of user data, and so there's talk about using a new microservice 
>> for users in lieu of our existing Django models. Even if we are able to fix 
>>

Re: Scaling Django

2016-02-03 Thread Larry Martell

I don't think there is a silver bullet that will fix all issues, nor
any one technology stack that will. I have a fairly good size django
app I built, and I did not consider performance all that much during
initial development. As the user base and dataset started to grow I
did see performance issues and I dealt with each one as they arose by
profiling the issue and seeing where the bottlenecks were. Often it
was not at all where I thought it would be. Each case had a different
solution, e.g.: database tuning, adding memory, writing custom queries
using temp tables, minimizing js code, optimizing jQuery code and DOM
manipulation, and so on.

On Wed, Feb 3, 2016 at 11:02 AM, Joshua Pokotilow  wrote:
> The service uses the Django REST Framework and takes multiple seconds to
> return a response. The response is a JSON array with thousands of
> dictionaries. We haven't yet investigated why it's slow, nor have we tried
> to cache / memoize anything to speed it up.
>
> On Wednesday, February 3, 2016 at 10:46:25 AM UTC-5, Avraham Serour wrote:
>>
>> what do you mean by slow? can you measure in ms?
>>
>> On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow 
>> wrote:
>>>
>>> At the startup where I work, we've written a lot of our server code in
>>> Django. So far, we've adopted a "build it fast" mentality, so we invested
>>> very little time in optimizing our code. A small amount of load testing has
>>> revealed our codebase / infrastructure as it stands today needs to run
>>> faster and support more users.
>>>
>>> We recently hired some new engineers who are extremely skeptical that we
>>> should optimize our existing code. Their main concerns are:
>>>
>>> - We need to move to a service-oriented infrastructure because Django is
>>> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
>>> - It's too easy to write slow queries using the Django ORM
>>> - It's hard to hire Django engineers
>>> - While Instagram and DISQUS use Django to service large numbers of
>>> people, they don't use it for any serious backend work
>>>
>>> After having worked with Django for the last 3 years, I'm a big believer
>>> in it, and I believe it would scale. To defend my position, I've pointed out
>>> to my colleagues that it's easy to identify bottlenecks with tools like the
>>> Django Debug Toolbar and Yet Another Django Profiler. With my colleagues
>>> present, I've isolated and fixed significant speed problems inside of a few
>>> hours. I don't believe the Django ORM is inherently bad, although I do think
>>> that coders who use it should Know What They're Doing. Finally, I've
>>> referenced blog entries that talk about how Instagram and Disqus use Django
>>> on the backend for backend-y tasks.
>>>
>>> Despite my best efforts, my colleagues are still pushing to have us
>>> rewrite large portions of our infrastructure as separate services before we
>>> try to fix them. For example, we have one slow REST endpoint that returns a
>>> boatload of user data, and so there's talk about using a new microservice
>>> for users in lieu of our existing Django models. Even if we are able to fix
>>> bottlenecks we encounter in a timely fashion, my colleagues fear that Django
>>> won't scale with the business.
>>>
>>> I'm writing this post to garner additional evidence that Django will
>>> scale. Anything compelling (and preferably not obvious) that would help shed
>>> some light on Django's ability to scale would be *greatly* appreciated, as
>>> it's very difficult for me to defend my position that Django is a viable
>>> long-term solution without solid evidence to back up my claims. It certainly
>>> doesn't help that I don't have any experience scaling Django myself!
>>>
>>> Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CACwCsY7p%2BfHjzUqoRhJn1Cy_djKpgGJRtL1TD210Gx6V9g%3DTMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Avraham Serour

if your problem is the DB or network or small processor it won't help
rewriting the application.
The first step is to investigate the problem, then you can have solution,
sometimes people have a solution and then look for a problem, in your case
they want to leave python and django and are looking for problems with it.

more than which language, library and tech stack you use the system design
impacts a lot in the overall speed among other things, I've worked with
systems with too many indirections for example, the performance was not
impacted so much by the libraries we were using or not.

sometimes rewriting can be a good because of this, it is a way to give you
a chance to make a new design.

Avraham

On Wed, Feb 3, 2016 at 6:02 PM, Joshua Pokotilow 
wrote:

> The service uses the Django REST Framework and takes multiple seconds to
> return a response. The response is a JSON array with thousands of
> dictionaries. We haven't yet investigated why it's slow, nor have we tried
> to cache / memoize anything to speed it up.
>
> On Wednesday, February 3, 2016 at 10:46:25 AM UTC-5, Avraham Serour wrote:
>>
>> what do you mean by slow? can you measure in ms?
>>
>> On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow 
>> wrote:
>>
>>> At the startup where I work, we've written a lot of our server code in
>>> Django. So far, we've adopted a "build it fast" mentality, so we invested
>>> very little time in optimizing our code. A small amount of load testing has
>>> revealed our codebase / infrastructure as it stands today needs to run
>>> faster and support more users.
>>>
>>> We recently hired some new engineers who are extremely skeptical that we
>>> should optimize our existing code. Their main concerns are:
>>>
>>> - We need to move to a service-oriented infrastructure because Django is
>>> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
>>> - It's too easy to write slow queries using the Django ORM
>>> - It's hard to hire Django engineers
>>> - While Instagram and DISQUS use Django to service large numbers of
>>> people, they don't use it for any serious backend work
>>>
>>> After having worked with Django for the last 3 years, I'm a big believer
>>> in it, and I believe it would scale. To defend my position, I've pointed
>>> out to my colleagues that it's easy to identify bottlenecks with tools like
>>> the Django Debug Toolbar and Yet Another Django Profiler. With my
>>> colleagues present, I've isolated and fixed significant speed problems
>>> inside of a few hours. I don't believe the Django ORM is inherently bad,
>>> although I do think that coders who use it should Know What They're Doing.
>>> Finally, I've referenced blog entries that talk about how Instagram and
>>> Disqus use Django on the backend for backend-y tasks.
>>>
>>> Despite my best efforts, my colleagues are still pushing to have us
>>> rewrite large portions of our infrastructure as separate services before we
>>> try to fix them. For example, we have one slow REST endpoint that returns a
>>> boatload of user data, and so there's talk about using a new microservice
>>> for users in lieu of our existing Django models. Even if we are able to fix
>>> bottlenecks we encounter in a timely fashion, my colleagues fear that
>>> Django won't scale with the business.
>>>
>>> I'm writing this post to garner additional evidence that Django will
>>> scale. Anything compelling (and preferably not obvious) that would help
>>> shed some light on Django's ability to scale would be *greatly*
>>> appreciated, as it's very difficult for me to defend my position that
>>> Django is a viable long-term solution without solid evidence to back up my
>>> claims. It certainly doesn't help that I don't have any experience scaling
>>> Django myself!
>>>
>>> Thank you.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-users...@googlegroups.com.
>>> To post to this group, send email to django...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/django-users.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the

Re: Scaling Django

2016-02-03 Thread Remco Gerlich

There is a book (ebook) named "High Performance Django" that has many
useful tips.

Also, new software developers are _always_ skeptical when they start a new
job. They have an old way of doing things from their previous job, "that's
not how we work here!" reflexes also from that old job, and they take a
while to adjust to the new tools.

 First let them get productive with what exists, _then_ let them gather
real statistics about real problems when they want to change something.

You can have scalibility issues and solve them with whatever framework,
Django included.

Remco Gerlich


On Wed, Feb 3, 2016 at 4:30 PM, Joshua Pokotilow 
wrote:

> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer
> in it, and I believe it would scale. To defend my position, I've pointed
> out to my colleagues that it's easy to identify bottlenecks with tools like
> the Django Debug Toolbar and Yet Another Django Profiler. With my
> colleagues present, I've isolated and fixed significant speed problems
> inside of a few hours. I don't believe the Django ORM is inherently bad,
> although I do think that coders who use it should Know What They're Doing.
> Finally, I've referenced blog entries that talk about how Instagram and
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us
> rewrite large portions of our infrastructure as separate services before we
> try to fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that
> Django won't scale with the business.
>
> I'm writing this post to garner additional evidence that Django will
> scale. Anything compelling (and preferably not obvious) that would help
> shed some light on Django's ability to scale would be *greatly*
> appreciated, as it's very difficult for me to defend my position that
> Django is a viable long-term solution without solid evidence to back up my
> claims. It certainly doesn't help that I don't have any experience scaling
> Django myself!
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAFAGLK3wfsrqeDLcHX%2B3iaLVGoJb5ypXjoUdKKcjpm_UYakj1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Joshua Pokotilow

The service uses the Django REST Framework and takes multiple seconds to 
return a response. The response is a JSON array with thousands of 
dictionaries. We haven't yet investigated why it's slow, nor have we tried 
to cache / memoize anything to speed it up.

On Wednesday, February 3, 2016 at 10:46:25 AM UTC-5, Avraham Serour wrote:
>
> what do you mean by slow? can you measure in ms?
>
> On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow  > wrote:
>
>> At the startup where I work, we've written a lot of our server code in 
>> Django. So far, we've adopted a "build it fast" mentality, so we invested 
>> very little time in optimizing our code. A small amount of load testing has 
>> revealed our codebase / infrastructure as it stands today needs to run 
>> faster and support more users.
>>
>> We recently hired some new engineers who are extremely skeptical that we 
>> should optimize our existing code. Their main concerns are:
>>
>> - We need to move to a service-oriented infrastructure because Django is 
>> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
>> - It's too easy to write slow queries using the Django ORM
>> - It's hard to hire Django engineers
>> - While Instagram and DISQUS use Django to service large numbers of 
>> people, they don't use it for any serious backend work
>>
>> After having worked with Django for the last 3 years, I'm a big believer 
>> in it, and I believe it would scale. To defend my position, I've pointed 
>> out to my colleagues that it's easy to identify bottlenecks with tools like 
>> the Django Debug Toolbar and Yet Another Django Profiler. With my 
>> colleagues present, I've isolated and fixed significant speed problems 
>> inside of a few hours. I don't believe the Django ORM is inherently bad, 
>> although I do think that coders who use it should Know What They're Doing. 
>> Finally, I've referenced blog entries that talk about how Instagram and 
>> Disqus use Django on the backend for backend-y tasks.
>>
>> Despite my best efforts, my colleagues are still pushing to have us 
>> rewrite large portions of our infrastructure as separate services before we 
>> try to fix them. For example, we have one slow REST endpoint that returns a 
>> boatload of user data, and so there's talk about using a new microservice 
>> for users in lieu of our existing Django models. Even if we are able to fix 
>> bottlenecks we encounter in a timely fashion, my colleagues fear that 
>> Django won't scale with the business.
>>
>> I'm writing this post to garner additional evidence that Django will 
>> scale. Anything compelling (and preferably not obvious) that would help 
>> shed some light on Django's ability to scale would be *greatly* 
>> appreciated, as it's very difficult for me to defend my position that 
>> Django is a viable long-term solution without solid evidence to back up my 
>> claims. It certainly doesn't help that I don't have any experience scaling 
>> Django myself!
>>
>> Thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users...@googlegroups.com .
>> To post to this group, send email to django...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-users.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/38b96b56-064a-4933-abb5-e6085cd24425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Sergiy Khohlov

Hello,
 Your first  words have a answer. Swift coding always produces performance
problem.  This is expected.  Looks like  few new engineers  use another one
technology  and would not like to use django.  This a reason of his
criticism. Mostly low performance is related to the DB  performance. I'm
preferring avoid using ManyToMany  ability by due to high res usage at the
DB level. Writing  correct models and DB function helps in  the most case.
I have no idea about proposed solution but I definitely sure that code
produces bottleneck not programming language.  RubyOnRails has a same
problem such a Django for example.. Have you got  good performance at
JAva+Tomkat+Apache ?  I'm ready to see this high performance  ASP.
Half year ago I've rewritten GTS  service from C# to python. As result CPU
usage dropped  from 45% to 7-9% and memory usage from 1.5Gb  to 300kb.
Wrong solution, mistakes at the building project requirement stage
produces  more problem that selecting programming language.

Many thanks,

Serge


+380 636150445
skype: skhohlov

On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow 
wrote:

> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer
> in it, and I believe it would scale. To defend my position, I've pointed
> out to my colleagues that it's easy to identify bottlenecks with tools like
> the Django Debug Toolbar and Yet Another Django Profiler. With my
> colleagues present, I've isolated and fixed significant speed problems
> inside of a few hours. I don't believe the Django ORM is inherently bad,
> although I do think that coders who use it should Know What They're Doing.
> Finally, I've referenced blog entries that talk about how Instagram and
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us
> rewrite large portions of our infrastructure as separate services before we
> try to fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that
> Django won't scale with the business.
>
> I'm writing this post to garner additional evidence that Django will
> scale. Anything compelling (and preferably not obvious) that would help
> shed some light on Django's ability to scale would be *greatly*
> appreciated, as it's very difficult for me to defend my position that
> Django is a viable long-term solution without solid evidence to back up my
> claims. It certainly doesn't help that I don't have any experience scaling
> Django myself!
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CADTRxJPCNO2r%3D2-Vog0DGiXjmQof%3DQgH_RNjgySX1ENOkwUOrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Bill Blanchard

Let's try to adress some of their concerns:

- We need to move to a service-oriented infrastructure because Django is
> too monolithic

It depends on what your application does and what you're planning to do
with it in the future.  People are quick prescribe SOA as the end all way
to scale, but they tend to ignore the added complexity that comes with
building out and integrating smaller services.

 - It's too easy to write slow queries using the Django ORM

It's just as easy (arguably easier) to write slow queries using pure SQL or
any other ORM.  The ORM makes a lot of good decisions for mediocre
programmers (I'd put myself in that category).  If you're a great
programmer and have great programmers who really understand SQL, then
you're just as likely to get your ORM queries right as you are straight
SQL.

>
> - It's hard to hire Django engineers

Compared to what?  .NET or Java engineers?  Probably.  Harder than the
newest shiny javascript framework engineers? Probably not.  Django has
about as robust an  engineering population as Ruby/Rails does.  I don't
know what you'd convert to in order to make hiring easier.  All engineers
(especially good ones) are really hard to come by these days.  If you're
looking at outsourcing to Southwest Asia, then yes, the Django population
isn't as high as .NET/Java/PHP.  However, hiring challenges are most
typically defined by your location and your ability and/or willingness to
explore remote workers.

While Instagram and DISQUS use Django to service large numbers of people,
> they don't use it for any serious backend work

Reddit is also a large Django user.  All engineering decisions should be
made around what your particular needs are and what skills your team
possesses or is able to acquire.  Needs of an organization evolve over time
and the organizations adjust as they need to.

Many organizations start with a Python/Django or Ruby/Rails application to
build a product *quickly *which is what those stacks excel at.   A mantra
typically heard in the community is "don't optimize prematurely".  If
you're saying "man, we're going to hit a wall at 100,000 users", well you
need to get to 90,000 users first before worrying about 100,000.  Getting
the 90k users is the real hard part.

All this being said, your colleagues could be right to want to move off
Django.  We don't know much about your particular circumstances.

For more information on optimizing Django for scale, check out this book.
https://highperformancedjango.com/

Best of luck.

Bill

On Wed, Feb 3, 2016 at 10:30 AM, Joshua Pokotilow 
wrote:

> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer
> in it, and I believe it would scale. To defend my position, I've pointed
> out to my colleagues that it's easy to identify bottlenecks with tools like
> the Django Debug Toolbar and Yet Another Django Profiler. With my
> colleagues present, I've isolated and fixed significant speed problems
> inside of a few hours. I don't believe the Django ORM is inherently bad,
> although I do think that coders who use it should Know What They're Doing.
> Finally, I've referenced blog entries that talk about how Instagram and
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us
> rewrite large portions of our infrastructure as separate services before we
> try to fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that
> Django won't scale with the business.
>
> I'm writing this post to garner additional evidence that Django will
> scale. Anything compelling (and preferably not obvious) that would help
> shed some light on Django's ability to scale would be *greatly*
> appreciated, as it's very difficult for me to defend my position that
> Django is a viable long-term solution without solid evidence to back up my
> claims. It

Re: Scaling Django

2016-02-03 Thread Rafael E. Ferrero

Maybe I don't understand you very well, and for shure you have a very
specific problem to solve... but... do you read something of this?

http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
https://www.digitalocean.com/community/tutorials/how-to-scale-django-beyond-the-basics
https://highperformancedjango.com/
http://talks.caktusgroup.com/djangocon/2013/scaling/#slide17
https://docs.djangoproject.com/en/1.8/faq/general/#does-django-scale


Rafael E. Ferrero

2016-02-03 12:30 GMT-03:00 Joshua Pokotilow :

> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer
> in it, and I believe it would scale. To defend my position, I've pointed
> out to my colleagues that it's easy to identify bottlenecks with tools like
> the Django Debug Toolbar and Yet Another Django Profiler. With my
> colleagues present, I've isolated and fixed significant speed problems
> inside of a few hours. I don't believe the Django ORM is inherently bad,
> although I do think that coders who use it should Know What They're Doing.
> Finally, I've referenced blog entries that talk about how Instagram and
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us
> rewrite large portions of our infrastructure as separate services before we
> try to fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that
> Django won't scale with the business.
>
> I'm writing this post to garner additional evidence that Django will
> scale. Anything compelling (and preferably not obvious) that would help
> shed some light on Django's ability to scale would be *greatly*
> appreciated, as it's very difficult for me to defend my position that
> Django is a viable long-term solution without solid evidence to back up my
> claims. It certainly doesn't help that I don't have any experience scaling
> Django myself!
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAJJc_8Un4amXafG61xzt%2BbwY3bu4jOQ5LY-5MsxcXRMVm6Apvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling Django

2016-02-03 Thread Avraham Serour

what do you mean by slow? can you measure in ms?

On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow 
wrote:

> At the startup where I work, we've written a lot of our server code in
> Django. So far, we've adopted a "build it fast" mentality, so we invested
> very little time in optimizing our code. A small amount of load testing has
> revealed our codebase / infrastructure as it stands today needs to run
> faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we
> should optimize our existing code. Their main concerns are:
>
> - We need to move to a service-oriented infrastructure because Django is
> too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
> - It's too easy to write slow queries using the Django ORM
> - It's hard to hire Django engineers
> - While Instagram and DISQUS use Django to service large numbers of
> people, they don't use it for any serious backend work
>
> After having worked with Django for the last 3 years, I'm a big believer
> in it, and I believe it would scale. To defend my position, I've pointed
> out to my colleagues that it's easy to identify bottlenecks with tools like
> the Django Debug Toolbar and Yet Another Django Profiler. With my
> colleagues present, I've isolated and fixed significant speed problems
> inside of a few hours. I don't believe the Django ORM is inherently bad,
> although I do think that coders who use it should Know What They're Doing.
> Finally, I've referenced blog entries that talk about how Instagram and
> Disqus use Django on the backend for backend-y tasks.
>
> Despite my best efforts, my colleagues are still pushing to have us
> rewrite large portions of our infrastructure as separate services before we
> try to fix them. For example, we have one slow REST endpoint that returns a
> boatload of user data, and so there's talk about using a new microservice
> for users in lieu of our existing Django models. Even if we are able to fix
> bottlenecks we encounter in a timely fashion, my colleagues fear that
> Django won't scale with the business.
>
> I'm writing this post to garner additional evidence that Django will
> scale. Anything compelling (and preferably not obvious) that would help
> shed some light on Django's ability to scale would be *greatly*
> appreciated, as it's very difficult for me to defend my position that
> Django is a viable long-term solution without solid evidence to back up my
> claims. It certainly doesn't help that I don't have any experience scaling
> Django myself!
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAFWa6t%2BBkFpyKAQy8excUssKYpao%2BfGU8wJPVcQFT517X833vQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-30 Thread Cal Leeming [Simplicity Media Ltd]

Just to chime in on this..

In terms of commercial options, we have sometimes gone with ZXTM (now known
as StingRay Traffic Manager) , it has some truly amazing features and you
should definitely check it out. I believe that RiverBed have since started
issuing free commercial licences for up to a certain traffic rate, and it's
a downloadable package/virtual appliance.

We also often use the load balancers that come with Rackspace Cloud, they
have proven to be quite efficient. When using this route, we also tend to
throw everything in front of CloudFlare too (if you haven't seen this
already, check it out. It is free for non SSL usage too!)

In a nut shell;

* ZXTM - commercial (free licence to a certain amount), amazing traffic
script language, self hosted
* Rackspace Cloud - does what it says on the tin, no traffic scripting
* F5 - commercial, not had any personal experience with it (but one of our
providers uses it as a shared load balancer for their customers and it's
been stable)
* haproxy - works, but it can be a pita!
* CloudFlare - this isn't a load balancer, but does give you much better
control over DNS (it proxies your site, and effectively makes 'DNS changes'
instant)

I believe uWSGI has some really good load balancing features too, but I
haven't used them in too much depth yet (despite being an avid user of
uWSGI for 2-3 years!)

Hope this helps

Cal

On Tue, Oct 30, 2012 at 8:14 AM, Kurtis Mullins wrote:

> The easiest thing I've found to use is simply uWSGI with Nginx. It's easy
> to just create new Django servers on the fly. You simply include the IPs in
> a list and it will use various algorithms (optional) to distribute the
> requests appropriately.
>
> As a lot of applications are IO bound, you could also use a distributed
> database system to help with your scalability. I don't have much experience
> in that area, though.
>
> This still leaves a point of failure: Nginx (or whatever load balancer or
> reverse proxy you use). Maybe someone else here will know more about load
> balancing Nginx itself ... that might require specialized hardware. I know
> a lot of cloud services offer load balancers (e.g. rackspace) so you could
> possibly use that with multiple nginx servers and further multiple django
> servers.
>
>
> On Tue, Oct 30, 2012 at 3:42 AM, Isaac XXX  wrote:
>
>> Hi there,
>>
>> maybe you're right, but I'm not really worried about RAM footprint, or
>> resources consumption. I'm concerned now on architecture, setting a right
>> scalable system, and a right cluster of systems, without lacks of
>> communications between them.
>>
>> Underlaying technologies can be easily replaced (say apache-mod_wsgi for
>> gunicorn or uwsgi), and some performance improvements can be made, but this
>> is not what I'm looking for. I'm looking for the tools to generate a robust
>> system, balancing requests through several systems, and allowing increase
>> the size of this system (adding more servers) without trouble.
>>
>> Cheers,
>>
>> Isaac
>>
>>
>> On 10/29/2012 05:18 PM, Some Developer wrote:
>>
>>> On 29/10/2012 16:03, Isaac XXX wrote:
>>>
 Hi there,

 thank you for response Tom.

 Actually, I've a complete idea at how to build this system, but I lack
 the exact information about how to join systems, and what I was looking for
 was a source of cohesive information on all systems. At least, when I
 finish to build that system, I will write this tutorial.

 For someone who can help me, I will describe here what I thought it can
 be this structure:

 - 1 nginx, as a reverse proxy on frontend, serving static/media and
 redirecting content to apache clusters
 - n apache servers, with mod_wsgi, serving dynamic data
 - m postgresql servers, in a master-slave flavour

 Cheers,

 Isaac

>>>
>>> Why not just ditch Apache entirely and just use Nginx for serving all
>>> media (both static and dynamic)? You can then save quite a few resources as
>>> you only need to run one HTTP server rather than two.
>>>
>>> Using Nginx to serve Django content works well. Just serve your Django
>>> application via FastCGI or uWSGI and you'll significantly simplify your
>>> configuration and reduce RAM usage on your servers as well.
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to django-users+unsubscribe@**
>> googlegroups.com .
>> For more options, visit this group at http://groups.google.com/**
>> group/django-users?hl=en
>> .
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-30 Thread Tom Evans

On Tue, Oct 30, 2012 at 7:35 AM, Isaac XXX  wrote:
> Hi Tom,
>
> you're right, I was not really explicit about what were my lacks of
> information. Right now, the following points are the ones I can't found a
> howto for the desired deployment:
>
> - Create a master-slave system on postgresql, maintaining all systems up to
> date, distributing reads, and centralizing writes
> - How to configure a cluster of reverse proxies (a single reverse proxy can
> not be enought, and I need to plan to deploy more than 1 load balancers)
>
> The rest (configure apache with mod_wsgit, configure nginx to serve static
> content and so on), is now solved on my current deployments, so it should
> not be a problem on a distributed environment.
>
> Cheers,
>
> Isaac
>

Hi Isaac

We use a similar setup at $JOB. We run a pair of apache httpd servers,
which serve static content and reverse proxy to other http servers
(usually apache again, sometimes not) for dynamic content. The
requests are distributed evenly between the two proxies by our
routers, which round robin connections between the two of them. This
is basically your setup, but we use Apache, because we know it and can
tune it to give nginx like performance anyway.

Actually, that's only half of it - each proxy has all of the public
IPs we serve allocated on lo0 (loopback), and the requests are round
robin routed via a pair of high availability addresses. Therefore, on
both boxes, apache listens on the 'right' IPs. If we ever want to run
with just one proxy, we can 'down' one of the HA addresses on the
server we wish to update, which moves that HA address to be active on
the other proxy, which then serves all the requests.

Now, did we need to do any of this? Probably not! We serve in the
region of 3-5 million requests a day, with peaks of around 200
concurrent requests/s going through the proxies. Apache uses in total
500MB of RAM, pre-tuned to serve up to 768 concurrent req/s without
requiring extra resources. Load average on the boxes never goes above
0.05, even if we put all the requests through one machine. It's nice
having a spare for such a critical part of infrastructure though.

pgsql scaling is a little more involved than MySQL, which is what
we've always used here - usually because it's replication systems are
so good. You will typically need to use some external software to
manage the replication, eg Slony, but don't take my word for it, very
limited pgsql experience.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-30 Thread Kurtis Mullins

The easiest thing I've found to use is simply uWSGI with Nginx. It's easy
to just create new Django servers on the fly. You simply include the IPs in
a list and it will use various algorithms (optional) to distribute the
requests appropriately.

As a lot of applications are IO bound, you could also use a distributed
database system to help with your scalability. I don't have much experience
in that area, though.

This still leaves a point of failure: Nginx (or whatever load balancer or
reverse proxy you use). Maybe someone else here will know more about load
balancing Nginx itself ... that might require specialized hardware. I know
a lot of cloud services offer load balancers (e.g. rackspace) so you could
possibly use that with multiple nginx servers and further multiple django
servers.

On Tue, Oct 30, 2012 at 3:42 AM, Isaac XXX  wrote:

> Hi there,
>
> maybe you're right, but I'm not really worried about RAM footprint, or
> resources consumption. I'm concerned now on architecture, setting a right
> scalable system, and a right cluster of systems, without lacks of
> communications between them.
>
> Underlaying technologies can be easily replaced (say apache-mod_wsgi for
> gunicorn or uwsgi), and some performance improvements can be made, but this
> is not what I'm looking for. I'm looking for the tools to generate a robust
> system, balancing requests through several systems, and allowing increase
> the size of this system (adding more servers) without trouble.
>
> Cheers,
>
> Isaac
>
>
> On 10/29/2012 05:18 PM, Some Developer wrote:
>
>> On 29/10/2012 16:03, Isaac XXX wrote:
>>
>>> Hi there,
>>>
>>> thank you for response Tom.
>>>
>>> Actually, I've a complete idea at how to build this system, but I lack
>>> the exact information about how to join systems, and what I was looking for
>>> was a source of cohesive information on all systems. At least, when I
>>> finish to build that system, I will write this tutorial.
>>>
>>> For someone who can help me, I will describe here what I thought it can
>>> be this structure:
>>>
>>> - 1 nginx, as a reverse proxy on frontend, serving static/media and
>>> redirecting content to apache clusters
>>> - n apache servers, with mod_wsgi, serving dynamic data
>>> - m postgresql servers, in a master-slave flavour
>>>
>>> Cheers,
>>>
>>> Isaac
>>>
>>
>> Why not just ditch Apache entirely and just use Nginx for serving all
>> media (both static and dynamic)? You can then save quite a few resources as
>> you only need to run one HTTP server rather than two.
>>
>> Using Nginx to serve Django content works well. Just serve your Django
>> application via FastCGI or uWSGI and you'll significantly simplify your
>> configuration and reduce RAM usage on your servers as well.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to django-users+unsubscribe@**
> googlegroups.com .
> For more options, visit this group at http://groups.google.com/**
> group/django-users?hl=en
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-30 Thread Isaac XXX


Hi there,

maybe you're right, but I'm not really worried about RAM footprint, or 
resources consumption. I'm concerned now on architecture, setting a 
right scalable system, and a right cluster of systems, without lacks of 
communications between them.


Underlaying technologies can be easily replaced (say apache-mod_wsgi for 
gunicorn or uwsgi), and some performance improvements can be made, but 
this is not what I'm looking for. I'm looking for the tools to generate 
a robust system, balancing requests through several systems, and 
allowing increase the size of this system (adding more servers) without 
trouble.


Cheers,

Isaac

On 10/29/2012 05:18 PM, Some Developer wrote:

On 29/10/2012 16:03, Isaac XXX wrote:

Hi there,

thank you for response Tom.

Actually, I've a complete idea at how to build this system, but I 
lack the exact information about how to join systems, and what I was 
looking for was a source of cohesive information on all systems. At 
least, when I finish to build that system, I will write this tutorial.


For someone who can help me, I will describe here what I thought it 
can be this structure:


- 1 nginx, as a reverse proxy on frontend, serving static/media and 
redirecting content to apache clusters

- n apache servers, with mod_wsgi, serving dynamic data
- m postgresql servers, in a master-slave flavour

Cheers,

Isaac


Why not just ditch Apache entirely and just use Nginx for serving all 
media (both static and dynamic)? You can then save quite a few 
resources as you only need to run one HTTP server rather than two.


Using Nginx to serve Django content works well. Just serve your Django 
application via FastCGI or uWSGI and you'll significantly simplify 
your configuration and reduce RAM usage on your servers as well.




--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-30 Thread Isaac XXX

Thank you so much for tips. I will keep them when I start to test 
environment for performance.


Cheers

Isaac

On 10/29/2012 06:44 PM, Cal Leeming [Simplicity Media Ltd] wrote:

Hi Isaac,

If there is one thing I have learnt about scaling apps, it's about 
trying things out for yourself.


Sure there are some best practice guidelines (i.e. serving files from 
nginx, or using apache's X-SendFile rather than streaming out via the 
webapp), but if someone comes along and tells you to use X instead of 
Y, then you don't get the advantages of learning "the hard way".


Another important note is that scaling rarely goes up on a 1:1 ratio, 
i.e. the configuration and resources required to handle X number of 
requests/sec, may be completely different if you need to handle Y 
number of requests/sec.


And often what works for one person, won't work for another (scaling 
is entirely dependant on your application, despite what any of these 
cloud providers might tell you!)


In my own experience, I've found that;

* SSDs with Percona MySQL, resolves a LOT of performance problems - 
but don't abuse it

* Lots and lots and lots of query tuning and InnoDB tuning
* New Relic to identify bottlenecks
* IO contention is a big thing
* Snowball prevention (i.e. you set max clients to X, your backend 
can't handle it, your requests stack up, the load balancer forces time 
out, and your database gets smashed - or you set max memory too high, 
server goes into swap etc).

* uWSGI + nginx is amazing
* Identify where your bottlenecks are (in my own experience, IO/memory 
tends to come up more often than CPU)


Sadly I haven't tried PSQL so I can't offer any advice on this - 
Percona are dragging MySQL kicking and screaming into the 21st century 
and really doing some amazing things, but it's by no means perfect!


The above has helped us grow past 8k-12k requests/minute, the largest 
database we manage is around 1.1 billion rows weighing in at 160GB+, 
and we maintain around 60+ servers.


I should reiterate, the above is purely based on my own experience and 
use cases - I am by no means an expert on the subject and I'm still 
learning approaches on a daily basis - so this is really meant as 
"food for thought" rather than a "this is how you should do things".


Hope this helps a bit!

Cal

On Mon, Oct 29, 2012 at 2:42 PM, Isaac XXX > wrote:


Hi folks,

I'm developing a new application that should get high traffic.
Right now, I've other projects with the follow architecture:

Nginx on front: serving static content and redirecting to apache
for dynamic data
Apache+mod_wsgi: serving dynamic pages
PostgreSQL: backend for data storage (RDBM)
Memcache: for caching purposes :)

All my deployments use a single server, with single
frontend/backend (1 nginx, 1 apache, 1 postgresql). The
requirements for this new project are really large, and I think I
will need to scale all system. Can anyone suggest me an all-in-one
tutorial, discussing the main points on scale a system?

I know there are different alternatives for DB (master-slave,
clustering...), nginx can serve as a reverse proxy or not... and I
need to merge all this information in a single scalable system,
but I can't find an unified source of information.

Can anyone help me on it?

-- 
You received this message because you are subscribed to the Google

Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com
.
To unsubscribe from this group, send email to
django-users+unsubscr...@googlegroups.com
.
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en.


--
You received this message because you are subscribed to the Google 
Groups "Django users" group.

To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-30 Thread Isaac XXX


Hi Tom,

you're right, I was not really explicit about what were my lacks of 
information. Right now, the following points are the ones I can't found 
a howto for the desired deployment:


- Create a master-slave system on postgresql, maintaining all systems up 
to date, distributing reads, and centralizing writes
- How to configure a cluster of reverse proxies (a single reverse proxy 
can not be enought, and I need to plan to deploy more than 1 load balancers)


The rest (configure apache with mod_wsgit, configure nginx to serve 
static content and so on), is now solved on my current deployments, so 
it should not be a problem on a distributed environment.


Cheers,

Isaac

On 10/29/2012 05:23 PM, Tom Evans wrote:

On Mon, Oct 29, 2012 at 4:03 PM, Isaac XXX  wrote:

Hi there,

thank you for response Tom.

Actually, I've a complete idea at how to build this system, but I lack the
exact information about how to join systems, and what I was looking for was
a source of cohesive information on all systems. At least, when I finish to
build that system, I will write this tutorial.

For someone who can help me, I will describe here what I thought it can be
this structure:

- 1 nginx, as a reverse proxy on frontend, serving static/media and
redirecting content to apache clusters
- n apache servers, with mod_wsgi, serving dynamic data
- m postgresql servers, in a master-slave flavour

Cheers,

Isaac


I'm confused about what you are confused about - you seem to grasp
precisely what is required.

IE, which of the following Qs are you stuck at:

How to configure nginx to reverse proxy and balance to other http servers?
How to configure apache, mod_wsgi and django?
How to configure pgsql in a master/multiple slave environment?
How to configure django to issue write requests to the write master,
and distribute reads to read-only slaves?


Cheers

Tom



--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-29 Thread Kurtis Mullins

The old rule of thumb is to avoid premature optimization. I'd build,
profile, then scale or otherwise optimize as needed. This isn't a tutorial
or really a discussion on all points of scaling a system; but identifying
bottle necks will do wonders when it comes to deciding what, where, and how
to scale. Good luck!

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-29 Thread Cal Leeming [Simplicity Media Ltd]

Hi Isaac,

If there is one thing I have learnt about scaling apps, it's about trying
things out for yourself.

Sure there are some best practice guidelines (i.e. serving files from
nginx, or using apache's X-SendFile rather than streaming out via the
webapp), but if someone comes along and tells you to use X instead of Y,
then you don't get the advantages of learning "the hard way".

Another important note is that scaling rarely goes up on a 1:1 ratio, i.e.
the configuration and resources required to handle X number of
requests/sec, may be completely different if you need to handle Y number of
requests/sec.

And often what works for one person, won't work for another (scaling is
entirely dependant on your application, despite what any of these cloud
providers might tell you!)

In my own experience, I've found that;

* SSDs with Percona MySQL, resolves a LOT of performance problems - but
don't abuse it
* Lots and lots and lots of query tuning and InnoDB tuning
* New Relic to identify bottlenecks
* IO contention is a big thing
* Snowball prevention (i.e. you set max clients to X, your backend can't
handle it, your requests stack up, the load balancer forces time out, and
your database gets smashed - or you set max memory too high, server goes
into swap etc).
* uWSGI + nginx is amazing
* Identify where your bottlenecks are (in my own experience, IO/memory
tends to come up more often than CPU)

Sadly I haven't tried PSQL so I can't offer any advice on this - Percona
are dragging MySQL kicking and screaming into the 21st century and really
doing some amazing things, but it's by no means perfect!

The above has helped us grow past 8k-12k requests/minute, the largest
database we manage is around 1.1 billion rows weighing in at 160GB+, and we
maintain around 60+ servers.

I should reiterate, the above is purely based on my own experience and use
cases - I am by no means an expert on the subject and I'm still learning
approaches on a daily basis - so this is really meant as "food for thought"
rather than a "this is how you should do things".

Hope this helps a bit!

Cal

On Mon, Oct 29, 2012 at 2:42 PM, Isaac XXX  wrote:

> Hi folks,
>
> I'm developing a new application that should get high traffic. Right now,
> I've other projects with the follow architecture:
>
> Nginx on front: serving static content and redirecting to apache for
> dynamic data
> Apache+mod_wsgi: serving dynamic pages
> PostgreSQL: backend for data storage (RDBM)
> Memcache: for caching purposes :)
>
> All my deployments use a single server, with single frontend/backend (1
> nginx, 1 apache, 1 postgresql). The requirements for this new project are
> really large, and I think I will need to scale all system. Can anyone
> suggest me an all-in-one tutorial, discussing the main points on scale a
> system?
>
> I know there are different alternatives for DB (master-slave,
> clustering...), nginx can serve as a reverse proxy or not... and I need to
> merge all this information in a single scalable system, but I can't find an
> unified source of information.
>
> Can anyone help me on it?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to django-users+unsubscribe@**
> googlegroups.com .
> For more options, visit this group at http://groups.google.com/**
> group/django-users?hl=en
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-29 Thread Some Developer


On 29/10/2012 16:03, Isaac XXX wrote:

Hi there,

thank you for response Tom.

Actually, I've a complete idea at how to build this system, but I lack 
the exact information about how to join systems, and what I was 
looking for was a source of cohesive information on all systems. At 
least, when I finish to build that system, I will write this tutorial.


For someone who can help me, I will describe here what I thought it 
can be this structure:


- 1 nginx, as a reverse proxy on frontend, serving static/media and 
redirecting content to apache clusters

- n apache servers, with mod_wsgi, serving dynamic data
- m postgresql servers, in a master-slave flavour

Cheers,

Isaac


Why not just ditch Apache entirely and just use Nginx for serving all 
media (both static and dynamic)? You can then save quite a few resources 
as you only need to run one HTTP server rather than two.


Using Nginx to serve Django content works well. Just serve your Django 
application via FastCGI or uWSGI and you'll significantly simplify your 
configuration and reduce RAM usage on your servers as well.


--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-29 Thread Tom Evans

On Mon, Oct 29, 2012 at 4:03 PM, Isaac XXX  wrote:
> Hi there,
>
> thank you for response Tom.
>
> Actually, I've a complete idea at how to build this system, but I lack the
> exact information about how to join systems, and what I was looking for was
> a source of cohesive information on all systems. At least, when I finish to
> build that system, I will write this tutorial.
>
> For someone who can help me, I will describe here what I thought it can be
> this structure:
>
> - 1 nginx, as a reverse proxy on frontend, serving static/media and
> redirecting content to apache clusters
> - n apache servers, with mod_wsgi, serving dynamic data
> - m postgresql servers, in a master-slave flavour
>
> Cheers,
>
> Isaac
>

I'm confused about what you are confused about - you seem to grasp
precisely what is required.

IE, which of the following Qs are you stuck at:

How to configure nginx to reverse proxy and balance to other http servers?
How to configure apache, mod_wsgi and django?
How to configure pgsql in a master/multiple slave environment?
How to configure django to issue write requests to the write master,
and distribute reads to read-only slaves?


Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-29 Thread Isaac XXX


Hi there,

thank you for response Tom.

Actually, I've a complete idea at how to build this system, but I lack 
the exact information about how to join systems, and what I was looking 
for was a source of cohesive information on all systems. At least, when 
I finish to build that system, I will write this tutorial.


For someone who can help me, I will describe here what I thought it can 
be this structure:


- 1 nginx, as a reverse proxy on frontend, serving static/media and 
redirecting content to apache clusters

- n apache servers, with mod_wsgi, serving dynamic data
- m postgresql servers, in a master-slave flavour

Cheers,

Isaac

On 10/29/2012 04:23 PM, Tom Evans wrote:

On Mon, Oct 29, 2012 at 2:42 PM, Isaac XXX  wrote:

Hi folks,

I'm developing a new application that should get high traffic. Right now,
I've other projects with the follow architecture:

Nginx on front: serving static content and redirecting to apache for dynamic
data
Apache+mod_wsgi: serving dynamic pages
PostgreSQL: backend for data storage (RDBM)
Memcache: for caching purposes :)

All my deployments use a single server, with single frontend/backend (1
nginx, 1 apache, 1 postgresql). The requirements for this new project are
really large, and I think I will need to scale all system. Can anyone
suggest me an all-in-one tutorial, discussing the main points on scale a
system?

I know there are different alternatives for DB (master-slave,
clustering...), nginx can serve as a reverse proxy or not... and I need to
merge all this information in a single scalable system, but I can't find an
unified source of information.

Can anyone help me on it?


There is unlikely to be one authoritative source that will explain
precisely how to scale an app - part of this is it is domain specific
what "scale" and "app" mean!

So first off, "scale". You can scale up, or out. Scaling up means
running everything on faster hardware. Due to how IT progresses, every
18 months you can replace your server with something twice as fast.
Scale out means running everything over more boxes. Scale up is
trivial, just spend more money, scale out can be harder.

Most parts of the stack are easy to scale, because HTTP itself is
stateless and therefore easy to scale. Eg, if you have a nginx
frontend, serving static files and proxying to backend servers for
static content, and the nginx server is overloaded, it is easy to add
a balancer to route HTTP requests to multiple nginx servers.

The same is true for dynamic content - need more workers, add more
machines, and tell nginx to talk to more machines.

The only tricky aspect of scale out is database. Most databases do not
have a simple 'scale out' option. With postgres, you can setup
master-slave trees, but this only expands your read capacity, all
writes have to go through one server (and then all the slaves, making
it more expensive the more slaves you add).

The only true way of scaling out with database servers is to shard
your data, splitting it up by some arbitrary algorithm (usually on
user), but sharding isn't easy, you will have to design your database
and app around it. There are some very good videos, docs and talks
from the likes of Facebook and the like, sharding is not a panacea and
requires a lot of work.

Cheers

Tom



--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django (nginx + apache + mod_wsgi + postgresql)

2012-10-29 Thread Tom Evans

On Mon, Oct 29, 2012 at 2:42 PM, Isaac XXX  wrote:
> Hi folks,
>
> I'm developing a new application that should get high traffic. Right now,
> I've other projects with the follow architecture:
>
> Nginx on front: serving static content and redirecting to apache for dynamic
> data
> Apache+mod_wsgi: serving dynamic pages
> PostgreSQL: backend for data storage (RDBM)
> Memcache: for caching purposes :)
>
> All my deployments use a single server, with single frontend/backend (1
> nginx, 1 apache, 1 postgresql). The requirements for this new project are
> really large, and I think I will need to scale all system. Can anyone
> suggest me an all-in-one tutorial, discussing the main points on scale a
> system?
>
> I know there are different alternatives for DB (master-slave,
> clustering...), nginx can serve as a reverse proxy or not... and I need to
> merge all this information in a single scalable system, but I can't find an
> unified source of information.
>
> Can anyone help me on it?
>

There is unlikely to be one authoritative source that will explain
precisely how to scale an app - part of this is it is domain specific
what "scale" and "app" mean!

So first off, "scale". You can scale up, or out. Scaling up means
running everything on faster hardware. Due to how IT progresses, every
18 months you can replace your server with something twice as fast.
Scale out means running everything over more boxes. Scale up is
trivial, just spend more money, scale out can be harder.

Most parts of the stack are easy to scale, because HTTP itself is
stateless and therefore easy to scale. Eg, if you have a nginx
frontend, serving static files and proxying to backend servers for
static content, and the nginx server is overloaded, it is easy to add
a balancer to route HTTP requests to multiple nginx servers.

The same is true for dynamic content - need more workers, add more
machines, and tell nginx to talk to more machines.

The only tricky aspect of scale out is database. Most databases do not
have a simple 'scale out' option. With postgres, you can setup
master-slave trees, but this only expands your read capacity, all
writes have to go through one server (and then all the slaves, making
it more expensive the more slaves you add).

The only true way of scaling out with database servers is to shard
your data, splitting it up by some arbitrary algorithm (usually on
user), but sharding isn't easy, you will have to design your database
and app around it. There are some very good videos, docs and talks
from the likes of Facebook and the like, sharding is not a panacea and
requires a lot of work.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-04 Thread vivek



> I guess it was 16.

Sounds good.

> Separation of django and static content is part of the deployment/setup
> change anyway.

Yes that would definitely help. As mentioned before look at varnish.

rgds
vivek

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-04 Thread Subhranath Chunder

On Mon, Jun 4, 2012 at 12:15 PM, vivek  wrote:

>
>
> >
> > That's aggregated load time, and not a single page loading time. The test
> > comprised of navigating to multiple pages to generate more real life
> > scenario.
> >
>
> How many pages?
>
I guess it was 16.


>
> >
> > > 3. text/html , which is the output of django app, is taking 62.74 %
> > > time.
> >
> > This number might not be bad actually, taking into consideration that I
> aim
> > to reduce the number of http connections per page to something pretty
> low.
> >
>
> Number of connections/page will not bring down this figure.
>
As I said, I plan to reduce the number of connections per page. I don't
plan to reduce this figure.


>
>
> > > What is the payload of your html page ?
> >
> > 5- 10 Kb (compressed) on avg depending upon page content
> >
>
> > Since you thought the aggregated load time to be of a single page, I
> guess
> > your perspectives need to change accordingly. :)
>
> Possibly but that would depend on number of pages in testing. e.g. if
> the number of pages about 10+ it seems logical but if its 2-3 pages
> then its still on high side.


> Also the load time increases near linearly with number of users. Which
> doesn't sounds logical e.g. at peak its almost 3 miunutes.
>
Current single setup server has to handle high number of connections per
page. Around 50 - 80 connections on average.
Active connections thus linearly rose to around 2.3k. And apache webserver
currently is sitting in front of django and static, so you see thus effect.

Separation of django and static content is part of the deployment/setup
change anyway.


>
> rgds
> vivek
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>


-- 
Thanks,
Subhranath Chunder.
www.subhranath.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-04 Thread vivek



>
> That's aggregated load time, and not a single page loading time. The test
> comprised of navigating to multiple pages to generate more real life
> scenario.
>

How many pages?

>
> > 3. text/html , which is the output of django app, is taking 62.74 %
> > time.
>
> This number might not be bad actually, taking into consideration that I aim
> to reduce the number of http connections per page to something pretty low.
>

Number of connections/page will not bring down this figure.


> > What is the payload of your html page ?
>
> 5- 10 Kb (compressed) on avg depending upon page content
>

> Since you thought the aggregated load time to be of a single page, I guess
> your perspectives need to change accordingly. :)

Possibly but that would depend on number of pages in testing. e.g. if
the number of pages about 10+ it seems logical but if its 2-3 pages
then its still on high side.

Also the load time increases near linearly with number of users. Which
doesn't sounds logical e.g. at peak its almost 3 miunutes.

rgds
vivek



-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-03 Thread Subhranath Chunder

On Monday, June 4, 2012, vivek wrote:

> Hi,
>
> > To load test I used loadimpact.com and the results of which can be
> found on:
> http://loadimpact.com/load-test/www.reviews42.com-18774e46e8f562a6eb4...
> > The test configuration consisted of 600 VUs with 10 mins step duration.
> > Got around .1 millions requests and around 200+ requests/sec max. Is this
> > good, bad, or at par?
>
> Some quick observations from results:
> 1. The user load times start from 30+ seconds

That's aggregated load time, and not a single page loading time. The test
comprised of navigating to multiple pages to generate more real life
scenario.


> 2. The error rate also increases with higher requests/sec.

Yes. With high reqs/sec it's starts to come in. An upcoming major client
side code optimization is in pipeline. A drastic change in reqs/sec is
expected.


> 3. text/html , which is the output of django app, is taking 62.74 %
> time.

This number might not be bad actually, taking into consideration that I aim
to reduce the number of http connections per page to something pretty low.


> Overall things don't seem optimal. If you are testing based on load
> times of 30+ sec then the test approach may not be practical, you dont
> expect your visitor to wait 30+ sec. Your application itself may need
> more fine tuning. text/html generation time seem on higher side.
> Also be mindful that generally database performance on Amazon EC2 is
> lower compared to bare metal servers but you have mentioned that the
> queries are minimal so not sure how much would that have any impact.
>
As I said above the 30+ secs is the aggregated number. That too for
un-optimized browser code.


>
> What is the payload of your html page ?
>
5- 10 Kb (compressed) on avg depending upon page content


>
> Without knowing much about the application its usually difficult to
> say much but based on the results, there seems to be scope of
> improvement in html generation itself.
>
Once the major client side optimization comes through, and some
deployment/setup changes, I expect the request/sec handling capability to
easily shoot the sky.

Since you thought the aggregated load time to be of a single page, I guess
your perspectives need to change accordingly. :)


>
> rgds
> vivek
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-03 Thread vivek

Hi,

> To load test I used loadimpact.com and the results of which can be found 
> on:http://loadimpact.com/load-test/www.reviews42.com-18774e46e8f562a6eb4...
> The test configuration consisted of 600 VUs with 10 mins step duration.
> Got around .1 millions requests and around 200+ requests/sec max. Is this
> good, bad, or at par?

Some quick observations from results:
1. The user load times start from 30+ seconds
2. The error rate also increases with higher requests/sec.
3. text/html , which is the output of django app, is taking 62.74 %
time.

Overall things don't seem optimal. If you are testing based on load
times of 30+ sec then the test approach may not be practical, you dont
expect your visitor to wait 30+ sec. Your application itself may need
more fine tuning. text/html generation time seem on higher side.
Also be mindful that generally database performance on Amazon EC2 is
lower compared to bare metal servers but you have mentioned that the
queries are minimal so not sure how much would that have any impact.

What is the payload of your html page ?

Without knowing much about the application its usually difficult to
say much but based on the results, there seems to be scope of
improvement in html generation itself.

rgds
vivek


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-02 Thread Subhranath Chunder

On Sat, Jun 2, 2012 at 7:14 AM, Tim Chase wrote:

> On 06/01/12 09:17, Subhranath Chunder wrote:
> > (Given the fact that the server is deployed in Amazon EC2
> > Singapore location, as m1.xlarge with all it's network, memory
> > constrains in place)
>
> A couple of the other aspects that occurred to me:
>
> Is there geographical separation between your Django/web server and
> its backing database?  If your web server is serving Django
> pages/apps out of Singapore, but the database serving each of those
> requests is in the USA, it's asking for trouble.
>
Right you are. But NO in my case. :)


>
> Alternatively, if they're on the same (virtual?) server, are they
> competing for resources?  Most scalable sites have Django and
> database processes running on separate servers but ensuring that
> they're on the same local low-latency network
>
As I said currently, yes they are deployed as a single server setup. So are
competing for resources.
I'll probably change the setup to low-latency network cluster in the
future, once traffic starts to increase, and horizontal scaling is required.


>
> I presume your database queries have established indexes for the
> types of data queries you're executing.
>
YES.


>
>
> None of this precludes actually profiling your application to see
> where the slowness is actually happening, but it might be helpful to
> have in mind as you got chasing things down.
>
Sure. I'm chasing things down to keep them improving. It's better to
know/find possible bottlenecks, than ever getting badly caught on one.


>
> -tkc
>
>
>


-- 
Thanks,
Subhranath Chunder.
www.subhranath.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Tim Chase

On 06/01/12 09:17, Subhranath Chunder wrote:
> (Given the fact that the server is deployed in Amazon EC2
> Singapore location, as m1.xlarge with all it's network, memory
> constrains in place)

A couple of the other aspects that occurred to me:

Is there geographical separation between your Django/web server and
its backing database?  If your web server is serving Django
pages/apps out of Singapore, but the database serving each of those
requests is in the USA, it's asking for trouble.

Alternatively, if they're on the same (virtual?) server, are they
competing for resources?  Most scalable sites have Django and
database processes running on separate servers but ensuring that
they're on the same local low-latency network

I presume your database queries have established indexes for the
types of data queries you're executing.

None of this precludes actually profiling your application to see
where the slowness is actually happening, but it might be helpful to
have in mind as you got chasing things down.

-tkc

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Subhranath Chunder

On Fri, Jun 1, 2012 at 8:20 PM, Kurtis Mullins wrote:

> To me, the biggest bottleneck in a "Django Application Installation" (not
> application) is not going to be Django at all. It's going to be I/O --
> typically to the database and/or file system.

Yup.


> Another large part are the templates (and rendering system) which
> typically read from the filesystem.
>
Okay. I might have overlooked this all the while.


>
> Write good code with a focus on minimizing DB queries.
>
Done with zero-or-none policy. And still we're improving.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Kurtis Mullins

To me, the biggest bottleneck in a "Django Application Installation" (not
application) is not going to be Django at all. It's going to be I/O --
typically to the database and/or file system. These are used heavily (from
my personal experience) by all sorts of django functions. As for the
database -- it's used by every model, which comprises a huge chunk of the
Django Application Installation. Another large part are the templates (and
rendering system) which typically read from the filesystem.

Write good code with a focus on minimizing DB queries.
Save your templates in a RAM disk.
Save your DB to a RAM disk with hard-disk persistence on Writes.
Cache everything as much as possible (ORM-Level, Template-Level,
View-Level, Sessions, etc..) in memory (memcache)
Let a completely different server serve all static and user media (e.g. S3)
over CDN

Then, I'd imagine the Installation would be about as optimized as possible
without going through the code and carefuly anylizing every Class, method,
and data structure to see what you can pull a bit more out of. Of course
you should spend time *before* you even start writing a single line of code
to do some careful software "engineering". Some sites using
Open-up-everything-with-APIs (think Amazon) -- but that's not exactly a
tight code-base.

Of course the problem with this is -- developers are expensive! (or
underpaid in some cases). It's a lot easier to just rely on distributed
systems so that you can easily throw more resources at the problem until
you reach some critical point where optimizations are truly required. If
you pay a developer 60k/year to do something like this and it takes 3 times
as long (180k) -- why not just pay 60k for a year and throw whatever is
needed money-wise at the problem (probably much less than the remaining
120k)

On Fri, Jun 1, 2012 at 10:38 AM, Subhranath Chunder wrote:

> Yup. the application performance has been kept in mind from the ground
> work. Zero or one query, with extensive use of cache is what we try to
> achieve at the app level.
>
> Just to keep the thread a bit more focused on it's purpose, I would like
> to remind ourselves that, the discussion is on "Scaling django
> installation" and not "Scaling django application".
> For e.g. (with random number representations)
> - Setup 1: Single server setup:
>  x1 Computation Units, x2 GB memory, n Geographical location, Max Serves
> 2000 requests/sec
> - Setup 2: 2 servers cluster setup (1 server serving django, other media):
>  y1 Computation Units, y2 GB memory, n Geographical location, Max Serves
> 3000 requests/sec
> - Setup 3: 2 servers cluster setup (with a single load balancer):
>  z1 Computation Units, z2 GB memory, n Geographical location, Max Serves
> 2800 requests/sec
>
> Hope, I was able to put things in the right direction.
>
>
> On Fri, Jun 1, 2012 at 7:43 PM, Javier Guerra Giraldez  > wrote:
>
>> On Fri, Jun 1, 2012 at 3:56 AM, Subhranath Chunder 
>> wrote:
>> > how should we measure response complexity?
>>
>> a simple first approximation is the number of DB queries per page.
>
>
>> the debug toolbar nicely gives that figure while developing.
>
>
>> --
>> Javier
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>>
>
>
> --
> Thanks,
> Subhranath Chunder.
> www.subhranath.com
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Subhranath Chunder

Yup. the application performance has been kept in mind from the ground
work. Zero or one query, with extensive use of cache is what we try to
achieve at the app level.

Just to keep the thread a bit more focused on it's purpose, I would like to
remind ourselves that, the discussion is on "Scaling django installation"
and not "Scaling django application".
For e.g. (with random number representations)
- Setup 1: Single server setup:
 x1 Computation Units, x2 GB memory, n Geographical location, Max Serves
2000 requests/sec
- Setup 2: 2 servers cluster setup (1 server serving django, other media):
 y1 Computation Units, y2 GB memory, n Geographical location, Max Serves
3000 requests/sec
- Setup 3: 2 servers cluster setup (with a single load balancer):
 z1 Computation Units, z2 GB memory, n Geographical location, Max Serves
2800 requests/sec

Hope, I was able to put things in the right direction.


On Fri, Jun 1, 2012 at 7:43 PM, Javier Guerra Giraldez
wrote:

> On Fri, Jun 1, 2012 at 3:56 AM, Subhranath Chunder 
> wrote:
> > how should we measure response complexity?
>
> a simple first approximation is the number of DB queries per page.


> the debug toolbar nicely gives that figure while developing.


> --
> Javier
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>


-- 
Thanks,
Subhranath Chunder.
www.subhranath.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Kurtis Mullins

Check out django-cache-machine. It uses memcache to cache your ORM qureies
and updates (invalidates) that cache when they change.

On Fri, Jun 1, 2012 at 10:35 AM, Tim Chase
wrote:

> On 06/01/12 09:17, Subhranath Chunder wrote:
> > On Fri, Jun 1, 2012 at 6:57 PM, Tim Chase <
> django.us...@tim.thechases.com>wrote:
> >> 2) I/O
> >> 2a) disk
> >> 2b) network
> >> 2c) memory
> >>
> > Don't think these might be creating much bottleneck in my scenario. But
> > still, nothing like getting to exact figures. Again, how would you
> measure
> > it?
>
> usually in I/O operations-per-second.  Additionally, if you use
> blocking I/O, the request will have to wait until the I/O has
> completed before the request-processing can complete.  If you can
> pipeline your high-latency I/O operations, it can produce large
> gains.  As for actually measuring the operations, it can be as
> simple as noting datetime.datetime.now() before and after the window
> of interest (and possibly passing those into your template for a
> debugging render).  Without measuring, there's no way to know where
> it's slow.
>
> I'm also rashly assuming that you've disabled DEBUG in your settings.
>
> > The focus of the application has been to reduce bottlenecks as much as
> > possible.
> > Zero or one query, extensive use of memcache, async tasks(via celery),
> etc.
> > it's all there in application layer to reduce the bottlenecks.
>
> Without further details about the particular views that are slowing
> things down, it's hard to tell.  Do you have some middleware that's
> performing queries?  Are certain views slower than others?  How are
> you authenticating (against DB tables or LDAP requests to a remote
> server)?
>
> >> There's no single number to measure the complexity
> >
> > Are we sure. The round-trip response time for a request to the server,
> > can't that be used as a single number to measure the complexity?
>
> Well, a view of
>
>  def slowview(request):
>time.sleep(2000)
>return render_to_response(...)
>
> is a slow view, but not terribly complex.
>
> -tkc
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Tim Chase

On 06/01/12 09:17, Subhranath Chunder wrote:
> On Fri, Jun 1, 2012 at 6:57 PM, Tim Chase 
> wrote:
>> 2) I/O
>> 2a) disk
>> 2b) network
>> 2c) memory
>>
> Don't think these might be creating much bottleneck in my scenario. But
> still, nothing like getting to exact figures. Again, how would you measure
> it?

usually in I/O operations-per-second.  Additionally, if you use
blocking I/O, the request will have to wait until the I/O has
completed before the request-processing can complete.  If you can
pipeline your high-latency I/O operations, it can produce large
gains.  As for actually measuring the operations, it can be as
simple as noting datetime.datetime.now() before and after the window
of interest (and possibly passing those into your template for a
debugging render).  Without measuring, there's no way to know where
it's slow.

I'm also rashly assuming that you've disabled DEBUG in your settings.

> The focus of the application has been to reduce bottlenecks as much as
> possible.
> Zero or one query, extensive use of memcache, async tasks(via celery), etc.
> it's all there in application layer to reduce the bottlenecks.

Without further details about the particular views that are slowing
things down, it's hard to tell.  Do you have some middleware that's
performing queries?  Are certain views slower than others?  How are
you authenticating (against DB tables or LDAP requests to a remote
server)?

>> There's no single number to measure the complexity
>
> Are we sure. The round-trip response time for a request to the server,
> can't that be used as a single number to measure the complexity?

Well, a view of

  def slowview(request):
time.sleep(2000)
return render_to_response(...)

is a slow view, but not terribly complex.

-tkc

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Subhranath Chunder

On Fri, Jun 1, 2012 at 6:57 PM, Tim Chase wrote:

> On 06/01/12 03:56, Subhranath Chunder wrote:
> > With that in mind, how should we measure response complexity?
> > Any particular parameter, scale? Probably I can measure against
> > that, and share the numbers to shed more light on how many
> > requests can be handled in with a particular hardware config.
>
> There are a pretty small number of bottlenecks:
>
> 1) computation (a CPU-bound problem)
> 1b) algorithm (usually manifests as a CPU problem)
>
Not much computation involved in my case as far as I see. At most cases
it's all about zero or one db hit. memcache used to keep the data already
prepared. Still nothing like being able to measure the computation. How
would you measure it?


> 2) I/O
> 2a) disk
> 2b) network
> 2c) memory
>
Don't think these might be creating much bottleneck in my scenario. But
still, nothing like getting to exact figures. Again, how would you measure
it?


>
> Most of them can be mitigated through an improvement in algorithm or
> by adding an appropriate caching layer.  There's also perceived
> performance, so spawning off asynchronous tasks (such as via celery)
> can give the user the feel of "I've accepted your request and I'm
> letting you know it will take a little while to complete".
>
> In any testing that I do, I try to determine where the bottleneck
> is, and if that can be improved:  Did I choose a bad algorithm?  Am
> I making thousands of queries (and waiting for them to round-trip to
> the database) when a handful would suffice?  Am I transmitting data
> that I don't need to or that I could have stashed in a cache
> somewhere?  Am I limited by the CPU/pipe/disks on my machine?
>
The focus of the application has been to reduce bottlenecks as much as
possible.
Zero or one query, extensive use of memcache, async tasks(via celery), etc.
it's all there in application layer to reduce the bottlenecks.


>
> There's no single number to measure the complexity, but often
> there's an overriding factor that can be found addressed, at which
> point another issue may surface.
>
Are we sure. The round-trip response time for a request to the server,
can't that be used as a single number to measure the complexity?
(Given the fact that the server is deployed in Amazon EC2 Singapore
location, as m1.xlarge with all it's network, memory constrains in place)


>
> -tkc
>
>
>
>


-- 
Thanks,
Subhranath Chunder.
www.subhranath.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Javier Guerra Giraldez

On Fri, Jun 1, 2012 at 3:56 AM, Subhranath Chunder  wrote:
> how should we measure response complexity?

a simple first approximation is the number of DB queries per page.

the debug toolbar nicely gives that figure while developing.

-- 
Javier

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Tim Chase

On 06/01/12 03:56, Subhranath Chunder wrote:
> With that in mind, how should we measure response complexity?
> Any particular parameter, scale? Probably I can measure against
> that, and share the numbers to shed more light on how many
> requests can be handled in with a particular hardware config.

There are a pretty small number of bottlenecks:

1) computation (a CPU-bound problem)
1b) algorithm (usually manifests as a CPU problem)
2) I/O
2a) disk
2b) network
2c) memory

Most of them can be mitigated through an improvement in algorithm or
by adding an appropriate caching layer.  There's also perceived
performance, so spawning off asynchronous tasks (such as via celery)
can give the user the feel of "I've accepted your request and I'm
letting you know it will take a little while to complete".

In any testing that I do, I try to determine where the bottleneck
is, and if that can be improved:  Did I choose a bad algorithm?  Am
I making thousands of queries (and waiting for them to round-trip to
the database) when a handful would suffice?  Am I transmitting data
that I don't need to or that I could have stashed in a cache
somewhere?  Am I limited by the CPU/pipe/disks on my machine?

There's no single number to measure the complexity, but often
there's an overriding factor that can be found addressed, at which
point another issue may surface.

-tkc

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-06-01 Thread Subhranath Chunder

On Fri, Jun 1, 2012 at 1:39 AM, Doug Ballance  wrote:

> I don't think anyone will be able to give you a good evaluation
> without knowing more about the requests.  Django itself could probably
> handle 10k requests per second returning a simple "hello world"
> response, or less than 10 if you are returning very large/difficult to
> generate responses.  It is what your app does that is going to make
> all the difference.
>
Sure, I'm willing to share more info, as long as people are interested and
ready to discuss.
Just need some more traction on the thread to keep the discussion alive.

With that in mind, how should we measure response complexity? Any
particular parameter, scale?
Probably I can measure against that, and share the numbers to shed more
light on how many requests can be handled in with a particular hardware
config.

Any suggestion on how to load test/performance test django server
installation? (Comparing performance with some high-traffic sites, and
scaling application upto or better level is always desired :) )


>
> The djangobook.com site has some good info on scaling, despite being
> for a much older version of django.  Ignore the code, and skip down
> the the section on scaling:
> http://www.djangobook.com/en/1.0/chapter20/

Yeah. I'm very much aware of this. But what I always feel is, this section
is missing out on some numbers, to show how much the different setups
perform on similar hardware.

Right now, I myself am using the single server setup, as mentioned before.


>
>
> From my own experience, caching/memcache can make all the difference
> in the world.  Find out what is taking the time, and cache it.
> Different approaches to your page design can help too.  If the page is
> 95% identical for all users, cache the 95% and pull in the 5% with
> javascript to personalize.  Allowing something like varnish to sit in
> front of those expensive to generate, but cachable pages is another
> way to speed things up but it requires a bit of application specific
> configuration to be useful (ignoring cookies for certain urls, making
> sure you are setting the vary header correctly in your app, etc).
>
Memcache, etc. all in place. And we're still improving things as much as we
can.
Haven't really tried varnish till now, but will surely try it as well.

As for most of the other queries, they're still quite open. Starting with
the most important one:
> Got around .1 millions requests and around 200+ requests/sec max. Is this
good, bad, or at par?


>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>


-- 
Thanks,
Subhranath Chunder.
www.subhranath.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-05-31 Thread Doug Ballance

I don't think anyone will be able to give you a good evaluation
without knowing more about the requests.  Django itself could probably
handle 10k requests per second returning a simple "hello world"
response, or less than 10 if you are returning very large/difficult to
generate responses.  It is what your app does that is going to make
all the difference.

The djangobook.com site has some good info on scaling, despite being
for a much older version of django.  Ignore the code, and skip down
the the section on scaling:
http://www.djangobook.com/en/1.0/chapter20/

>From my own experience, caching/memcache can make all the difference
in the world.  Find out what is taking the time, and cache it.
Different approaches to your page design can help too.  If the page is
95% identical for all users, cache the 95% and pull in the 5% with
javascript to personalize.  Allowing something like varnish to sit in
front of those expensive to generate, but cachable pages is another
way to speed things up but it requires a bit of application specific
configuration to be useful (ignoring cookies for certain urls, making
sure you are setting the vary header correctly in your app, etc).

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: Scaling django installation

2012-05-31 Thread Subhranath Chunder

Bump!

On Wednesday, May 30, 2012, Subhranath Chunder wrote:

> As the subject suggests, wanted to discuss, acquire and share some
> knowledge on scaling django installation.
>
> Firstly, my current project is a product Reviews platform, and I wanted to
> benchmark or load test the current deployment.
> Currently the deployment/installation stands on a single server setup.
> - Single Amazon EC2 instance m1.xlarge
> - Apache Webserver (serving django and static)
>
> To load test I used loadimpact.com and the results of which can be found
> on:
>
> http://loadimpact.com/load-test/www.reviews42.com-18774e46e8f562a6eb4009495cf9d752
> The test configuration consisted of 600 VUs with 10 mins step duration.
> Got around .1 millions requests and around 200+ requests/sec max. Is this
> good, bad, or at par?
>
> - What might be the possible suggestions for scaling this installation?
> - Does separating out media server helps much, and upto what extent?
> - Can a single server setup handle 1k to 10k requests/sec?
> - Some tools for benchmarking and performance testing?
> - Other cost effective ways to scale up the installation?
> - What sort of django installation needs to be there to handle 10k
> requests/sec and .1 million parallel users at all time?
> - Any good reads on scaling/scalable django deployment or installation.
> Sort of guide.
>
> --
> Thanks,
> Subhranath Chunder.
> www.subhranath.com
>
>

-- 
Thanks,
Subhranath Chunder.
www.subhranath.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

52 matches

Mail list logo