J, Amazing feedback this is great!
I think memcached is great. I haven't had time to play with it yet but I have pretty much read everything and been prepped to play with it once I have a chance. I personally think that storing images in the DB is the best place to start because if other better solutions are available later you can very easily migrate. But if you start out with filesystem migration is a little bit more cludgy in my opinion. I mean you have to go traverse directories and copy/move/delete or whatever you have to do for the migration. We have been using mysql on some pretty big internal projects here and its been working satisfactorily. However there are issues with it that make me not so confident in these big claims of large sites using it. Mainly its the scaling out paradigm that is not very clear with mysql. We tried using replication with master slaves and the replication speed was wayyyyyy too slow. Then the whole clustering approach with mysql seems to be very confusing and not very documented as far as I have poked around. The only really solid scaling approaches I have seen with mysql is either using vmware to cluster hardware at the hardware/os/vm layer to make one big virtual machine or using third party hardware/software bundles with mysql like ones from NetApp or similar. I wish clustering with mysql was as simple as adding a node to the cluster and you gain 0.7 performance per machine. Another very intriguing thing with super large sites is the actual schema design. You have to be very smart about design, data segregation, indexes, etc. I mean I don't know for sure but I am pretty sure sites like myspace don't just have one huge users table with user_id, email, sha1_password. I would imagine they have segregated users into separate schemas which would scale far better than mysql replication or clustering would. Something like every 10,000 users are allocated on a new mysql server. Thanks, ------------------------------------------ Ali Mesdaq Security Researcher II Websense Security Labs http://www.WebsenseSecurityLabs.com ------------------------------------------ -----Original Message----- From: J. Shirley [mailto:[EMAIL PROTECTED] Sent: Friday, October 26, 2007 12:31 PM To: The elegant MVC web framework Subject: Re: [Catalyst] Hypothetical Site and Scalability Planning On 10/26/07, Mesdaq, Ali <[EMAIL PROTECTED]> wrote: Hey All, Just wanted to start a thread about scalability planning and design. I was thinking we could take the approach of what peoples opinions, ideas, and best practices are for large scale sites and use a hypothetical site or a existing site as the model to plan for. Not everything discussed needs to be catalyst only it could be general web server configs or something similar. For example how would you guys approach a project where you needed to create a site like a myspace.com <http://myspace.com> or similar with 0 current users but could surpass 1 million users in 1 month then 100 million in 1 year. I am interested to see the opinions and designs people would have to deal with that type of scalability. I mean even simple issues become very complex with those numbers. Like where and how to store photos. Should they be stored on filesystem, db, or external sites like akamai. What web server should be used? Apache? Should it be threaded version? How does that affect catalyst and its modules are they all thread safe or is threaded apache not even the way to go? Here's my opinions on the matter: 1) Start out with memcached in place. It scales well, and use it. Use PageCache where you can. 2) Store images in something that is for storing data, not files. Storing images as files means you are stuck with some file system format that binds you unnecessarily. Things like S3, Akamai or your own homegrown MogileFS cluster gives you an API into the data. Granted, you could do the same for NFS or whatever, and just write a good compatibility API, you are largely duplicating the work of the previous tech. If you use S3, setup your image servers to cache for a loooooong time (on disk). Pull from S3, and store it for as long as you reasonably can. This area a lot of people get wrong and then get stuck with costly migrations. 3) Use database replication strategies where you can. In the F/OSS world, MySQL is outshining PostgreSQL with this. InnoDB removes a lot of the complaints that folks have about MySQL but there is always evangelism against MySQL. If it works for you, just take it in stride - a LOT of high traffic sites use MySQL; you can usually get some insight from them. MySQL allows InnoDB on the master, and MyISAM on the slaves -- gets you faster read times, and tends to not block on inserts that bad -- and then as you grow it is easier to grow into a full blown MySQL cluster... but at that point, you have enough money to thoroughly explore every option available. 4) You'll have to tune Apache or whatever web server you have to your specific app. Every app has different usage patterns, and you'll have to customize your web server accordingly. This is where starting from scratch pays off -- you can experiment and see what improves performance. Another piece of advice: Don't look at requests per second as the idea of webserver scalability -- sure, you want to have efficient code, but that is just efficient code measurement; not scalability. Look at it this way: How many webservers do I need to add to my cluster to double traffic. If there answer is more than two, start looking at bottlenecks. If it is two, and you are still near peak usage, look at bottlenecks. If you add two, and everything is running smooth then you are probably in good shape. Now start worrying about your databases :) Hope this helps, it is an area I have some experience in and find fun. -J -- J. Shirley :: [EMAIL PROTECTED] :: Killing two stones with one bird... http://www.toeat.com Click here <https://www.mailcontrol.com/sr/13CZA7L8WnZKygC!vtOFEoENv!XWCM+4CHHyWURc UaOFd4By5NsBQMW0RRglMIC9MNdyVDcC4LbY5rGljK6Ah6GIufzY36fhPPa2BFLh7xtvVXLv 3q!3oase5!VJjqbUzOKXfvQZ6DsY9PE1lueDe7GyRPN4qwvQRcyi5C3p!RPGENyTVNX0cIQ+ TZtfM4ZqxsH8AbYjoTXIf+hQ8pk0I1xrVhykbxl2> to report this email as spam. _______________________________________________ List: Catalyst@lists.scsys.co.uk Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[EMAIL PROTECTED]/ Dev site: http://dev.catalyst.perl.org/