Re: [GENERAL] how good is PostgreSQL
> Whenever a query is executed (not found in cache, etc.), the caching > system would simply store the query, the results, and a list of tables > queried. When a new query came in, it would do a quick lookup in the query > hash to see if it already had the results. If so, whammo. Whenever an > insert/delete/update was sensed, it would look at the tables being affected, > and the caching mechanism would clear out the entries depending on those > tables. It seems to me that tracking the list of cached queries and watching for queries that might invalidate them adds a lot of complexity to the back end and the front end still has to establish the connection and wait transfer the data over the socket. On a more practical level, a backend solution would require someone with fairly detailed knowlege of the internals of the backend. A front end solution can more likely to be implemented by someone not as knowlegable. One of the big advantages of your technique is there is no code change at the application level. This means less database lock-in. Maybe that is a disadvantage too. ;-)
Re: [GENERAL] how good is PostgreSQL
> As programmers, we naturally want to throw things into databases for > three reasons. First, it's easy to get data in. Second, it's easy to get > relevant data out. And third, it's "cool". We don't want to work with > flat files, now do we? ; ) Kiddin', eh? :) Actually, the third reason seems to dominate the younger developers' minds. People often tend to keep everything in poor DBMS until it begins to kick back. And this has impact on the customers. Does your system use a database? No, why should it? You mean you'll keep our dearly beloved banner ads as flat files? Yes, this is where they belong. Sorry, we'll seek for someone more advanced. Good luck. Of course, hardware vendors jump up of joy :) Maybe I don't get it, but IMHO there's no reason to put into DB something that can't be indexed and used in where clause. > It would *not* allow the > database to do joins, it would grab entire tables, then try to do the > joins > itself, in Perl. Umh Yeah Well To keep compatibility with other Open Source Databases and ESR/RMS, you know :) >(Incidentally, we've toyed around with developping a query-caching > system that would sit betwen PostgreSQL and our DB libraries. Sounds amazing, but requires some research, I guess. However, in many cases one would be more than happy with cahced connections. Of course, cahced query results can be naturally added to that, but just connections are OK to start with. Security -- contaminated fish and microchips huge supertankers on Arabian trips oily propaganda from the leaders' lips all about the future there's people over here, people over there everybody's looking for a little more air crossing all the borders just to take their share planning for the future Rainbow, Difficult to Cure
Re: [GENERAL] how good is PostgreSQL
Steve Wolfe wrote: > > > Even after that, you have a long way to go before you will hit 1000 > > transactions per second from any SQL database. >I guess they could always buy a few Sun E1's on the backend, and a > large room of rack-mountable PC's for web/CGI serving. Nothing like > plopping down ten or twenty million dollars on hardware. : ) Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on that. Probably would cost less, and be more reliable. And they can always load another Linux/390 VM -- an S/390 can run something like 41,000 virtual machines each running Linux/390 and Apache. However, if you want to see the architecture of a _large_ database-backed website, see the story behind Digital City at www.aolserver.com. While they're using Sybase instead of PostgreSQL, the architecture is the same. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Re: [GENERAL] how good is PostgreSQL
> Even after that, you have a long way to go before you will hit 1000 > transactions per second from any SQL database. I guess they could always buy a few Sun E1's on the backend, and a large room of rack-mountable PC's for web/CGI serving. Nothing like plopping down ten or twenty million dollars on hardware. : ) steve
Re: [GENERAL] how good is PostgreSQL
> Even after that, you have a long way to go before you will hit 1000 > transactions per second from any SQL database. Since my last post probably wasn't too useful, here's some information that might be a little more help. It's a little long, I know, but hopefully it will be of use to someone. As programmers, we naturally want to throw things into databases for three reasons. First, it's easy to get data in. Second, it's easy to get relevant data out. And third, it's "cool". We don't want to work with flat files, now do we? ; ) However, in some cases, using the database to get data out ends up costing us a lot of time and money. Sometimes we do the same nasty query so often, that we end up purchasing bigger hardware to make the system work reasonably. Why? Because it was easier for us to write a program that did: GetDataFromDatabase(); PrepareData(); PrintData(); Each time, the database server does the work. But it doesn't necessarily have to be that way. In our company, we've found two trends that have enabled us to save a LOT of processing power on our machines. (read: Increase the capacity of our servers by 30% or more, with fairly minor changes) The first case is that of rarely-changing data. Some of our datasets probably have around 50,000 to 1,000,000 views (selects) for each update (insert/delete). Having the database repeat the query every time is a waste. So, we began writing our programs such that they will grab the data from the database once, and generate the HTML for every page, and the indexes. Then, when an update is made to the database (via the administrative tools), it simply rewrites *the relevant HTML files*, and changes the indeces pointing to them. (There are also some other very large advantages to this sort of thing, but I'm not allowed to say them. ; ) ) The second case is that of often-repeated queries. One of the offerings on our site is an online directory, which gets a pretty fair amount of traffic. Unfortunately, it uses a proprietary program that was purchased by management before they spoke with us. Grr It was the most utterly inefficient program I've ever seen. It would *not* allow the database to do joins, it would grab entire tables, then try to do the joins itself, in Perl. We rewrote the program to let PostgreSQL do the joins, and that sped it up. Then we realized that a very small number of queries (those for the first one or two levels of pages) accounted for a huge portion of the useage. So, we replaced the front page with a static HTML page (the front page doesn't change...), and saw another terrific drop in our system loads. Overall, by only modifying a couple of our more heavily-uesd programs, our server loads dropped by about 30%-40%. If we went to the trouble to modify some others, it would drop even more. But we're going to rewrite them completely for other reasons. : ) In any event, there are ways like this to save a LOT of CPU and disk I/O. Most web servers can server out several hundred static pages with the resources that would otherwise deliver one dynamically-created, database-driven page. It also allows you to cluster the web servers with cheap commodity hardware, instead of using big-iron on the database. And if you have a big-iron machine running the back-end, this can severely lighten the load on it, keeping you from dropping a few hundred grand on the next step up. ; ) (Incidentally, we've toyed around with developping a query-caching system that would sit betwen PostgreSQL and our DB libraries. However, it seems like it could be done *much* more efficiently in PostgreSQL itself, as it would be much easier to keep track of which tables have changed, etc.. Anybody know if this sort of functionality is planned? It would be terrific to simply give the machine another 256 megs of RAM, and tell it to use it as a DB cache...) steve
Re: [GENERAL] how good is PostgreSQL
> Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on > that. Probably would cost less, and be more reliable. And they can > always load another Linux/390 VM -- an S/390 can run something like > 41,000 virtual machines each running Linux/390 and Apache. Yeah I'm very optomistic about IBM's new chips that are coming out next year. Each "processor module" will have 4 processors, but each processor will have 2 cores - so in effect, each "processor module" has 8 processors on it. All processors will have copper interconnects, and depending on the source, will debut at anywhere from 1.3 to 2 gigahertz. I think that will certainly help them get a larger share of the high-end market! steve
Re: [GENERAL] how good is PostgreSQL
Arnold Gamboa wrote: > Hi, > > For users of large PostgreSQL and PostgreSQL builders, this is for you. > > I'm having a terrible time deciding now. :( > > We're about to build a "huge" website now. I got tied up in signing the > contract without really getting enough information about PgSQL since this > what we plan to implement with PHP (normally we use mySQL but i guess it > does not fit for huge databases like that). > > Here's my problem.. We're about to build a site like hitbox.com where there > is a large amount of database required.. If say there is 100,000 users with > 1000 page hits per day for each, and everything will be logged, you could > imagine how huge this will be. I'm just so "nervous" (really, that's the > term) if we implement this and later on experience a slow down or worse than > that, crash in the server. That is a LOT of work for any system. That is over 1100 page views a second, or under 900us each.. A standard Pentium III system, serving static pages would have problems with that. If you look at search engines, to get that performance with readonly data, they usually cluster multiple systems and load balance across them. You may need to segment your data and have multiple SQL servers perform different functions. Also, that 1100 page view per second is assuming an even distribution of traffic, which does not happen in a web server. If you average that much, chances are there will be periods of twice that. Look into a "local director," "Alteon," or even LVS. > > > My questions are: > 1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW > DOWN. > 2. limit in number of tables per database > 3. ... limit in number of database. There are a couple factors involved, more complex than a simple response. Use multiple databases and put each on a separate disk, with its own controller. Better yet, have multiple load balanced web boxes do a lot of processing in PHP and offload much of the CPU bound SQL work to the "cheap" web boxes, and have multiple SQL databases in the back handling various independent tasks. In a web site I worked on, we had multiple front end web servers, load balanced with an Alteon. Each web server had its own SQL database which provided SQL access to "static" data which was updated each week. We had an additional single SQL database backend which all the Web servers accessed for synchronized dynamic data. If you are serious about the load you expect to put on this system you must be careful: Do not create any indexes you do not need. Do not use the "foreign key" constraint as it forces a trigger for each insert. Make sure you index the keys by which you will access data. Avoid searching by strings, try to use keys. Even after that, you have a long way to go before you will hit 1000 transactions per second from any SQL database. If you are betting your business on this implementation, you have a lot of homework to do. > > > Thanks for you comments. I would really appreciate every comment that I'll > receive regarding this. > > Arnold