"Lauri Pietarinen" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Anthony W. Youngman wrote: > > >In article <[EMAIL PROTECTED]>, Lauri Pietarinen > ><[EMAIL PROTECTED]> writes > > > > > >>So in your opinion, is the problem > >> > >>1) SQL is so hard that the average programmer will not know how to use it > >>efficiently > >> > >> > > > >Nope > > > >>or > >>2) Relational (or SQL-) DBMS'es are just too slow > >> > >Yes. > > > >>If 2) then why don't we get a bit more concrete. Could you give > >>an example of a query that in your experience would be too slow using > >>a standard SQL database (e.g. Oracle, or MySQL). We could then > >>actually try it out on some machine and compare. I suggest using > >>the customer-order-order_detail-product database > > > >Okay. Give me a FORMULA that returns a time in seconds for your query. > > > >Let's assume I want to print a statement of how many invoices were sent > >to a customer, along with various details of those invoices. My invoice > >file is indexed by company/month, and we can reasonably assume that the > >time taken to produce the statement is infinitesimal compared to the > >time taken to retrieve the invoice data from disk. For MV > > > >T = (2 + N) * ST * 1.05 > > > >Where T is the time taken to produce the report, N is the number of > >invoices, and ST is the hard disk seek time. > > > First of all it is important to note that an important component of all > modern SQL-DBMS's is > the buffer pool (or cache) meaning that in a reasonably well tuned > database you get very few > disk I/O's, even when *writing* data into tables. > > SQL-DBMS's also are very clever at using indexes, i.e. if they can find > all necessary data > from an index it will not even look at the table, so to speak. > > And, even when presuming conservatively that there is no data in cache, > depending on how > the data is clustered, you will get more than one row/disk read (= 8K in > most(?) systems). > > So, assuming the (simplified) example > > Customer(cust_id, .....) > Order(order_id, cust_id,...) > OrderDetail(order_id, prod_id, ... > Product(prod_id,....) > > If you created a clustering index on > Customer(cust_id) > Order(cust_id) > OrderDetail(order_id) > > And presumed that the average length of > customer = 1K > order=500 > orderDetail=300 > > You would get, with 3 I/O's > - 8 customer rows > - 16 order rows > - 24 order detail rows (which would only apply to one order) > > so, granted, that would result in one I/O per order which is more than > in your example. > > I could now denormalise OrderDetail so that it contains cust_id also > and cluster by cust_id > (might cause you trouble down the road, if you can change the customer > of an order), in which case, with 3 I/O's I would get > - 8 customer rows > - 16 order rows > - 24 order detail rows (which would all apply to one customer)
Depending on block size, by clustering the three tables together, one might get all of those rows for a single read potentially improving on Wol's numbers by a factor of eight or more for this one query. Of course, doing so would increase the cost of a table scan on the customer table. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]