Re: [HACKERS] Dreaming About Redesigning SQL

Bob Badour Fri, 24 Oct 2003 08:31:54 -0700

"Lauri Pietarinen" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Anthony W. Youngman wrote:
>
> >In article <[EMAIL PROTECTED]>, Lauri Pietarinen
> ><[EMAIL PROTECTED]> writes
> >
> >
> >>So in your opinion, is the problem
> >>
> >>1)  SQL is so hard that the average programmer will not know how to use
it
> >>efficiently
> >>
> >>
> >
> >Nope
> >
> >>or
> >>2)  Relational (or SQL-) DBMS'es are just too slow
> >>
> >Yes.
> >
> >>If 2)  then why don't we get a bit more concrete.  Could you give
> >>an example of a query that in your experience would be too slow using
> >>a standard SQL database (e.g. Oracle, or MySQL). We could then
> >>actually try it out on some machine and compare.  I suggest using
> >>the customer-order-order_detail-product database
> >
> >Okay. Give me a FORMULA that returns a time in seconds for your query.
> >
> >Let's assume I want to print a statement of how many invoices were sent
> >to a customer, along with various details of those invoices. My invoice
> >file is indexed by company/month, and we can reasonably assume that the
> >time taken to produce the statement is infinitesimal compared to the
> >time taken to retrieve the invoice data from disk. For MV
> >
> >T = (2 + N) * ST * 1.05
> >
> >Where T is the time taken to produce the report, N is the number of
> >invoices, and ST is the hard disk seek time.
> >
> First of all it is important to note that an important component of all
> modern SQL-DBMS's is
> the buffer pool (or cache) meaning that in a reasonably well tuned
> database you get very few
> disk I/O's, even when *writing* data into tables.
>
> SQL-DBMS's also are very clever at using indexes, i.e. if they can find
> all necessary data
> from an index it will not even look at the table, so to speak.
>
> And, even when presuming conservatively that there is no data in cache,
> depending on how
> the data is clustered, you will get more than one row/disk read (= 8K in
> most(?) systems).
>
> So, assuming the (simplified) example
>
> Customer(cust_id, .....)
> Order(order_id, cust_id,...)
> OrderDetail(order_id, prod_id, ...
> Product(prod_id,....)
>
> If you created a clustering index on
> Customer(cust_id)
> Order(cust_id)
> OrderDetail(order_id)
>
> And presumed that the average length of
> customer = 1K
> order=500
> orderDetail=300
>
> You would get, with 3 I/O's
> - 8 customer rows
> - 16 order rows
> - 24 order detail rows (which would only apply to one order)
>
> so, granted, that would result in one I/O per order which is more than
> in your example.
>
> I could now denormalise OrderDetail so that it contains cust_id also
> and cluster by cust_id
> (might cause you trouble down the road, if you can change the customer
> of an order), in which case, with 3 I/O's I would get
> - 8 customer rows
> - 16 order rows
> - 24 order detail rows (which would all apply to one customer)


Depending on block size, by clustering the three tables together, one might
get all of those rows for a single read potentially improving on Wol's
numbers by a factor of eight or more for this one query. Of course, doing so
would increase the cost of a table scan on the customer table.



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Dreaming About Redesigning SQL

Reply via email to