Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman kirjutas K, 05.11.2003 kell 01:15: 1) Your database might change over time and say a table that originally had only a few rows could suddenty grow considerably. Now an optimiser would insulate you from these changes or in the worst case all that would need to be done would be to create an index (and, yes, check that the DBMS starts using it). Except that an optimiser is *irrelevant* to MV. What do we need to be insulated from? MV doesn't care whether a FILE is 4Kb or 40Gb, the cost of accessing a single record, AT RANDOM, from within that FILE is almost identical. Where would we gain from an optimiser? In practice, it would get in the way and slow us down! getting a single record from any DB ,AT RANDOM, follows the same rules ;) 2) You might have a product that runs in a number of sites: large ones and small ones. Now you would not have to reoptimise the programs for each type site. BUT WE DON'T NEED AN OPTIMISER. IT'S A WASTE OF CPU TIME!!! WE *D*O*N*'*T* *N*E*E*D* ONE!!! on slashdot this would be tagged *funny* ;) 3) Complex SQL-queries do quite a lot of things and it might not be very obvious for the programmer how to optimise best. But a large chunk of SQL's complexity is reassembling a view of an entity. perhaps a large chunk of initial perceived complexity of SQL is reassembling a view of an entity. You will get over it in a day or two ;) that is *if * the thing you are after *is* an entity. MV doesn't have that complexity. An MV program views the database the same way as a programmer views the real world. You mean screenfuls of weird green glowing letters running down the screen leaving slowly fading tracks ? So it's pretty obvious to a MV programmer how to optimise things. I've never been very good at optimising the real world - the obvious optimisations have very limited scope. 4) depending on input from user (say, a search screen) the optimal access path may be different. An optimiser could generate a different path depending on this input. Again, MV views the entity as a whole, so probably we don't need to generate a different path - it's just get me this entity regardless of what we need to know about it. Not what we need to know about it but what we already know about it. So it is always a SEQUENTIAL SCAN , non ? or is there some magic by which you have all entities automatically hashed by each and every attribute (or combination of attributes) ? We're not interested in being able to improve the speed at which the db can find data to respond to an app request - with an access factor of 1.05 (actually, it's nearer 1.02 or 1.03) we consider any effort there to be a waste of time ... But isn't it better to have NO disk reads than one? I thought disk I/O was rather expensive? With that mentality you will always be disk bound. I'm assuming we don't have sufficient RAM to cache stuff ... Our mentality is to leave disk caching to the OS. The app says get me X. The database knows *exactly* where to look and asks the OS to get me disk sector Y. How does the database map X to Y, without any extra info (meaning extra disk accesses) ? If you can always predict your data needs that well, you dont need a database, all you need is a file system. Any OS worth its salt will have that cached if it's been asked for previously recently. Were you not talking about databases with substantially more data than fits into RAM ? That way, we're only caching stuff that's been accessed recently. But because for us the atomic chunk is an entity, there's a good chance that stuff has been accessed and is in cache. depending on your point of view, anything can be an entity (or atomic chunk) ;) SQL optimisation *seems* to be more efficient because it tries to predict what you're going to want next. Where do you get your weird ideas about SQL optimisation from ? But whereas SQL *guesses* that because you've accessed one order detail, you're likely to want other order details from the same invoice (a sensible guess), you cannot compare this to MV because it gives you those order details as a side effect. In order for MV optimisation to be of any use, it would need to guess which INVOICE I'm going to access next, and frankly a random number generator is probably as good an optimiser as any! So you claim that MV is good for problems you already know the best way to solve ? Basically, the only way you can beat us in the real world is to throw hardware at the problem - and like I said with linux and macro/micro kernels, we can do the same :-) Well, please do! We do. Which is why we can smoke any relational db for speed unless the hardware is big enough to store the entire database in RAM (and even then we'd beat it for speed :-) (just not that much in absolute terms, although probably a fair bit in percentages :-) I guess this is the same as some ASM programmer claiming he can beat a C
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen lauri.pie [EMAIL PROTECTED] writes Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row And I presume the system will automatically move all related stuff (order details etc.) into the same block as the new customer? How long will that take? What if there is no room for it there? Well, I'd view an order as an entity. As such, I would give it its own FILE, and your question doesn't make sense. But then your formula for disk head movements does not make sense either! Why not? The order is a real-world thing, and as such I would have an ORDERS file, in which each order is a single entry, with customer_id as one of its attributes. order detail is an attribute of order, so if I change customer_id it's the relational equivalent of just changing one cell in one row. The chances of me having to move the record is pretty near nil, and if I do it won't change bucket so at most it involves two frames (or disk blocks, if that's what you want to call them). But if the system did move the stuff, it would be four disk accesses - read/write to delete the old entry, read/write to save the new. As for enough room - well - it'll fall over if we have a disk full (or it might not). Not enough room here means not enought room in the block of the customer (from which you were supposed to get all data in one read, or disk head movement). That would mean that your order information would be moved perhaps to another block and result in an extra head movement, or am I right? Which I've taken in to account - if there isn't enough room in the original bucket, I need to either overflow into the next bucket which might exist, or to create it if it doesn't. Ie two head movements to delete from the first bucket, and two head movements to add to the second. And it will only fall over if I need to create a new bucket and there's no space left on the disk (or if (and this is very unlikely in this scenario) it triggers a split which again needs space and there's none left on disk). Or have you not sussed that we view order detail as an attribute of order (which is therefore stored as part of the same thing), but customer is separate from order, is stored separately, and is linked by a relationship. (Whereas order detail is NOT related to order, because they are part of the same thing :-) Well, it does result in data being stored multiple times ;-) What on earth is wrong with that? Do you know how much 160GB of disk cost's today? I could ask: does your system work in, say 4KB? That's how much memory the first computer I used (a Wang 2000) had. Probably it would not work at all. In the 50's they did amazing things with hardly any compilers and very little memory. I am referring to Whirlwind. See http://www.cedmagic.com/history/whirlwind-computer.html. Could you have done that with MV? My point? Why are we discussing restrictions to memory and CPU speed of the 70's and 80's? If an SQL DBMS uses more memory and disk, and it is available, why complain about *that*. Im not impying that you cannot complain about other matters, e.g. ease of development etc. and you might even be right. Be it as it is, I am not trying to make you abandon your MV database. As always, you're relying on hardware to help :-) You know what I think of that :-) And 160Gb of disk is only cheap if you're using IDE on a desktop PC - it costs a hell of a lot more for a laptop or SCSI for a server. And if it's embedded it maybe that the *room* is expensive, not the capacity ... And: what if I was just reading customer-data. Would the same formula apply (= (2+N)*ST*1.05)? Nope. If I understand you correctly, you want attributes that belong to the entity customer, not the entity invoice. T = ST * 1.05. (By the way, billing and/or invoice address (for example) are invoice attributes, not company attributes.) No, I want you to give me a list of all your customers. How many disk reads? T = N * 1.05 where N is the number of customers. What do you want to know about those customers? Address? Phone number*s*? Anything else? That's *all* at no extra cost. Well, no thanks. I just wanted their names this time. The relational alternative, with an index on customer_name, would be again an order of magnitune less disk reads. Well, if you let me use an index here, I'm sorry, GAME OVER! The best you can do would be a photo finish. Assuming an overhead of, say, 4 bytes per index entry, the entire index would be Size = 4 * N + sigma(name_length) + sigma(key_length) Okay, I've probably got some padding there as well, but so will you. And note I
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Marshall Spight [EMAIL PROTECTED] writes Unless one has data independence, one does not have this option; one will be locked into a particular performance model. This is why I found the MV guy's obvious pleasure at being able to precisely describe the performance model for his DB as odd: I thought it a deficit to be able to say what it was; he thought it an asset. When you park your car, do you put the chassis on the drive, the engine in the garage, and the wheels in the front garden? When I park my car, I don't particularly _care_ whether it runs on propane, diesel, gasoline, ethanol, or batteries. (Well, at home, they don't allow propane cars in the parking garage, but that's a case where details HAVE to emerge.) I don't need to care whether the car uses a 4 cylinder engine, 6, 8, 12, or perhaps evades having cylinders at all. I frankly have NO IDEA how many RPMs the engine gets to, nor do I know how many times the wheels turn in the average minute. These are all details I don't NEED to know in order to park the car, and are pretty much irrelevant to the average need to drive an automobile. I consider it a Good Thing that my database has a query optimizer that makes it unnecessary for me to worry about the details of how indexes will be used. Occasionally some anomaly comes up that requires that I dig into details, but most of the time, the abstractions allow me to ignore these details, and allows me to spend my time worrying about optimizing the things that actually need it, as opposed to chasing after irrelevant improvements. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/linux.html ASSEMBLER is a language. Any language that can take a half-dozen keystrokes and compile it down to one byte of code is all right in my books. Though for the REAL programmer, assembler is a waste of time. Why use a compiler when you can code directly into memory through a front panel. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Anthony W. Youngman [EMAIL PROTECTED] writes Really, however you calculate it, it is an order of magnitude less than your alternative. And please don't tell me that using indexes is not fair or not in the spirit of the relational model ;-) Well, it does result in data being stored multiple times ;-) And while it maybe doesn't affect the result that much, you wanted the value? Where has that come from? What if the price changed half way through the period you're calculating? :-) You've failed to answer your own question, so maybe I could match you ... Whoops - sorry - I did notice after I wrote this that you included price in your index. OK! But it does seem strange indexing on a composite field like that ... But why does it seem strange? regards, Lauri ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
Christopher Browne [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anthony W. Youngman [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Marshall Spight [EMAIL PROTECTED] writes Unless one has data independence, one does not have this option; one will be locked into a particular performance model. This is why I found the MV guy's obvious pleasure at being able to precisely describe the performance model for his DB as odd: I thought it a deficit to be able to say what it was; he thought it an asset. When you park your car, do you put the chassis on the drive, the engine in the garage, and the wheels in the front garden? When I park my car, I don't particularly _care_ whether it runs on propane, diesel, gasoline, ethanol, or batteries. Christopher, You have to remember who you are talking to; Wol is ignorant and stupid. A car is a physical artifact just as the physical representation of a datum is a physical artifact. Physical independence is the equivalent to having a door from the hallway to the garage, a door from the kitchen to the garage, a door from the back yard to the garage, and car access to the driveway--and an identical car parked in the back alley just for convenience. Wol's analogies are dumb because they reflect his intelligence. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Anthony W. Youngman [EMAIL PROTECTED] writes But it does seem strange indexing on a composite field like that ... But why does it seem strange? He only knows one product and only a handful of recipes for using that product. Everything else seems strange because it lies outside the tightly confined cognitive box from which he views the world. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Anthony W. Youngman [EMAIL PROTECTED] writes Really, however you calculate it, it is an order of magnitude less than your alternative. And please don't tell me that using indexes is not fair or not in the spirit of the relational model ;-) Well, it does result in data being stored multiple times ;-) And while it maybe doesn't affect the result that much, you wanted the value? Where has that come from? What if the price changed half way through the period you're calculating? :-) You've failed to answer your own question, so maybe I could match you ... Whoops - sorry - I did notice after I wrote this that you included price in your index. But it does seem strange indexing on a composite field like that ... Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Marshall Spight [EMAIL PROTECTED] writes Bob Badour [EMAIL PROTECTED] wrote in message news:W46dnf4tbfF1DwiiU- [EMAIL PROTECTED] All physical structures will bias performance for some operations and against others. This strikes me as a succinct statement of the value of data independence. One has the option (but not the requirement) to adjust the physical structures the DBMS uses while keeping the logical model (and therefor all application code and queries, etc.) unchanged. Unless one has data independence, one does not have this option; one will be locked into a particular performance model. This is why I found the MV guy's obvious pleasure at being able to precisely describe the performance model for his DB as odd: I thought it a deficit to be able to say what it was; he thought it an asset. When you park your car, do you put the chassis on the drive, the engine in the garage, and the wheels in the front garden? You may find my approach of keeping data together strange, I just find it extremely weird that you think it is an IMPROVEMENT to disassemble what is in the real world a single thing. I'm sure you would not be happy if I tried to disassemble YOU and store your head in one place, your legs and arms in another, etc etc. Can I refer you to something called emergent complexity? A scientific theory of how the whole can be greater than the sum of its parts? Harking to something else, I can't remember who said the tuple is the fundamental unit of data. Apart from the fact that such a statement is not worth arguing with, I would compare that to the quark in physics. A strange beast that is known to exist, but can never be found in reality. And as a chemist, it is totally and utterly irrelevant to me. It pays to know it's there just in case in some strange circumstance it should be useful, but for the most part I can ignore it as just not part of my reality. Oh - and do you know why I was so pleased to describe the performance model for my db? For the same reason as I mentioned Huffman compression. It's impossible to prove that that Huffman is the most efficient algorithm, and indeed I pointed out that it isn't. It is, however, possible to prove that it is mathematically impossible for a more efficient algorithm to exist. I'm TOTALLY happy to be locked into a performance model, if I can PROVE that there are no other models that are more efficient. My ability with stats isn't good enough, but the figure bandied about is that there is room for about 5% improvement before we hit that mathematical limit. SQL has a HELL of a long way to go to catch up :-) Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lauri Pietarinen lauri.pie [EMAIL PROTECTED] writes Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: Fine. But MV *doesn't* *need* much of a cache. Let's assume both SQL and MV have the same amount of RAM to cache in - i.e. *not* *much*. I did say the spec said extract maximum performance from the hardware available. So what's wrong with gettng a machine with lots of memory? How much does 2G of memory for an Intel-box cost now a days? Is this some kind of new ultimate sport, trying to get along with as little memory as possible? I presume you didn't read the bit below ... what if you have SEVERAL tables, and EACH of them is a gigabyte or two in size? OK, I get your point. Using technology to get you out of a hole is fine. Assuming it will be there if you need it is not. And actually, this is one of the factors hammering the MV model :-( Technology is now powerful enough to solve a lot of problems simply by using brute force. Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row And I presume the system will automatically move all related stuff (order details etc.) into the same block as the new customer? How long will that take? What if there is no room for it there? Well, I'd view an order as an entity. As such, I would give it its own FILE, and your question doesn't make sense. But if the system did move the stuff, it would be four disk accesses - read/write to delete the old entry, read/write to save the new. As for enough room - well - it'll fall over if we have a disk full (or it might not). if we stick to your example and even if we don't normalise using e.g. clustering features of Oracle, as Bob pointed out, we are getting at most the same number of I/O's. So, answer to your question: our formula is at least as good as yours. Except I think Bob said we could optimise to favour *certain* transactions. I think actually ANY transaction benefits. You're relying on stuff that's outwith your theory, we're relying on stuff that's inherent to our model. That certainly is not true. The theory says NOTHING about how data should be arranged on disk. You are talking about how modern SQL-databases behave. The DBMS is at liberty to do whatever it pleases with the data, even save it in a PICK database. Hey, wadda you think? Would that be a good idea? We get to keep our SQL but with the speed of PICK ;-) That would be nice ;-) But I think our two paragraphs don't connect. I was talking about MV ... We let the hardware help us out if it can. There's a big difference. If you can't get the hardware, you're stuffed. We don't need it, so while we may have a hard time of it it's nowhere near as bad for us. And again, relational separates the physical from the logical. You're being hypocritical if you call upon the physical representation to help out with the (speed of the) logical presentation. My goodness, no I'm not! Its the same as claiming that if you have a drawing for a house, you have to make that house out of paper?!? I want a list with all products with corresponding total sales, read from order detail e.g. Hammer 1$ Nail 5000$ Screw 1200$ How many disk reads (or head movements)? Actually, probably the same as you here. If we're indexed on order detail. If Hammer appears in N invoices, then T = (1+N) * ST * 1.05 for hammers, and the same for all the other products. Theory favours us, in that if a product appears X times in one invoice, that's one read for us and X for you, but hardware will probably help you more than us (that is, assuming thrashing cuts in) in that you stand a marginally higher chance of getting multiple instances of a product in any given read. So for each product you get T = (1+N) * ST * 1.05. Now, for our SQL-DBMS, presuming that we build indexes for detail and product: order_detail(product_id, qty, unit_price) = 20 bytes/row product(product_id, product_name) = 50 bytes/row With 2 disk reads I would get 8K/20 = 400 order detail rows and 8K/50 = 160 product rows Since all rows are in product_id order, no need for random disk reads so T = 1 + N/400 + P/160 (N=number of details, P=number of products) for ALL products and details. And, because of sequential prefetch, we probably would not have to wait for I/O's at all. Really, however you calculate it, it is an order of magnitude less than your alternative. And please don't tell me that using indexes is not fair or not in the spirit of the relational model ;-) Well, it does result in data being stored multiple times ;-) And while it maybe doesn't affect the result that much, you wanted the value? Where has that come from? What if the price changed half way
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen lauri.pie [EMAIL PROTECTED] writes Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row And I presume the system will automatically move all related stuff (order details etc.) into the same block as the new customer? How long will that take? What if there is no room for it there? Well, I'd view an order as an entity. As such, I would give it its own FILE, and your question doesn't make sense. But then your formula for disk head movements does not make sense either! But if the system did move the stuff, it would be four disk accesses - read/write to delete the old entry, read/write to save the new. As for enough room - well - it'll fall over if we have a disk full (or it might not). Not enough room here means not enought room in the block of the customer (from which you were supposed to get all data in one read, or disk head movement). That would mean that your order information would be moved perhaps to another block and result in an extra head movement, or am I right? If we're indexed on order detail. If Hammer appears in N invoices, then T = (1+N) * ST * 1.05 for hammers, and the same for all the other products. Theory favours us, in that if a product appears X times in one invoice, that's one read for us and X for you, but hardware will probably help you more than us (that is, assuming thrashing cuts in) in that you stand a marginally higher chance of getting multiple instances of a product in any given read. So for each product you get T = (1+N) * ST * 1.05. Now, for our SQL-DBMS, presuming that we build indexes for detail and product: order_detail(product_id, qty, unit_price) = 20 bytes/row product(product_id, product_name) = 50 bytes/row With 2 disk reads I would get 8K/20 = 400 order detail rows and 8K/50 = 160 product rows Since all rows are in product_id order, no need for random disk reads so T = 1 + N/400 + P/160 (N=number of details, P=number of products) for ALL products and details. And, because of sequential prefetch, we probably would not have to wait for I/O's at all. Really, however you calculate it, it is an order of magnitude less than your alternative. And please don't tell me that using indexes is not fair or not in the spirit of the relational model ;-) Well, it does result in data being stored multiple times ;-) What on earth is wrong with that? Do you know how much 160GB of disk cost's today? I could ask: does your system work in, say 4KB? That's how much memory the first computer I used (a Wang 2000) had. Probably it would not work at all. In the 50's they did amazing things with hardly any compilers and very little memory. I am referring to Whirlwind. See http://www.cedmagic.com/history/whirlwind-computer.html. Could you have done that with MV? My point? Why are we discussing restrictions to memory and CPU speed of the 70's and 80's? If an SQL DBMS uses more memory and disk, and it is available, why complain about *that*. Im not impying that you cannot complain about other matters, e.g. ease of development etc. and you might even be right. Be it as it is, I am not trying to make you abandon your MV database. And while it maybe doesn't affect the result that much, you wanted the value? Where has that come from? From e.g. select p.product_id, product_name, sum(qty*unit_price) from product, order_detail od where p.product_id = od.product_id group by p.product_id, product_name This is the SQL statement that will result in 1 + N/400 + P/160 disk reads (if rows not found in cache) What if the price changed half way through the period you're calculating? Which price? The price that has already been paid by customer? :-) You've failed to answer your own question, so maybe I could match you ... How have I failed? And: what if I was just reading customer-data. Would the same formula apply (= (2+N)*ST*1.05)? Nope. If I understand you correctly, you want attributes that belong to the entity customer, not the entity invoice. T = ST * 1.05. (By the way, billing and/or invoice address (for example) are invoice attributes, not company attributes.) No, I want you to give me a list of all your customers. How many disk reads? T = N * 1.05 where N is the number of customers. What do you want to know about those customers? Address? Phone number*s*? Anything else? That's *all* at no extra cost. Well, no thanks. I just wanted their names this time. The relational alternative, with an index on customer_name, would be again an order of magnitune less disk reads. Well, if
Re: [HACKERS] Dreaming About Redesigning SQL
Marsh Ray wrote: Lauri Pietarinen wrote: The theory, indeed, does not say anything about buffer pools, but by decoupling logic from implementation we leave the implementor (DBMS) to do as it feels fit to do. As DBMS technology advances, we get faster systems without having to change our programs. I think you've identified why relational systems have been the overwhelming winner in the business environment. They allow vendors to provide an optimized but fairly general solution to the interesting problem of efficiently accessing and storing data on rotating magnetic storage, while at the same time presenting a programming model that's at just the right level for the business applications programmer. Relational theory or no, linked tables are typically conceptualized as a slight formalization of the spreadsheet, or (in earlier times) stacks of punched cards. As business computers evolved from more specific machines that could perform some relational operations on punched cards (sort, select, etc.), I think it's still sort of stuck in the collective unconscious of business to want to model their data this way. I agree with you on that one. The punch cards history is well visible in the fact that in IBM-mainframes, many files have a width of 80 chars, which just happens to be the amount of characters you could save on a punch card. And, yes, tables are often thought of as a deck of index cards, something you might have had in the past. I think relational theory is useful primarily to database implementers, students, and those few application developers who are after a deeply theoretical understanding of their tools. They're probably the ones reading this list. I suppose MV and other non-SQL data stores have their place in a certain niches (embedded systems, etc.), but the business world has already voted with it's feet. What I sense is a longing for a unified environment, something that SQL + [your app programming environment] does not provide.at the moment. Hence the affection to Pick and other niche environments? Lauri ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Paul Vernon [EMAIL PROTECTED] writes No, I think Anthony is just saying that he doesn't believe in science/the scientific method. Or maybe he believes that engineering is not based on scientific knowledge! Actually, I *DO* believe in the Scientific Method. I just fail to see the connection between Scientific Method and Relational. The former is Science, the latter is Maths. Please tell me how I can use relational theory to predict the future. Without that, relational is unprovable, and hence unscientific. Note I didn't say relational is *incorrect* - the ideas of mathematically correct and scientifically provable are orthogonal, and have nothing to say about each other. Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Dreaming About Redesigning SQL
Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Bob Badour wrote: Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Depending on block size, by clustering the three tables together, one might get all of those rows for a single read potentially improving on Wol's numbers by a factor of eight or more for this one query. Of course, doing so would increase the cost of a table scan on the customer table. Which DBMS'es support clustering of mutiple tables except for Oracle? I don't know. Why would it matter? Is this feature really used any more? If one has a hard performance requirement that only clustering can meet, one will use it. I thought it was more trouble than worth. All physical structures will bias performance for some operations and against others. In general, increasing the cost of customer scans will be sufficiently unpleasant to make clustering customers with orders undesirable. However, if one chooses to consider only one physical arrangement and one operations, as Wol is wont to do, I observe we can outperform his product by a factor of eight. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes So in your opinion, is the problem 1) SQL is so hard that the average programmer will not know how to use it efficiently Nope or 2) Relational (or SQL-) DBMS'es are just too slow Yes. If 2) then why don't we get a bit more concrete. Could you give an example of a query that in your experience would be too slow using a standard SQL database (e.g. Oracle, or MySQL). We could then actually try it out on some machine and compare. I suggest using the customer-order-order_detail-product database Okay. Give me a FORMULA that returns a time in seconds for your query. Let's assume I want to print a statement of how many invoices were sent to a customer, along with various details of those invoices. My invoice file is indexed by company/month, and we can reasonably assume that the time taken to produce the statement is infinitesimal compared to the time taken to retrieve the invoice data from disk. For MV T = (2 + N) * ST * 1.05 Where T is the time taken to produce the report, N is the number of invoices, and ST is the hard disk seek time. First of all it is important to note that an important component of all modern SQL-DBMS's is the buffer pool (or cache) meaning that in a reasonably well tuned database you get very few disk I/O's, even when *writing* data into tables. SQL-DBMS's also are very clever at using indexes, i.e. if they can find all necessary data from an index it will not even look at the table, so to speak. And, even when presuming conservatively that there is no data in cache, depending on how the data is clustered, you will get more than one row/disk read (= 8K in most(?) systems). So, assuming the (simplified) example Customer(cust_id, .) Order(order_id, cust_id,...) OrderDetail(order_id, prod_id, ... Product(prod_id,) If you created a clustering index on Customer(cust_id) Order(cust_id) OrderDetail(order_id) And presumed that the average length of customer = 1K order=500 orderDetail=300 You would get, with 3 I/O's - 8 customer rows - 16 order rows - 24 order detail rows (which would only apply to one order) so, granted, that would result in one I/O per order which is more than in your example. I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Now the amout of I/O's would depend on how many detail rows we have per customer. And, of course, because we are using sequential prefetch, we would be getting more than one I/O block (8?, 16?) per disk seek, so it's a hard comparison to make but I suspect that it would about equal your example. Now, that was a *conservative* estimate, and we assumed that we did not have any rows lying around in the (global!) cache. As the size of the cache grows in proportion to the size of the total database we can assume less and less disk I/O. Note also that the cache can be configured many ways, you can put different tables (or indexes) in different caches, and even change the size of the cache on the fly (you might want a bigger cache during evening and night when your batch programs are running) so you can rig your system to favour certain types of queries. I havn't even gone into the topic of using thick indexes so table access can be totally avoided (=we are reading into memory only interesting columns). Now, in your example, what if the product department comes along and wants to make a report with sales / product? What would be your formula in that case? And: what if I was just reading customer-data. Would the same formula apply (= (2+N)*ST*1.05)? But as I understand relational theory, such a question is completely outside the scope of the theory. Seeing as it tries to divorce the database logic from the practical implementation ... The theory, indeed, does not say anything about buffer pools, but by decoupling logic from implementation we leave the implementor (DBMS) to do as it feels fit to do. As DBMS technology advances, we get faster systems without having to change our programs. When we design databases we can decouple logical planning from performance considerations, which, you must agree, are two separate issues. And you know it's been proven that Huffman coding is the most efficient compression algorithm? (Actually, it isn't - it's been proven it can't be improved upon, which isn't the same thing...). Can you improve on the formula I've just given you? Given that if we could change the 1.05 to 1 then we can prove it can't be improved upon ... again - I've taken the liberty of assuming that a MV FILE is equivalent to an entity if we assume the relational designer has been thinking in an entity-attribute- relation sort of way. My
Re: [HACKERS] Dreaming About Redesigning SQL
Bob Badour wrote: Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Depending on block size, by clustering the three tables together, one might get all of those rows for a single read potentially improving on Wol's numbers by a factor of eight or more for this one query. Of course, doing so would increase the cost of a table scan on the customer table. Which DBMS'es support clustering of mutiple tables except for Oracle? Is this feature really used any more? I thought it was more trouble than worth. regards, Lauri ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
Bob Badour wrote: Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Bob Badour wrote: Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Depending on block size, by clustering the three tables together, one might get all of those rows for a single read potentially improving on Wol's numbers by a factor of eight or more for this one query. Of course, doing so would increase the cost of a table scan on the customer table. Which DBMS'es support clustering of mutiple tables except for Oracle? I don't know. Why would it matter? Just curious... Is this feature really used any more? If one has a hard performance requirement that only clustering can meet, one will use it. OK I thought it was more trouble than worth. All physical structures will bias performance for some operations and against others. In general, increasing the cost of customer scans will be sufficiently unpleasant to make clustering customers with orders undesirable. However, if one chooses to consider only one physical arrangement and one operations, as Wol is wont to do, I observe we can outperform his product by a factor of eight. OK, right... Lauri ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Dreaming About Redesigning SQL
Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes So in your opinion, is the problem 1) SQL is so hard that the average programmer will not know how to use it efficiently Nope or 2) Relational (or SQL-) DBMS'es are just too slow Yes. If 2) then why don't we get a bit more concrete. Could you give an example of a query that in your experience would be too slow using a standard SQL database (e.g. Oracle, or MySQL). We could then actually try it out on some machine and compare. I suggest using the customer-order-order_detail-product database Okay. Give me a FORMULA that returns a time in seconds for your query. Let's assume I want to print a statement of how many invoices were sent to a customer, along with various details of those invoices. My invoice file is indexed by company/month, and we can reasonably assume that the time taken to produce the statement is infinitesimal compared to the time taken to retrieve the invoice data from disk. For MV T = (2 + N) * ST * 1.05 Where T is the time taken to produce the report, N is the number of invoices, and ST is the hard disk seek time. First of all it is important to note that an important component of all modern SQL-DBMS's is the buffer pool (or cache) meaning that in a reasonably well tuned database you get very few disk I/O's, even when *writing* data into tables. SQL-DBMS's also are very clever at using indexes, i.e. if they can find all necessary data from an index it will not even look at the table, so to speak. And, even when presuming conservatively that there is no data in cache, depending on how the data is clustered, you will get more than one row/disk read (= 8K in most(?) systems). So, assuming the (simplified) example Customer(cust_id, .) Order(order_id, cust_id,...) OrderDetail(order_id, prod_id, ... Product(prod_id,) If you created a clustering index on Customer(cust_id) Order(cust_id) OrderDetail(order_id) And presumed that the average length of customer = 1K order=500 orderDetail=300 You would get, with 3 I/O's - 8 customer rows - 16 order rows - 24 order detail rows (which would only apply to one order) so, granted, that would result in one I/O per order which is more than in your example. I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Depending on block size, by clustering the three tables together, one might get all of those rows for a single read potentially improving on Wol's numbers by a factor of eight or more for this one query. Of course, doing so would increase the cost of a table scan on the customer table. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: Well, as far as we MV'ers are concerned, performance IS a problem with the relational approach. The attitude (as far as I can tell) with relational is to hide the actual DB implementation from the programmers. So it is a design flaw that it is extremely easy for a programmer to do something stupid. And you need a DBA to try and protect the database from the programmers! As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. So in your opinion, is the problem 1) SQL is so hard that the average programmer will not know how to use it efficiently Nope or 2) Relational (or SQL-) DBMS'es are just too slow Yes. If 2) then why don't we get a bit more concrete. Could you give an example of a query that in your experience would be too slow using a standard SQL database (e.g. Oracle, or MySQL). We could then actually try it out on some machine and compare. I suggest using the customer-order-order_detail-product database Okay. Give me a FORMULA that returns a time in seconds for your query. Let's assume I want to print a statement of how many invoices were sent to a customer, along with various details of those invoices. My invoice file is indexed by company/month, and we can reasonably assume that the time taken to produce the statement is infinitesimal compared to the time taken to retrieve the invoice data from disk. For MV T = (2 + N) * ST * 1.05 Where T is the time taken to produce the report, N is the number of invoices, and ST is the hard disk seek time. I've assumed I have to access the company details as well, hence the 2 (1 for company, 1 for the index). I've also assumed that the data isn't cached in RAM, which I think is reasonable if we assume the hardware is being stressed. If 1) I would like to hear some concrete examples. It's 2, so ... But as I understand relational theory, such a question is completely outside the scope of the theory. Seeing as it tries to divorce the database logic from the practical implementation ... And you know it's been proven that Huffman coding is the most efficient compression algorithm? (Actually, it isn't - it's been proven it can't be improved upon, which isn't the same thing...). Can you improve on the formula I've just given you? Given that if we could change the 1.05 to 1 then we can prove it can't be improved upon ... again - I've taken the liberty of assuming that a MV FILE is equivalent to an entity if we assume the relational designer has been thinking in an entity-attribute- relation sort of way. My maths isn't good enough to prove it, but I think it would be pretty easy to prove that accessing data as one and only one complete entity at a time is the most efficient way. best regards, Lauri Pietarinen Looking forward to you coming up with maths that can prove relational can even EQUAL MV :-) Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dreaming About Redesigning SQL
Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anthony W. Youngman wrote: Well, as far as we MV'ers are concerned, performance IS a problem with the relational approach. The attitude (as far as I can tell) with relational is to hide the actual DB implementation from the programmers. So it is a design flaw that it is extremely easy for a programmer to do something stupid. And you need a DBA to try and protect the database from the programmers! As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. So in your opinion, is the problem 1) SQL is so hard that the average programmer will not know how to use it efficiently or 2) Relational (or SQL-) DBMS'es are just too slow No, I think Anthony is just saying that he doesn't believe in science/the scientific method. Or maybe he believes that engineering is not based on scientific knowledge! Think different. Think Engineering, not Maths. And for $DEITY's sake stop going on about science. Unless you can use set theory to predict the future, relational has nothing to do with science ... Regards Paul Vernon Business Intelligence, IBM Global Services ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... [snip] Actually, Bob pointed out ... [snip] Why don't you go and bang your heads together Bob. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Okay. Give me a FORMULA that returns a time in seconds for your query. Let's assume I want to print a statement of how many invoices were sent to a customer, along with various details of those invoices. My invoice file is indexed by company/month, and we can reasonably assume that the time taken to produce the statement is infinitesimal compared to the time taken to retrieve the invoice data from disk. For MV T = (2 + N) * ST * 1.05 Where T is the time taken to produce the report, N is the number of invoices, and ST is the hard disk seek time. First of all it is important to note that an important component of all modern SQL-DBMS's is the buffer pool (or cache) meaning that in a reasonably well tuned database you get very few disk I/O's, even when *writing* data into tables. Fine. But MV *doesn't* *need* much of a cache. Let's assume both SQL and MV have the same amount of RAM to cache in - i.e. *not* *much*. I did say the spec said extract maximum performance from the hardware available. You're assuming that you can throw hardware at the problem - fine, but that's not always possible. You might have already maxed out the ram, you might have a huge database, you might be sharing your db server with other programs (BIND really likes to chew up every available drop of ram, doesn't it :-). I'm not saying that you shouldn't throw hardware at it, but what if you can't? SQL-DBMS's also are very clever at using indexes, i.e. if they can find all necessary data from an index it will not even look at the table, so to speak. Same with MV And, even when presuming conservatively that there is no data in cache, depending on how the data is clustered, you will get more than one row/disk read (= 8K in most(?) systems). Same with MV So, assuming the (simplified) example Customer(cust_id, .) Order(order_id, cust_id,...) OrderDetail(order_id, prod_id, ... Product(prod_id,) If you created a clustering index on Customer(cust_id) Order(cust_id) OrderDetail(order_id) And presumed that the average length of customer = 1K order=500 orderDetail=300 You would get, with 3 I/O's - 8 customer rows - 16 order rows - 24 order detail rows (which would only apply to one order) so, granted, that would result in one I/O per order which is more than in your example. I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Now the amout of I/O's would depend on how many detail rows we have per customer. And, of course, because we are using sequential prefetch, we would be getting more than one I/O block (8?, 16?) per disk seek, so it's a hard comparison to make but I suspect that it would about equal your example. Except my example was an *average* case, and yours is a *best* case. Oh, and my data is still normalised - I haven't had to denormalise it! AND I haven't run an optimiser over it :-) Now, that was a *conservative* estimate, and we assumed that we did not have any rows lying around in the (global!) cache. As the size of the cache grows in proportion to the size of the total database we can assume less and less disk I/O. You're relying on the hardware to bale you out :-) We can do the same! Note also that the cache can be configured many ways, you can put different tables (or indexes) in different caches, and even change the size of the cache on the fly (you might want a bigger cache during evening and night when your batch programs are running) so you can rig your system to favour certain types of queries. I havn't even gone into the topic of using thick indexes so table access can be totally avoided (=we are reading into memory only interesting columns). Now, in your example, what if the product department comes along and wants to make a report with sales / product? What would be your formula in that case? I'm not quite sure what you're trying to do. I'll assume you want a report of all invoices which refer to a given product. Assuming I've got the relevant indices defined, I can simply read a list of invoices from the product code index, a second list of invoices from the month index, and do an intersect of the two lists. So again, T = (2+N) * ST * 1.05 where N is the number of invoices that reference that product. And now ALL the invoice data has been retrieved from disk to ram ... And: what if I was just reading customer-data. Would the same formula apply (= (2+N)*ST*1.05)? Nope. If I understand you correctly, you want attributes that belong to the entity customer, not the entity invoice. T = ST * 1.05. (By the way, billing and/or invoice address (for example) are invoice attributes, not company attributes.) But as I understand relational theory, such a
Re: [HACKERS] Dreaming About Redesigning SQL
Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Okay. Give me a FORMULA that returns a time in seconds for your query. Let's assume I want to print a statement of how many invoices were sent to a customer, along with various details of those invoices. My invoice file is indexed by company/month, and we can reasonably assume that the time taken to produce the statement is infinitesimal compared to the time taken to retrieve the invoice data from disk. For MV T = (2 + N) * ST * 1.05 Where T is the time taken to produce the report, N is the number of invoices, and ST is the hard disk seek time. First of all it is important to note that an important component of all modern SQL-DBMS's is the buffer pool (or cache) meaning that in a reasonably well tuned database you get very few disk I/O's, even when *writing* data into tables. Fine. But MV *doesn't* *need* much of a cache. Let's assume both SQL and MV have the same amount of RAM to cache in - i.e. *not* *much*. I did say the spec said extract maximum performance from the hardware available. So what's wrong with gettng a machine with lots of memory? How much does 2G of memory for an Intel-box cost now a days? Is this some kind of new ultimate sport, trying to get along with as little memory as possible? You're assuming that you can throw hardware at the problem - fine, but that's not always possible. You might have already maxed out the ram, you might have a huge database, you might be sharing your db server with other programs (BIND really likes to chew up every available drop of ram, doesn't it :-). I'm not saying that you shouldn't throw hardware at it, but what if you can't? SQL-DBMS's also are very clever at using indexes, i.e. if they can find all necessary data from an index it will not even look at the table, so to speak. Same with MV And, even when presuming conservatively that there is no data in cache, depending on how the data is clustered, you will get more than one row/disk read (= 8K in most(?) systems). Same with MV I could now denormalise OrderDetail so that it contains cust_id also and cluster by cust_id (might cause you trouble down the road, if you can change the customer of an order), in which case, with 3 I/O's I would get - 8 customer rows - 16 order rows - 24 order detail rows (which would all apply to one customer) Now the amout of I/O's would depend on how many detail rows we have per customer. And, of course, because we are using sequential prefetch, we would be getting more than one I/O block (8?, 16?) per disk seek, so it's a hard comparison to make but I suspect that it would about equal your example. Except my example was an *average* case, and yours is a *best* case. Oh, and my data is still normalised - I haven't had to denormalise it! AND I haven't run an optimiser over it :-) Are you hiding your optimiser behind the curtain? ;-) Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, if we stick to your example and even if we don't normalise using e.g. clustering features of Oracle, as Bob pointed out, we are getting at most the same number of I/O's. So, answer to your question: our formula is at least as good as yours. Actually, Bob pointed out we are getting at most 12.5% as many disk head movements or I/O's. I'll take an 87.5% improvement any day. Now, that was a *conservative* estimate, and we assumed that we did not have any rows lying around in the (global!) cache. As the size of the cache grows in proportion to the size of the total database we can assume less and less disk I/O. You're relying on the hardware to bale you out :-) We can do the same! Well why don't you? We achieved 8 times the performance with exactly the same hardware. What the hell is this idiot talking about us relying on hardware? He is a moron. You will do everyone a favour if you just bounce him off the bottom of your killfile. Note also that the cache can be configured many ways, you can put different tables (or indexes) in different caches, and even change the size of the cache on the fly (you might want a bigger cache during evening and night when your batch programs are running) so you can rig your system to favour certain types of queries. I havn't even gone into the topic of using thick indexes so table access can be totally avoided (=we are reading into memory only interesting columns). Now, in your example, what if the product department comes along and wants to make a report with sales / product? What would be your formula in that case? I'm not quite sure what you're trying to do. I'll assume you want a report of all invoices which refer to a given product. Assuming I've got the relevant indices defined, I
Re: [HACKERS] Dreaming About Redesigning SQL
Bob [EMAIL PROTECTED] spewed forth... We achieved 8 times the performance with exactly the same hardware. What the hell is this idiot talking about us relying on hardware? He is a moron. You will do everyone a favour if you just bounce him off the bottom of your killfile. ... [profanity-laced responses deleted] ... It's about time to take this discussion private, guys. It has long since stopped being on-topic to hacking PostgreSql internals. Darren ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Dawn M. Wolthuis wrote: So, nope, I'm not trolling. I've been doing some research the past couple of years and I'm convinced that it is time to do something new (and yet old) with data persistence. Perhaps. But before you go down that road, you have to answer the following simple, yet possibly difficult-to-answer, question: What problem are you trying to solve? Data persistence is far too vague a term to be meaningful in its own right -- the term needs some context to have any real meaning here. We store data for a reason. Similarly, we retrieve it for a reason. The data we're interested in looking for and the way we are interested in looking for it will have a huge impact on any data retrieval solution one could craft. The relational model of data storage and retrieval is just a model, highly suitable to some things and not suitable at all to others. The amount of development work that has gone into it and the amount of use it has gotten shows that the relational model is actually reasonably good at meeting many of the data storage and retrieval needs that people have. As with any method, it has tradeoffs and drawbacks, There is no magic bullet and there never will be (or so experience says). I have no reason to believe that the problem of data persistence and retrieval is any exception to that. If you have a particular set of data retrieval problems in mind that you wish to solve, by all means feel free to develop the mathematical foundation to solve them. Feel free to tell us that the relational model is not suitable for that set of problems -- we might even agree with you on that. But don't make the claim that the relational model is lacking as a result of not being a storage and retrieval method that is suitable to all problems, and that there is a Better Way that will Solve Everything. Many have made such claims about many different technologies. They were wrong, too. I may be misreading you and responding to arguments you aren't making or implying, but if so I doubt I'm alone, based on some of the other responses I've seen here. By the way, language is only a means of expression, and the only sort of question (relevant to this discussion, anyway) that a language is the answer to is what's the best way for me to express X?. It is neither an answer to the question of how to retrieve data nor is it a solution to the problem of storing data in a persistent manner. The answer to the question of how best to query data is certainly going to be a language, but the specific language one comes up with in answer to the question will depend on what the person asking wants. English is likely to be the best answer only under certain circumstances. SQL is likely to be the best answer (or, at least, a very good answer) only under other circumstances. It just depends. But as with any solution to any problem, there is no one-size-fits-all solution. As a mathematician, you should know this: the English language is horribly inadequate for expressing mathematical concepts. That's why mathematicians don't use it as their primary language for doing math. Why, then, should we expect English, or Java, or any other language to be any better for performing certain kinds of queries against data than some other, more directed language? Say what you want about SQL, but at least it was designed with querying table-based data in mind and is at least somewhat good at its job. -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
On Mon, 2003-10-20 at 13:50, Anthony W. Youngman wrote: Note I didn't say relational is *incorrect* - the ideas of mathematically correct and scientifically provable are orthogonal, and have nothing to say about each other. Eh? Mathematical and Scientific reasoning (more correctly: axiomatic and non-axiomatic reasoning, respectively) are the same thing. Any apparent differences such that we can even make a distinction is the result of differences in relative system sizes (in terms of Kolmogorov complexity) in practice. If you think they are orthogonal, you don't understand the nature of this particular beast. Cheers, -James Rogers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] All physical structures will bias performance for some operations and against others. This strikes me as a succinct statement of the value of data independence. One has the option (but not the requirement) to adjust the physical structures the DBMS uses while keeping the logical model (and therefor all application code and queries, etc.) unchanged. Unless one has data independence, one does not have this option; one will be locked into a particular performance model. This is why I found the MV guy's obvious pleasure at being able to precisely describe the performance model for his DB as odd: I thought it a deficit to be able to say what it was; he thought it an asset. Marshall PS. This is nothing you don't know, Bob; just a comment for the group. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
Lauri Pietarinen [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: Fine. But MV *doesn't* *need* much of a cache. Let's assume both SQL and MV have the same amount of RAM to cache in - i.e. *not* *much*. I did say the spec said extract maximum performance from the hardware available. So what's wrong with gettng a machine with lots of memory? How much does 2G of memory for an Intel-box cost now a days? Is this some kind of new ultimate sport, trying to get along with as little memory as possible? I presume you didn't read the bit below ... what if you have SEVERAL tables, and EACH of them is a gigabyte or two in size? OK, I get your point. Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row And I presume the system will automatically move all related stuff (order details etc.) into the same block as the new customer? How long will that take? What if there is no room for it there? if we stick to your example and even if we don't normalise using e.g. clustering features of Oracle, as Bob pointed out, we are getting at most the same number of I/O's. So, answer to your question: our formula is at least as good as yours. Except I think Bob said we could optimise to favour *certain* transactions. Exactly. This is as true for Pick as it is for any other file processor. I think actually ANY transaction benefits. Wol thinks a lot of things that are just plain wrong. That's inherent to his ignorance and his stupidity. You're relying on stuff that's outwith your theory, we're relying on stuff that's inherent to our model. I am relying on reality, and Wol relies on fantasy. In his mind, he is right and nothing will ever change his mind. That certainly is not true. The theory says NOTHING about how data should be arranged on disk. You are talking about how modern SQL-databases behave. No, he isn't. Wol doesn't even know how modern SQL-databases really behave. He is talking about nothing but his own imagined prejudices. The DBMS is at liberty to do whatever it pleases with the data, even save it in a PICK database. Hey, wadda you think? Would that be a good idea? We get to keep our SQL but with the speed of PICK ;-) Now, that was a *conservative* estimate, and we assumed that we did not have any rows lying around in the (global!) cache. As the size of the cache grows in proportion to the size of the total database we can assume less and less disk I/O. You're relying on the hardware to bale you out :-) We can do the same! Well why don't you? We let the hardware help us out if it can. There's a big difference. If you can't get the hardware, you're stuffed. We don't need it, so while we may have a hard time of it it's nowhere near as bad for us. And again, relational separates the physical from the logical. You're being hypocritical if you call upon the physical representation to help out with the (speed of the) logical presentation. My goodness, no I'm not! Its the same as claiming that if you have a drawing for a house, you have to make that house out of paper?!? Don't you see? Wol is an ignorant moron. You will save a lot of bandwidth if you simply ignore the idiot. I want a list with all products with corresponding total sales, read from order detail e.g. Hammer 1$ Nail 5000$ Screw 1200$ How many disk reads (or head movements)? Actually, probably the same as you here. If we're indexed on order detail. If Hammer appears in N invoices, then T = (1+N) * ST * 1.05 for hammers, and the same for all the other products. Theory favours us, in that if a product appears X times in one invoice, that's one read for us and X for you No, theory does not favour Wol's product at all. Making ignorant and stupid assertions about how many reads are required for a relational dbms does not affect the actual number of reads required. Wol is an ignorant moron. No matter how many times you explain these points to him, he will remain convinced of Pick's mythical performance advantage. , but hardware will probably help you more than us (that is, assuming thrashing cuts in) in that you stand a marginally higher chance of getting multiple instances of a product in any given read. So for each product you get T = (1+N) * ST * 1.05. Now, for our SQL-DBMS, presuming that we build indexes for detail and product: order_detail(product_id, qty, unit_price) = 20 bytes/row product(product_id, product_name) = 50 bytes/row With 2 disk reads I would get 8K/20 = 400 order detail rows and 8K/50 = 160 product rows Since all rows are in product_id order, no need for random disk
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] As soon as a requirement for a database specifies extraction of the maximum power from the box, I don't for a second believe that this is your only requirement, or that this is even an actual requirement. If it really is an actual requirement, then I assume you're writing all of your code in hand-tuned assembly language, and that the document you consult most regularly when writing code is the CPU's instruction timing table. Another commodity box costs $1000, which is about the same as the cost to a company of a day's programmer time. What *really* matters is getting software delivered in a timely fashion, that is as correct as possible, and that will operate reliably over time and not cause data corruption. Marshall ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots The academics (presumably you mean Tannenbaum et al) never claimed that monolithic kernels could not obtain market acceptance; they never said anything about market acceptance. Instead, they had identified a number of weaknesses of monolithic kernels and pointed out that a microkernel architecture didn't suffer from these problems. Certainly the monolithic kernel is easier to implement. Linus set out to build a unix kernel workalike, and he chose the easiest path, copying architecture from the 1970s, along with all the weaknesses that those idiot academics had identified years earlier. Since then, his monolithic kernel has gotten a lot of marketshare, due to a number of different reasons, none of them being architectural superiority. Marshall ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman wrote: In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: Fine. But MV *doesn't* *need* much of a cache. Let's assume both SQL and MV have the same amount of RAM to cache in - i.e. *not* *much*. I did say the spec said extract maximum performance from the hardware available. So what's wrong with gettng a machine with lots of memory? How much does 2G of memory for an Intel-box cost now a days? Is this some kind of new ultimate sport, trying to get along with as little memory as possible? I presume you didn't read the bit below ... what if you have SEVERAL tables, and EACH of them is a gigabyte or two in size? OK, I get your point. Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row And I presume the system will automatically move all related stuff (order details etc.) into the same block as the new customer? How long will that take? What if there is no room for it there? if we stick to your example and even if we don't normalise using e.g. clustering features of Oracle, as Bob pointed out, we are getting at most the same number of I/O's. So, answer to your question: our formula is at least as good as yours. Except I think Bob said we could optimise to favour *certain* transactions. I think actually ANY transaction benefits. You're relying on stuff that's outwith your theory, we're relying on stuff that's inherent to our model. That certainly is not true. The theory says NOTHING about how data should be arranged on disk. You are talking about how modern SQL-databases behave. The DBMS is at liberty to do whatever it pleases with the data, even save it in a PICK database. Hey, wadda you think? Would that be a good idea? We get to keep our SQL but with the speed of PICK ;-) Now, that was a *conservative* estimate, and we assumed that we did not have any rows lying around in the (global!) cache. As the size of the cache grows in proportion to the size of the total database we can assume less and less disk I/O. You're relying on the hardware to bale you out :-) We can do the same! Well why don't you? We let the hardware help us out if it can. There's a big difference. If you can't get the hardware, you're stuffed. We don't need it, so while we may have a hard time of it it's nowhere near as bad for us. And again, relational separates the physical from the logical. You're being hypocritical if you call upon the physical representation to help out with the (speed of the) logical presentation. My goodness, no I'm not! Its the same as claiming that if you have a drawing for a house, you have to make that house out of paper?!? I want a list with all products with corresponding total sales, read from order detail e.g. Hammer 1$ Nail 5000$ Screw 1200$ How many disk reads (or head movements)? Actually, probably the same as you here. If we're indexed on order detail. If Hammer appears in N invoices, then T = (1+N) * ST * 1.05 for hammers, and the same for all the other products. Theory favours us, in that if a product appears X times in one invoice, that's one read for us and X for you, but hardware will probably help you more than us (that is, assuming thrashing cuts in) in that you stand a marginally higher chance of getting multiple instances of a product in any given read. So for each product you get T = (1+N) * ST * 1.05. Now, for our SQL-DBMS, presuming that we build indexes for detail and product: order_detail(product_id, qty, unit_price) = 20 bytes/row product(product_id, product_name) = 50 bytes/row With 2 disk reads I would get 8K/20 = 400 order detail rows and 8K/50 = 160 product rows Since all rows are in product_id order, no need for random disk reads so T = 1 + N/400 + P/160 (N=number of details, P=number of products) for ALL products and details. And, because of sequential prefetch, we probably would not have to wait for I/O's at all. Really, however you calculate it, it is an order of magnitude less than your alternative. And please don't tell me that using indexes is not fair or not in the spirit of the relational model ;-) And: what if I was just reading customer-data. Would the same formula apply (= (2+N)*ST*1.05)? Nope. If I understand you correctly, you want attributes that belong to the entity customer, not the entity invoice. T = ST * 1.05. (By the way, billing and/or invoice address (for example) are invoice attributes, not company attributes.) No, I want you to give me a list of all your customers. How many disk reads? T = N * 1.05 where N is the number of customers. What do you want to know about those customers? Address? Phone number*s*? Anything else? That's *all* at no extra cost. Well, no thanks. I just
Re: [HACKERS] Dreaming About Redesigning SQL
Marshall Spight [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] All physical structures will bias performance for some operations and against others. This strikes me as a succinct statement of the value of data independence. One has the option (but not the requirement) to adjust the physical structures the DBMS uses while keeping the logical model (and therefor all application code and queries, etc.) unchanged. Unless one has data independence, one does not have this option; one will be locked into a particular performance model. This is why I found the MV guy's obvious pleasure at being able to precisely describe the performance model for his DB as odd: I thought it a deficit to be able to say what it was; he thought it an asset. It becomes an obvious deficit as soon as he needs to improve upon the performance for some operation and he has no way to do it. Thus, he lacks the option to gain the factor of eight improvement for the first query offered by clustering. Marshall PS. This is nothing you don't know, Bob; just a comment for the group. Of course. Likewise. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lauri Pietarinen [EMAIL PROTECTED] writes Anthony W. Youngman wrote: Fine. But MV *doesn't* *need* much of a cache. Let's assume both SQL and MV have the same amount of RAM to cache in - i.e. *not* *much*. I did say the spec said extract maximum performance from the hardware available. So what's wrong with gettng a machine with lots of memory? How much does 2G of memory for an Intel-box cost now a days? Is this some kind of new ultimate sport, trying to get along with as little memory as possible? I presume you didn't read the bit below ... what if you have SEVERAL tables, and EACH of them is a gigabyte or two in size? If an engineer has a problem, throwing brute force at it is rarely the solution. Let's be topical (near enough) and look at the Titanic (seeing as there was this film recently). If they'd forseen the problem, they could have thrown brute force at it and doubled the thickness of the steel plate. Except she would have then sunk when they launched her, before she even had a chance to hit the iceberg. Or look at aviation - especially in the early years. They had gliders that could fly, and they had engines that could easily provide the power to get a glider airborne. The problem was, every time they increased the power of the engine they got *further* *away* from the possibility of powered flight, because the increased power came at the price of increased weight. You're welcome to live in your mathematical world where power can be gained for no cost, but that doesn't work in the real world. And the cost isn't necessarily dollars. Like in the aircraft example, the cost could be a case of sorry, technology ain't that advanced yet mate! You're assuming that you can throw hardware at the problem - fine, but that's not always possible. You might have already maxed out the ram, you might have a huge database, you might be sharing your db server with other programs (BIND really likes to chew up every available drop of ram, doesn't it :-). I'm not saying that you shouldn't throw hardware at it, but what if you can't? Except my example was an *average* case, and yours is a *best* case. Oh, and my data is still normalised - I haven't had to denormalise it! AND I haven't run an optimiser over it :-) Are you hiding your optimiser behind the curtain? ;-) Well, if you include getting optimisation for free because that's the way things work, maybe I am ;-) Well, if it is normalised, how easy is it for you to change the customer_id of an order? Anyway, Incredibly easy. Just update the customer_id field of the invoice record. A single change to a single row if we stick to your example and even if we don't normalise using e.g. clustering features of Oracle, as Bob pointed out, we are getting at most the same number of I/O's. So, answer to your question: our formula is at least as good as yours. Except I think Bob said we could optimise to favour *certain* transactions. I think actually ANY transaction benefits. You're relying on stuff that's outwith your theory, we're relying on stuff that's inherent to our model. Now, that was a *conservative* estimate, and we assumed that we did not have any rows lying around in the (global!) cache. As the size of the cache grows in proportion to the size of the total database we can assume less and less disk I/O. You're relying on the hardware to bale you out :-) We can do the same! Well why don't you? We let the hardware help us out if it can. There's a big difference. If you can't get the hardware, you're stuffed. We don't need it, so while we may have a hard time of it it's nowhere near as bad for us. And again, relational separates the physical from the logical. You're being hypocritical if you call upon the physical representation to help out with the (speed of the) logical presentation. Note also that the cache can be configured many ways, you can put different tables (or indexes) in different caches, and even change the size of the cache on the fly (you might want a bigger cache during evening and night when your batch programs are running) so you can rig your system to favour certain types of queries. I havn't even gone into the topic of using thick indexes so table access can be totally avoided (=we are reading into memory only interesting columns). Now, in your example, what if the product department comes along and wants to make a report with sales / product? What would be your formula in that case? I'm not quite sure what you're trying to do. I'll assume you want a report of all invoices which refer to a given product. Assuming I've got the relevant indices defined, I can simply read a list of invoices from the product code index, a second list of invoices from the month index, and do an intersect of the two lists. I want a list with all products with corresponding total sales, read from order detail e.g. Hammer 1$ Nail 5000$ Screw 1200$ How many disk reads
Re: [HACKERS] Dreaming About Redesigning SQL
Marshall Spight kirjutas N, 23.10.2003 kell 11:01: Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots ... Linus set out to build a unix kernel workalike, and he chose the easiest path, copying architecture from the 1970s, along with all the weaknesses that those idiot academics had identified years earlier. Since then, his monolithic kernel has gotten a lot of marketshare, due to a number of different reasons, none of them being architectural superiority. Unless you count as architectural superiority the fact that it can be actually written and debugged in a reasonable time. Being able to mathematically define something as not having certain weaknesses does not quarantee that the thing can be actually implemented and/or is usable. -- Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
Wol, I think one MAJOR problem is that most (if not all) MV practitioners are not formally qualified in computing ... snip Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. And if that means pinching all the best ideas we can find from relational, then we're engineers - of course we'll do it :-) So what you're saying is that you use MV databases becuase you find them easier to understand and set up than relational databases. Sure. For that matter, spreadsheets are easier to understand and set up than relational databases -- or MV databases, for that matter. I've no problem with the idea that MV databases are easier for the neophyte to understand. But you seem to be making this bizarre logical leap that easier to understand without training == technically superior. And I noticed that you have completely backed off from your assertion that there was some kind of MV database theory without ever addressing it. As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. Well, frankly, no. You're confusing your floundering because you find it hard to grasp relational database design and/or have a badly designed database with a performance problem. There are several people in our community running large genetics databases on PostgreSQL, using good relational design, and doing well with them. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. And hope to the gods that nobody pulls the power cord. Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. And if that means pinching all the best ideas we can find from relational, then we're engineers - of course we'll do it :-) Relational is all about preserving the *long-term* integrity, utility, and accessability of your data. MV is all about getting an expedient result immediately and to heck with the future. Read some of E.F. Codd's original papers ... he was not concerned with expanding some obscure batch of mathematics, he was concerned with the problems of data that wasn't clean, or wasn't the same when you queried it twice, or data with rules that changed eraticaly over time, and wasn't portable at all. These are real, engineering problems that needed solving and relational theory solved it. . Unless you can use set theory to predict the future, relational has nothing to do with science ... I wasn't aware that clairvoyance was a tenet of the scientific method. Science is about reproducing reliable results, which is also what relational theory is about. To conclude: Your entire advocacy of Pick can be summed up as I like Pick and find it easy to program, therefore it is superior. Well, Wol, that makes it superior for *you* but not for the rest of us. If you want to use Pick and not PostgreSQL, go for it; but if you barge in here and try to convince us that Pick is superior based strictly on your say-so, you've become one of the crazy preachers on Sproul Plaza. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Josh == Josh Berkus [EMAIL PROTECTED] writes: Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. And if that means pinching all the best ideas we can find from relational, then we're engineers - of course we'll do it :-) Josh Relational is all about preserving the *long-term* Josh integrity, utility, and accessability of your data. MV is Josh all about getting an expedient result immediately and to Josh heck with the future. To emphasize the *long-term* - relational is all about separating physical database representation from a logical view of the data. Josh, I'll be happy to meet at Intermezzo - after Nov 24 perhaps. I have a conference deadline .. the papers that we write just to secure funding - your tax dollars at work. Long live the NSF ! BTW, I'm not necessarily that far from your views. There is, however, more to an XML database than just storing data - relational databases do just fine there. The tools that are used to build database systems have other uses - for instance XML message brokers. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Christopher Browne [EMAIL PROTECTED] writes How do you know it works? Without the theory and model, you really do not. And don't other databases have both theory and model? It's just that all the academics have been brainwashed into thinking this is true only for relational, so that's what they teach to everyone else, and the end result is that all research is ploughed into a model that may be (I didn't say is) bankrupt. Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots :-) In mathematics as well as in the analysis of computer algorithms, it is typical for someone who is trying to explain something new to try to do so in terms that allow the gentle reader to do as direct a comparison as possible between the things with which they are familiar (e.g. - in this case, relational database theory) and the things with which they are perhaps NOT familiar (e.g. - in this case, MV databases). Nobody seems to have been prepared to explain the MV model in adequate theoretical terms as to allow the gentle readers to compare the theory behind it with the other theories out there. I'm afraid that does not reflect very well on either those lauding MV or those trashing it. I think one MAJOR problem is that most (if not all) MV practitioners are not formally qualified in computing ... for example by education I'm a chemist. And I'm doing postgrad in medical science ... The trouble is - we MV'ers tend to take an engineering approach - we use it because we know it works. To quote you from another post ... When people _don't_ do that thinking differently, we are certain to see hideous performance, and that is neither a SQL issue nor a relational issue. The point is that if they are accessing a big pile of data, they have to think carefully [jumping to that different way of thinking] irrespective of what specific language(s), libraries, or other tools they are using. Well, as far as we MV'ers are concerned, performance IS a problem with the relational approach. The attitude (as far as I can tell) with relational is to hide the actual DB implementation from the programmers. So it is a design flaw that it is extremely easy for a programmer to do something stupid. And you need a DBA to try and protect the database from the programmers! As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. And if that means pinching all the best ideas we can find from relational, then we're engineers - of course we'll do it :-) Think different. Think Engineering, not Maths. And for $DEITY's sake stop going on about science. Unless you can use set theory to predict the future, relational has nothing to do with science ... Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
-Original Message- From: Lauri Pietarinen [mailto:[EMAIL PROTECTED] Sent: Sunday, October 19, 2003 1:50 PM To: [EMAIL PROTECTED] Subject: Re: [HACKERS] Dreaming About Redesigning SQL Anthony W. Youngman wrote: Well, as far as we MV'ers are concerned, performance IS a problem with the relational approach. The attitude (as far as I can tell) with relational is to hide the actual DB implementation from the programmers. So it is a design flaw that it is extremely easy for a programmer to do something stupid. And you need a DBA to try and protect the database from the programmers! As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. So in your opinion, is the problem 1) SQL is so hard that the average programmer will not know how to use it efficiently or This same average programmer will have the same difficulty with a pure, orthogonal and mathematically rigorous language. 2) Relational (or SQL-) DBMS'es are just too slow If 2) then why don't we get a bit more concrete. Could you give an example of a query that in your experience would be too slow using a standard SQL database (e.g. Oracle, or MySQL). We could then actually try it out on some machine and compare. I suggest using the customer-order-order_detail-product database If 1) I would like to hear some concrete examples. I once worked on a telephone customer support system that had 5 million lines of C++ and 5 million lines of SQL. It was too complex for the average person to comprehend. There are always going to be some queries that are too slow. If you put a key on everything, data modifications will go into the toilet. If you lack a key on a field required for a filtered query, you will have a table scan. Hence, there is no free lunch, and there will have to be compromises. However, due to the exponential expansion of compute power, the efficiency of SQL is the least of our worries. See this link, it makes for excellent reading: http://www.kurzweilai.net/articles/art0134.html?printable=1 best regards, Lauri Pietarinen ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman kirjutas P, 19.10.2003 kell 21:24: As soon as a requirement for a database specifies extraction of the maximum power from the box, it OUGHT to rule out all the current relational databases. MV flattens it for it for performance. As an MV programmer, I *KNOW* that I can find any thing I'm looking for (or find out it doesn't exist) with just ONE disk seek. Relational or not, this requires either in-memory index or perfect hash. BTW, how do you find the oldest red elephant with just one disk seek? as in SQL: select from elephants where colour=red order by age desc limit 1; A relational programmer has to ask the db does this exist and hope the db is optimised to be able to return the result quickly. To quote the Pick FAQ SQL optimises the easy task of finding stuff in memory. Pick optimises the hard task of getting it into memory in the first place. SQL by itself optimises nothing: by definition it evaluates full cross products and then compares all rows with predicates. Some SQL implementations do optimse a little ;) Relational is all about theory and proving things mathematically correct. MV is all about engineering and getting the result. Or perhaps just getting _the_ result ;) getting some other result will probably need another MV database ;) Unless you can use set theory to predict the future, Isn't this what PostgreSQL's optimiser does ? -- Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Sailesh Krishnamurthy) writes: Christopher == Christopher Browne [EMAIL PROTECTED] writes: Christopher Ah, but do papers honestly indicate the emergence Christopher of some underlying theoretical model for which Christopher fidelity could be evaluated? Certainly. The model is that of semi-structured data, where often times schema isn't very clearly defined. It's a different model from the relational model - which I'm partial to. There are situations where the XML model does make sense. Christopher Or merely that academics are looking to write Christopher papers on whatever topics can attract research Christopher funding? Bash academics if you want. The truth is that industry is also working on it. As I said before, I have no axe to grind in this. I might be in academia now, but frankly I couldn't give a toss about XML. Fifteen years ago, there were _vital_ things going on in academia; Stonebraker's group was assortedly working on Postgres, Ingres, and Cohera. _Real_ OS research was going on, with active work on such systems as Sprite, BSD, Mach, with _many_ schools building their own OS kernels to explore different edges. The problem is that with the dearth of funding for basic research, academia seems to be getting used as a partly-government-funded development area for the developments that industry hopes might work but where they would rather evade responsibility for the failures. Christopher Half the articles in SIGOS have been about pretend Christopher applications of Java to operating systems; why does Christopher it seem likely that the database academics are any Christopher less affected by this? I think you are looking at the wrong publishing location :-) The premier venue for the OS community are the SOSP and OSDI conferences. Please look at the SOSP04 papers - you'll find fairly good systems work. When the first paragraph of the annual report contains the phrase ... despite the operating systems in our name ... I can't see this boding well for the area of operating systems research being particularly vital. They have had to expand it to some notion of systems that is sufficiently wide open that I'm not sure what it _wouldn't_ admit as relevant. BTW, I'm not sure which database papers you read - the premeer venues for database systems work are the SIGMOD and VLDB conferences. I haven't been subscribing to anything for a couple years now, TODS having pretty much fallen into irrelevance. I suppose I need to look to see if there is something worth following again. Christopher CODASYL had a query system, albeit something that Christopher looked more like assembly language than anything Please take a fair look at the XQuery data model and the XQuery language before comparing it with CODASYL. I will not admit (at least in public :-) to being a big fan of XQuery but that is because of certain details, not anything fundamental. I have used it a bit for extracting data out of SOAP requests; I wasn't terribly impressed with it, finding that with the way it was remapping data, I would commonly have to code my own tree walker to do things _right_. It's not hard to do; when I had to modify my Galeon bookmark walker a few months ago when they changed from a custom XML form to RDF, I was pleasantly surprised to find that most of the changes involved changing names of tags in DEFCONSTANT declarations. (Guess the language :-).) Of course, web bookmark systems are likely to be structured similarly, so that this change would be easy shouldn't be much of a surprise. But when you need to write code, that indicates that there is some weakness to the model... -- let name=cbbrowne and tld=libertyrms.info in String.concat @ [name;tld];; http://dev6.int.libertyrms.com/ Christopher Browne (416) 646 3304 x124 (land) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony, And don't other databases have both theory and model? Actually, no, the new databases do not. The relational model is backed by relational algebra and relational calculus, plus a series of postulates and laws which have been refined and tested over 20 years. Not Object-Oriented databases nor XML databases, nor Multi-Value databases have any body of theory behind them, mathematical or otherwise. I defy you to post a single paper that has a mathematical theory for MV or OODB, or even a firm set of laws that govern such a database. Nor is the industry moving toward developing such a theory; instead the marketeers of commercial OODB and XMLDB use a lot of ink to denigrate the idea of mathematical theory as antiquated and stuffy, and in one case even using their advertising clout to drive critical theorists off the pages of IT magazines (see Fabian Pascal's web page). Actually, amusingly enough, there is a body of theory backing XML databases, but it's not one any current devloper would want to admit to: the XML database is functionally identical to the Network Databases of the 1960's. Of course, Network Databases gave way, historically, to Relational for good reason. And MV databases, despite decades of existence, never developed any theory behind them at all, possibly because one is not possible; MV databases are entirely an ad-hoc creation designed to work around decade-old limits in computer processing. Pick is merely TextMagic revived and put on the web. Now, OODB could certainly *develop* a model and theory, and I think it's high time it did. The Zope project has amply demonstrated the usefulness of OODB for certain applications. But as long as there is no OODB calculus, and no industry-agreed model, and no ANSI standard language or interface, each and every OODB will be 100% incompatible with every other one ... severely limiting their utility. The importance of theory, model, and standards is *not* to be overstated in an industry where every year the industry-favorite commerical databases get more ad-hoc, further from the theory, and more callous in their disregard of international standards. FWIW, I share your dissatisfaction with SQL, but because it's not relational enough rather than the other way around. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Dreaming About Redesigning SQL
Josh == Josh Berkus [EMAIL PROTECTED] writes: This is an unfair characterization of XML databases, and I can say this without accusations of bias for I vastly prefer working with the relational model. Josh Actually, amusingly enough, there is a body of theory Josh backing XML databases, but it's not one any current devloper Josh would want to admit to: the XML database is functionally Josh identical to the Network Databases of the 1960's. Of Josh course, Network Databases gave way, historically, to Josh Relational for good reason. If you look at the academic research work, there have been gazillions of recent papers on XML database technology. All the major database vendors (Oracle, IBM and Microsoft) are investing fairly heavily in core-engine XMLDB technology. Finally, while it is true that some of XML db technology is evocative of network databases, XML databases are certainly more than network databases. For one, they are semi-structure .. in addition they present query language access to their data (although I'm not a big fan of XQuery). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Sailesh Krishnamurthy) transmitted: Josh == Josh Berkus [EMAIL PROTECTED] writes: This is an unfair characterization of XML databases, and I can say this without accusations of bias for I vastly prefer working with the relational model. Josh Actually, amusingly enough, there is a body of theory Josh backing XML databases, but it's not one any current devloper Josh would want to admit to: the XML database is functionally Josh identical to the Network Databases of the 1960's. Of Josh course, Network Databases gave way, historically, to Josh Relational for good reason. If you look at the academic research work, there have been gazillions of recent papers on XML database technology. All the major database vendors (Oracle, IBM and Microsoft) are investing fairly heavily in core-engine XMLDB technology. Ah, but do papers honestly indicate the emergence of some underlying theoretical model for which fidelity could be evaluated? Or merely that academics are looking to write papers on whatever topics can attract research funding? Half the articles in SIGOS have been about pretend applications of Java to operating systems; why does it seem likely that the database academics are any less affected by this? I haven't yet seen a coherent XML theory emerge from the chaos. It's not too dissimilar from the object chaos; the only works I am aware of that try hard to provide theory behind object orientedness are _A Theory of Objects_ by Abadi and Cardelli, and Benjamin Pierce's book, _Types and Programming Languages_. After twenty-odd years of object oriented programming, I find it quite appalling that there is as little theoretical OO literature as there is. Finally, while it is true that some of XML db technology is evocative of network databases, XML databases are certainly more than network databases. For one, they are semi-structure .. in addition they present query language access to their data (although I'm not a big fan of XQuery). CODASYL had a query system, albeit something that looked more like assembly language than anything else... -- output = reverse(gro.gultn @ enworbbc) http://www.ntlug.org/~cbbrowne/nonrdbms.html Rules of the Evil Overlord #63. Bulk trash will be disposed of in incinerators, not compactors. And they will be kept hot, with none of that nonsense about flames going through accessible tunnels at predictable intervals. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dreaming About Redesigning SQL
Sailesh, Warning: I get carried away in this response. I'm afraid that I'm a fond reader of Fabian Pascal and CJ Date, so I have far too much to say on the topic. So if you really care about XML databases, you should probably hold off on reading the rest until you're well-caffinated and in a cheerful frame of mind. Also, let me clarify that there is a distinction between using XML *as a* database, and putting XML documents into databases of other types. I find the latter obvious and sensible, but the former a silly and wrong-headed idea, and it's the pure-XML-database which I attack below. If you want to really have this out, I live in San Francisco and I love to argue. Coffee at Intermezzo? I'll buy. --- If you look at the academic research work, there have been gazillions of recent papers on XML database technology. Point me to one which presents an algebra, calculus, or other mathematical underpinning of XML databases, and I will be happy to eat my words on this list. I can easily find lots of papers using google, but all of them are about *technical implementation* and do not provide a theoretical underpinning for XML databases. A few (such as Dan Suciu's paper) present some theory to back XQuery but it is presented entirely as an XML-based data access extension to SQL ... a role which seems fine to me. Others, even those cited by xmldb.org like the below, have rather lukewarm things to say on the topic, such as David Mertz, PhD: (http://www-106.ibm.com/developerworks/library/x-matters8/index.html) XML is an extremely versatile data transport format, but despite high hopes for it, XML is mediocre to poor as a data storage and access format. ... snip ...XML has no inherent mechanism for enforcing constraints of this sort (DTDs and schemas are constraints of a different, more limited sort). Without constraints, you just have data, not a data model (to slightly oversimplify matters). ... snip ... In other words, go ahead and be excited by XML's promise of a universal data transport mechanism, but keep your backend data on something designed for it, like DB2 or Oracle (or on Postgres or MySQL for smaller-scale systems). And this guy is cited by XMLDB.org? Perhaps not surprising, as among the 5 goals of XMLDB.org, development of a standard theory of XML databases is not present. All the major database vendors (Oracle, IBM and Microsoft) are investing fairly heavily in core-engine XMLDB technology. So? Oracle, IBM and Microsoft also have SQL databases that do a terrible job of upholding the SQL standard, and their (at least Oracle's and Microsoft's) adherence is getting worse with successive versions rather than better. I wouldn't look to them for guidance. If they're spending millions on XML Databases, it's becuase it, however wrong-headed, is a fad and fads mean sales, and they don't want to take a chance on missing out. And these companies have backed plenty of useless technologies before; remember Microsoft's Periodicals on CD? Not that I'm against XML; as far as I'm concerned, for interchangable, searchable, and archival documents, XML is the greatest thing since sliced Beatles. I love XML-RPC for pushing data through HTTP, and I will happily be in the cheering squad for anyone who writes a set of OSS tools to extract data from XML docs stored in a PostgreSQL database, or to automate some-standard-XML-to-relational-data-and-back conversion. That is a good application of XML+Database ideas. XML databases, on the other hand, are an example of taking a good idea too far. XML is a great data transmission tool; it's a great document transformation tool; it's a good way to store documents. It is not, however, a good database. -- -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Christopher == Christopher Browne [EMAIL PROTECTED] writes: Christopher Ah, but do papers honestly indicate the emergence Christopher of some underlying theoretical model for which Christopher fidelity could be evaluated? Certainly. The model is that of semi-structured data, where often times schema isn't very clearly defined. It's a different model from the relational model - which I'm partial to. There are situations where the XML model does make sense. Christopher Or merely that academics are looking to write papers Christopher on whatever topics can attract research funding? Bash academics if you want. The truth is that industry is also working on it. As I said before, I have no axe to grind in this. I might be in academia now, but frankly I couldn't give a toss about XML. Christopher Half the articles in SIGOS have been about pretend Christopher applications of Java to operating systems; why does Christopher it seem likely that the database academics are any Christopher less affected by this? I think you are looking at the wrong publishing location :-) The premier venue for the OS community are the SOSP and OSDI conferences. Please look at the SOSP04 papers - you'll find fairly good systems work. BTW, I'm not sure which database papers you read - the premeer venues for database systems work are the SIGMOD and VLDB conferences. Christopher CODASYL had a query system, albeit something that Christopher looked more like assembly language than anything Please take a fair look at the XQuery data model and the XQuery language before comparing it with CODASYL. I will not admit (at least in public :-) to being a big fan of XQuery but that is because of certain details, not anything fundamental. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dreaming About Redesigning SQL
Christopher Browne [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Quoth Anthony W. Youngman [EMAIL PROTECTED]: In article [EMAIL PROTECTED], Gene Wirchenko [EMAIL PROTECTED] writes [EMAIL PROTECTED] (Seun Osewa) wrote: [snip] Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? How do you know it works? Without the theory and model, you really do not. And don't other databases have both theory and model? It's just that all the academics have been brainwashed into thinking this is true only for relational, so that's what they teach to everyone else, and the end result is that all research is ploughed into a model that may be (I didn't say is) bankrupt. Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots :-) In mathematics as well as in the analysis of computer algorithms, it is typical for someone who is trying to explain something new to try to do so in terms that allow the gentle reader to do as direct a comparison as possible between the things with which they are familiar (e.g. - in this case, relational database theory) and the things with which they are perhaps NOT familiar (e.g. - in this case, MV databases). Nobody seems to have been prepared to explain the MV model in adequate theoretical terms as to allow the gentle readers to compare the theory behind it with the other theories out there. I'm afraid that does not reflect very well on either those lauding MV or those trashing it. - Those lauding it have not made an attempt to show why the theory behind it would support it being preferable to the other models around. I hear some vague Oh, it's not about models; it's about language which doesn't get to the heart of anything. - And all we get from Bob Badour are dismissive sound-bites that _don't_ explain why he should be taken seriously. Indeed, the sharper and shorter he gets, the less credible that gets. There are no pointers to Michael Stonebraker on Why Pick Is Not My Favorite Database. Brian Kernighan felt the issues with Pascal were important enough that he wrote a nice, approachable paper that quite cogently describes the problems with Standard Pascal. http://www.lysator.liu.se/c/bwk-on-pascal.html He nicely summarizes it with 9 points that fit on a sheet of paper. If Bob wanted people to take him really seriously about this, and has done all the research to back up the points that are apparently so obvious to him, then it should surely be _easy_ to write up Nine Reasons Pick Isn't My Favorite Database System. And just as people have been pointing back to Kernighan's paper on Pascal for over 20 years, folks could point back to the Pick essay. But apparently it is much too difficult for anyone to present any _useful_ discourse on it. How many times do I have to repeat the same points? I dislike Pick because it lacks logical identity, confuses the physical and the logical, lacks a robust query language, lacks physical independence, lacks logical independence and damages brains. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
On Fri, 2003-10-17 at 22:52, Christopher Browne wrote: Nobody seems to have been prepared to explain the MV model in adequate theoretical terms as to allow the gentle readers to compare the theory behind it with the other theories out there. I'm not convinced that there was a great deal of theory behind Dick Pick's database design. It has always struck me as very pragmatic. In terms of storage, the substantial difference between MV and relational databases is that each MV file (=table) holds, or can hold, what would be the result of a join in a relational database. Wherever we use arrays in PostgreSQL, we effectively do the same thing. The advantages of MV are that it is very simple to program and to define your data. If you want another attribute (=column) you simply define a new dictionary entry with a new attribute number; data output formatting can be simply done by defining new dictionary items which present the same data in a different way. The MV characteristic makes it very fast to get (for example) a list of invoices for a particular customer, since the list of invoice numbers can be kept as part of the customer record. The disadvantages (at least of original PICK) are: there are no constraints (not even by typecasting); there can be no relational enquiries -- everything must be defined in the dictionary; the environment is utterly undisciplined -- anything can be changed at will with a simple text editor; even more than in MySQL, all data validation must be done by programming; there is no requirement for a record in a file to correspond at all to the structure defined in its dictionary; finally, the security model was laughable. The effects of this can be seen in many places whose applications are based on PICK. There is usually a mass of programs of various ages, with no certainty that all have the same view of the database structure. The database design is often very amateurish; frequently it truly is amateur, because the simplicity of programming makes it easy for users to roll their own systems, but they usually lack the necessary experience in data analysis. Security usually depends on user ignorance; in UniVerse migrations of old PICK databases, I have often seen entire directories of important data with 777 permissions, and with everyone using the same login. Good use of MV requires the imposition of disciplined programming in an environment which is profoundly hostile to such discipline. It is not really possible to guarantee data integrity. There are some advances on this in some implementations. I know UniVerse: it provides SQL and adds it on top of the existing MV structure; it also provides transactions. These features give some of the advantages of a relational database, provided that only SQL facilities are used, but I doubt if many people have used UniVerse to build SQL systems from scratch. I feel that SQL was provided more to satisfy the box tickers who compare tenders than with a serious intention of providing data integrity. Having used both SQL and MV, I would not now design in any MV implementation known to me a system whose data I valued. -- Oliver Elphick[EMAIL PROTECTED] Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C Delight thyself also in the LORD; and he shall give thee the desires of thine heart. Psalms 37:4 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Seun Osewa) wrote in message news:[EMAIL PROTECTED]... Hi, This is for relational database theory experts on one hand and imlementers of real-world alications on the other hand. If there was a chance to start again and design SQL afresh, for best cleaness/power/performance what changes would you make? What would _your_ query language (and the underlying database concept) look like? Seun Osewa PS: I should want to post my ideas too for review but more experienced/qualified people should come first Some ideas, links, and complaints about SQL and remaking or replacing it: http://www.c2.com/cgi/wiki?SqlFlaws -T- ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Gene Wirchenko [EMAIL PROTECTED] writes [EMAIL PROTECTED] (Seun Osewa) wrote: [snip] Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? How do you know it works? Without the theory and model, you really do not. And don't other databases have both theory and model? It's just that all the academics have been brainwashed into thinking this is true only for relational, so that's what they teach to everyone else, and the end result is that all research is ploughed into a model that may be (I didn't say is) bankrupt. Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots :-) Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Quoth Anthony W. Youngman [EMAIL PROTECTED]: In article [EMAIL PROTECTED], Gene Wirchenko [EMAIL PROTECTED] writes [EMAIL PROTECTED] (Seun Osewa) wrote: [snip] Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? How do you know it works? Without the theory and model, you really do not. And don't other databases have both theory and model? It's just that all the academics have been brainwashed into thinking this is true only for relational, so that's what they teach to everyone else, and the end result is that all research is ploughed into a model that may be (I didn't say is) bankrupt. Just like the academics were brainwashed into thinking that microkernels were the be-all and end-all - until Linus showed them by practical example that they were all idiots :-) In mathematics as well as in the analysis of computer algorithms, it is typical for someone who is trying to explain something new to try to do so in terms that allow the gentle reader to do as direct a comparison as possible between the things with which they are familiar (e.g. - in this case, relational database theory) and the things with which they are perhaps NOT familiar (e.g. - in this case, MV databases). Nobody seems to have been prepared to explain the MV model in adequate theoretical terms as to allow the gentle readers to compare the theory behind it with the other theories out there. I'm afraid that does not reflect very well on either those lauding MV or those trashing it. - Those lauding it have not made an attempt to show why the theory behind it would support it being preferable to the other models around. I hear some vague Oh, it's not about models; it's about language which doesn't get to the heart of anything. - And all we get from Bob Badour are dismissive sound-bites that _don't_ explain why he should be taken seriously. Indeed, the sharper and shorter he gets, the less credible that gets. There are no pointers to Michael Stonebraker on Why Pick Is Not My Favorite Database. Brian Kernighan felt the issues with Pascal were important enough that he wrote a nice, approachable paper that quite cogently describes the problems with Standard Pascal. http://www.lysator.liu.se/c/bwk-on-pascal.html He nicely summarizes it with 9 points that fit on a sheet of paper. If Bob wanted people to take him really seriously about this, and has done all the research to back up the points that are apparently so obvious to him, then it should surely be _easy_ to write up Nine Reasons Pick Isn't My Favorite Database System. And just as people have been pointing back to Kernighan's paper on Pascal for over 20 years, folks could point back to the Pick essay. But apparently it is much too difficult for anyone to present any _useful_ discourse on it. -- (reverse (concatenate 'string ac.notelrac.teneerf @ 454aa)) http://cbbrowne.com/info/nondbms.html For a good prime call: 391581 * 2^216193 - 1 -- [EMAIL PROTECTED] (Szymon Rusinkiewicz) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Seun Osewa) wrote: [snip] Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? How do you know it works? Without the theory and model, you really do not. Sincerely, Gene Wirchenko Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... Mike Preece [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] [EMAIL PROTECTED] (Dawn M. Wolthuis) wrote in message news:[EMAIL PROTECTED]... Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... Dawn M. Wolthuis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Thank you, Seun, for asking your question with a bit of logic and not gut-reaction emotional baggage (and for also asking a question of me off-list so I could ramble). I'll try to make this more suscinct. With all due respect, Dawn, you are an idiot. That statement has nothing to do with my viscera or with my emotions; I write it with complete dispassion. You simply are an idiot, and Seun simply lacks sufficient education to recognize that fact from what you have written. Thanks for the clarification, Bob. What seemed like an emotional outburst from me was actually a logically-derived statement regarding my capacity then, right? If I were you, I might be just a tad bit concerned that I have taught many engineers calculus and you are flying on planes or driving over bridges they designed. Let's hope they learned it well in spite of me, eh? smiles. --dawn Dawn M. Wolthuis Dawn: I'm sure you're not an idiot. Mike, you have expressed certainty contradicting the patently obvious before. Your opinion is only a useful gauge insofar as you are wrong so much more frequently than you are correct. Dawn: Bob professes to respect referential integrity!? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Dann Corbit) wrote in message news:[EMAIL PROTECTED]... -Original Message- From: Seun Osewa [mailto:[EMAIL PROTECTED] Sent: Friday, October 03, 2003 11:52 AM To: [EMAIL PROTECTED] Subject: [HACKERS] Dreaming About Redesigning SQL Hi, This is for relational database theory experts on one hand and imlementers of real-world alications on the other hand. If there was a chance to start again and design SQL afresh, for best cleaness/power/performance what changes would you make? What would your query language (and the underlying database concept) look like? Seun Osewa PS: I should want to post my ideas too for review but more experienced/qualified people should come first I imagine you have read the 3rd database manifesto by Codd. I think he's gone off the deep end a bit. Dann, you are showing your ignorance. While Dr. Codd recently died, if you think he wrote a third database manifesto, you have definitely gone off the deep end yourself. You don't just throw away a trillion dollars worth of effort and tools to make things mathematically orthogonal. Again, you are showing your ignorance. Nobody has ever suggested anything even remotely resembling the above. However, on some things he is clearly right. For instance, null values are evil. Dr. Codd believed we need two NULLs. You ascribe correctness to the one thing I think he clearly got wrong. Programmers understand it That's an absurd assertion. Therefore, his idea of using default values instead and never using null is a good one. That is not his idea. If SQL vendors would follow the ANSI/ISO standard to the letter, and implement the latest iteration, that would solve all of the problems that SQL tool users have to face. Upon what do you base this ridiculous opinion? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
The mathematics of language is more complex than the mathematics of relations, particularly simple relations (such as 1NF tables). Are you sure, you know what you are talking about? I would suggest ditching the entire relational model (as both overly simplistic in its theory and overly complex in its implementation.. Incredible! How about reading some books on the subject? -- -- Anith ( Please reply to newsgroups only ) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Seun Osewa) wrote in message news:[EMAIL PROTECTED]... I have tried, twice, to download the evaluation version of the alphora product for testing and it doesn't work. Guess there would be a lot to learn from playing with it; the product is more than a RDBMS Aw, that's unfortunate. It took me a while to get working. It is infact an integrated application development environment where you can define a great part of your application in a declarative fashion. regards, Lauri Pietarinen ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
With all due respect, Dawn, you are an idiot. Dawn M. Wolthuis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Good question. Although I would want to move away from relational databases too, if there is an RDBMS and one wants to query it, what would I aim for? If you look at XQuery, you will see an example of what I would definitely NOT aim for. Although the user of such a language might very well be a technical person, instead of starting with mathematics (relational calculus, relational algebra) I would suggest starting with language. The mathematics of language is more complex than the mathematics of relations, particularly simple relations (such as 1NF tables). If you look at the history of data persistence prior to Codd's 1970 ACM paper, you will see several attempts at this. One I have studied of late is GIRLS (Generalized Information Retrieval Language and System), specified by Don Nelson and implemented by several folks with the most famous being Dick Pick. This GIRLS language was specified a full 40 years ago and lives today in many IT shops under a variety of about 10 different names, including IBM's UniQuery and Retrieve (for UniData and Universe respectively). This language is flawed, as are all, but so very close to what I would think would be a good approach. It was written at TRW in order to make it so that the military in Viet Nam could query their data without technical folks in the field. It went into production in 1969 with the US Army. Prior to the end of the cold war, it was used by the CIA to track (the associated database) and query about Russion spies in the US. I would suggest ditching the entire relational model (as both overly simplistic in its theory and overly complex in its implementation) and start with English (that is one of the other names for the GIRLS language). Note that language is also the starting point for putting data in XML documents, but it sure doesn't seem to be the starting point for XQuery, eh? --dawn Dawn M. Wolthuis www.tincat-group.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
Good question. Although I would want to move away from relational databases too, if there is an RDBMS and one wants to query it, what would I aim for? If you look at XQuery, you will see an example of what I would definitely NOT aim for. Although the user of such a language might very well be a technical person, instead of starting with mathematics (relational calculus, relational algebra) I would suggest starting with language. The mathematics of language is more complex than the mathematics of relations, particularly simple relations (such as 1NF tables). If you look at the history of data persistence prior to Codd's 1970 ACM paper, you will see several attempts at this. One I have studied of late is GIRLS (Generalized Information Retrieval Language and System), specified by Don Nelson and implemented by several folks with the most famous being Dick Pick. This GIRLS language was specified a full 40 years ago and lives today in many IT shops under a variety of about 10 different names, including IBM's UniQuery and Retrieve (for UniData and Universe respectively). This language is flawed, as are all, but so very close to what I would think would be a good approach. It was written at TRW in order to make it so that the military in Viet Nam could query their data without technical folks in the field. It went into production in 1969 with the US Army. Prior to the end of the cold war, it was used by the CIA to track (the associated database) and query about Russion spies in the US. I would suggest ditching the entire relational model (as both overly simplistic in its theory and overly complex in its implementation) and start with English (that is one of the other names for the GIRLS language). Note that language is also the starting point for putting data in XML documents, but it sure doesn't seem to be the starting point for XQuery, eh? --dawn Dawn M. Wolthuis www.tincat-group.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Dawn M. Wolthuis) wrote in message news:[EMAIL PROTECTED]... I would suggest ditching the entire relational model (as both overly simplistic in its theory and overly complex in its implementation) and start with English (that is one of the other names for the GIRLS language). Note that language is also the starting point for putting data in XML documents, but it sure doesn't seem to be the starting point for XQuery, eh? --dawn Dawn M. Wolthuis www.tincat-group.com Please explain further. What do you really mean? Its natural for everyone here to think every word in that post was a troll unless you explain your views more clearly. You could not have expressed a more unpopular/unsupportable combination of ideas! Exactly how would we go about using language as a query tool? Is this AI? What would the underlying model be knowing how redundant and imprecise language can be? Tell what we may have missed. Seun Osewa ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Dreaming About Redesigning SQL
Mike Preece [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] [EMAIL PROTECTED] (Dawn M. Wolthuis) wrote in message news:[EMAIL PROTECTED]... Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... Dawn M. Wolthuis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Thank you, Seun, for asking your question with a bit of logic and not gut-reaction emotional baggage (and for also asking a question of me off-list so I could ramble). I'll try to make this more suscinct. With all due respect, Dawn, you are an idiot. That statement has nothing to do with my viscera or with my emotions; I write it with complete dispassion. You simply are an idiot, and Seun simply lacks sufficient education to recognize that fact from what you have written. Thanks for the clarification, Bob. What seemed like an emotional outburst from me was actually a logically-derived statement regarding my capacity then, right? If I were you, I might be just a tad bit concerned that I have taught many engineers calculus and you are flying on planes or driving over bridges they designed. Let's hope they learned it well in spite of me, eh? smiles. --dawn Dawn M. Wolthuis Dawn: I'm sure you're not an idiot. Mike, you have expressed certainty contradicting the patently obvious before. Your opinion is only a useful gauge insofar as you are wrong so much more frequently than you are correct. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
[EMAIL PROTECTED] (Dawn M. Wolthuis) wrote in message news:[EMAIL PROTECTED]... Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... Dawn M. Wolthuis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Thank you, Seun, for asking your question with a bit of logic and not gut-reaction emotional baggage (and for also asking a question of me off-list so I could ramble). I'll try to make this more suscinct. With all due respect, Dawn, you are an idiot. That statement has nothing to do with my viscera or with my emotions; I write it with complete dispassion. You simply are an idiot, and Seun simply lacks sufficient education to recognize that fact from what you have written. Thanks for the clarification, Bob. What seemed like an emotional outburst from me was actually a logically-derived statement regarding my capacity then, right? If I were you, I might be just a tad bit concerned that I have taught many engineers calculus and you are flying on planes or driving over bridges they designed. Let's hope they learned it well in spite of me, eh? smiles. --dawn Dawn M. Wolthuis Dawn: I'm sure you're not an idiot. Bob: Do pigs really like people to wrestle in mud with them? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Thank you, Seun, for asking your question with a bit of logic and not gut-reaction emotional baggage (and for also asking a question of me off-list so I could ramble). I'll try to make this more suscinct. First of all, I have read Codd's 1970 1974 ACM papers, as well as his The Relational Model for Database Management, Version 2 book. I have also read several books by Chris Date and by Fabian Pascal. I held a dialog (that reads more like a monologue) with Pascal and it is reproduced in total if you scroll to the bottom of the http://store.tincat-group.com page and click on the Dick Pick / Ted Codd Blue Brothers parody picture. I had not been reading this news group until lately, but it strikes me that this is a group that might be very entertained by that dialog. I have a master's degree in mathematics and my father is a linguist. I find the language of mathematics and the mathematics of language both fascinating. My experience, however, is that I have run IT project teams working with a variety of databases (and languages) and have never seen any environment that is as agile for develoeprs (both productive from the start and easy to maintain) than the teams I have led that worked with the UniData database. I have worked with SQL as well as both older and newer database languages. So, nope, I'm not trolling. I've been doing some research the past couple of years and I'm convinced that it is time to do something new (and yet old) with data persistence. I favor using Java for a variety of reasons, but am comfortable with other languages as well, and think that using Java both for the software application and for the constraints on the data, rather than encoding constraints in some other language within a database, makes for both a more agile development approach AND, surprisingly enough, tends to make for better data integrity, although a lousy software developer can certainly mess up either environment. Separating the DBA from the software developer has definitely had a negative affect on the speed with which software is developed and maintained, but my experience (and intuition -- I don't, yet, have scientific evidence) tells me that the benefits purported by the approach of having a dba work on some centralized constraints on the persisted data outside of the context of the use of that data have not really come to fruition and/or are not worth the costs of using this approach (more on that some other time). So, while some might classify me as an idiot (men can be so emotional sometimes ;-), I have several graduate classes in logic to my credit and believe that I am approaching this topic quite logically, even if my summaries skip some of the logical steps in the process. I have thought about how to prove my points and since my point is really about agility and quality in application software development and maintenance, a competition to see what tools and techniques and what data persistence approaches win such a competition might be the best proof. The current industry benchmarks for databases tend to be SQL-based and highly political, so let's put different approaches to the test. Thanks for asking your question and not just assuming I'm a nut because I disagree with the current state of the industry on this topic. I'm sure there are gaps in my thinking and I know some of my opinions are based on intuition that arises from my experience, but I do hope to have more proof in the future. I am also very willing to adjust my opinions with convincing arguments and evidence and trust that there are some on this list who work similarly. --dawn [EMAIL PROTECTED] (Seun Osewa) wrote in message news:[EMAIL PROTECTED]... [EMAIL PROTECTED] (Dawn M. Wolthuis) wrote in message news:[EMAIL PROTECTED]... I would suggest ditching the entire relational model (as both overly simplistic in its theory and overly complex in its implementation) and start with English (that is one of the other names for the GIRLS language). Note that language is also the starting point for putting data in XML documents, but it sure doesn't seem to be the starting point for XQuery, eh? --dawn Dawn M. Wolthuis www.tincat-group.com Please explain further. What do you really mean? Its natural for everyone here to think every word in that post was a troll unless you explain your views more clearly. You could not have expressed a more unpopular/unsupportable combination of ideas! Exactly how would we go about using language as a query tool? Is this AI? What would the underlying model be knowing how redundant and imprecise language can be? Tell what we may have missed. Seun Osewa ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Dreaming About Redesigning SQL
Bob Badour [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... Dawn M. Wolthuis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Thank you, Seun, for asking your question with a bit of logic and not gut-reaction emotional baggage (and for also asking a question of me off-list so I could ramble). I'll try to make this more suscinct. With all due respect, Dawn, you are an idiot. That statement has nothing to do with my viscera or with my emotions; I write it with complete dispassion. You simply are an idiot, and Seun simply lacks sufficient education to recognize that fact from what you have written. Thanks for the clarification, Bob. What seemed like an emotional outburst from me was actually a logically-derived statement regarding my capacity then, right? If I were you, I might be just a tad bit concerned that I have taught many engineers calculus and you are flying on planes or driving over bridges they designed. Let's hope they learned it well in spite of me, eh? smiles. --dawn Dawn M. Wolthuis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
Dawn M. Wolthuis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Thank you, Seun, for asking your question with a bit of logic and not gut-reaction emotional baggage (and for also asking a question of me off-list so I could ramble). I'll try to make this more suscinct. With all due respect, Dawn, you are an idiot. That statement has nothing to do with my viscera or with my emotions; I write it with complete dispassion. You simply are an idiot, and Seun simply lacks sufficient education to recognize that fact from what you have written. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
While I definitely agree that the mathematics of the data persistence mechanism is not as important to me as whether it works or not, as a former mathematician, I have done a little study related to the mathematics of non-relational approaches, such as PICK (the one both Wol and I have been know to advocate on behalf of). These models tend to start with language rather than mathematics. So, what started out as my attempt to show such things as the fact that a PICK file is more like a mathematical RELATION than an RDBMS table, I ended up studying the mathematics of language for a short time - one can see that the mathematics of language, which is what we are storing when working with text-based objects, is much more complex than simple relations. By the way, in case you are wondering how PICK files are more like mathematical relations -- they do have a numbered position for each domain (in other words, there is a location for each column within a row as there is a location in a PICK ITEM/RECORD); they do not by default request a constraint on the length of values in a given domain (a quite unnecessary database constraint); and they permit relations as elements within a relation -- there is no mathematical requirement that a relation be in first normal form, for example. I do tire of the thought that a database premised on the relational model is somehow more mathematically accurate than those premised on a language model. PICK, like XML, was used to make it easy to think about storing and retrieving text. If you set aside the need for storing other objects for now and focus on text-based data persistence, it is simply a means to persist propositions. If one were to normalize your sentences before you said them, you might guess that people would have a harder time figuring out what you were saying. Similarly, normalizing data before persisting it tends to make it difficult to retrieve the original propositions, reconstructing language from normalized data. It's time to move on from the relational model -- it was a good academic exercise, but has not proven a very agile means for persisting and retrieving propositions, methinks. smiles. --dawn Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... In article [EMAIL PROTECTED], Lee Fesperman [EMAIL PROTECTED] writes If you don't care for mathematical principles, there's always ad-hoc database models. Check out Pick, OO and XML databases. They're interested in what works and ignore elegance and mathematical principles. Mathematical principles? You mean like Euclidean Geometry and Newtonian Mechanics? They're perfectly solid, good, mathematically correct. Shame they don't actually WORK all the time in the real world. That's what I feel about relational, too ... Cheers, Wol ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Dreaming About Redesigning SQL
I have tried, twice, to download the evaluation version of the alphora product for testing and it doesn't work. Guess there would be a lot to learn from playing with it; the product is more than a RDBMS Regards, Seun Osewa [EMAIL PROTECTED] (Lauri Pietarinen) wrote: That is, in fact, the approach taken in a product called Dataphor (see www.alphora.com). They have implemented a D-language (called D4) that translates into SQL and hence uses underlying SQLServer, Oracle or DB2- DBMS'es as the engine. regards, Lauri Pietarinen ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
Christopher Browne [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... I think an implementor would be better off using an SQL database underneath, and using their code layer in between to accomplish the divorce from the aspects of SQL that they disapprove of. That is, in fact, the approach taken in a product called Dataphor (see www.alphora.com). They have implemented a D-language (called D4) that translates into SQL and hence uses underlying SQLServer, Oracle or DB2- DBMS'es as the engine. It is, however, not a very easy mapping to do and you have to resort to all sorts of unclean stuff to make it work... regards, Lauri Pietarinen ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Dreaming About Redesigning SQL
Seun Osewa wrote: Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? Relational is the _right_ model because 'it works'. It's the only truly comprehensive data model and subject of decades of research. All other data models have been found to be flawed and (nearly) discarded. If you don't care for mathematical principles, there's always ad-hoc database models. Check out Pick, OO and XML databases. They're interested in what works and ignore elegance and mathematical principles. -- Lee Fesperman, FirstSQL, Inc. (http://www.firstsql.com) == * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
On 3 Oct 2003 21:39:03 GMT, Christopher Browne [EMAIL PROTECTED] wrote: There are two notable 'projects' out there: 1. There's Darwen and Date's Tutorial D language, defined as part of their Third Manifesto about relational databases. 2. newSQL http://newsql.sourceforge.net/, where they are studying two syntaxes, one based on Java, and one based on a simplification (to my mind, oversimplification) of SQL. ISTR that Terry Halpin (of ORM fame) designed a language named ConQuer. I don't know the details, but I think Date's latest edition refers to it in a note. Halpin's working on Visio at Microsoft now, I think. -- Mike Sherrill Information Management Systems ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Seun Osewa [EMAIL PROTECTED] writes Thanks for the links. Christopher Browne [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] 125932.news.uni-berlin.de... There are two notable 'projects' out there: 1. There's Darwen and Date's Tutorial D language, defined as part of their Third Manifesto about relational databases. 2. newSQL http://newsql.sourceforge.net/, where they are studying two syntaxes, one based on Java, and one based on a simplification (to my mind, oversimplification) of SQL. I was able to get a pdf coy of the Third Manifesto article here: http://citeseer.nj.nec.com/darwen95third.html but the details of tutorial D seem not to be a part of that article. NewSQL *might* be cool if someone found reason to use it in a DBMS. Is Darwen and Date's stuff that where they said SQL was crap. As I understand it, within about a year of designing SQL, at least one of Codd and Date said it was rubbish and tried to replace it with something better. Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? I couldn't agree more (of course I would). As I like to put it, surely Occam's Razor says that stuffing the four-dimensional world into a flat- earth database can't be the optimal solution! The trouble with so many SQL advocates is that they are so convinced in the mathematical rightness of the relational model, that they forget it is a *model* and, as such, needs to be shown as relevant to the real world. That said, I always think relationally when designing databases - it helps. Look at the multi-value databases. Think relationally, you can still store your data in normal form, but you're not stuffed by all the irrelevant restrictions that relational databases tend to impose. Get a freebie copy of jBASE, UniVerse or UniData, and try them out :-) Cheers, Wol -- Anthony W. Youngman [EMAIL PROTECTED] 'Yings, yow graley yin! Suz ae rikt dheu,' said the blue man, taking the thimble. 'What *is* he?' said Magrat. 'They're gnomes,' said Nanny. The man lowered the thimble. 'Pictsies!' Carpe Jugulum, Terry Pratchett 1998 Visit the MaVerick web-site - http://www.maverick-dbms.org Open Source Pick ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Dreaming About Redesigning SQL
In article [EMAIL PROTECTED], Lee Fesperman [EMAIL PROTECTED] writes If you don't care for mathematical principles, there's always ad-hoc database models. Check out Pick, OO and XML databases. They're interested in what works and ignore elegance and mathematical principles. Mathematical principles? You mean like Euclidean Geometry and Newtonian Mechanics? They're perfectly solid, good, mathematically correct. Shame they don't actually WORK all the time in the real world. That's what I feel about relational, too ... Cheers, Wol -- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk Witches are curious by definition and inquisitive by nature. She moved in. Let me through. I'm a nosey person., she said, employing both elbows. Maskerade : (c) 1995 Terry Pratchett ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dreaming About Redesigning SQL
Anthony W. Youngman [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... In article [EMAIL PROTECTED], Lee Fesperman [EMAIL PROTECTED] writes If you don't care for mathematical principles, there's always ad-hoc database models. Check out Pick, OO and XML databases. They're interested in what works and ignore elegance and mathematical principles. Mathematical principles? You mean like Euclidean Geometry and Newtonian Mechanics? They're perfectly solid, good, mathematically correct. Shame they don't actually WORK all the time in the real world. That's what I feel about relational, too ... That explains the generally poor quality of your posts. You substitute emotion for reason. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
Thanks for the links. Christopher Browne [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... There are two notable 'projects' out there: 1. There's Darwen and Date's Tutorial D language, defined as part of their Third Manifesto about relational databases. 2. newSQL http://newsql.sourceforge.net/, where they are studying two syntaxes, one based on Java, and one based on a simplification (to my mind, oversimplification) of SQL. I was able to get a pdf coy of the Third Manifesto article here: http://citeseer.nj.nec.com/darwen95third.html but the details of tutorial D seem not to be a part of that article. NewSQL *might* be cool if someone found reason to use it in a DBMS. Sometimes I wonder why its so important to model data in the rela- tional way, to think of data in form of sets of tuples rather than tables or lists or whatever. I mean, though its elegant and based on mathematical principles I would like to know why its the _right_ model to follow in designing a DBMS (or database). The way my mind sees it, should we not rather be interested in what works? Seun Osewa ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Dreaming About Redesigning SQL
-Original Message- From: Seun Osewa [mailto:[EMAIL PROTECTED] Sent: Friday, October 03, 2003 11:52 AM To: [EMAIL PROTECTED] Subject: [HACKERS] Dreaming About Redesigning SQL Hi, This is for relational database theory experts on one hand and imlementers of real-world alications on the other hand. If there was a chance to start again and design SQL afresh, for best cleaness/power/performance what changes would you make? What would _your_ query language (and the underlying database concept) look like? Seun Osewa PS: I should want to post my ideas too for review but more experienced/qualified people should come first I imagine you have read the 3rd database manifesto by Codd. I think he's gone off the deep end a bit. You don't just throw away a trillion dollars worth of effort and tools to make things mathematically orthogonal. However, on some things he is clearly right. For instance, null values are evil. Programmers understand it, but end users will *always* be surprised that: SELECT COUNT(shirts) FROM clothing WHERE color = 'blue' SELECT COUNT(shirts) FROM clothing WHERE NOT color = 'blue' Is not equal to the number of shirts in the inventory if any color fields are null. Therefore, his idea of using default values instead and never using null is a good one. If SQL vendors would follow the ANSI/ISO standard to the letter, and implement the latest iteration, that would solve all of the problems that SQL tool users have to face. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Dreaming About Redesigning SQL
Mike Mascari kirjutas L, 04.10.2003 kell 06:32: 2) The query language should be computationally complete. The user should be able to author complete applications in the language, rather than the language being a sublanguage. To me it seems like requiring that one should be able to author complete programs in regex. Yes, when all you have is a hammer everything looks like a nail ;) Hannu ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Dreaming About Redesigning SQL
The world rejoiced as [EMAIL PROTECTED] (Mike Mascari) wrote: It's a very provocative read. At a minimum, one can learn what to avoid with SQL. The language looks neat on paper. Perhaps one day someone will provide an open source implementation. One could envision a D project along the same lines as the same sort of project that added SQL to Postgres... I think you summed it up nicely. The manifesto is a provocative, if painful, read. It is very useful at pointing out pointy edges of SQL that might be wise to avoid. I'm not thrilled with the language; I think they have made a mistake in trying to make it too abbreviation-oriented. They keep operator names short, to a fault. As you say, the most likely way for a D to emerge in a popular way would be by someone adding the language to an existing database system. There is a project out on SourceForge for a D implementation, called Duro. It takes the opposite approach; the operators are all defined as C functions, so you write all your code in C. It uses a data store built atop Berkeley DB. I think an implementor would be better off using an SQL database underneath, and using their code layer in between to accomplish the divorce from the aspects of SQL that they disapprove of. Sort of like MaVerick, a Pick implementation in Java that uses a DBMS such as PostgreSQL as the underlying data store. You do a proof of concept by building something that translates D requests to SQL requests. And _then_ get a project going to put a D parser in as an alternative to the SQL parser. (Yes, that oversimplifies matters. Tough...) -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://www3.sympatico.ca/cbbrowne/rdbms.html Rules of the Evil Overlord #81. If I am fighting with the hero atop a moving platform, have disarmed him, and am about to finish him off and he glances behind me and drops flat, I too will drop flat instead of quizzically turning around to find out what he saw. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dreaming About Redesigning SQL
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Seun Osewa) belched out...: This is for relational database theory experts on one hand and imlementers of real-world alications on the other hand. If there was a chance to start again and design SQL afresh, for best cleaness/power/performance what changes would you make? What would _your_ query language (and the underlying database concept) look like? There are two notable 'projects' out there: 1. There's Darwen and Date's Tutorial D language, defined as part of their Third Manifesto about relational databases. 2. newSQL http://newsql.sourceforge.net/, where they are studying two syntaxes, one based on Java, and one based on a simplification (to my mind, oversimplification) of SQL. The newSQL project suffers from their definition being something of a chip away everything that doesn't look like an elephant definition. They aren't defining, in mathematical terms, what their language is supposed to be able to express; they are instead defining a big grab-bag of minor syntactic features, and seem to expect that a database system will emerge from that. In contrast, Tutorial D is _all_ about mathematical definition of what it is supposed to express, and the text is a tough read, irrespective of other merits. -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com'). http://cbbrowne.com/info/thirdmanifesto.html DOS: n., A small annoying boot virus that causes random spontaneous system crashes, usually just before saving a massive project. Easily cured by Unix. See also MS-DOS, IBM-DOS, DR-DOS. -- from David Vicker's .plan ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dreaming About Redesigning SQL
Christopher Browne wrote: After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Seun Osewa) belched out...: This is for relational database theory experts on one hand and imlementers of real-world alications on the other hand. If there was a chance to start again and design SQL afresh, for best cleaness/power/performance what changes would you make? What would _your_ query language (and the underlying database concept) look like? There are two notable 'projects' out there: 1. There's Darwen and Date's Tutorial D language, defined as part of their Third Manifesto about relational databases. I read the Third Manifesto. There are many ideas in the TTM that have strong arguments, although I most confess I haven't read any critiques. A few (of many) points: 1) Strict adherence to the relational model, where all of SQL's short-comings are addressed: A) No attribute ordering B) No tuple ordering (sets aren't ordered) C) No duplicate tuples (relations are sets) D) No nulls (2VL sufficient. Missing information is meta-data) E) No nullogical mistakes (ex: SUM of an empty relation is zero, AVG is an error) F) Generalized transitive closure G) Declared attribute, relation variable, and database constraints, including transition constraints H) Candidate keys required (this has positive logical consequences for the DBMS implementor) I) Tuple and relation-valued attributes J) No tuple-level operations a bunch more... 2) The query language should be computationally complete. The user should be able to author complete applications in the language, rather than the language being a sublanguage. This reverses Codd's query sublanguage proposed in A Relational Model of Data for Large Shared Data Banks http://www.acm.org/classics/nov95/s1p5.html sarcasm Thanks ACM for just putting part of the paper on-line, complete with broken links and spelling errors! /sarcasm 3) The language (a D implementation) would ensure a separation between the logical design of the application and the physical implementation. The programmer should think in terms of the evaluation of relational algebraic expressions, not manipulating physical records in disk blocks in a file. 4) The type system should separate the actual, internal representation from the possible representation, of which there might be many. For example, a POINT may be internally expressed in cartesian coordinates but may supply both polar and cartensian THE_ operators. 5) The type system should implement D D's view of multiple inheritance, where read-operators are inherited but write-operators aren't. This eliminates the Is a Circle an Ellipse? dilemma imposed by C++, for example. IOW, in a D language, a Circle is an Ellipse. They reject Stonebreaker's ideas of OIDs and relation variable inheritance, which of course, are in PostgreSQL. It's a very provocative read. At a minimum, one can learn what to avoid with SQL. The language looks neat on paper. Perhaps one day someone will provide an open source implementation. One could envision a D project along the same lines as the same sort of project that added SQL to Postgres... But I'd rather have PITR :-) Mike Mascari [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster