Hi Paul: Thanks for the sharing. I would like to share another good latest paper here "Everything you always wanted to know about compiled and vectorized queries but were afraid to ask" : http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
It explains the two kind of database execution architecture : vectorized & compiled. It can also answer the ever asked question about what's the difference between spark's whole stage codegen and Drill's codegen. On Tue, Jan 22, 2019 at 10:51 AM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi All, > > Wanted to pass along some good foundational material about databases. We > find ourselves immersed day-to-day in the details of Drill's > implementation. It is helpful to occasionally step back and look at the > larger DB tradition in which Drill resides. This material is especially > good for anyone who didn't study DB theory in college. > > "Architecture of a Database System": > http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By > Stonebraker et al. While focused on "classic" DB systems, the ideas readily > apply to "Big Data" distributed engines such as Drill. Walks through many > of the basic architectural choices. You'll find yourself saying, "I see, > Drill chose the shared-nothing, OS thread model but random heap allocation > rather than a buffer pool." That is, you can see Drill's design choices in > the context of the overall DB solution space. > > "Database Management Systems", 3e by Ramakrishnan & Gehrke. A > textbook-length overview of DB theory. I used the second edition years ago > to design and build a complete embedded hybrid DB and object store. I keep > returning to the book any time I need a refresher on some topic or other. > > What other favorites do people have? Anyone know of any good references > that explain the rule-based architecture of a planner such as Calcite? > (R&G, 2e, mostly discuss the classic "dynamic programming" style of > planner.) > > Thanks, > - Paul > >