Re: Is there a good discussion of optimizations?

Guyren Howe Thu, 07 Jan 2021 13:49:03 -0800

On Jan 7, 2021, 13:42 -0800, Florents Tselai <florents.tse...@gmail.com>, wrote:
> Apologies for the shameless self-promotion :)
>
> Around a year ago I collected my thoughts on this topic. You can read the 
> post here Modern Data Practice and the SQL Tradition .
> It looks like it resonated with a lot of folks in the community. HN 
> Discussion https://news.ycombinator.com/item?id=21482114
>
> I would specifically underline the fact that the newer generation of 
> programmers & data pros (my former self included) don't really appreciate 
> things like triggers and server-side programming. Triggers and DB-side 
> functions are considered something like Assembly code.
>
> Not many neophytes have been shown with use cases why for example writing 2-3 
> lines of PL/SQL can save you a huge overhead of back and forth and 
> environment set up to write the same thing in say Pandas.
>
> I would focus on Triggers, Indices on expressions, and time-related 
> functions. Probably on generated columns too. They may be considered a new 
> feature, but the reasoning of building successively columns on top of a few 
> base ones is quite appealing nowadays, especially for ML purposes.
>
> I also wouldn't hesitate to discuss advanced topics. They are usually 
> considered obscure because people don't present useful examples, rather toy 
> and arbitrary ones.
>
> In a recent O'Reilly training, I was skeptical of talking about triggers for 
> optimization but it looks like it was probably the most useful part of my 
> training as students could actually "steal and tweak" my code.


Thanks for this. May I steal some of your examples if they prove useful? I’ll 
credit you of course.

I’m planning on somewhat emphasizing that a relational database is a logic 
engine. Viewed through this lens, a query or view is a “backward” implication 
and a trigger is a “forward” one. This leads to considering triggers (and the 
moral equivalent in external code) as requiring “truth maintenance”, and is a 
great way to think about when the database is the appropriate place for some 
bit of logic.
> On Thu, Jan 7, 2021 at 11:21 PM Guyren Howe <guy...@gmail.com> wrote:
> > On Jan 7, 2021, 13:07 -0800, Kevin Brannen <kbran...@efji.com>, wrote:
> > > From: Guyren Howe <guy...@gmail.com>
> > >
> > > >Most folks, in my experience, who use relational databases don’t really 
> > > >understand the basic theory or even more important the why - the 
> > > >philosophy - of what a relational database is and how to get the most 
> > > >out of them. I see a lot of folks trying to use SQL in an imperative 
> > > >manner - make this temp table, then update it some, then make this other 
> > > >temp table, etc...
> >
> > Actually, I’m mostly going to talk about the relational model, rather than 
> > SQL. Our industry seems to always settle for third-best, and SQL is the 
> > worst of all the examples of this. The world desperately needs a good 
> > relational database based on a better query language — datalog, for example.
> >
> > I put up with SQL so I can use the relational model, and I think that 
> > understanding SQL has to start with that.
> >
> > Anyhow.
> > > An example of this is that we have a report we're trying to write that I'd
> > > like to think can be done in SQL, but I can't think of a way to do it. 
> > > Yet,
> > > if I do the base query and pull the data back into my application, I can 
> > > do
> > > the last bit with 3 lines of Perl very easily. The problem here revolves
> > > around comparing a row of data to the previous row to know whether the 
> > > data
> > > changed "significantly enough" to keep the new row.
> > >
> > > Another example is doing running totals. A couple of years ago I would 
> > > have
> > > said SQL can't do that. Now I know about the OVER clause, something that I
> > > would categorize as somewhat obscure, I can do it as needed.
> >
> > Actually, Window functions might be “advanced”, but are certainly not 
> > obscure. Your example sounds like it’s trivially solved with LAG().
> > > As Michael Lewis pointed out, large dataset can also cause you to choose 
> > > not
> > > to use SQL in 1 big statemet for everything (i.e. advocating the use to 
> > > temp
> > > tables). In some ways, using a CTE is a type of temp table, or at least I
> > > view it as such. That allows a person to solve a problem in bite-sized 
> > > chunks.
> > > I will agree that optimization can do it better at times, but the code 
> > > also has
> > > to be maintained as well – a balancing act.
> >
> > This appears to be good advice with SQL Server, which I’m coming to learn 
> > has a fairly poor query optimizer. But I would have thought Postgres’s 
> > optimizer would usually use a temporary table where appropriate.
> >
> > Curious to hear if that’s wrong.
> > > I think your idea is a good one, but I do hope you present that SQL can't
> > > solve everything ... else why do we have plpgsql. :) You’re correct 
> > > though,
> > > SQL isn’t used as much as it should be in many places.
> >
> > An important consideration will be when relational is inappropriate. My 
> > biggest goal, though, is to get folks to understand how much relations 
> > *can* do — far too many devs in my experience don’t use the power of SQL 
> > because they don’t understand it.
> >
> > Thanks for taking the time to give me this feedback.

Re: Is there a good discussion of optimizations?

Reply via email to