Re: Is there a good discussion of optimizations?

Rob Sargent Thu, 07 Jan 2021 13:55:25 -0800


On 1/7/21 2:48 PM, Guyren Howe wrote:

On Jan 7, 2021, 13:42 -0800, Florents Tselai<florents.tse...@gmail.com>, wrote:

Apologies for the shameless self-promotion :)


    Around a year ago I collected my thoughts on this topic. You can
    read the post here Modern Data Practice and the SQL Tradition
    <https://tselai.com/modern-data-practice-and-the-sql-tradition.html> .

    It looks like it resonated with a lot of folks in the community.
    HN Discussion https://news.ycombinator.com/item?id=21482114
    <https://news.ycombinator.com/item?id=21482114>


    I would specifically underline the fact that the newer generation
    of programmers & data pros (my former self included) don't really
    appreciate things like triggers and server-side programming.
    Triggers and DB-side functions are considered something like

Assembly code.


    Not many neophytes have been shown with use cases why for example
    writing 2-3 lines of PL/SQL can save you a huge overhead of back
    and forth and environment set up to write the same thing in say

Pandas.


    I would focus on Triggers, Indices on expressions, and
    time-related functions. Probably on generated columns too. They
    may be considered a new feature, but the reasoning of building
    successively columns on top of a few base ones is quite appealing
    nowadays, especially for ML purposes.


    I also wouldn't hesitate to discuss advanced topics. They are
    usually considered obscure because people don't present useful

examples, rather toy and arbitrary ones.


    In a recent O'Reilly training, I was skeptical of talking about
    triggers for optimization but it looks like it was probably the
    most useful part of my training as students could actually "steal
    and tweak" my code.
    
<https://github.com/Florents-Tselai/SQLite-for-Data-Scientists/blob/master/notebooks/5_advanced_SQL.ipynb>

Thanks for this. May I steal some of your examples if they proveuseful? I’ll credit you of course.

I’m planning on somewhat emphasizing that a relational database is alogic engine. Viewed through this lens, a query or view is a“backward” implication and a trigger is a “forward” one. This leads toconsidering triggers (and the moral equivalent in external code) asrequiring “truth maintenance”, and is a great way to think about whenthe database is the appropriate place for some bit of logic.


    On Thu, Jan 7, 2021 at 11:21 PM Guyren Howe <guy...@gmail.com
    <mailto:guy...@gmail.com>> wrote:

        On Jan 7, 2021, 13:07 -0800, Kevin Brannen <kbran...@efji.com
        <mailto:kbran...@efji.com>>, wrote:

            From: Guyren Howe <guy...@gmail.com <mailto:guy...@gmail.com>>

            >Most folks, in my experience, who use relational
            databases don’t really understand the basic theory or even
            more important the why - the philosophy - of what a
            relational database is and how to get the most out of
            them. I see a lot of folks trying to use SQL in an
            imperative manner - make this temp table, then update it
            some, then make this other temp table, etc...


        Actually, I’m mostly going to talk about the relational model,
        rather than SQL. Our industry seems to always settle for
        third-best, and SQL is the worst of all the examples of this.
        The world desperately needs a good relational database based
        on a better query language — datalog, for example.


        I put up with SQL so I can use the relational model, and I
        think that understanding SQL has to start with that.


        Anyhow.

            An example of this is that we have a report we're trying
            to write that I'd

            like to think can be done in SQL, but I can't think of a
            way to do it. Yet,

            if I do the base query and pull the data back into my
            application, I can do

            the last bit with 3 lines of Perl very easily. The problem
            here revolves

            around comparing a row of data to the previous row to know
            whether the data

            changed "significantly enough" to keep the new row.

            Another example is doing running totals. A couple of years
            ago I would have

            said SQL can't do that. Now I know about the OVER clause,
            something that I

            would categorize as somewhat obscure, I can do it as needed.


        Actually, Window functions might be “advanced”, but are
        certainly not obscure. Your example sounds like it’s trivially
        solved with LAG().

            As Michael Lewis pointed out, large dataset can also cause
            you to choose not

            to use SQL in 1 big statemet for everything (i.e.
            advocating the use to temp

            tables). In some ways, using a CTE is a type of temp
            table, or at least I

            view it as such. That allows a person to solve a problem
            in bite-sized chunks.

            I will agree that optimization can do it better at times,
            but the code also has

            to be maintained as well – a balancing act.


        This appears to be good advice with SQL Server, which I’m
        coming to learn has a fairly poor query optimizer. But I would
        have thought Postgres’s optimizer would usually use a
        temporary table where appropriate.

Please include, for all those front-end coders who might want to hit thedatabase, the expense/overhead involved. I've seen "foreach id, readdatabase, process record" all too often.

Re: Is there a good discussion of optimizations?

Reply via email to