On 1/7/21 2:48 PM, Guyren Howe wrote:
On Jan 7, 2021, 13:42 -0800, Florents Tselai
<florents.tse...@gmail.com>, wrote:
Apologies for the shameless self-promotion :)
Around a year ago I collected my thoughts on this topic. You can
read the post here Modern Data Practice and the SQL Tradition
<https://tselai.com/modern-data-practice-and-the-sql-tradition.html> .
It looks like it resonated with a lot of folks in the community.
HN Discussion https://news.ycombinator.com/item?id=21482114
<https://news.ycombinator.com/item?id=21482114>
I would specifically underline the fact that the newer generation
of programmers & data pros (my former self included) don't really
appreciate things like triggers and server-side programming.
Triggers and DB-side functions are considered something like
Assembly code.
Not many neophytes have been shown with use cases why for example
writing 2-3 lines of PL/SQL can save you a huge overhead of back
and forth and environment set up to write the same thing in say
Pandas.
I would focus on Triggers, Indices on expressions, and
time-related functions. Probably on generated columns too. They
may be considered a new feature, but the reasoning of building
successively columns on top of a few base ones is quite appealing
nowadays, especially for ML purposes.
I also wouldn't hesitate to discuss advanced topics. They are
usually considered obscure because people don't present useful
examples, rather toy and arbitrary ones.
In a recent O'Reilly training, I was skeptical of talking about
triggers for optimization but it looks like it was probably the
most useful part of my training as students could actually "steal
and tweak" my code.
<https://github.com/Florents-Tselai/SQLite-for-Data-Scientists/blob/master/notebooks/5_advanced_SQL.ipynb>
Thanks for this. May I steal some of your examples if they prove
useful? I’ll credit you of course.
I’m planning on somewhat emphasizing that a relational database is a
logic engine. Viewed through this lens, a query or view is a
“backward” implication and a trigger is a “forward” one. This leads to
considering triggers (and the moral equivalent in external code) as
requiring “truth maintenance”, and is a great way to think about when
the database is the appropriate place for some bit of logic.
On Thu, Jan 7, 2021 at 11:21 PM Guyren Howe <guy...@gmail.com
<mailto:guy...@gmail.com>> wrote:
On Jan 7, 2021, 13:07 -0800, Kevin Brannen <kbran...@efji.com
<mailto:kbran...@efji.com>>, wrote:
From: Guyren Howe <guy...@gmail.com <mailto:guy...@gmail.com>>
>Most folks, in my experience, who use relational
databases don’t really understand the basic theory or even
more important the why - the philosophy - of what a
relational database is and how to get the most out of
them. I see a lot of folks trying to use SQL in an
imperative manner - make this temp table, then update it
some, then make this other temp table, etc...
Actually, I’m mostly going to talk about the relational model,
rather than SQL. Our industry seems to always settle for
third-best, and SQL is the worst of all the examples of this.
The world desperately needs a good relational database based
on a better query language — datalog, for example.
I put up with SQL so I can use the relational model, and I
think that understanding SQL has to start with that.
Anyhow.
An example of this is that we have a report we're trying
to write that I'd
like to think can be done in SQL, but I can't think of a
way to do it. Yet,
if I do the base query and pull the data back into my
application, I can do
the last bit with 3 lines of Perl very easily. The problem
here revolves
around comparing a row of data to the previous row to know
whether the data
changed "significantly enough" to keep the new row.
Another example is doing running totals. A couple of years
ago I would have
said SQL can't do that. Now I know about the OVER clause,
something that I
would categorize as somewhat obscure, I can do it as needed.
Actually, Window functions might be “advanced”, but are
certainly not obscure. Your example sounds like it’s trivially
solved with LAG().
As Michael Lewis pointed out, large dataset can also cause
you to choose not
to use SQL in 1 big statemet for everything (i.e.
advocating the use to temp
tables). In some ways, using a CTE is a type of temp
table, or at least I
view it as such. That allows a person to solve a problem
in bite-sized chunks.
I will agree that optimization can do it better at times,
but the code also has
to be maintained as well – a balancing act.
This appears to be good advice with SQL Server, which I’m
coming to learn has a fairly poor query optimizer. But I would
have thought Postgres’s optimizer would usually use a
temporary table where appropriate.
Please include, for all those front-end coders who might want to hit the
database, the expense/overhead involved. I've seen "foreach id, read
database, process record" all too often.