Re: [Python-ideas] Proposal: Query language extension to Python (PythonQL)

Nick Coghlan Sat, 25 Mar 2017 09:41:44 -0700

First off, I think PythonQL (and PonyORM before it) is a very
interesting piece of technology. However, I think some of the answers
so far suggest we may need to discuss a couple of meta-issues around
target audiences and available technical options before continuing on.

I'm quoting Gerald's post here because it highlights the "target
audience" problem, but my comments apply to the thread generally.

On 25 March 2017 at 22:51, Gerald Britton <[email protected]> wrote:
>
> I see lots of C# code, but (thankfully) not so much LINQ to SQL.  Yes, it is 
> a cool technology.  But I sometimes have a problem with the SQL it generates. 
>  Since I'm also a SQL developer, I'm sensitive to how queries are 
> constructed, for performance reasons, as well as how they look, for 
> readability and aesthetic reasons.
>
> LINQ queries can generate poorly-performing SQL, since LINQ is a basically a 
> translator, but not an AI.  As far as appearances go, LINQ queries can look 
> pretty gnarly, especially if they include sub queries or a few joins.  That 
> makes it hard for the SQL dev (me!) to read and understand if there are 
> performance problems (which there often are, in my experience)
>
> So, I would tend to code the SQL separately and put it in a SQL view, 
> function or stored procedure.  I can still parse the results with LINQ (not 
> LINQ to SQL), which is fine.
>
> For similar reasons, I'm not a huge fan of ORMs either.  Probably my bias 
> towards designing the database first and building up queries to meet the 
> business goals before writing a line of Python, C#, or the language de jour.

Right, the target audience here *isn't* folks who already know how to
construct their own relational queries in SQL, and it definitely isn't
folks that know how to tweak their queries to get optimal performance
from the specific database they're using. Rather, it's folks that
already know Python's comprehensions, and perhaps some of the
itertools features, and helping to provide them with a smoother
on-ramp into the world of relational data processing.

There's no question that folks dealing with sufficiently large data
sets with sufficiently stringent performance requirements are
eventually going to want to reach for handcrafted SQL or a distributed
computation framework like dask, but that's not really any different
from our standard position that when folks are attempting to optimise
a hot loop, they're eventually going to have to switch to something
that can eliminate the interpreter's default runtime object management
overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an
extension module in a different language entirely). It isn't an
argument against making it easier for folks to postpone the point
where they find it necessary to reach for the "something else" that
takes them beyond Python's default capabilities.

However, at the same time, PythonQL *is* a DSL for data manipulation
operations, and map and filter are far and away the most common of
those. Even reduce, which was previously a builtin, was pushed into
functools for Python 3.0, with the preferred alternative being to just
write a suitably named function that accepts an iterable and returns a
single value. And while Python is a very popular tool for data
manipulation, it would be a big stretch to assume that that was it's
primary use case in all contexts.

So it makes sense to review some of the technical options that are
available to help make projects like PythonQL more maintainable,
without necessarily gating improvements to them on the relatively slow
update and rollout cycle of new Python versions.

= Option 1 =

Fully commit to the model of allowing alternate syntactic dialects to
run atop Python interpreters. In Hylang and PythonQL we have at least
two genuinely interesting examples of that working through the text
encoding system, as well as other examples like Cython that work
through the extension module system.

So that's an opportunity to take this from "Possible, but a bit hacky"
to "Pluggable source code translation is supported at all levels of
the interpreter, including debugger source maps, etc" (perhaps by
borrowing ideas from other ecosytems like Java, JavaScript, and .NET,
where this kind of thing is already a lot more common.

The downside of this approach is that actually making it happen would
be getting pretty far afield from the original PythonQL goal of
"provide nicer data manipulation abstractions in Python", and it
wouldn't actually deliver anything new that can't already be done with
existing import and codec system features.

= Option 2 =

Back when f-strings were added for 3.6, I wrote PEP 501 to generalise
the idea as "i-strings": exposing the intermediate interpolated form
of f-strings, such that you could write code like `myquery =
sql(i"SELECT {column} FROM {table};")` where the "sql" function
received an "InterpolationTemplate" object that it could render
however it wanted, but the "column" and "table" references were just
regular Python expressions.

It's currently deferred indefinitely, as I didn't have any concrete
use cases that Guido found sufficiently compelling to make the
additional complexity worthwhile. However, given optionally delayed
rendering of interpolated strings, PythonQL could be used in the form:

    result =pyql(i"""
        (x,y)
        for x in {range(1,8)}
        for y in {range(1,7)}
        if x % 2 == 0 and
           y % 2 != 0 and
           x > y
    """)

I personally like this idea (otherwise I wouldn't have written PEP 501
in the first place), and the necessary technical underpinnings to
enable it are all largely already in place to support f-strings. If
the PEP were revised to show examples of using it to support
relatively seamless calling back and forth between Hylang, PythonQL
and regular Python code in the same process, that might be intriguing
enough to pique Guido's interest (and I'm open to adding co-authors
that are interested in pursuing that).

Option 3:

Go all the way to expanding comprehensions to natively be a full data
manipulation DSL.

I'm personally not a fan of that approach, as syntax is really hard to
search for help on (keywords are better for that than punctuation, but
not by much), while methods and functions get to have docstrings. It
also means the query language gets tightly coupled to the Python
grammar, which not only makes the query language difficult to update,
but also makes Python's base syntax harder for new users to learn.

By contrast, when DSLs are handled as interpolation templates with
delayed rendering, then the rendering function gets to provide runtime
documentation, and the definition of the DSL is coupled to the update
cycle of the rendering function, *not* that of the Python language
definition.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal: Query language extension to Python (PythonQL)

Reply via email to