Re: [Python-ideas] Proposal: Query language extension to Python (PythonQL)

Pavel Velikhov Sun, 26 Mar 2017 09:32:39 -0700

Hi Nick!

> On 26 Mar 2017, at 18:02, Nick Coghlan <ncogh...@gmail.com> wrote:
> 
> On 26 March 2017 at 21:40, Pavel Velikhov <pavel.velik...@gmail.com> wrote:
>> On 25 Mar 2017, at 19:40, Nick Coghlan <ncogh...@gmail.com> wrote:
>>> Right, the target audience here *isn't* folks who already know how to
>>> construct their own relational queries in SQL, and it definitely isn't
>>> folks that know how to tweak their queries to get optimal performance
>>> from the specific database they're using. Rather, it's folks that
>>> already know Python's comprehensions, and perhaps some of the
>>> itertools features, and helping to provide them with a smoother
>>> on-ramp into the world of relational data processing.
>> 
>> 
>> Actually I myself am a user of PythonQL, even though I’m an SQL expert. I 
>> work in data science, so
>> I do a lot of ad-hoc querying and we always get some new datasets we need to 
>> check out and work with.
>> Some things like nested data models are also much better handled by 
>> PythonQL, and data like
>> JSON or XML will also be easier to handle.
> 
> So perhaps a better way of framing it would be to say that PythonQL
> aims to provide a middle ground between interfaces that are fully in
> "Python mode" (e.g ORMs, pandas DataFrames), where the primary
> interface is methods-on-objects, and those that are fully in "data
> manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs).
> 
> At the Python level, success for PythonQL would look like people being
> able to seamlessly transfer their data manipulation skills from a
> Django ORM project to an SQL Alchemy project to a pandas analysis
> project to a distributed data analysis project in dask, without their
> data manipulation code really having to change - only the backing data
> structures and the runtime performance characteristics would differ.
> 
> At the data manipulation layer, success for PythonQL would look like
> people being able to easily get "good enough" performance for one-off
> scripts, regardless of the backing data store, with closer attention
> to detail only being needed for genuinely large data sets (where
> efficiency matters even for one-off analyses), or for frequently
> repeated operations (where wasted CPU hours show up as increased
> infrastructure expenses).


Yes, more in this line. It is possible for us to provide decent-looking hints 
for
query optimization and we are planning a sophisticated optimizer in the future,
but especially in the beginning of the project this sounds quite fair.

> 
>>> There's no question that folks dealing with sufficiently large data
>>> sets with sufficiently stringent performance requirements are
>>> eventually going to want to reach for handcrafted SQL or a distributed
>>> computation framework like dask, but that's not really any different
>>> from our standard position that when folks are attempting to optimise
>>> a hot loop, they're eventually going to have to switch to something
>>> that can eliminate the interpreter's default runtime object management
>>> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an
>>> extension module in a different language entirely). It isn't an
>>> argument against making it easier for folks to postpone the point
>>> where they find it necessary to reach for the "something else" that
>>> takes them beyond Python's default capabilities.
>> 
>> Don’t know, for example one of the wrappers is going to be an Apache Spark
>> wrappers, so you could quickly hack up a PythonQL query that would be run
>> on a distributed platform.
> 
> Right, I meant this in the same sense that folks using an ORM like SQL
> Alchemy may eventually hit a point where rather than trying to
> convince the ORM to emit the SQL they want to run, it's easier to just
> bypass the ORM layer and write the exact SQL they want.
> 
> It's worthwhile attempting to reduce the number of cases where folks
> feel obliged to do that, but at the same time, abstraction layers need
> to hide at least some lower level details if they're going to actually
> work properly.

> 
>>> = Option 1 =
>>> 
>>> Fully commit to the model of allowing alternate syntactic dialects to
>>> run atop Python interpreters. In Hylang and PythonQL we have at least
>>> two genuinely interesting examples of that working through the text
>>> encoding system, as well as other examples like Cython that work
>>> through the extension module system.
>>> 
>>> So that's an opportunity to take this from "Possible, but a bit hacky"
>>> to "Pluggable source code translation is supported at all levels of
>>> the interpreter, including debugger source maps, etc" (perhaps by
>>> borrowing ideas from other ecosytems like Java, JavaScript, and .NET,
>>> where this kind of thing is already a lot more common.
>>> 
>>> The downside of this approach is that actually making it happen would
>>> be getting pretty far afield from the original PythonQL goal of
>>> "provide nicer data manipulation abstractions in Python", and it
>>> wouldn't actually deliver anything new that can't already be done with
>>> existing import and codec system features.
>> 
>> This would be great anyways, if we could rely on some preprocessor directive,
>> instead of hacking encodings, this could be nice.
> 
> Victor Stinner wrote up some ideas about that in PEP 511:
> https://www.python.org/dev/peps/pep-0511/
> 
> Preprocessing is one of the specific uses cases considered:
> https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor
> 
>>> = Option 2 =
>>> 
>>> ... given optionally delayed
>>> rendering of interpolated strings, PythonQL could be used in the form:
>>> 
>>>   result =pyql(i"""
>>>       (x,y)
>>>       for x in {range(1,8)}
>>>       for y in {range(1,7)}
>>>       if x % 2 == 0 and
>>>          y % 2 != 0 and
>>>          x > y
>>>   """)
>>> 
>>> I personally like this idea (otherwise I wouldn't have written PEP 501
>>> in the first place), and the necessary technical underpinnings to
>>> enable it are all largely already in place to support f-strings. If
>>> the PEP were revised to show examples of using it to support
>>> relatively seamless calling back and forth between Hylang, PythonQL
>>> and regular Python code in the same process, that might be intriguing
>>> enough to pique Guido's interest (and I'm open to adding co-authors
>>> that are interested in pursuing that).
>> 
>> What would be the difference between this and just executing a PythonQL
>> string for us, getting local and global variables into PythonQL scope?
> 
> The big new technical capability that f-strings introduced is that the
> compiler can see the variable references in the embedded expressions,
> so f-strings "just work" with closure references, whereas passing
> locals() and globals() explicitly is:
> 
> 1. slow (since you have to generate a full locals dict);
> 2. incompatible with the use of closure variables (since they're not
> visible in either locals() *or* globals())
> 
> The i-strings concept takes that closure-compatible interpolation
> capability and separates it from the str.format based rendering step.
> 
> From a speed perspective, the interpolation aspects of this approach
> are so efficient they rival simple string concatenation:
> 
> $ python -m perf timeit -s 'first = "Hello"; second = " World!"'
> 'first + second'
> .....................
> Mean +- std dev: 71.7 ns +- 2.1 ns
> 
> $ python -m perf timeit -s 'first = "Hello"; second = " World!"'
> 'f"{first}{second}"'
> .....................
> Mean +- std dev: 77.8 ns +- 2.5 ns
> 
> Something like pyql that did more than just concatenate the text
> sections with the text values of the embedded expressions would still
> need some form of regex-style caching strategy to avoid parsing the
> same query string multiple times, but the Python interpreter would
> handle the task of breaking up the string into the text sections and
> the interpolated Python expressions.


Thanks, will start following this proposal!

> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal: Query language extension to Python (PythonQL)

Reply via email to