Hi Nick! > On 26 Mar 2017, at 18:02, Nick Coghlan <ncogh...@gmail.com> wrote: > > On 26 March 2017 at 21:40, Pavel Velikhov <pavel.velik...@gmail.com> wrote: >> On 25 Mar 2017, at 19:40, Nick Coghlan <ncogh...@gmail.com> wrote: >>> Right, the target audience here *isn't* folks who already know how to >>> construct their own relational queries in SQL, and it definitely isn't >>> folks that know how to tweak their queries to get optimal performance >>> from the specific database they're using. Rather, it's folks that >>> already know Python's comprehensions, and perhaps some of the >>> itertools features, and helping to provide them with a smoother >>> on-ramp into the world of relational data processing. >> >> >> Actually I myself am a user of PythonQL, even though I’m an SQL expert. I >> work in data science, so >> I do a lot of ad-hoc querying and we always get some new datasets we need to >> check out and work with. >> Some things like nested data models are also much better handled by >> PythonQL, and data like >> JSON or XML will also be easier to handle. > > So perhaps a better way of framing it would be to say that PythonQL > aims to provide a middle ground between interfaces that are fully in > "Python mode" (e.g ORMs, pandas DataFrames), where the primary > interface is methods-on-objects, and those that are fully in "data > manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs). > > At the Python level, success for PythonQL would look like people being > able to seamlessly transfer their data manipulation skills from a > Django ORM project to an SQL Alchemy project to a pandas analysis > project to a distributed data analysis project in dask, without their > data manipulation code really having to change - only the backing data > structures and the runtime performance characteristics would differ. > > At the data manipulation layer, success for PythonQL would look like > people being able to easily get "good enough" performance for one-off > scripts, regardless of the backing data store, with closer attention > to detail only being needed for genuinely large data sets (where > efficiency matters even for one-off analyses), or for frequently > repeated operations (where wasted CPU hours show up as increased > infrastructure expenses).
Yes, more in this line. It is possible for us to provide decent-looking hints for query optimization and we are planning a sophisticated optimizer in the future, but especially in the beginning of the project this sounds quite fair. > >>> There's no question that folks dealing with sufficiently large data >>> sets with sufficiently stringent performance requirements are >>> eventually going to want to reach for handcrafted SQL or a distributed >>> computation framework like dask, but that's not really any different >>> from our standard position that when folks are attempting to optimise >>> a hot loop, they're eventually going to have to switch to something >>> that can eliminate the interpreter's default runtime object management >>> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an >>> extension module in a different language entirely). It isn't an >>> argument against making it easier for folks to postpone the point >>> where they find it necessary to reach for the "something else" that >>> takes them beyond Python's default capabilities. >> >> Don’t know, for example one of the wrappers is going to be an Apache Spark >> wrappers, so you could quickly hack up a PythonQL query that would be run >> on a distributed platform. > > Right, I meant this in the same sense that folks using an ORM like SQL > Alchemy may eventually hit a point where rather than trying to > convince the ORM to emit the SQL they want to run, it's easier to just > bypass the ORM layer and write the exact SQL they want. > > It's worthwhile attempting to reduce the number of cases where folks > feel obliged to do that, but at the same time, abstraction layers need > to hide at least some lower level details if they're going to actually > work properly. > >>> = Option 1 = >>> >>> Fully commit to the model of allowing alternate syntactic dialects to >>> run atop Python interpreters. In Hylang and PythonQL we have at least >>> two genuinely interesting examples of that working through the text >>> encoding system, as well as other examples like Cython that work >>> through the extension module system. >>> >>> So that's an opportunity to take this from "Possible, but a bit hacky" >>> to "Pluggable source code translation is supported at all levels of >>> the interpreter, including debugger source maps, etc" (perhaps by >>> borrowing ideas from other ecosytems like Java, JavaScript, and .NET, >>> where this kind of thing is already a lot more common. >>> >>> The downside of this approach is that actually making it happen would >>> be getting pretty far afield from the original PythonQL goal of >>> "provide nicer data manipulation abstractions in Python", and it >>> wouldn't actually deliver anything new that can't already be done with >>> existing import and codec system features. >> >> This would be great anyways, if we could rely on some preprocessor directive, >> instead of hacking encodings, this could be nice. > > Victor Stinner wrote up some ideas about that in PEP 511: > https://www.python.org/dev/peps/pep-0511/ > > Preprocessing is one of the specific uses cases considered: > https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor > >>> = Option 2 = >>> >>> ... given optionally delayed >>> rendering of interpolated strings, PythonQL could be used in the form: >>> >>> result =pyql(i""" >>> (x,y) >>> for x in {range(1,8)} >>> for y in {range(1,7)} >>> if x % 2 == 0 and >>> y % 2 != 0 and >>> x > y >>> """) >>> >>> I personally like this idea (otherwise I wouldn't have written PEP 501 >>> in the first place), and the necessary technical underpinnings to >>> enable it are all largely already in place to support f-strings. If >>> the PEP were revised to show examples of using it to support >>> relatively seamless calling back and forth between Hylang, PythonQL >>> and regular Python code in the same process, that might be intriguing >>> enough to pique Guido's interest (and I'm open to adding co-authors >>> that are interested in pursuing that). >> >> What would be the difference between this and just executing a PythonQL >> string for us, getting local and global variables into PythonQL scope? > > The big new technical capability that f-strings introduced is that the > compiler can see the variable references in the embedded expressions, > so f-strings "just work" with closure references, whereas passing > locals() and globals() explicitly is: > > 1. slow (since you have to generate a full locals dict); > 2. incompatible with the use of closure variables (since they're not > visible in either locals() *or* globals()) > > The i-strings concept takes that closure-compatible interpolation > capability and separates it from the str.format based rendering step. > > From a speed perspective, the interpolation aspects of this approach > are so efficient they rival simple string concatenation: > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'first + second' > ..................... > Mean +- std dev: 71.7 ns +- 2.1 ns > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'f"{first}{second}"' > ..................... > Mean +- std dev: 77.8 ns +- 2.5 ns > > Something like pyql that did more than just concatenate the text > sections with the text values of the embedded expressions would still > need some form of regex-style caching strategy to avoid parsing the > same query string multiple times, but the Python interpreter would > handle the task of breaking up the string into the text sections and > the interpolated Python expressions. Thanks, will start following this proposal! > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/