On Sun, Mar 26, 2017 at 10:02 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 26 March 2017 at 21:40, Pavel Velikhov <pavel.velik...@gmail.com> > wrote: > > On 25 Mar 2017, at 19:40, Nick Coghlan <ncogh...@gmail.com> wrote: > >> Right, the target audience here *isn't* folks who already know how to > >> construct their own relational queries in SQL, and it definitely isn't > >> folks that know how to tweak their queries to get optimal performance > >> from the specific database they're using. Rather, it's folks that > >> already know Python's comprehensions, and perhaps some of the > >> itertools features, and helping to provide them with a smoother > >> on-ramp into the world of relational data processing. > > > > > > > Actually I myself am a user of PythonQL, even though I’m an SQL expert. > I work in data science, so > > I do a lot of ad-hoc querying and we always get some new datasets we > need to check out and work with. > > Some things like nested data models are also much better handled by > PythonQL, and data like > > JSON or XML will also be easier to handle. > > So perhaps a better way of framing it would be to say that PythonQL > aims to provide a middle ground between interfaces that are fully in > "Python mode" (e.g ORMs, pandas DataFrames), where the primary > interface is methods-on-objects, and those that are fully in "data > manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs). > > At the Python level, success for PythonQL would look like people being > able to seamlessly transfer their data manipulation skills from a > Django ORM project to an SQL Alchemy project to a pandas analysis > project to a distributed data analysis project in dask, without their > data manipulation code really having to change - only the backing data > structures and the runtime performance characteristics would differ. > e.g. Django ORM to SQLAlchemy: - Does this necessarily imply a metamodel for relations? - Django: GenericForeignKey - SQLAlchemy: sqlalchemy_utils.generic_relationship ... > At the data manipulation layer, success for PythonQL would look like > people being able to easily get "good enough" performance for one-off > scripts, regardless of the backing data store, with closer attention > to detail only being needed for genuinely large data sets (where > efficiency matters even for one-off analyses), or for frequently > repeated operations (where wasted CPU hours show up as increased > infrastructure expenses). > http://pandas.pydata.org/pandas-docs/stable/ecosystem.html#out-of-core (dask, blaze, odo, ) http://blaze.pydata.org/ - blaze - | Src: https://github.com/blaze/blaze - | Docs: https://blaze.readthedocs.io/en/latest/rosetta-pandas.html - | Docs: https://blaze.readthedocs.io/en/latest/rosetta-sql.html - | Docs: https://blaze.readthedocs.io/en/latest/backends.html - "Python, Pandas, SQLAlchemy, MongoDB, PyTables, and Spark" - dask - | Src: https://github.com/dask/dask - | Docs: http://dask.pydata.org/en/latest/#familiar-user-interface ** - | Docs: http://dask.pydata.org/en/latest/scheduler-choice.html - http://xarray.pydata.org/en/stable/dask.html - odo - | Src: https://github.com/blaze/blaze#odo - | Docs: https://odo.readthedocs.io/en/latest/#formats - zero-copy - https://www.slideshare.net/wesm/memory-interoperability-in-analytics-and-machine-learning - https://github.com/alex/zero_buffer/blob/master/zero_buffer.py - ibis - | Src: https://github.com/cloudera/ibis - (a backend may compile to LLVM) - seeAlso: blaze, dask, "bulk synchronous parallel" - | Docs: http://docs.ibis-project.org/sql.html - | Docs: http://docs.ibis-project.org/tutorial.html "Expression tutortial" jupyter notebooks - | Docs: http://docs.ibis-project.org/ - Apache Impala (incubating) - Apache Kudu (incubating) - Hadoop Distributed File System (HDFS) - PostgreSQL (Experimental) - SQLite - [ SQLAlchemy: { ... } ] - | Src: https://github.com/cloudera/ibis/blob/master/ibis/sql/alchemy.py - apache beam - https://beam.apache.org/documentation/sdks/python/ - https://beam.apache.org/get-started/quickstart-py/ (pip install apache-beam) - https://beam.apache.org/documentation/sdks/pydoc/0.6.0/ - apache_beam.transforms - https://beam.apache.org/documentation/programming-guide/#transforms "Applying transforms" Somewhere in this list, these become big data tools. > >> There's no question that folks dealing with sufficiently large data > >> sets with sufficiently stringent performance requirements are > >> eventually going to want to reach for handcrafted SQL or a distributed > >> computation framework like dask, but that's not really any different > >> from our standard position that when folks are attempting to optimise > >> a hot loop, they're eventually going to have to switch to something > >> that can eliminate the interpreter's default runtime object management > >> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an > >> extension module in a different language entirely). It isn't an > >> argument against making it easier for folks to postpone the point > >> where they find it necessary to reach for the "something else" that > >> takes them beyond Python's default capabilities. > > > > Don’t know, for example one of the wrappers is going to be an Apache > Spark > > wrappers, so you could quickly hack up a PythonQL query that would be run > > on a distributed platform. > > Right, I meant this in the same sense that folks using an ORM like SQL > Alchemy may eventually hit a point where rather than trying to > convince the ORM to emit the SQL they want to run, it's easier to just > bypass the ORM layer and write the exact SQL they want. > At that point one can either: - reflect the tables/mappings at devtime - reflect the tables/mappings at runtime And then run the raw DBAPI query (using appropriate query interpolation (-> i-strings and scoped configuration state)): session.execute("SELECT dbapi_version FROM ?", "tbl;name") > It's worthwhile attempting to reduce the number of cases where folks > feel obliged to do that, but at the same time, abstraction layers need > to hide at least some lower level details if they're going to actually > work properly. > > >> = Option 1 = > >> > >> Fully commit to the model of allowing alternate syntactic dialects to > >> run atop Python interpreters. In Hylang and PythonQL we have at least > >> two genuinely interesting examples of that working through the text > >> encoding system, as well as other examples like Cython that work > >> through the extension module system. > >> > >> So that's an opportunity to take this from "Possible, but a bit hacky" > >> to "Pluggable source code translation is supported at all levels of > >> the interpreter, including debugger source maps, etc" (perhaps by > >> borrowing ideas from other ecosytems like Java, JavaScript, and .NET, > >> where this kind of thing is already a lot more common. > >> > >> The downside of this approach is that actually making it happen would > >> be getting pretty far afield from the original PythonQL goal of > >> "provide nicer data manipulation abstractions in Python", and it > >> wouldn't actually deliver anything new that can't already be done with > >> existing import and codec system features. > > > This would be great anyways, if we could rely on some preprocessor > directive, > > instead of hacking encodings, this could be nice. > > Victor Stinner wrote up some ideas about that in PEP 511: > https://www.python.org/dev/peps/pep-0511/ > > Preprocessing is one of the specific uses cases considered: > https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor > > >> = Option 2 = > >> > >> ... given optionally delayed > >> rendering of interpolated strings, PythonQL could be used in the form: > >> > >> result =pyql(i""" > >> (x,y) > >> for x in {range(1,8)} > >> for y in {range(1,7)} > >> if x % 2 == 0 and > >> y % 2 != 0 and > >> x > y > >> """) > >> > >> I personally like this idea (otherwise I wouldn't have written PEP 501 > >> in the first place), and the necessary technical underpinnings to > >> enable it are all largely already in place to support f-strings. If > >> the PEP were revised to show examples of using it to support > >> relatively seamless calling back and forth between Hylang, PythonQL > >> and regular Python code in the same process, that might be intriguing > >> enough to pique Guido's interest (and I'm open to adding co-authors > >> that are interested in pursuing that). > > > > What would be the difference between this and just executing a PythonQL > > string for us, getting local and global variables into PythonQL scope? > > The big new technical capability that f-strings introduced is that the > compiler can see the variable references in the embedded expressions, > so f-strings "just work" with closure references, whereas passing > locals() and globals() explicitly is: > > 1. slow (since you have to generate a full locals dict); > 2. incompatible with the use of closure variables (since they're not > visible in either locals() *or* globals()) > > The i-strings concept takes that closure-compatible interpolation > capability and separates it from the str.format based rendering step. > > From a speed perspective, the interpolation aspects of this approach > are so efficient they rival simple string concatenation: > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'first + second' > ..................... > Mean +- std dev: 71.7 ns +- 2.1 ns > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'f"{first}{second}"' > ..................... > Mean +- std dev: 77.8 ns +- 2.5 ns > > Something like pyql that did more than just concatenate the text > sections with the text values of the embedded expressions would still > need some form of regex-style caching strategy to avoid parsing the > same query string multiple times, but the Python interpreter would > handle the task of breaking up the string into the text sections and > the interpolated Python expressions. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/