Typo on the airflow link: it's https://issues.apache.org/jira/browse/AIRFLOW-746 Sorry
Le mer. 11 janv. 2017 à 15:48, Gael Magnan <gaelmag...@gmail.com> a écrit : > Thanks for all the information, > > currently what I have is working iso with what was existing, just need to > add some doc and tests, but I have a few questions. > > Currently airflow/urtils/db.py initdb function creates a lot of > connections by default. > I understand that they are used for the tests, but why are they created > even on production environment? > Looking into that made me realize that we can create connections of a type > that is not defined, (ex: beeline, aws, ...) by code just like you said but > it's to the best of my knowledge impossible with the UI. > > By the way, should I create the beeline and aws connection types? So > everyone can create them through the UI? > > I've created a JIRA issue: > https://issues.apache.org/jira/browse/AIRFLOW-235, and a first draft of > the PR: https://github.com/apache/incubator-airflow/pull/1981, more tests > and doc coming + all the changes depending on where this discussion takes > us. > > > Le mar. 10 janv. 2017 à 21:55, Maxime Beauchemin < > maximebeauche...@gmail.com> a écrit : > > Note that the `Conection.extra` attribute was designed to store anything > that is specific to a connection type. Here's an example of how it's used > in the HiveHook: > > https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L75 > > Also note that currently there's no enforcement of connection type that > match a hook, so you can use any connection with any hook. So you could set > a new connection for mongo with no type at all and still use your local > MongoHook without changing the Airflow source code. > > The one part that is missing is the association from the connection to the > hook here in Connection.get_hook > > https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L620 > This allows getting the hook from the connection, and allows for the Data > Profiling section of the UI to know how to execute a query against that > connection type. > > Max > > On Tue, Jan 10, 2017 at 2:58 AM, Gael Magnan <gaelmag...@gmail.com> wrote: > > > @George Leslie-Waksman: > > It could be done, but since it's marked as deprecated everywhere and > Alex > > Van Boxel asked me tu used entry points I'm not sure we should do both. > > > > @Jeremiah Lowin: > > I was thinking not changing much of the api, juste the implementation > > behind it. > > So for example, adding the mongodb connection type would just be doing > > the following (assuming we have a MongodbHook): > > ``` > > from airflow.models import BaseConnectionType > > > > class MongodbConnectionType(BaseConnectionType): > > name = 'mongodb' > > description = 'MongoDB' > > > > def get_hook(conn_id): > > try: > > from airflow.hooks.mongodb_hook import MongodbHook > > return MongodbHook(conn_id=conn_id) > > except: > > pass > > ``` > > > > and if we want to handle the query params (and we do) we could just do > > something like overriding the parse_from_uri method (not final code): > > ``` > > @classmethod > > def parse_from_uri(cls, uri): > > temp_uri = urlparse(uri) > > (host, schema, login, password, port, > > extras) = super().parse_from_uri(cls, uri) > > extras = dict([param.split('=', 1) for param in > > temp_uri.query.split('&')]) > > return host, schema, login, password, port, extras > > ``` > > > > So if you defined a connection using an uri the way you are able now, > and > > this URI started with mongodb, > > it would be extracted and a connection created. > > > > > > @Maxime Beauchemin: > > By can't easily I meant that to had a connection type you have to go > > through making a pr to airflow, you can't just use a plugin. > > I think all you have to do to add a new one is modify the Connection > > class in models. But it's still hard coded. > > > > I agree with you, we could easily use just a data structure for most of > > the connections, especially the ones existing now, > > but it would make it harder to add a connection whose uri format is not > > consistant with the other ones (not to mention ones that don't use uri), > > except if we also handle a string format that would parse the uri and > set > > the right fields. > > > > Le mar. 10 janv. 2017 à 01:35, Maxime Beauchemin < > > maximebeauche...@gmail.com> > > a écrit : > > > > > About *"you can't **easily add a connection type without modifying the > > > airflow code source*", can we start by listing out the places that need > > to > > > be touched up when adding a connection? > > > > > > Here's what I could find: > > > https://github.com/apache/incubator-airflow/blob/master/ > > > airflow/models.py#L516 > > > https://github.com/apache/incubator-airflow/blob/master/ > > > airflow/utils/db.py#L109 > > > https://github.com/apache/incubator-airflow/blob/master/ > > > airflow/models.py#L620 > > > > > > I didn't dig super deep but it seems like what we need is a > configurable > > > connection configuration blob that informs the connection type name, > the > > > verbose name and the related hook location. > > > > > > Maybe a solution is to have a `CONNECTION_TYPES` object in settings.py > > that > > > could be altered in different environments. This CONNECTION_TYPE would > > > store the name, the verbose_name and the related hook's path as a > string > > > `airflow.hooks.mysql_hook.MySqlHook` or something like that. It may not > > be > > > that far from what you were proposing, and whether it should be a class > > or > > > a data structure is arguable, but I'd like to keep configuration > elements > > > easily serializable so I'd vote for a data structure using only > > primitives. > > > > > > Max > > > > > > On Mon, Jan 9, 2017 at 5:13 AM, Alex Van Boxel <a...@vanboxel.be> > wrote: > > > > > > > I was actually going to propose something different with > entry-points, > > > but > > > > your requirement beat me to it (but that's ok :-). Actually I think > > with > > > > this mechanism people would be able to extend Airflow connection > > > mechanism > > > > (and later other stuff) by doing *pip install > > > airflow-sexy-new-connection* > > > > (for example). > > > > > > > > On Mon, Jan 9, 2017 at 1:39 PM Gael Magnan <gaelmag...@gmail.com> > > wrote: > > > > > > > > > Thank you for the read, I'm gonna look at it, it's probably gonna > be > > > > better > > > > > that what I have. > > > > > > > > > > Point taken about the URI, I'll see if i can find something generic > > > > enough > > > > > to handle all those cases. > > > > > > > > > > Le lun. 9 janv. 2017 à 13:36, Alex Van Boxel <a...@vanboxel.be> a > > > écrit > > > > : > > > > > > > > > > > Thanks a lot, yes it clarifies a lot and I do agree you really > need > > > to > > > > > hack > > > > > > inside Airflow to add a Connection type. While you're working at > > this > > > > > could > > > > > > you have a look at the standard python *entry-point mechanism* > for > > > > > > registering Connection types/components. > > > > > > > > > > > > A good read on this: > > > > > > > > > > > > > > > > > http://docs.pylonsproject.org/projects/pylons-webframework/ > > > > en/latest/advanced_pylons/entry_points_and_plugins.html > > > > > > > > > > > > My first though would be that just by adding an entry to the > > factory > > > > > method > > > > > > would be enough to register your Connection + ConnectionType and > > UI. > > > > > > > > > > > > Also note that not everything works with a URI. The Google Cloud > > > > > Connection > > > > > > doesn't have one, it uses a secret key file stored on disk, so > > don't > > > > > force > > > > > > every connection type to work with URI's. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 9, 2017 at 1:15 PM Gael Magnan <gaelmag...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > Yes sure, > > > > > > > > > > > > > > The question was the following: > > > > > > > "I was looking at the code of the connections, and I realized > you > > > > can't > > > > > > > easily add a connection type without modifying the airflow code > > > > > source. I > > > > > > > wanted to create a mongodb connection type, but I think the > best > > > > > approche > > > > > > > would be to refactor connections first. Thoughts anyone?" > > > > > > > > > > > > > > The answer of Bolke de Bruin was: "making it more generic would > > be > > > > > > > appreciated" > > > > > > > > > > > > > > So basically the way the code is set up actually every types of > > > > > > connection > > > > > > > existing is defined though a list in the Connection class. It > > > > > implements > > > > > > > exactly the same code for parsing uri to get connections info > and > > > > > doesn't > > > > > > > allow for a simple way to get back the uri from the connection > > > infos. > > > > > > > > > > > > > > I need to add a mongodb connection and a way to get it back as > a > > > uri, > > > > > so > > > > > > i > > > > > > > could use an other type of connection and play around with that > > or > > > > > juste > > > > > > > add one more hard coded connection type, but I though this > might > > be > > > > > > > something that comes back regularly and having a simple way to > > plug > > > > in > > > > > > new > > > > > > > types of connection would make it easier for anyone to > > contribute a > > > > new > > > > > > > connection type. > > > > > > > > > > > > > > Hope this clarifies my proposal. > > > > > > > > > > > > > > Le lun. 9 janv. 2017 à 12:46, Alex Van Boxel <a...@vanboxel.be > > > > a > > > > > écrit > > > > > > : > > > > > > > > > > > > > > > Hey Gael, > > > > > > > > > > > > > > > > could you please recap the question here and provide some > > > context. > > > > > Not > > > > > > > > everyone on the mailinglist is actively following Gitter, > > > including > > > > > me. > > > > > > > > With some context it would be easier to give feedback. > Thanks. > > > > > > > > > > > > > > > > On Mon, Jan 9, 2017 at 11:15 AM Gael Magnan < > > > gaelmag...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > following my question on gitter the other day and the > > response > > > > from > > > > > > > Bolke > > > > > > > > > de Bruin, I've started working on refactoring the > connections > > > in > > > > > > > airflow. > > > > > > > > > > > > > > > > > > Before submitting a PR I wanted to share my proposal with > you > > > and > > > > > get > > > > > > > > > feedbacks. > > > > > > > > > > > > > > > > > > The idea is quite simple, I've divided the Connection class > > in > > > > two, > > > > > > > > > Connection and ConnectionType, connection has the same > > > interface > > > > it > > > > > > had > > > > > > > > > before plus a few methods, but the class keeps a reference > > to a > > > > > > > > dictionary > > > > > > > > > of registered ConnectionType. It delegates the work of > > parsing > > > > from > > > > > > > URI, > > > > > > > > > formatting to URI (added) and getting the hook to the > > > > > ConnectionType > > > > > > > > > associated with the conn_type. > > > > > > > > > > > > > > > > > > I've thought of two ways of registering new > ConnectionTypes, > > > the > > > > > > first > > > > > > > is > > > > > > > > > making the BaseConnectionType use a metaclass that > registered > > > any > > > > > new > > > > > > > > > ConnectionType with Connection when the class is declared, > it > > > > would > > > > > > > > require > > > > > > > > > the less work to extend the connection module, as just > > > importing > > > > > the > > > > > > > file > > > > > > > > > with the connection would do the trick. > > > > > > > > > The second one is juste to have a function/classmethod that > > you > > > > > call > > > > > > > > > manually to register your connection. It would be simpler > to > > > > > > understand > > > > > > > > but > > > > > > > > > requires more work every time you create a new > > ConnectionType. > > > > > > > > > > > > > > > > > > Hope this proposal is clear enough, and I'm waiting for > > > feebacks > > > > > and > > > > > > > > > possible improvements. > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > Gael Magnan de Bornier > > > > > > > > > > > > > > > > > -- > > > > > > > > _/ > > > > > > > > _/ Alex Van Boxel > > > > > > > > > > > > > > > > > > > > > -- > > > > > > _/ > > > > > > _/ Alex Van Boxel > > > > > > > > > > > > > > > -- > > > > _/ > > > > _/ Alex Van Boxel > > > > > > > > > > >