[ 
https://issues.apache.org/jira/browse/AIRFLOW-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3615.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.10.3

> Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 
> -> 3.5 (but not in 3.6) 
> ------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3615
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3615
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Jarek Potiuk
>            Assignee: Kamil Bregula
>            Priority: Major
>             Fix For: 1.10.3
>
>
> There is a problem with case sensitivity of parsing URI for database 
> connections which are using local UNIX sockets rather than TCP connection.
> In case of local UNIX sockets the hostname part of the URI contains 
> url-encoded local socket path rather than actual hostname and in case this 
> path contains uppercase characters, urlparse will deliberately lowercase them 
> when parsing. This is perfectly fine for hostnames (according to 
> [https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation 
> should be done for hostnames.
> However urlparse still uses hostname if the URI does not contain host but 
> only local path (i.e. when the location starts with %2F ("/")). What's more - 
> the host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this 
> is somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the 
> hostname is not normalized to lowercase any more ! - see below snippets 
> showing the behaviours for different python versions) .
> In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get 
> the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it 
> is done via hostname. There is no other, reliable way when using urlparse 
> because the path can also contain 'authority' (user/password) and this is 
> urlparse's job to separate them out. The Airflow's Connection similarly does 
> not make a distinction of TCP vs. local socket connection and it uses host 
> field to store the  socket path (it's case sensitive however). So you can use 
> UPPERCASE when you define connection in the database, but this is a problem 
> for parsing connections from environment variables, because we currently 
> cannot pass a URI where socket path contains UPPERCASE characters.
> Since urlparse is really there to parse URLs and it is not good for parsing 
> non-URL URIs - we should likely use different parser which handles more 
> generic URIs - including non-lowercasing path for all versions of python.
> I think we could also consider adding local path to Connection model and use 
> it instead of hostname to store the socket path. This approach would be the 
> "correct" one, but it might introduce some compatibility issues, so maybe 
> it's not worth, considering that host is case sensitive in Airflow.
> Snippet showing urlparse behaviour in different python versions:
> {quote}Python 2.7.10 (default, Aug 17 2018, 19:45:58)
>  [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urlparse import urlparse,unquote
>  >>> conn = urlparse("http://AAA";)
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA";)
>  >>> conn.hostname
>  '%2faaa'
> {quote}
>  
> {quote}Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
>  [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urlparse import urlparse,unquote
>  Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  ImportError: No module named 'urlparse'
>  >>> from urllib.parse import urlparse,unquote
>  >>> conn = urlparse("http://AAA";)
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA";)
>  >>> conn.hostname
>  '%2faaa'
> {quote}
>  
> {quote}Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14)
>  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
>  Type "help", "copyright", "credits" or "license" for more information.
>  >>> from urllib.parse import urlparse,unquote
>  >>> conn = urlparse("http://AAA";)
>  >>> conn.hostname
>  'aaa'
>  >>> conn = urlparse("http://%2FAAA";)
>  >>> conn.hostname
>  {color:#ff0000}'%2FAAA'{color}
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to