[ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:
--------------------------------
    Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from 
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from 
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from 
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl
<module 'sasl' from 
'/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
>>> import requests
>>> requests
<module 'requests' from 
'/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
>>> import Logging
>>> Logging
<module 'Logging' from '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
>>> import thrift
>>> thrift
<module 'thrift' from 
'/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.


> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9489
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9489
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>    Affects Versions: Impala 3.4.0
>            Reporter: David Knupp
>            Assignee: David Knupp
>            Priority: Major
>
> [Note: this JIRA was filed in relation to the ongoing effort to make the 
> impala-shell compatible with python 3]
> The impala python development environment is a fairly convoluted affair -- a 
> number of packages are installed in the infra/python/env, some of it comes 
> from the toolchain, some of it is generated and lives in the shell directory. 
> Generally speaking, if you launch impala-python and import a module, it's not 
> necessarily easy to predict where the module might live.
> {noformat}
> $ python
> Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sasl
> >>> sasl
> <module 'sasl' from 
> '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
> >>> import requests
> >>> requests
> <module 'requests' from 
> '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
> >>> import Logging
> >>> Logging
> <module 'Logging' from 
> '/home/systest/Impala/shell/gen-py/Logging/__init__.pyc'>
> >>> import thrift
> >>> thrift
> <module 'thrift' from 
> '/home/systest/Impala/toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/__init__.pyc'>
> {noformat}
> Really, there is no one coherent environment -- there's just whatever 
> collection of modules happens to be available at a given time for a given 
> type of invocation, all of which is accomplished behind the scenes by calling 
> scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} 
> that are responsible for cobbling together a PYTHONPATH based on known 
> locations and current env variables.
> As far as I can tell, there are three important contexts where python comes 
> into play...
> * during the build process (used during data load, e.g., 
> testdata/bin/load_nested.py)
> * when running the py.test bases e2e tests
> * whenever the impala-shell is invoked
> As noted by IMPALA-7825 (and also in a conversation I had with 
> [~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. 
> It seems to happen during test data load (specifically, when calling 
> testdata/bin/load_nested.py) mainly because there was some well-intentioned 
> but probably misjudged attempt at code reuse from the test framework. The 
> test code that gets re-used involves impyla and/or thrift-sasl, which 
> currently still relies on thrift 0.9.3. So our test framework, and by 
> extension the build, both inherit the same limitation.
> The impala-shell, on the other hand, luckily doesn't directly reuse any of 
> the same modules, and there's no real need to keep it pinned to 0.9.3. 
> However, since calling the impala-shell.sh winds up invoking 
> {{set-pythonpath.sh}}, the same script that script sets up the environment 
> during building or testing, the shell winds up defaulting to thrift 0.9.3 as 
> well.
> thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
> python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- 
> we just have to use it. The way to accomplish this is be decoupling the 
> impala-shell from calling either {{set-pythonpath.sh}} or 
> {{impala-python-common.sh}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to