[jira] [Work started] (IMPALA-9492) TestRecoverPartitions::test_unescaped_string_partition failing on S3

2020-03-12 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9492 started by Quanlong Huang.
--
> TestRecoverPartitions::test_unescaped_string_partition failing on S3
> 
>
> Key: IMPALA-9492
> URL: https://issues.apache.org/jira/browse/IMPALA-9492
> Project: IMPALA
>  Issue Type: Bug
>Reporter: David Rorke
>Assignee: Quanlong Huang
>Priority: Blocker
>  Labels: build-failure
>
> TestRecoverPartitions::test_unescaped_string_partition is a new test recently 
> added by [https://gerrit.cloudera.org/#/c/15278/]
> It's failing when run against S3:
> {noformat}
> metadata/test_recover_partitions.py:374: in test_unescaped_string_partition
> assert self.count_partition(result.data) == 4
> E   assert 2 == 4
> E+  where 2 =   0x725e650>>(['"\t-1\t0\t0B\tNOT CACHED\tNOT 
> CACHED\tTEXT\tfalse\ts3a://impala-test-uswest2-1/test-warehouse/test_unescaped_string_...ouse/test_unescaped_string_partition_2265687.db/test_unescaped_string_partition/p=%27",
>  'Total\t-1\t0\t0B\t0B\t\t\t\t'])
> E+where  > = 
>  0x725e650>.count_partition
> E+and   ['"\t-1\t0\t0B\tNOT CACHED\tNOT 
> CACHED\tTEXT\tfalse\ts3a://impala-test-uswest2-1/test-warehouse/test_unescaped_string_...ouse/test_unescaped_string_partition_2265687.db/test_unescaped_string_partition/p=%27",
>  'Total\t-1\t0\t0B\t0B\t\t\t\t'] = 
> .data
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9492) TestRecoverPartitions::test_unescaped_string_partition failing on S3

2020-03-12 Thread David Rorke (Jira)
David Rorke created IMPALA-9492:
---

 Summary: TestRecoverPartitions::test_unescaped_string_partition 
failing on S3
 Key: IMPALA-9492
 URL: https://issues.apache.org/jira/browse/IMPALA-9492
 Project: IMPALA
  Issue Type: Bug
Reporter: David Rorke
Assignee: Quanlong Huang


TestRecoverPartitions::test_unescaped_string_partition is a new test recently 
added by [https://gerrit.cloudera.org/#/c/15278/]

It's failing when run against S3:
{noformat}
metadata/test_recover_partitions.py:374: in test_unescaped_string_partition
assert self.count_partition(result.data) == 4
E   assert 2 == 4
E+  where 2 = >(['"\t-1\t0\t0B\tNOT CACHED\tNOT 
CACHED\tTEXT\tfalse\ts3a://impala-test-uswest2-1/test-warehouse/test_unescaped_string_...ouse/test_unescaped_string_partition_2265687.db/test_unescaped_string_partition/p=%27",
 'Total\t-1\t0\t0B\t0B\t\t\t\t'])
E+where > = 
.count_partition
E+and   ['"\t-1\t0\t0B\tNOT CACHED\tNOT 
CACHED\tTEXT\tfalse\ts3a://impala-test-uswest2-1/test-warehouse/test_unescaped_string_...ouse/test_unescaped_string_partition_2265687.db/test_unescaped_string_partition/p=%27",
 'Total\t-1\t0\t0B\t0B\t\t\t\t'] = 
.data
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9491) Compilation failure in KuduUtil.java

2020-03-12 Thread David Rorke (Jira)
David Rorke created IMPALA-9491:
---

 Summary: Compilation failure in KuduUtil.java
 Key: IMPALA-9491
 URL: https://issues.apache.org/jira/browse/IMPALA-9491
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: David Rorke
Assignee: Csaba Ringhofer


Build is failing with the following:
{noformat}
12:40:33 [INFO] BUILD FAILURE
12:40:33 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on 
project impala-frontend: Compilation failure: Compilation failure:
12:40:33 [ERROR] 
/data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[181,12]
 an enum switch case label must be the unqualified name of an enumeration 
constant
12:40:33 [ERROR] 
/data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[183,12]
 cannot find symbol
12:40:33 [ERROR] symbol:   method addDate(int,java.sql.Date)
12:40:33 [ERROR] location: variable key of type 
org.apache.kudu.client.PartialRow
12:40:33 [ERROR] 
/data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[239,12]
 an enum switch case label must be the unqualified name of an enumeration 
constant
12:40:33 [ERROR] 
/data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[442,45]
 cannot find symbol
12:40:33 [ERROR] symbol:   variable DATE
12:40:33 [ERROR] location: class org.apache.kudu.Type
12:40:33 [ERROR] 
/data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[468,12]
 an enum switch case label must be the unqualified name of an enumeration 
constant
{noformat}

Likely related to this change:  https://gerrit.cloudera.org/#/c/14705/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8800) Add DATE type support to Kudu scanner

2020-03-12 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke updated IMPALA-8800:

Priority: Major  (was: Blocker)

> Add DATE type support to Kudu scanner
> -
>
> Key: IMPALA-8800
> URL: https://issues.apache.org/jira/browse/IMPALA-8800
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8800) Add DATE type support to Kudu scanner

2020-03-12 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke updated IMPALA-8800:

Labels:   (was: broken-build)

> Add DATE type support to Kudu scanner
> -
>
> Key: IMPALA-8800
> URL: https://issues.apache.org/jira/browse/IMPALA-8800
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8800) Add DATE type support to Kudu scanner

2020-03-12 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke updated IMPALA-8800:

Priority: Blocker  (was: Major)

> Add DATE type support to Kudu scanner
> -
>
> Key: IMPALA-8800
> URL: https://issues.apache.org/jira/browse/IMPALA-8800
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8800) Add DATE type support to Kudu scanner

2020-03-12 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke updated IMPALA-8800:

Labels: broken-build  (was: )

> Add DATE type support to Kudu scanner
> -
>
> Key: IMPALA-8800
> URL: https://issues.apache.org/jira/browse/IMPALA-8800
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Blocker
>  Labels: broken-build
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build 
process. This seems to come into play during the loading of test data 
(specifically, when calling testdata/bin/load_nested.py) mainly because at one 
point there was some well-intentioned but probably misguided attempt at code 
reuse from the test framework. The test code that gets re-used involves impyla 
and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test 
framework, and by extension the build, both inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same test modules, and there really is no need to keep it pinned to 0.9.3. 
However, since calling the impala-shell.sh winds up invoking 
{{set-pythonpath.sh}}, the same script that script sets up the environment 
during building or testing, thrift 0.9.3 just kind of leaks over by default.

As it turns out, thrift 0.9.3 is also one of the many limitations restricting 
the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 
is available -- we just have to use it. And the way to accomplish that  is by 
decoupling the impala-shell from relying either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}. 

As a first pass, we can address the dev environment by just having 
{{impala-shell.sh}} itself do whatever is required to find python dependencies, 
and we can specify thrift-0.11.0 there. Also, thrift 0.11.0 should be used by 
both of the scripts used to create the tarballs that package the impala-shell 
for customer environments. Neither of these should adversely building Impala or 
running the py.test test framework.

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a 

[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 during the build 
process. This seems to come into play during the loading of test data 
(specifically, when calling testdata/bin/load_nested.py) mainly because at one 
point there was some well-intentioned but probably misguided attempt at code 
reuse from the test framework. The test code that gets re-used involves impyla 
and/or thrift-sasl, which currently still relies on thrift 0.9.3. So our test 
framework, and by extension the build, both inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same test modules, and there really is no need to keep it pinned to 0.9.3. 
However, since calling the impala-shell.sh winds up invoking 
{{set-pythonpath.sh}}, the same script that script sets up the environment 
during building or testing, thrift 0.9.3 just kind of leaks over by default.

As it turns out, thrift 0.9.3 is also one of the many limitations restricting 
the impala-shell to python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 
is available -- we just have to use it. And the way to accomplish that  is by 
decoupling the impala-shell from relying either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}. 

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 

[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just the collection of 
available modules that happen to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other 

[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by calling 
scripts like {{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that 
are responsible for cobbling together a PYTHONPATH based on known locations and 
current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just whatever 
collection of modules happens to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The 

[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of it is generated and lives in the shell directory. 
Generally speaking, if you launch impala-python and import a module, it's not 
necessarily easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just the collection of 
available modules that happen to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.

  was:
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of is generated and lives in the shell directory. Generally 
speaking, if you launch impala-python and import a module, it's not necessarily 
easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just the collection of 
available modules that happen to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the 

[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
[Note: this JIRA was filed in relation to the ongoing effort to make the 
impala-shell compatible with python 3]

The impala python development environment is a fairly convoluted affair -- a 
number of packages are installed in the infra/python/env, some of it comes from 
the toolchain, some of is generated and lives in the shell directory. Generally 
speaking, if you launch impala-python and import a module, it's not necessarily 
easy to predict where the module might live.
{noformat}
$ python
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sasl
>>> sasl

>>> import requests
>>> requests

>>> import Logging
>>> Logging

>>> import thrift
>>> thrift

{noformat}
Really, there is no one coherent environment -- there's just the collection of 
available modules that happen to be available at a given time for a given type 
of invocation, all of which is accomplished behind the scenes by scripts like 
{{bin/set-pythonpath.sh}} and {{bin/impala-python-common.sh}} that cobble 
together a PYTHONPATH based on known locations and current env variables.

As far as I can tell, there are three important contexts where python comes 
into play...
* during the build process (used during data load, e.g., 
testdata/bin/load_nested.py)
* when running the py.test bases e2e tests
* whenever the impala-shell is invoked

As noted by IMPALA-7825 (and also in a conversation I had with 
[~stakiar_impala_496e]), we're dependent on thrift 0.9.3 the build process. It 
seems to happen during test data load (specifically, when calling 
testdata/bin/load_nested.py) mainly because there was some well-intentioned but 
probably misjudged attempt at code reuse from the test framework. The test code 
that gets re-used involves impyla and/or thrift-sasl, which currently still 
relies on thrift 0.9.3. So our test framework, and by extension the build, both 
inherit the same limitation.

The impala-shell, on the other hand, luckily doesn't directly reuse any of the 
same modules, and there's no real need to keep it pinned to 0.9.3. However, 
since calling the impala-shell.sh winds up invoking {{set-pythonpath.sh}}, the 
same script that script sets up the environment during building or testing, the 
shell winds up defaulting to thrift 0.9.3 as well.

thrift 0.9.3 is one of the many limitations restricting the impala-shell to 
python 2. Luckily, with IMPALA-7924 resolved, thrift-0.11.0 is available -- we 
just have to use it. The way to accomplish this is be decoupling the 
impala-shell from calling either {{set-pythonpath.sh}} or 
{{impala-python-common.sh}}.

  was:
The impala python development environment is a fairly convoluted affair. A 
number of packages are 

Apparently, we can't simply kick thrift-0.9.3 to the curb. Our build needs some 
attention before that can happen, per IMPALA-7825, and also conversation I had 
with [~stakiar_impala_496e]. 

However, with IMPALA-7924 resolved, we do have access to thrift-0.11.0 python 
files, and we should use those by default. It turns out that being stuck with 
thrift-0.9.3 is a major impediment to achieving python 3 compatibility for our 
python stack.


> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> --
>
> Key: IMPALA-9489
> URL: https://issues.apache.org/jira/browse/IMPALA-9489
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>
> [Note: this JIRA was filed in relation to the ongoing effort to make the 
> impala-shell compatible with python 3]
> The impala python development environment is a fairly convoluted affair -- a 
> number of packages are installed in the infra/python/env, some of it comes 
> from the toolchain, some of is generated and lives in the shell directory. 
> Generally speaking, if you launch impala-python and import a module, it's not 
> necessarily easy to predict where the module might live.
> {noformat}
> $ python
> Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sasl
> >>> sasl
>  '/home/systest/Impala/shell/ext-py/sasl-0.1.1/dist/sasl-0.1.1-py2.7-linux-x86_64.egg/sasl/__init__.pyc'>
> >>> import requests
> >>> requests
>  '/home/systest/Impala/infra/python/env/local/lib/python2.7/site-packages/requests/__init__.pyc'>
> >>> import Logging
> >>> Logging

[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Description: 
The impala python development environment is a fairly convoluted affair. A 
number of packages are 

Apparently, we can't simply kick thrift-0.9.3 to the curb. Our build needs some 
attention before that can happen, per IMPALA-7825, and also conversation I had 
with [~stakiar_impala_496e]. 

However, with IMPALA-7924 resolved, we do have access to thrift-0.11.0 python 
files, and we should use those by default. It turns out that being stuck with 
thrift-0.9.3 is a major impediment to achieving python 3 compatibility for our 
python stack.

  was:
Apparently, we can't simply kick thrift-0.9.3 to the curb. Our build needs some 
attention before that can happen, per IMPALA-7825, and also conversation I had 
with [~stakiar_impala_496e]. 

However, with IMPALA-7924 resolved, we do have access to thrift-0.11.0 python 
files, and we should use those by default. It turns out that being stuck with 
thrift-0.9.3 is a major impediment to achieving python 3 compatibility for our 
python stack.


> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> --
>
> Key: IMPALA-9489
> URL: https://issues.apache.org/jira/browse/IMPALA-9489
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>
> The impala python development environment is a fairly convoluted affair. A 
> number of packages are 
> Apparently, we can't simply kick thrift-0.9.3 to the curb. Our build needs 
> some attention before that can happen, per IMPALA-7825, and also conversation 
> I had with [~stakiar_impala_496e]. 
> However, with IMPALA-7924 resolved, we do have access to thrift-0.11.0 python 
> files, and we should use those by default. It turns out that being stuck with 
> thrift-0.9.3 is a major impediment to achieving python 3 compatibility for 
> our python stack.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Summary: Setup impala-shell.sh env separately, and use thrift-0.11.0 by 
default  (was: Setup impala-shell.sh env separately, and use thrift-0.11.0 by 
default.)

> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default
> --
>
> Key: IMPALA-9489
> URL: https://issues.apache.org/jira/browse/IMPALA-9489
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>
> Apparently, we can't simply kick thrift-0.9.3 to the curb. Our build needs 
> some attention before that can happen, per IMPALA-7825, and also conversation 
> I had with [~stakiar_impala_496e]. 
> However, with IMPALA-7924 resolved, we do have access to thrift-0.11.0 python 
> files, and we should use those by default. It turns out that being stuck with 
> thrift-0.9.3 is a major impediment to achieving python 3 compatibility for 
> our python stack.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9489) Setup impala-shell.sh env separately, and use thrift-0.11.0 by default.

2020-03-12 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-9489:

Summary: Setup impala-shell.sh env separately, and use thrift-0.11.0 by 
default.  (was: Have impala-python env and impala-shell use available 
thrift-0.11.0 files)

> Setup impala-shell.sh env separately, and use thrift-0.11.0 by default.
> ---
>
> Key: IMPALA-9489
> URL: https://issues.apache.org/jira/browse/IMPALA-9489
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>
> Apparently, we can't simply kick thrift-0.9.3 to the curb. Our build needs 
> some attention before that can happen, per IMPALA-7825, and also conversation 
> I had with [~stakiar_impala_496e]. 
> However, with IMPALA-7924 resolved, we do have access to thrift-0.11.0 python 
> files, and we should use those by default. It turns out that being stuck with 
> thrift-0.9.3 is a major impediment to achieving python 3 compatibility for 
> our python stack.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9425) Statestore may fail to report when an impalad has failed

2020-03-12 Thread Thomas Tauber-Marshall (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-9425.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Statestore may fail to report when an impalad has failed
> 
>
> Key: IMPALA-9425
> URL: https://issues.apache.org/jira/browse/IMPALA-9425
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> If an impalad fails and another is restarted at the same host:port 
> combination quickly, the statestore may fail to report to the coordinators 
> that the impalad went down.
> The reason for this is that in the cluster membership topic, impalads are 
> keyed by their statestore subscriber id, which is "impalad@host:port". If the 
> new impalad registers itself before a topic update has been generated for a 
> particular coordinator, the statestore has no way of knowing that the 
> particular key was deleted and then re-added since the last update.
> The result is that queries that were running on the impalad that failed may 
> not be cancelled by the coordinator until they pass the unresponsive backend 
> timeout, which by default is ~12 minutes.
> I propose as a solution that we add a concept of uuids for impalads, where 
> each impalad will generate its own uuid on startup. This allows us to 
> differentiate between different impalads running at the same host:port 
> combination.
> It can also be used to simplify some logic in the scheduler and 
> ExecutorGroup/ExecutorBlacklist etc. where we currently have data structures 
> containing info about impalads that are keyed off host/port combinations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8964) Increase runtime filter wait timeout for mt_dop

2020-03-12 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8964:
-

Assignee: David Rorke

> Increase runtime filter wait timeout for mt_dop
> ---
>
> Key: IMPALA-8964
> URL: https://issues.apache.org/jira/browse/IMPALA-8964
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Assignee: David Rorke
>Priority: Major
>  Labels: multithreading
>
> When we enable joins for multithreaded plans, we should adjust the runtime 
> filter wait time. 
> A large part of the motivation for the timeout was to allow parallelism 
> between the different sides of the join - there was some concern that having 
> a scan block indefinitely would effectively reduce the amount of parallelism 
> that the plan executed with.
> With multithreading, we want to get parallelism across multiple copies of the 
> same fragment, rather than parallelism across different fragments. So this 
> motivation no longer applies. Making the filter wait time unlimited would 
> make query execution more predictable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9490) DOC: Include brief statement of support for reading Apache Hudi optimized table

2020-03-12 Thread Kris Hahn (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kris Hahn resolved IMPALA-9490.
---
Resolution: Fixed

> DOC: Include brief statement of support for reading Apache Hudi optimized 
> table
> ---
>
> Key: IMPALA-9490
> URL: https://issues.apache.org/jira/browse/IMPALA-9490
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Kris Hahn
>Assignee: Kris Hahn
>Priority: Major
>
> Document experimental support for Apache Hudi Read Optimized Table. See 
> IMPALA-8778.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-6267) MT Scanners do not check runtime filters per-file before processing each split

2020-03-12 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-6267 started by Tim Armstrong.
-
> MT Scanners do not check runtime filters per-file before processing each split
> --
>
> Key: IMPALA-6267
> URL: https://issues.apache.org/jira/browse/IMPALA-6267
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: perf
>
> The old implementation of HdfsScanNode re-checks partition filters per scan 
> range in HdfsScanNode::ProcessSplit() before processing each scan range. 
> HdfsScanNodeMt does not have similar logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8322) Cancelling an in-progress IO can cause delays and/or RPC timeouts due to tangle of locks

2020-03-12 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-8322:
--
Summary: Cancelling an in-progress IO can cause delays and/or RPC timeouts 
due to tangle of locks  (was: S3 tests encounter "timed out waiting for 
receiver fragment instance")

> Cancelling an in-progress IO can cause delays and/or RPC timeouts due to 
> tangle of locks
> 
>
> Key: IMPALA-8322
> URL: https://issues.apache.org/jira/browse/IMPALA-8322
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
> Attachments: fb5b9729-2d7a-4590-ea365b87-d2ead75e.dmp_dumped, 
> run_tests_swimlane.json.gz
>
>
> This has been seen multiple times when running s3 tests:
> {noformat}
> query_test/test_join_queries.py:57: in test_basic_joins
> self.run_test_case('QueryTest/joins', new_vector)
> common/impala_test_suite.py:472: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:699: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:174: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:183: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:360: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:381: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Sender 127.0.0.1 timed out waiting for receiver fragment 
> instance: 6c40d992bb87af2f:0ce96e5d0007, dest node: 4{noformat}
> This is related to IMPALA-6818. On a bad run, there are various time outs in 
> the impalad logs:
> {noformat}
> I0316 10:47:16.359313 20175 krpc-data-stream-mgr.cc:354] Sender 127.0.0.1 
> timed out waiting for receiver fragment instance: 
> ef4a5dc32a6565bd:a8720b850007, dest node: 5
> I0316 10:47:16.359345 20175 rpcz_store.cc:265] Call 
> impala.DataStreamService.TransmitData from 127.0.0.1:40030 (request call id 
> 14881) took 120182ms. Request Metrics: {}
> I0316 10:47:16.359380 20175 krpc-data-stream-mgr.cc:354] Sender 127.0.0.1 
> timed out waiting for receiver fragment instance: 
> d148d83e11a4603d:54dc35f70004, dest node: 3
> I0316 10:47:16.359395 20175 rpcz_store.cc:265] Call 
> impala.DataStreamService.TransmitData from 127.0.0.1:40030 (request call id 
> 14880) took 123097ms. Request Metrics: {}
> ... various messages ...
> I0316 10:47:56.364990 20154 kudu-util.h:108] Cancel() RPC failed: Timed out: 
> CancelQueryFInstances RPC to 127.0.0.1:27000 timed out after 10.000s (SENT)
> ... various messages ...
> W0316 10:48:15.056421 20150 rpcz_store.cc:251] Call 
> impala.ControlService.CancelQueryFInstances from 127.0.0.1:40912 (request 
> call id 202) took 48695ms (client timeout 1).
> W0316 10:48:15.056473 20150 rpcz_store.cc:255] Trace:
> 0316 10:47:26.361265 (+ 0us) impala-service-pool.cc:165] Inserting onto call 
> queue
> 0316 10:47:26.361285 (+ 20us) impala-service-pool.cc:245] Handling call
> 0316 10:48:15.056398 (+48695113us) inbound_call.cc:162] Queueing success 
> response
> Metrics: {}
> I0316 10:48:15.057087 20139 connection.cc:584] Got response to call id 202 
> after client already timed out or cancelled{noformat}
> So far, this has only happened on s3. The system load at the time is not 
> higher than normal. If anything it is lower than normal. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org