[jira] [Commented] (IMPALA-7625) test_web_pages.py backend tests are failing

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737905#comment-16737905
 ] 

ASF subversion and git services commented on IMPALA-7625:
-

Commit 274e96bd147b5d91872c441c3a600fa8d5295bbe in impala's branch 
refs/heads/master from Lars Volker
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=274e96b ]

IMPALA-8059: Disable broken tests

IMPALA-7625 caused some tests to fail but because the change otherwise
also addressed test failures we explicitly disable the affected tests
here instead of reverting IMPALA-7625.

Change-Id: Ibbd11840aac63dc7d483cafc9ee9b419dc840f37
Reviewed-on: http://gerrit.cloudera.org:8080/12190
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 


> test_web_pages.py backend tests are failing
> ---
>
> Key: IMPALA-7625
> URL: https://issues.apache.org/jira/browse/IMPALA-7625
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> While working on IMPALA-6249, we found that the tests under 
> {{webserver/test_web_pages.py}} are not being run by Jenkins. We re-enabled 
> the tests, however, a few of the backend specific tests are failing. 
> IMPALA-6249 disabled these tests. This JIRA is to follow up on these tests 
> and fix them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7924) Generate Thrift 11 Python Code

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737903#comment-16737903
 ] 

ASF subversion and git services commented on IMPALA-7924:
-

Commit fa78c594de39878902f3887a209f29f7976583d0 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=fa78c59 ]

IMPALA-7924: Generate Thrift 11 Python Code

Upgrades the version of the toolchain in order to pull in Thrift 0.11.0.
Updates the CMake build to write generated Python code using Thrift 0.11
to shell/build/thrift-11-gen/gen-py/.

The Thrift 0.11 Python deserialization code has some big performance
improvements that allow faster parsing of runtime profiles. By adding
the ability to generate the Thrift Python code using Thrift 0.11 we can
take advantage of the Python performance improvements without going
through a full Thrift upgrade from 0.9 to 0.11.

Set USE_THRIFT11_GEN_PY=true and then run bin/set-pythonpath.sh to add
the Thrift 0.11 Python generated code to the PYTHONPATH rather than the
0.9 generated code.

Testing:
- Ran core tests

Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0
Reviewed-on: http://gerrit.cloudera.org:8080/12036
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Generate Thrift 11 Python Code
> --
>
> Key: IMPALA-7924
> URL: https://issues.apache.org/jira/browse/IMPALA-7924
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Until IMPALA-7825 has been completed, it would be good to add the ability to 
> generate Python code using Thrift 11. As stated in IMPALA-7825, Thrift has 
> added performance improvements to its Python deserialization code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737904#comment-16737904
 ] 

ASF subversion and git services commented on IMPALA-8059:
-

Commit 274e96bd147b5d91872c441c3a600fa8d5295bbe in impala's branch 
refs/heads/master from Lars Volker
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=274e96b ]

IMPALA-8059: Disable broken tests

IMPALA-7625 caused some tests to fail but because the change otherwise
also addressed test failures we explicitly disable the affected tests
here instead of reverting IMPALA-7625.

Change-Id: Ibbd11840aac63dc7d483cafc9ee9b419dc840f37
Reviewed-on: http://gerrit.cloudera.org:8080/12190
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 


> TestWebPage::test_backend_states is flaky
> -
>
> Key: IMPALA-8059
> URL: https://issues.apache.org/jira/browse/IMPALA-8059
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: broken-build, flaky-test
>
> test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
> it's state is verified by the python test. Here are the relevant log: 
> {code:java}
> 07:33:45 - Captured stderr call 
> -
> 07:33:45 -- executing async: localhost:21000
> 07:33:45 select sleep(1) from functional.alltypes limit 1;
> 07:33:45 
> 07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
> 7f46f15ed4d6d0f6:4d58cdbc
> 07:33:45 -- getting state for operation: 
> 
> {code}
> This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7446) Queries can spill earlier than necessary because of accumulation of free buffers and clean pages

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737902#comment-16737902
 ] 

ASF subversion and git services commented on IMPALA-7446:
-

Commit ae65ff831966eb99417e15738c1aa96013a66f39 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=ae65ff8 ]

IMPALA-7446: enable buffer pool GC when near process mem limit

GC is performed when:
* The amount of memory allocated from the system for the buffer pool
  exceeds the reservation (i.e. free buffers and clean pages are not
  offset by unused reservation).
* The soft or hard process memory limit would otherwise cause an
  allocation to fail.

Testing:
Looped the old version of the semi_joins_exhaustive test, which
reliably reproduced the issue. I confirmed that the buffer pool GC was
running and that it preventing the query failures.

Added a backend test that reproed the issue. A large chunk of the code
change is to add infrastructure to use TCMalloc memory metrics
for the process memory tracker in backend tests.

Ran exhaustive tests.

Change-Id: I81e8e29f1ba319f1b499032f9518d32c511b4b21
Reviewed-on: http://gerrit.cloudera.org:8080/12133
Reviewed-by: Bikramjeet Vig 
Tested-by: Impala Public Jenkins 


> Queries can spill earlier than necessary because of accumulation of free 
> buffers and clean pages
> 
>
> Key: IMPALA-7446
> URL: https://issues.apache.org/jira/browse/IMPALA-7446
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management
> Fix For: Impala 3.2.0
>
>
> See IMPALA-7442 for an example where the query started to spill even when 
> memory could have been made available by freeing buffers or evicting clean 
> pages. Usually this would just result in spilling earlier than necessary, but 
> in the case of IMPALA-7442 it lead to a query failure.
> My original intent was that BufferPool::ReleaseMemory() should be called in 
> situations like this, but that was not done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7625) test_web_pages.py backend tests are failing

2019-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737906#comment-16737906
 ] 

ASF subversion and git services commented on IMPALA-7625:
-

Commit 274e96bd147b5d91872c441c3a600fa8d5295bbe in impala's branch 
refs/heads/master from Lars Volker
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=274e96b ]

IMPALA-8059: Disable broken tests

IMPALA-7625 caused some tests to fail but because the change otherwise
also addressed test failures we explicitly disable the affected tests
here instead of reverting IMPALA-7625.

Change-Id: Ibbd11840aac63dc7d483cafc9ee9b419dc840f37
Reviewed-on: http://gerrit.cloudera.org:8080/12190
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 


> test_web_pages.py backend tests are failing
> ---
>
> Key: IMPALA-7625
> URL: https://issues.apache.org/jira/browse/IMPALA-7625
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> While working on IMPALA-6249, we found that the tests under 
> {{webserver/test_web_pages.py}} are not being run by Jenkins. We re-enabled 
> the tests, however, a few of the backend specific tests are failing. 
> IMPALA-6249 disabled these tests. This JIRA is to follow up on these tests 
> and fix them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7446) Queries can spill earlier than necessary because of accumulation of free buffers and clean pages

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7446.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Queries can spill earlier than necessary because of accumulation of free 
> buffers and clean pages
> 
>
> Key: IMPALA-7446
> URL: https://issues.apache.org/jira/browse/IMPALA-7446
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management
> Fix For: Impala 3.2.0
>
>
> See IMPALA-7442 for an example where the query started to spill even when 
> memory could have been made available by freeing buffers or evicting clean 
> pages. Usually this would just result in spilling earlier than necessary, but 
> in the case of IMPALA-7442 it lead to a query failure.
> My original intent was that BufferPool::ReleaseMemory() should be called in 
> situations like this, but that was not done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7446) Queries can spill earlier than necessary because of accumulation of free buffers and clean pages

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7446.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Queries can spill earlier than necessary because of accumulation of free 
> buffers and clean pages
> 
>
> Key: IMPALA-7446
> URL: https://issues.apache.org/jira/browse/IMPALA-7446
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management
> Fix For: Impala 3.2.0
>
>
> See IMPALA-7442 for an example where the query started to spill even when 
> memory could have been made available by freeing buffers or evicting clean 
> pages. Usually this would just result in spilling earlier than necessary, but 
> in the case of IMPALA-7442 it lead to a query failure.
> My original intent was that BufferPool::ReleaseMemory() should be called in 
> situations like this, but that was not done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-3784) status-benchmark is broken

2019-01-08 Thread Greg Rahn (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Rahn closed IMPALA-3784.
-
   Resolution: Fixed
Fix Version/s: Impala 2.11.0

> status-benchmark is broken
> --
>
> Key: IMPALA-3784
> URL: https://issues.apache.org/jira/browse/IMPALA-3784
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Jim Apple
>Assignee: Jinchul Kim
>Priority: Minor
>  Labels: newbie
> Fix For: Impala 2.11.0
>
>
> status-benchmark in debug mode crashes:
> {noformat}
> status-benchmark: 
> /opt/Impala-Toolchain/boost-1.57.0/include/boost/optional/optional.hpp:992: 
> boost::optional::reference_type boost::optional::get() [with T = 
> large_type; boost::optional::reference_type = large_type&]: Assertion 
> `this->is_initialized()' failed.
> {noformat}
> In release mode, with the toolchain gcc, compilation of just that one file 
> runs for over 15 minutes. After that, I killed it. I tried this multiple 
> times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IMPALA-3784) status-benchmark is broken

2019-01-08 Thread Greg Rahn (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Rahn closed IMPALA-3784.
-
   Resolution: Fixed
Fix Version/s: Impala 2.11.0

> status-benchmark is broken
> --
>
> Key: IMPALA-3784
> URL: https://issues.apache.org/jira/browse/IMPALA-3784
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Jim Apple
>Assignee: Jinchul Kim
>Priority: Minor
>  Labels: newbie
> Fix For: Impala 2.11.0
>
>
> status-benchmark in debug mode crashes:
> {noformat}
> status-benchmark: 
> /opt/Impala-Toolchain/boost-1.57.0/include/boost/optional/optional.hpp:992: 
> boost::optional::reference_type boost::optional::get() [with T = 
> large_type; boost::optional::reference_type = large_type&]: Assertion 
> `this->is_initialized()' failed.
> {noformat}
> In release mode, with the toolchain gcc, compilation of just that one file 
> runs for over 15 minutes. After that, I killed it. I tried this multiple 
> times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8061) S3_ACCESS_VALIDATED unbound variable when using TARGET_FILESYSTEM=s3

2019-01-08 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8061 started by Fredy Wijaya.

> S3_ACCESS_VALIDATED unbound variable when using TARGET_FILESYSTEM=s3
> 
>
> Key: IMPALA-8061
> URL: https://issues.apache.org/jira/browse/IMPALA-8061
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Major
>
> {noformat}
> TARGET_FILESYSTEM=s3 ./buildall.sh -format -testdata -notests
> impala-config.sh: line 331: S3_ACCESS_VALIDATED: unbound variable
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8061) S3_ACCESS_VALIDATED unbound variable when using TARGET_FILESYSTEM=s3

2019-01-08 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-8061:


 Summary: S3_ACCESS_VALIDATED unbound variable when using 
TARGET_FILESYSTEM=s3
 Key: IMPALA-8061
 URL: https://issues.apache.org/jira/browse/IMPALA-8061
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Fredy Wijaya
Assignee: Fredy Wijaya


{noformat}
TARGET_FILESYSTEM=s3 ./buildall.sh -format -testdata -notests
impala-config.sh: line 331: S3_ACCESS_VALIDATED: unbound variable
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8061) S3_ACCESS_VALIDATED unbound variable when using TARGET_FILESYSTEM=s3

2019-01-08 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-8061:


 Summary: S3_ACCESS_VALIDATED unbound variable when using 
TARGET_FILESYSTEM=s3
 Key: IMPALA-8061
 URL: https://issues.apache.org/jira/browse/IMPALA-8061
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Fredy Wijaya
Assignee: Fredy Wijaya


{noformat}
TARGET_FILESYSTEM=s3 ./buildall.sh -format -testdata -notests
impala-config.sh: line 331: S3_ACCESS_VALIDATED: unbound variable
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6503) Support reading complex types from ORC format files

2019-01-08 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6503:
---
Description: ORC already supports Hive complex(nested) types: STRUCT, 
ARRAY, MAP, UNION. UNION type is not widely used. Here we track the support of 
reading STRUCT, ARRAY, MAP types from ORC files.  (was: ORC already supports 
Hive complex(nested) types: STRUCT, ARRAY, MAP, UNION)

> Support reading complex types from ORC format files
> ---
>
> Key: IMPALA-6503
> URL: https://issues.apache.org/jira/browse/IMPALA-6503
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> ORC already supports Hive complex(nested) types: STRUCT, ARRAY, MAP, UNION. 
> UNION type is not widely used. Here we track the support of reading STRUCT, 
> ARRAY, MAP types from ORC files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6503) Support reading complex types from ORC format files

2019-01-08 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6503:
---
Description: ORC already supports Hive complex(nested) types: STRUCT, 
ARRAY, MAP, UNION

> Support reading complex types from ORC format files
> ---
>
> Key: IMPALA-6503
> URL: https://issues.apache.org/jira/browse/IMPALA-6503
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> ORC already supports Hive complex(nested) types: STRUCT, ARRAY, MAP, UNION



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8060) Impala Doc: Clean up and re-org resource management and admission control docs

2019-01-08 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8060 started by Alex Rodoni.
---
> Impala Doc: Clean up and re-org resource management and admission control docs
> --
>
> Key: IMPALA-8060
> URL: https://issues.apache.org/jira/browse/IMPALA-8060
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> Move the Admission Control doc out of the Administration category as the 
> first half of the doc is conceptual.
> Create a new category Resource Management, and put the Admission Control and 
> Configuring Admission Control docs under the new category.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8060) Impala Doc: Clean up and re-org resource management and admission control docs

2019-01-08 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737764#comment-16737764
 ] 

Alex Rodoni commented on IMPALA-8060:
-

https://gerrit.cloudera.org/#/c/12191/

> Impala Doc: Clean up and re-org resource management and admission control docs
> --
>
> Key: IMPALA-8060
> URL: https://issues.apache.org/jira/browse/IMPALA-8060
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> Move the Admission Control doc out of the Administration category as the 
> first half of the doc is conceptual.
> Create a new category Resource Management, and put the Admission Control and 
> Configuring Admission Control docs under the new category.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737763#comment-16737763
 ] 

Sahil Takiar commented on IMPALA-8059:
--

Some more info from the Jenkins job where this test failed:

{code}
=== FAILURES ===
___ TestWebPage.test_backend_states 
webserver/test_web_pages.py:321: in test_backend_states
expected_state=running_state)
webserver/test_web_pages.py:297: in __run_query_and_get_debug_page
self.wait_for_state(query_handle, expected_state, 100)
common/impala_test_suite.py:841: in wait_for_state
% (handle.get_handle().id, expected_state, actual_state))
E   Timeout: query 'a94c0fc5da9a0f80:f74ed942' did not reach expected 
state '3', last known state '4'
 Captured stderr setup -
SET sync_ddl=False;
-- executing against localhost:21000
DROP DATABASE IF EXISTS `test_backend_states_22f4a3e6` CASCADE;

-- 2019-01-07 23:30:21,788 INFO MainThread: Started query 
2f402bda419c7cc4:25a44927
SET sync_ddl=False;
-- executing against localhost:21000
CREATE DATABASE `test_backend_states_22f4a3e6`;

-- 2019-01-07 23:30:22,619 INFO MainThread: Started query 
444d7423b372c442:10c3af78
-- 2019-01-07 23:30:22,621 INFO MainThread: Created database 
"test_backend_states_22f4a3e6" for test ID 
"webserver/test_web_pages.py::TestWebPage::()::test_backend_states"
- Captured stderr call -
-- executing async: localhost:21000
select sleep(1) from functional.alltypes limit 1;
-- 2019-01-07 23:30:22,626 INFO MainThread: Started query 
a94c0fc5da9a0f80:f74ed942
getting state for operation: tests.common.impala_connection.OperationHandle 
object at 0xc87d250
{code}

As stated in the description, the query {{select sleep(1) from 
functional.alltypes limit 1}} reached state FINISHED before the method 
{{__run_query_and_get_debug_page}} was able to check that it was in the RUNNING 
state. The intention of the test was for it run for at least 10 seconds so that 
the Python test could query the backend URL while the query was running.

> TestWebPage::test_backend_states is flaky
> -
>
> Key: IMPALA-8059
> URL: https://issues.apache.org/jira/browse/IMPALA-8059
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Sahil Takiar
>Priority: Blocker
>  Labels: broken-build, flaky-test
>
> test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
> it's state is verified by the python test. Here are the relevant log: 
> {code:java}
> 07:33:45 - Captured stderr call 
> -
> 07:33:45 -- executing async: localhost:21000
> 07:33:45 select sleep(1) from functional.alltypes limit 1;
> 07:33:45 
> 07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
> 7f46f15ed4d6d0f6:4d58cdbc
> 07:33:45 -- getting state for operation: 
> 
> {code}
> This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7988) Support loading data into a dockerised minicluster

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7988:
--
Description: 
This JIRA tracks getting data load to work against dockerised impala daemons.

* Fix start-impala-cluster.py to wait for cluster to become ready using 
ImpalaCluster
* Fix test configuration to work against all table formats (HDFS, HBase, Kudu)

  was:The ImpalaCluster class makes a bunch of assumptions about the cluster 
processes and ports that are invalidated by running them in docker. This JIRA 
tracks fixing ImpalaCluster to work correctly for the docker minicluster.


> Support loading data into a dockerised minicluster
> --
>
> Key: IMPALA-7988
> URL: https://issues.apache.org/jira/browse/IMPALA-7988
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: docker
>
> This JIRA tracks getting data load to work against dockerised impala daemons.
> * Fix start-impala-cluster.py to wait for cluster to become ready using 
> ImpalaCluster
> * Fix test configuration to work against all table formats (HDFS, HBase, Kudu)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7988) Support loading data into a dockerised minicluster

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7988:
--
Summary: Support loading data into a dockerised minicluster  (was: 
Implement ImpalaCluster functionality for docker minicluster)

> Support loading data into a dockerised minicluster
> --
>
> Key: IMPALA-7988
> URL: https://issues.apache.org/jira/browse/IMPALA-7988
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: docker
>
> The ImpalaCluster class makes a bunch of assumptions about the cluster 
> processes and ports that are invalidated by running them in docker. This JIRA 
> tracks fixing ImpalaCluster to work correctly for the docker minicluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737681#comment-16737681
 ] 

Paul Rogers edited comment on IMPALA-8058 at 1/9/19 1:07 AM:
-

Proposed fix:
 * Retain the existing code, except...
 * If the estimated row width is less than 1, return -1 as the estimate.
 * In the HBase scan node, if we get back -1 from the cardinality estimator, 
use the row count from HMS table stats.

Multiply the total row count by filter selectivity to get scan cardinality. 
Note that we must multiply the selectivity of the key predicates.


was (Author: paul.rogers):
Proposed fix:

* Retain the existing code, except...
* If the estimated row width is less than 1, return -1 as the estimate.
* In the HBase scan node, if we get back -1 from the cardinality estimator, use 
the row count from HMS table stats.

Multiply the total row count by filter selectivity to get scan cardinality.

Note that the existing HBase scan node double-counts filter cardinality:

* It uses the key range estimator described above to estimate the rows in that 
range.
* Applies the predicate selectivity a second time in the scan node.

So, a further fix is to:

* Apply all predicates only if we are using the HMS table stats row count.
* Apply only non-key predicate selectivity if we are using the (smaller) key 
range row count.

> HBase scan cardinality division-by-zero leads to bogus cardinality
> --
>
> Key: IMPALA-8058
> URL: https://issues.apache.org/jira/browse/IMPALA-8058
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Major
>
> A particular HBase query has highly selective key filters and runs into code 
> bugs that produce a bogus, huge cardinality value.
> {{HbaseScanNode.computeStats()}} attempts to compute table cardinality by 
> calling {{HBaseTable.getEstimatedRowStats()}}. This then calls into (in the 
> latest versions) {{FeHBaseTable.getEstimatedRowStats()}}.
> This code tries to estimate cardinality by:
> * Scanning a set of regions.
> * For each getting the size.
> * Averaging a bunch of rows to estimate row width.
> Once we know the size of the regions we need to scan, and the average row 
> width, we can compute the scan cardinality.
> The problem in this particular query is that the predicates are so selective 
> that no regions match. As a result, the average row width is zero. We divide 
> (as a double) the region size by 0 and get INF. We cast that to a long and 
> get Long.MAX_VALUE. We then use that as our (highly bogus) cardinality 
> estimate.
> The code must:
> * Detect the division-by-zero (now sample rows) case.
> * Use an alternative estimate (such as multiplying total table row count from 
> HMS by the filter selectivity.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8060) Impala Doc: Clean up and re-org resource management and admission control docs

2019-01-08 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8060:
---

 Summary: Impala Doc: Clean up and re-org resource management and 
admission control docs
 Key: IMPALA-8060
 URL: https://issues.apache.org/jira/browse/IMPALA-8060
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Affects Versions: Impala 3.1.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni


Move the Admission Control doc out of the Administration category as the first 
half of the doc is conceptual.

Create a new category Resource Management, and put the Admission Control and 
Configuring Admission Control docs under the new category.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8060) Impala Doc: Clean up and re-org resource management and admission control docs

2019-01-08 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8060:

Issue Type: Task  (was: Bug)

> Impala Doc: Clean up and re-org resource management and admission control docs
> --
>
> Key: IMPALA-8060
> URL: https://issues.apache.org/jira/browse/IMPALA-8060
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> Move the Admission Control doc out of the Administration category as the 
> first half of the doc is conceptual.
> Create a new category Resource Management, and put the Admission Control and 
> Configuring Admission Control docs under the new category.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8060) Impala Doc: Clean up and re-org resource management and admission control docs

2019-01-08 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8060:
---

 Summary: Impala Doc: Clean up and re-org resource management and 
admission control docs
 Key: IMPALA-8060
 URL: https://issues.apache.org/jira/browse/IMPALA-8060
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Affects Versions: Impala 3.1.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni


Move the Admission Control doc out of the Administration category as the first 
half of the doc is conceptual.

Create a new category Resource Management, and put the Admission Control and 
Configuring Admission Control docs under the new category.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-8057) Unhelpful messages in impalad.INFO log

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737690#comment-16737690
 ] 

Paul Rogers commented on IMPALA-8057:
-

Thanks [~tarmstrong]. Am in the middle of other stuff, but I'll loop back 
around to this and take your suggestion.

> Unhelpful messages in impalad.INFO log
> --
>
> Key: IMPALA-8057
> URL: https://issues.apache.org/jira/browse/IMPALA-8057
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Am analyzing a log. I seem many lines of the form:
> {noformat}
> I0108 15:27:04.868429  4182 udf.cc:446] Allocate Local: 
> FunctionContext=0x7f9745a24430 size=6 result=0x7f9754a08928
> {noformat}
> and
> {noformat}
>   0x7f96ea0cc400
>   0x7f96ea0cc420
>   0x7f96ea0cc440
> {noformat}
> In the current form, these don't convey much information, especially the 
> numbers.
> This is probably a trace-level log. But, even there, just displaying hex 
> numbers is not super informative.
> Perhaps the log messages need a good scrub to ensure that they are useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8059:


 Summary: TestWebPage::test_backend_states is flaky
 Key: IMPALA-8059
 URL: https://issues.apache.org/jira/browse/IMPALA-8059
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
it's state is verified by the python test. Here are the relevant log: 

{code:java}
07:33:45 - Captured stderr call 
-
07:33:45 -- executing async: localhost:21000
07:33:45 select sleep(1) from functional.alltypes limit 1;
07:33:45 
07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
7f46f15ed4d6d0f6:4d58cdbc
07:33:45 -- getting state for operation: 

{code}


This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737681#comment-16737681
 ] 

Paul Rogers edited comment on IMPALA-8058 at 1/9/19 12:25 AM:
--

Proposed fix:

* Retain the existing code, except...
* If the estimated row width is less than 1, return -1 as the estimate.
* In the HBase scan node, if we get back -1 from the cardinality estimator, use 
the row count from HMS table stats.

Multiply the total row count by filter selectivity to get scan cardinality.

Note that the existing HBase scan node double-counts filter cardinality:

* It uses the key range estimator described above to estimate the rows in that 
range.
* Applies the predicate selectivity a second time in the scan node.

So, a further fix is to:

* Apply all predicates only if we are using the HMS table stats row count.
* Apply only non-key predicate selectivity if we are using the (smaller) key 
range row count.


was (Author: paul.rogers):
Proposed fix:

* Retain the existing code, except...
* If the estimated row width is less than 1, return -1 as the estimate.
* In the HBase scan node, if we get back -1 from the cardinality estimator, use 
the row count from HMS table stats.

Multiply the total row count by filter selectivity to get scan cardinality.

Note that the existing HBase scan node double-counts filter cardinality:

* It uses the key range estimator described above to estimate the rows in that 
range.
* Applies the predicate selectivity a second time in the scan node.

So, a further fix is to:

* Apply all predicates only if we are using the HMS table stats row count.
* Apply only non-key predicate selectivity if we are using the (smaller) key 
range row count.

> HBase scan cardinality division-by-zero leads to bogus cardinality
> --
>
> Key: IMPALA-8058
> URL: https://issues.apache.org/jira/browse/IMPALA-8058
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Major
>
> A particular HBase query has highly selective key filters and runs into code 
> bugs that produce a bogus, huge cardinality value.
> {{HbaseScanNode.computeStats()}} attempts to compute table cardinality by 
> calling {{HBaseTable.getEstimatedRowStats()}}. This then calls into (in the 
> latest versions) {{FeHBaseTable.getEstimatedRowStats()}}.
> This code tries to estimate cardinality by:
> * Scanning a set of regions.
> * For each getting the size.
> * Averaging a bunch of rows to estimate row width.
> Once we know the size of the regions we need to scan, and the average row 
> width, we can compute the scan cardinality.
> The problem in this particular query is that the predicates are so selective 
> that no regions match. As a result, the average row width is zero. We divide 
> (as a double) the region size by 0 and get INF. We cast that to a long and 
> get Long.MAX_VALUE. We then use that as our (highly bogus) cardinality 
> estimate.
> The code must:
> * Detect the division-by-zero (now sample rows) case.
> * Use an alternative estimate (such as multiplying total table row count from 
> HMS by the filter selectivity.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737681#comment-16737681
 ] 

Paul Rogers commented on IMPALA-8058:
-

Proposed fix:

* Retain the existing code, except...
* If the estimated row width is less than 1, return -1 as the estimate.
* In the HBase scan node, if we get back -1 from the cardinality estimator, use 
the row count from HMS table stats.

Multiply the total row count by filter selectivity to get scan cardinality.

Note that the existing HBase scan node double-counts filter cardinality:

* It uses the key range estimator described above to estimate the rows in that 
range.
* Applies the predicate selectivity a second time in the scan node.

So, a further fix is to:

* Apply all predicates only if we are using the HMS table stats row count.
* Apply only non-key predicate selectivity if we are using the (smaller) key 
range row count.

> HBase scan cardinality division-by-zero leads to bogus cardinality
> --
>
> Key: IMPALA-8058
> URL: https://issues.apache.org/jira/browse/IMPALA-8058
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Major
>
> A particular HBase query has highly selective key filters and runs into code 
> bugs that produce a bogus, huge cardinality value.
> {{HbaseScanNode.computeStats()}} attempts to compute table cardinality by 
> calling {{HBaseTable.getEstimatedRowStats()}}. This then calls into (in the 
> latest versions) {{FeHBaseTable.getEstimatedRowStats()}}.
> This code tries to estimate cardinality by:
> * Scanning a set of regions.
> * For each getting the size.
> * Averaging a bunch of rows to estimate row width.
> Once we know the size of the regions we need to scan, and the average row 
> width, we can compute the scan cardinality.
> The problem in this particular query is that the predicates are so selective 
> that no regions match. As a result, the average row width is zero. We divide 
> (as a double) the region size by 0 and get INF. We cast that to a long and 
> get Long.MAX_VALUE. We then use that as our (highly bogus) cardinality 
> estimate.
> The code must:
> * Detect the division-by-zero (now sample rows) case.
> * Use an alternative estimate (such as multiplying total table row count from 
> HMS by the filter selectivity.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8059:


 Summary: TestWebPage::test_backend_states is flaky
 Key: IMPALA-8059
 URL: https://issues.apache.org/jira/browse/IMPALA-8059
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
it's state is verified by the python test. Here are the relevant log: 

{code:java}
07:33:45 - Captured stderr call 
-
07:33:45 -- executing async: localhost:21000
07:33:45 select sleep(1) from functional.alltypes limit 1;
07:33:45 
07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
7f46f15ed4d6d0f6:4d58cdbc
07:33:45 -- getting state for operation: 

{code}


This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7832) Support IF NOT EXISTS in alter table add columns

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737671#comment-16737671
 ] 

Paul Rogers commented on IMPALA-7832:
-

Turns out our existing syntax is:

{noformat}
ALTER TABLE  (REPLACE|ADD) COLUMNS ( ... )
{noformat}

That is, our syntax allows multiple columns per statement, not just one as in 
the ISO SQL. So, we could do two things:

* Add ISO SQL syntax that [~grahn] provided.
* Modify our existing syntax as shown in the original description.

This ensures that we are both SQL-compliant and backward-compatible. I believe 
that the two syntax variations can co-exist, I don't believe that they lead to 
ambiguities, but the parser generator will tell us if they are.

> Support IF NOT EXISTS in alter table add columns
> 
>
> Key: IMPALA-7832
> URL: https://issues.apache.org/jira/browse/IMPALA-7832
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: ramp-up
>
> alter table  add [if not exists] columns (  [,  
> ...])
> would add the column only if a column of the same name does not already exist
> Probably worth checking out what other databases do in different situations, 
> eg. if the column already exists but with a different type, if "replace" is 
> used instead of "add", etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

2019-01-08 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8058:
---

 Summary: HBase scan cardinality division-by-zero leads to bogus 
cardinality
 Key: IMPALA-8058
 URL: https://issues.apache.org/jira/browse/IMPALA-8058
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers


A particular HBase query has highly selective key filters and runs into code 
bugs that produce a bogus, huge cardinality value.

{{HbaseScanNode.computeStats()}} attempts to compute table cardinality by 
calling {{HBaseTable.getEstimatedRowStats()}}. This then calls into (in the 
latest versions) {{FeHBaseTable.getEstimatedRowStats()}}.

This code tries to estimate cardinality by:

* Scanning a set of regions.
* For each getting the size.
* Averaging a bunch of rows to estimate row width.

Once we know the size of the regions we need to scan, and the average row 
width, we can compute the scan cardinality.

The problem in this particular query is that the predicates are so selective 
that no regions match. As a result, the average row width is zero. We divide 
(as a double) the region size by 0 and get INF. We cast that to a long and get 
Long.MAX_VALUE. We then use that as our (highly bogus) cardinality estimate.

The code must:

* Detect the division-by-zero (now sample rows) case.
* Use an alternative estimate (such as multiplying total table row count from 
HMS by the filter selectivity.)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

2019-01-08 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8058:
---

 Summary: HBase scan cardinality division-by-zero leads to bogus 
cardinality
 Key: IMPALA-8058
 URL: https://issues.apache.org/jira/browse/IMPALA-8058
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers


A particular HBase query has highly selective key filters and runs into code 
bugs that produce a bogus, huge cardinality value.

{{HbaseScanNode.computeStats()}} attempts to compute table cardinality by 
calling {{HBaseTable.getEstimatedRowStats()}}. This then calls into (in the 
latest versions) {{FeHBaseTable.getEstimatedRowStats()}}.

This code tries to estimate cardinality by:

* Scanning a set of regions.
* For each getting the size.
* Averaging a bunch of rows to estimate row width.

Once we know the size of the regions we need to scan, and the average row 
width, we can compute the scan cardinality.

The problem in this particular query is that the predicates are so selective 
that no regions match. As a result, the average row width is zero. We divide 
(as a double) the region size by 0 and get INF. We cast that to a long and get 
Long.MAX_VALUE. We then use that as our (highly bogus) cardinality estimate.

The code must:

* Detect the division-by-zero (now sample rows) case.
* Use an alternative estimate (such as multiplying total table row count from 
HMS by the filter selectivity.)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7956) Avoid ad-hoc SQL parsing in shell

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7956:
--
Summary: Avoid ad-hoc SQL parsing in shell  (was: Use Impala SQL parser in 
Impala shell)

> Avoid ad-hoc SQL parsing in shell
> -
>
> Key: IMPALA-7956
> URL: https://issues.apache.org/jira/browse/IMPALA-7956
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.2.0
>Reporter: Fredy Wijaya
>Priority: Major
>
> Impala shell uses regular expression instead of a real SQL parser to 
> determine whether the with clause is a DML statement or not: 
> https://github.com/apache/impala/blob/ecf12bec42e11262b88dc0993e375fe4d8acaafb/shell/impala_shell.py#L1157.
>  We need to investigate on using Impala SQL parser instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7956) Avoid ad-hoc SQL parsing in shell

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7956:
--
Description: 
Impala shell uses regular expression instead of a real SQL parser to determine 
whether the with clause is a DML statement or not: 
https://github.com/apache/impala/blob/ecf12bec42e11262b88dc0993e375fe4d8acaafb/shell/impala_shell.py#L1157.

The shell does need to be able to correctly split SQL statements, but we'd like 
the rest of the SQL parsing to be done server-side.

  was:Impala shell uses regular expression instead of a real SQL parser to 
determine whether the with clause is a DML statement or not: 
https://github.com/apache/impala/blob/ecf12bec42e11262b88dc0993e375fe4d8acaafb/shell/impala_shell.py#L1157.
 We need to investigate on using Impala SQL parser instead.


> Avoid ad-hoc SQL parsing in shell
> -
>
> Key: IMPALA-7956
> URL: https://issues.apache.org/jira/browse/IMPALA-7956
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.2.0
>Reporter: Fredy Wijaya
>Priority: Major
>
> Impala shell uses regular expression instead of a real SQL parser to 
> determine whether the with clause is a DML statement or not: 
> https://github.com/apache/impala/blob/ecf12bec42e11262b88dc0993e375fe4d8acaafb/shell/impala_shell.py#L1157.
> The shell does need to be able to correctly split SQL statements, but we'd 
> like the rest of the SQL parsing to be done server-side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7832) Support IF NOT EXISTS in alter table add columns

2019-01-08 Thread Greg Rahn (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737655#comment-16737655
 ] 

Greg Rahn commented on IMPALA-7832:
---

The ANSI SQL standard syntax for this would be:
{noformat}
ALTER TABLE [ IF EXISTS ] ADD [ COLUMN ] [ IF NOT EXISTS ] column_name 
data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
{noformat}
For example:
{noformat}
alter table t 
add column if not exists c2 int,
add column if not exists c3 int;
{noformat}

> Support IF NOT EXISTS in alter table add columns
> 
>
> Key: IMPALA-7832
> URL: https://issues.apache.org/jira/browse/IMPALA-7832
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: ramp-up
>
> alter table  add [if not exists] columns (  [,  
> ...])
> would add the column only if a column of the same name does not already exist
> Probably worth checking out what other databases do in different situations, 
> eg. if the column already exists but with a different type, if "replace" is 
> used instead of "add", etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8057) Unhelpful messages in impalad.INFO log

2019-01-08 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737652#comment-16737652
 ] 

Tim Armstrong commented on IMPALA-8057:
---

I doubt you'll get much pushback if you just delete these logs. 

> Unhelpful messages in impalad.INFO log
> --
>
> Key: IMPALA-8057
> URL: https://issues.apache.org/jira/browse/IMPALA-8057
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Am analyzing a log. I seem many lines of the form:
> {noformat}
> I0108 15:27:04.868429  4182 udf.cc:446] Allocate Local: 
> FunctionContext=0x7f9745a24430 size=6 result=0x7f9754a08928
> {noformat}
> and
> {noformat}
>   0x7f96ea0cc400
>   0x7f96ea0cc420
>   0x7f96ea0cc440
> {noformat}
> In the current form, these don't convey much information, especially the 
> numbers.
> This is probably a trace-level log. But, even there, just displaying hex 
> numbers is not super informative.
> Perhaps the log messages need a good scrub to ensure that they are useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8057) Unhelpful messages in impalad.INFO log

2019-01-08 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8057:
---

 Summary: Unhelpful messages in impalad.INFO log
 Key: IMPALA-8057
 URL: https://issues.apache.org/jira/browse/IMPALA-8057
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers


Am analyzing a log. I seem many lines of the form:

{noformat}
I0108 15:27:04.868429  4182 udf.cc:446] Allocate Local: 
FunctionContext=0x7f9745a24430 size=6 result=0x7f9754a08928
{noformat}

and

{noformat}
  0x7f96ea0cc400
  0x7f96ea0cc420
  0x7f96ea0cc440
{noformat}

In the current form, these don't convey much information, especially the 
numbers.

This is probably a trace-level log. But, even there, just displaying hex 
numbers is not super informative.

Perhaps the log messages need a good scrub to ensure that they are useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8057) Unhelpful messages in impalad.INFO log

2019-01-08 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8057:
---

 Summary: Unhelpful messages in impalad.INFO log
 Key: IMPALA-8057
 URL: https://issues.apache.org/jira/browse/IMPALA-8057
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers


Am analyzing a log. I seem many lines of the form:

{noformat}
I0108 15:27:04.868429  4182 udf.cc:446] Allocate Local: 
FunctionContext=0x7f9745a24430 size=6 result=0x7f9754a08928
{noformat}

and

{noformat}
  0x7f96ea0cc400
  0x7f96ea0cc420
  0x7f96ea0cc440
{noformat}

In the current form, these don't convey much information, especially the 
numbers.

This is probably a trace-level log. But, even there, just displaying hex 
numbers is not super informative.

Perhaps the log messages need a good scrub to ensure that they are useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7832) Support IF NOT EXISTS in alter table add columns

2019-01-08 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7832 started by Fredy Wijaya.

> Support IF NOT EXISTS in alter table add columns
> 
>
> Key: IMPALA-7832
> URL: https://issues.apache.org/jira/browse/IMPALA-7832
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: ramp-up
>
> alter table  add [if not exists] columns (  [,  
> ...])
> would add the column only if a column of the same name does not already exist
> Probably worth checking out what other databases do in different situations, 
> eg. if the column already exists but with a different type, if "replace" is 
> used instead of "add", etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8055) run-tests.py reports tests as passed even if the did not

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737462#comment-16737462
 ] 

Paul Rogers commented on IMPALA-8055:
-

Oddly, found that a different test, {{metadata/test_stats_extrapolation.py}}, 
does fail as expected. It also has a ".test" file and failed due to changes in 
the EXPLAIN format. However, this test failure did percolate to the 
{{run-tests.py}} output:

{noformat}
=== 1 failed, 1 passed in 65.40 seconds =
{noformat}

But, not to the final output:

{noformat}
= 2 passed in 0.05 seconds ==
{noformat}

That is, {{run-tests.py}} when run with only this one test, produces both lines 
of output.

My guess is that certain stats are not getting propagated from one level of 
scripts to another. It may be that {{test_explain.py}} does not "publish" its 
failures, but {{test_stats_extrapolation.py}} does.

> run-tests.py reports tests as passed even if the did not
> 
>
> Key: IMPALA-8055
> URL: https://issues.apache.org/jira/browse/IMPALA-8055
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Been mucking about with the EXPLAIN output format which required rebasing a 
> bunch of tests on the new format. PlannerTest is fine: it clearly fails when 
> the expected ".test" files don't match the new "actual" files.
> When run on Jenkins in "pre-review" mode, the build does fail if a Python 
> end-to-end test fails. But, the job seems to give up at that point, not 
> running other tests and finding more problems. (There were three separate 
> test cases that needed fixing; took multiple runs to find them.)
> When run on my dev box, I get the following (highly abbreviated) output:
> {noformat}
> '|  in pipelines: 00(GETNEXT)' != '|  row-size=402B cardinality=5.76M'
> ...
> [gw3] PASSED 
> metadata/test_explain.py::TestExplain::test_explain_level0[protocol: beeswax 
> | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/none] 
> ...
>  6 passed in 68.63 seconds =
> {noformat}
> I've learned that "passed" means "maybe failed" and to go back and inspect 
> the actual output to figure out if the test did, indeed, fail. I suspect 
> "passed" means "didn't crash" rather than "tests worked."
> Would be very helpful to plumb the failure through to the summary line so it 
> said "3 passed, 3 failed" or whatever. Would be a huge time-saver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8055) run-tests.py reports tests as passed even if the did not

2019-01-08 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737439#comment-16737439
 ] 

Paul Rogers commented on IMPALA-8055:
-

Thanks [~philip] for taking a look at this one. Would do so myself but I'm 
knee-deep in other issues at the moment.

Agree that we don't need 100s of test failures because something basic is 
broken. On the other hand a limit of 1 failure might be a bit low. Perhaps 
somewhere around 10 failures before giving up might be more helpful.

Easy to reproduce. Fine "explain-level1.test" and modify one of the expected 
lines so that the test will fail. Maybe change this:

{noformat}
'   runtime filters: RF000 -> l_orderkey'
{noformat}

To this:

{noformat}
'   runtime filters: RF000 -> bogus'
{noformat}

Do the same for "explain-level2.test". Run this in your dev environment with:

{noformat}
${IMPALA_HOME}/tests/run-tests.py -s --update_results metadata/test_explain.py
{noformat}

You'll get the results shown in the description.

Now, modify one other test as well, say {{test_compute_stats.py}}. Upload a 
fake patch and run the pre-review tests. If the pattern holds, the output will 
show one of the failures, but not the other. Fix the failure and rerun. Now the 
other failure will break the build.

> run-tests.py reports tests as passed even if the did not
> 
>
> Key: IMPALA-8055
> URL: https://issues.apache.org/jira/browse/IMPALA-8055
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Been mucking about with the EXPLAIN output format which required rebasing a 
> bunch of tests on the new format. PlannerTest is fine: it clearly fails when 
> the expected ".test" files don't match the new "actual" files.
> When run on Jenkins in "pre-review" mode, the build does fail if a Python 
> end-to-end test fails. But, the job seems to give up at that point, not 
> running other tests and finding more problems. (There were three separate 
> test cases that needed fixing; took multiple runs to find them.)
> When run on my dev box, I get the following (highly abbreviated) output:
> {noformat}
> '|  in pipelines: 00(GETNEXT)' != '|  row-size=402B cardinality=5.76M'
> ...
> [gw3] PASSED 
> metadata/test_explain.py::TestExplain::test_explain_level0[protocol: beeswax 
> | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/none] 
> ...
>  6 passed in 68.63 seconds =
> {noformat}
> I've learned that "passed" means "maybe failed" and to go back and inspect 
> the actual output to figure out if the test did, indeed, fail. I suspect 
> "passed" means "didn't crash" rather than "tests worked."
> Would be very helpful to plumb the failure through to the summary line so it 
> said "3 passed, 3 failed" or whatever. Would be a huge time-saver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2019-01-08 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737441#comment-16737441
 ] 

Philip Zeyliger commented on IMPALA-7992:
-

Then how can we be fixed by reducing the number of iterations?

> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread: Started query 
> 504edbac7ecb32ce:bfbbbe93
> ~ Stack of  (140061483271936) 
> ~
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 277, in _perform_spawn
> reply.run()
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 213, in run
> self._result = func(*args, **kwargs)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 954, in _thread_receiver
> msg = Message.from_io(io)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 418, in from_io
> header = io.read(9)  # type 1, channel 4, payload 4
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 386, in read
> data = self._read(numbytes-len(buf))
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7832) Support IF NOT EXISTS in alter table add columns

2019-01-08 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reassigned IMPALA-7832:


Assignee: Fredy Wijaya

> Support IF NOT EXISTS in alter table add columns
> 
>
> Key: IMPALA-7832
> URL: https://issues.apache.org/jira/browse/IMPALA-7832
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: ramp-up
>
> alter table  add [if not exists] columns (  [,  
> ...])
> would add the column only if a column of the same name does not already exist
> Probably worth checking out what other databases do in different situations, 
> eg. if the column already exists but with a different type, if "replace" is 
> used instead of "add", etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8007) test_slow_subscriber is flaky

2019-01-08 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737398#comment-16737398
 ] 

Tim Armstrong commented on IMPALA-8007:
---

[~poojanilangekar] that's a nice find with time.sleep(), I think we've 
generally assumed that it sleeps a minimum of time. I suspect that this is 
probably rare in our tests, but hard to know.

The "right" way to do it, I think, would be to put a loop around time.sleep() 
and loop until enough time has elapsed. Unfortunately that would depend on 
using a monotonic clock, which it doesn't look like was exposed in python until 
3.something.

I think we could go with 1) except just validate that the first time is higher 
than the last time. E.g. collect 20 samples at 100ms intervals . It's 
theoretically possible that time.sleep() could return in less than a ms each 
time, but vanishingly unlikely.

> test_slow_subscriber is flaky
> -
>
> Key: IMPALA-8007
> URL: https://issues.apache.org/jira/browse/IMPALA-8007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber assert 
> (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 
> 8.8043 < 6.0 E+  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
> assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8043 < 6.0
> E+  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > 
> float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E+  where 5.0 = float((6 
> - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
> assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E+  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is 
> committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8056) Impala accepts plus in front of string value

2019-01-08 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8056:
--
Component/s: Frontend

> Impala accepts plus in front of string value
> 
>
> Key: IMPALA-8056
> URL: https://issues.apache.org/jira/browse/IMPALA-8056
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Andrejs Dubovskis
>Priority: Minor
>
> Impala accepts plus in front of string value and reject minus.
> See the output for the corresponding  queries:
> {code}
> Server version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 23f574543323301846b41fa5433690df32efe085)
> Query: select "a",+"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> Query progress can be monitored at: 
> http://catdn009:25000/query_plan?query_id=2640632c29c812c7:905a9fcd
> a b
> Fetched 1 row(s) in 0.01s
> Query: select "a",-"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> ERROR: AnalysisException: Arithmetic operation requires numeric operands: -1 
> * 'b'
> Could not execute command: select "a",-"b"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8056) Impala accepts plus in front of string value

2019-01-08 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737371#comment-16737371
 ] 

Tim Armstrong commented on IMPALA-8056:
---

[~dubislv] are you expecting that the first query should fail with an analysis 
error? Seems like that's what it should do, but the issue is fairly benign.

> Impala accepts plus in front of string value
> 
>
> Key: IMPALA-8056
> URL: https://issues.apache.org/jira/browse/IMPALA-8056
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Andrejs Dubovskis
>Priority: Minor
>
> Impala accepts plus in front of string value and reject minus.
> See the output for the corresponding  queries:
> {code}
> Server version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 23f574543323301846b41fa5433690df32efe085)
> Query: select "a",+"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> Query progress can be monitored at: 
> http://catdn009:25000/query_plan?query_id=2640632c29c812c7:905a9fcd
> a b
> Fetched 1 row(s) in 0.01s
> Query: select "a",-"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> ERROR: AnalysisException: Arithmetic operation requires numeric operands: -1 
> * 'b'
> Could not execute command: select "a",-"b"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7816) Race condition in HdfsScanNodeBase::StopAndFinalizeCounters

2019-01-08 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737237#comment-16737237
 ] 

Sahil Takiar commented on IMPALA-7816:
--

Would it make more sense to just wait for all the scanners to call 
{{HdfsParquetScanner::Close}} before calling 
{{HdfsScanNodeBase::StopAndFinalizeCounters}}? Seems odd that a scan-node can 
be closed before its corresponding scanners get closed. Doing this would make 
the code easier to understand, I would guess most devs would assume this to be 
true, which is probably how the bug was introduced in the first place. There 
might be other race conditions in the code as well due to this behavior, 
although I haven't been able to produce any more.

> Race condition in HdfsScanNodeBase::StopAndFinalizeCounters
> ---
>
> Key: IMPALA-7816
> URL: https://issues.apache.org/jira/browse/IMPALA-7816
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>  Labels: parquet
>
> While working on IMPALA-6964, I noticed that sometimes the runtime profile 
> for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and 
> sometimes it won't (depending on the query). However, looking at the code, 
> any scan of Parquet files should include this line.
> I debugged the code and there seems to a be a race condition where 
> {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before 
> {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes 
> the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls 
> {{HdfsScanNodeBase::RangeComplete}} which updates the shared object 
> {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so 
> {{StopAndFinalizeCounters}} will write out the contents of 
> {{file_type_counts_}} before all scanners can update it).
> {{StopAndFinalizeCounters}} can be called in two places: 
> {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be 
> called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the 
> query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} 
> once the limit is reached, but not necessarily before the scanners are closed.
> I'm able to re-produce this locally by using the queries:
> {code:java}
>  select * from functional_parquet.lineitem_sixblocks limit 10 {code}
> The runtime profile does not include {{File Formats}}
> {code:java}
>  select * from functional_parquet.lineitem_sixblocks order by l_orderkey 
> limit 10 {code}
> The runtime profile does include {{File Formats}}
> I tried to simply remove the call to {{StopAndFinalizeCounters}} from 
> {{GetNext}} but that doesn't seem to work. It actually caused several other 
> RP messages to get deleted (not entirely sure why).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8043) ExprTest fails on Ubuntu 16 when the timezone is America/Los_Angeles

2019-01-08 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8043 started by Attila Jeges.

> ExprTest fails on Ubuntu 16 when the timezone is America/Los_Angeles
> 
>
> Key: IMPALA-8043
> URL: https://issues.apache.org/jira/browse/IMPALA-8043
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Attila Jeges
>Priority: Major
>
> When setting up an Ubuntu development environment, I installed Ubuntu 16.04 
> using America/Los_Angeles as the timezone. When running on this 
> configuration, ExprTest.TimestampFunctions() fails with several timezone 
> issues:
> {noformat}
> [ RUN ] ExprTest.TimestampFunctions
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:559: Failure
> Value of: ConvertValue(result)
> Actual: -28800
> Expected: expected_result
> Which is: 0
> unix_timestamp(cast('1969-12-31 16:00:00' as timestamp))
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:559: Failure
> Value of: ConvertValue(result)
> Actual: -28800
> Expected: expected_result
> Which is: 0
> unix_timestamp('1969-12-31 16:00:00')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:559: Failure
> Value of: ConvertValue(result)
> Actual: -28800
> Expected: expected_result
> Which is: 0
> unix_timestamp('1969-12-31 16:00:00', '-MM-dd HH:mm:ss')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:559: Failure
> Value of: ConvertValue(result)
> Actual: 0
> Expected: expected_result
> Which is: 28800
> unix_timestamp('1970-01-01', '-MM-dd')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:559: Failure
> Value of: ConvertValue(result)
> Actual: 0
> Expected: expected_result
> Which is: 28800
> unix_timestamp('1970-01-01 10:10:10', '-MM-dd')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:559: Failure
> Value of: ConvertValue(result)
> Actual: -28800
> Expected: expected_result
> Which is: 0
> unix_timestamp('1969-12-31 16:00:00 extra text', '-MM-dd HH:mm:ss')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 00:00:00"
> Expected: expected_result
> Which is: "1969-12-31 16:00:00"
> cast(cast(0 as timestamp) as string)
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 00:00:00"
> Expected: expected_result
> Which is: "1969-12-31 16:00:00"
> cast(cast(0 as timestamp) as string)
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 00:00:00"
> Expected: expected_result
> Which is: "1969-12-31 16:00:00"
> from_unixtime(0)
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 00:00:00"
> Expected: expected_result
> Which is: "1969-12-31 16:00:00"
> from_unixtime(cast(0 as bigint))
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 00:00:00"
> Expected: expected_result
> Which is: "1969-12-31 16:00:00"
> from_unixtime(0, '-MM-dd HH:mm:ss')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 00:00:00"
> Expected: expected_result
> Which is: "1969-12-31 16:00:00"
> from_unixtime(cast(0 as bigint), '-MM-dd HH:mm:ss')
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:289: Failure
> Value of: GetValue(expr, TYPE_STRING)
> Actual: "1970-01-01 08:00:00"
> Expected: expected_result
> Which is: "1970-01-01 00:00:00"
> cast(to_timestamp(28800) as string)
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:445: Failure
> Expected: (value) <= (end), actual: 1546465102 vs 1546436302
> Google Test trace:
> /home/joe/view2/Impala/be/src/exprs/expr-test.cc:6572: 
> [ FAILED ] ExprTest.TimestampFunctions (3577 ms)
> {noformat}
> There are a couple workarounds to this:
> {noformat}
> # Set UTC as timezone
> sudo timedatectl set-timezone UTC
> # Use actual timezone name (e.g. PST8PDT rather than America/Los_Angeles
> # or CST6CDT rather than America/Chicago)
> sudo timedatectl set-timezone PST8PDT{noformat}
> This looks like it is coming from the tests using 
> TestTimestampUnixEpochConversions(). The tests work on other timezones 
> (Americas/La_Paz, Europe/Paris, Asia/Tokyo all work). It is likely that this 
> is a duplicate of some other known issue.
> Filing this Jira so that others who run into this might find this workaround.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For 

[jira] [Created] (IMPALA-8056) Impala accepts plus in front of string value

2019-01-08 Thread Andrejs Dubovskis (JIRA)
Andrejs Dubovskis created IMPALA-8056:
-

 Summary: Impala accepts plus in front of string value
 Key: IMPALA-8056
 URL: https://issues.apache.org/jira/browse/IMPALA-8056
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0
Reporter: Andrejs Dubovskis


Impala accepts plus in front of string value and reject minus.

See the output for the corresponding  queries:

{code}
Server version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
23f574543323301846b41fa5433690df32efe085)

Query: select "a",+"b"
Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
Query progress can be monitored at: 
http://catdn009:25000/query_plan?query_id=2640632c29c812c7:905a9fcd
a b
Fetched 1 row(s) in 0.01s

Query: select "a",-"b"
Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
ERROR: AnalysisException: Arithmetic operation requires numeric operands: -1 * 
'b'

Could not execute command: select "a",-"b"
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8056) Impala accepts plus in front of string value

2019-01-08 Thread Andrejs Dubovskis (JIRA)
Andrejs Dubovskis created IMPALA-8056:
-

 Summary: Impala accepts plus in front of string value
 Key: IMPALA-8056
 URL: https://issues.apache.org/jira/browse/IMPALA-8056
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0
Reporter: Andrejs Dubovskis


Impala accepts plus in front of string value and reject minus.

See the output for the corresponding  queries:

{code}
Server version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
23f574543323301846b41fa5433690df32efe085)

Query: select "a",+"b"
Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
Query progress can be monitored at: 
http://catdn009:25000/query_plan?query_id=2640632c29c812c7:905a9fcd
a b
Fetched 1 row(s) in 0.01s

Query: select "a",-"b"
Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
ERROR: AnalysisException: Arithmetic operation requires numeric operands: -1 * 
'b'

Could not execute command: select "a",-"b"
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2019-01-08 Thread JIRA


[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737005#comment-16737005
 ] 

Zoltán Borók-Nagy commented on IMPALA-7992:
---

[~philip] No, it is seeded. Python automatically seeds its random generator 
from current time or some operating system-specific random source:

[https://github.com/python/cpython/blob/78392885c9b08021c89649728053d31503d8a509/Lib/random.py#L93]

Unfortunately this behavior is not officially documented anymore. However, this 
stackoverflow comment suggests that it was documented earlier, but somehow got 
removed from the docs: [https://stackoverflow.com/a/817717]

> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread: Started query 
> 504edbac7ecb32ce:bfbbbe93
> ~ Stack of  (140061483271936) 
> ~
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 277, in _perform_spawn
> reply.run()
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 213, in run
> self._result = func(*args, **kwargs)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 954, in _thread_receiver
> msg = Message.from_io(io)
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 418, in from_io
> header = io.read(9)  # type 1, channel 4, payload 4
>   File 
> "/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/repos/Impala/infra/python/env/lib/python2.6/site-packages/execnet/gateway_base.py",
>  line 386, in read
> data = self._read(numbytes-len(buf))
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org