[jira] [Created] (IMPALA-8836) Support COMPUTE STATS on insert only ACID tables

2019-08-06 Thread Csaba Ringhofer (JIRA)
Csaba Ringhofer created IMPALA-8836:
---

 Summary: Support COMPUTE STATS on insert only ACID tables
 Key: IMPALA-8836
 URL: https://issues.apache.org/jira/browse/IMPALA-8836
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend, Frontend
Affects Versions: Impala 3.3.0
Reporter: Csaba Ringhofer
Assignee: Csaba Ringhofer






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8806) Add metrics to improve observability of executor groups

2019-08-06 Thread Bikramjeet Vig (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-8806.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Add metrics to improve observability of executor groups
> ---
>
> Key: IMPALA-8806
> URL: https://issues.apache.org/jira/browse/IMPALA-8806
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: observability
> Fix For: Impala 3.3.0
>
>
> As a follow onĀ IMPALA-8484, it makes sense to add some metrics to provide 
> better observability into the state of executor groups.
> Some metrics can be:
> - number of executor groups with any impalas in them
> - number of healthy executor groups
> - number of backends. Currently we have a python helper that calculates this 
> - get_num_known_live_backends, but it really should be a metric (we could 
> replace the test code with a metric if we did this).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-7486) Admit less memory on dedicated coordinator for admission control purposes

2019-08-06 Thread Bikramjeet Vig (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-7486.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Admit less memory on dedicated coordinator for admission control purposes
> -
>
> Key: IMPALA-7486
> URL: https://issues.apache.org/jira/browse/IMPALA-7486
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: resource-management, scalability
> Fix For: Impala 3.3.0
>
>
> Following on from IMPALA-7349, we should consider handling dedicated 
> coordinators specially rather than admitting a uniform amount of memory on 
> all backends.
> The specific scenario I'm interested in targeting is the case where we a 
> coordinator that is executing many "lightweight" coordinator fragments, e.g. 
> just an ExchangeNode and PlanRootSink, plus maybe other lightweight operators 
> like UnionNode that don't use much memory or CPU. With the current behaviour 
> it's possible for a coordinator to reach capacity from the point-of-view of 
> admission control when at runtime it is actually very lightly loaded.
> This is particularly true if coordinators and executors have different 
> process mem limits. This will be somewhat common since they're often deployed 
> on different hardware or the coordinator will have more memory dedicated to 
> its embedded JVM for the catalog cache.
> More generally we could admit different amounts per backend depending on how 
> many fragments are running, but I think this incremental step would address 
> the most important cases and be a little easier to understand.
> We may want to defer this work until we've implemented distributed runtime 
> filter aggregation, which will significantly reduce coordinator memory 
> pressure, and until we've improved distributed overadmission (since the 
> coordinator behaviour may help throttle overadmission ).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8832) Queries fail to run when connecting to Impala over Knox

2019-08-06 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-8832.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Queries fail to run when connecting to Impala over Knox
> ---
>
> Key: IMPALA-8832
> URL: https://issues.apache.org/jira/browse/IMPALA-8832
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 3.3.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>  Labels: security
> Fix For: Impala 3.3.0
>
>
> Impala recently added support for HTTP clients over HS2. One of the 
> motivations for this work was to allow proxying of connections to Impala 
> through other services such as Apache Knox. 
> However, testing in testing with Knox, it seems that its possible to connect 
> to Impala successfully, but then queries fail to run or results aren't 
> retrieved.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8837) Impala Doc: Document impersonalization via HTTP

2019-08-06 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8837:
---

 Summary: Impala Doc: Document impersonalization via HTTP
 Key: IMPALA-8837
 URL: https://issues.apache.org/jira/browse/IMPALA-8837
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8781) Add additional tests in test_result_spooling.py and validate cancellation logic

2019-08-06 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8781.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

Commit Hash: bbec8fa74961755269298706302477780019e7d5

IMPALA-8781: Result spooling tests to cover edge cases and cancellation

Adds additional tests to test_result_spooling.py to cover various edge
cases when fetching query results (ensure all Impala types are returned
properly, UDFs are evaluated correctly, etc.). A new QueryTest file
result-spooling.test is added to encapsulate all these tests. Tests with
a decreased ROW_BATCH_SIZE are added as well to validate that
BufferedPlanRootSink buffers row batches correctly.

BufferedPlanRootSink requires careful synchronization of the producer
and consumer threads, especially when queries are cancelled. The
TestResultSpoolingCancellation class is dedicated to running
cancellation tests with SPOOL_QUERY_RESULTS = true. The implementation
is heavily borrowed from test_cancellation.py and some of the logic is
re-factored into a new utility class called cancel_utils.py to avoid
code duplication between test_cancellation.py and
test_result_spooling.py.

Testing:
* Looped test_result_spooling.py overnight with no failures
* Core tests passed

Change-Id: Ib3b3a1539c4a5fa9b43c8ca315cea16c9701e283
Reviewed-on: http://gerrit.cloudera.org:8080/13907
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 

> Add additional tests in test_result_spooling.py and validate cancellation 
> logic
> ---
>
> Key: IMPALA-8781
> URL: https://issues.apache.org/jira/browse/IMPALA-8781
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> {{test_result_spooling.py}} currently runs a few basic tests with result 
> spooling enabled. We should add some more to cover all necessary edge cases 
> (ensure all Impala types are returned correctly, UDFs are evaluated 
> correctly, etc.) and add tests to validate the cancellation logic in 
> {{PlanRootSink}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8838) Impala wrote audit log with missing statement_type

2019-08-06 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8838:
-

 Summary: Impala wrote audit log with missing statement_type
 Key: IMPALA-8838
 URL: https://issues.apache.org/jira/browse/IMPALA-8838
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.9.0
Reporter: Tim Armstrong


We saw an audit log with a missing statement_type, where it should have been 
QUERY. Filing a bug to see if this reoccurs and if there is a pattern to it (we 
don't have a way to reproduce or debug now).

{noformat}
{
  "serviceType": "IMPALA", 
  "serviceName": "impala", 
  "extraValues": {
"12345678912345": {
  "status": "", 
  "impersonator": null, 
  "start_time": "2019-01-01 00:00:00.0", 
  "network_address": "123.123.123.123:12345", 
  "authorization_failure": false, 
  "sql_statement": "SELECT NDV_NO_FINALIZE(col) AS col, CAST(-1 as BIGINT), 
8, CAST(8 as DOUBLE), COUNT(col), ... FROM table WHERE (day='2019-01-01') GROUP 
BY day",
  "session_id\\ ": "xx:xx", 
  "query_id": "xxx:xx", 
  "catalog_objects": [
{
  "privilege": "VIEW_METADATA", 
  "object_type": "", 
  "name": "_impala_builtins"
}, 
{
  "privilege": "SELECT", 
  " object_type": "", 
  "name": "table"
}
  ], 
  "statement_type": "", 
  "user": "u...@realm.net"
}
  }
}
{noformat}

statement_type is printed here:
https://github.com/cloudera/Impala/blob/cdh5-2.9.0_5.12.2/be/src/service/impala-server.cc#L474

It calls out to the function which prints an enum 
here:https://github.com/cloudera/Impala/blob/cdh5-2.9.0_5.12.2/be/src/util/debug-util.cc#L68.
 The only way it can produce an empty string is if the enum value is 
out-of-range, which shouldn't be possible unless we're reading an uninitialised 
value or the memory is somehow corrupted. However, all the surrounding fields 
in the TExecRequest object look like they were written out to the audit log OK

The code has changed a bit in master because of the thrift version upgrade, but 
it is still equivalent as far as I can see.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8534) Enable data cache by default for end-to-end containerised tests

2019-08-06 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8534.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Enable data cache by default for end-to-end containerised tests
> ---
>
> Key: IMPALA-8534
> URL: https://issues.apache.org/jira/browse/IMPALA-8534
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Following on from IMPALA-8121, I don't think we can enable the data cache by 
> default, since it depends on what volumes are available to the container at 
> runtime. But we should definitely enable it for tests.
> [~kwho] said 
> {quote}When I tested with the data cache enabled in a mini-cluster with 3 
> node using the default scale of workload, I ran with 500 MB with 1 partition 
> by running
>  start-impala-cluster.py --data_cache_dir=/tmp --data_cache_size=500MB
> You can also a pre-existing directory as the startup flag of Impala like
> --data_cache=/tmp/data-cache-0:500MB
> {quote}
> start-impala-cluster.py already mounts some host directories into the 
> container, so we could either do the same for the data cache, or just depend 
> on the container root filesystem (which is likely to be slow, unfortunately).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8839) Impala writing data to tables should not lead to incorrect results in Hive

2019-08-06 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created IMPALA-8839:


 Summary: Impala writing data to tables should not lead to 
incorrect results in Hive
 Key: IMPALA-8839
 URL: https://issues.apache.org/jira/browse/IMPALA-8839
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.3.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen


This include partitioned/unpartitioned tables:
The proposed solution for this issue is that when Impala writes data to an 
unpartitioned table, it should update 'COLUMN_STATS_ACCURATE' json structure in 
table properties by removing its 'COLUMN_STATS' nested field (this will end up 
in TABLE_PARAMS table in HMS).

The proposed solution for this issue is that when Impala writes data to a 
partitioned table, it should update 'COLUMN_STATS_ACCURATE' json structure by 
removing its 'COLUMN_STATS' nested field in the properties of the partitions 
where data was inserted (PARTITION_PARAMS table in HMS).






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IMPALA-8840) Check failed: num_bytes <= sizeof(T) (5 vs. 4)

2019-08-06 Thread Xiaomeng Zhang (JIRA)
Xiaomeng Zhang created IMPALA-8840:
--

 Summary: Check failed: num_bytes <= sizeof(T) (5 vs. 4) 
 Key: IMPALA-8840
 URL: https://issues.apache.org/jira/browse/IMPALA-8840
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Xiaomeng Zhang
Assignee: Daniel Becker


Not sure if this is due to same issue asĀ 
https://issues.apache.org/jira/browse/IMPALA-8833#, the error message is a 
little different.
{code:java}
F0805 18:48:08.737411 5488 bit-stream-utils.inline.h:173] 
284731e5d1aad693:05c883020001] Check failed: num_bytes <= sizeof(T) (8 vs. 
4)
*** Check failure stack trace: ***
@ 0x52fb9bc google::LogMessage::Fail()
@ 0x52fd261 google::LogMessage::SendToLog()
@ 0x52fb396 google::LogMessage::Flush()
@ 0x52fe95d google::LogMessageFatal::~LogMessageFatal()
@ 0x2b2b867 impala::BatchedBitReader::GetBytes<>()
@ 0x2aeda65 impala::RleBatchDecoder<>::NextCounts()
@ 0x2a82896 impala::RleBatchDecoder<>::NextNumRepeats()
@ 0x2b1927f impala::ScalarColumnReader<>::ReadSlotsNoConversion()
@ 0x2ac7c2c impala::ScalarColumnReader<>::ReadSlots()
@ 0x2a7b861 
impala::ScalarColumnReader<>::MaterializeValueBatchRepeatedDefLevel()
@ 0x2a5b3b0 impala::ScalarColumnReader<>::ReadValueBatch<>()
@ 0x2a256a4 impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch()
@ 0x29b6eb6 impala::HdfsParquetScanner::AssembleRows()
@ 0x29b1cf8 impala::HdfsParquetScanner::GetNextInternal()
@ 0x29afc70 impala::HdfsParquetScanner::ProcessSplit()
@ 0x2494bc3 impala::HdfsScanNode::ProcessSplit()
@ 0x2493d98 impala::HdfsScanNode::ScannerThread()
@ 0x2493121 
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@ 0x24956e9 
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@ 0x1ea0241 boost::function0<>::operator()()
@ 0x23de77a impala::Thread::SuperviseThread()
@ 0x23e6afe boost::_bi::list5<>::operator()<>()
@ 0x23e6a22 boost::_bi::bind_t<>::operator()()
@ 0x23e69e5 boost::detail::thread_data<>::run()
@ 0x4224819 thread_proxy
@ 0x7fc1818c5e24 start_thread
@ 0x7fc17e01f34c __clone

{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IMPALA-8376) Add per-directory limits for scratch disk usage

2019-08-06 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8376.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Add per-directory limits for scratch disk usage
> ---
>
> Key: IMPALA-8376
> URL: https://issues.apache.org/jira/browse/IMPALA-8376
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.3.0
>
>
> The current syntax is:
> {noformat}
> --scratch_dirs=/data/1/impala/impalad,/data/10/impala/impalad,/data/11/impala/impalad,/data/2/impala/impalad,/data/3/impala/impalad,/data/4/impala/impalad,/data/5/impala/impalad,/data/6/impala/impalad,/data/7/impala/impalad,/data/8/impala/impalad,/data/9/impala/impalad,/data/12/impala/impalad
> {noformat}
> The current syntax for the data cache is
> {noformat}
> --data_cache_dir=/tmp --data_cache_size=500MB
> {noformat}
> One idea is to allow optionally specifying the limit after each directory:
> {noformat}
> --scratch_dirs=/data/1/impala/impalad:500MB,/data/10/impala/impalad:2GB,/data/11/impala/impalad
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)