[jira] [Updated] (IMPALA-9006) Consolidate the Statestore subscriber's retry logic

2019-12-06 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-9006:
---
Attachment: 76c83e9.diff

> Consolidate the Statestore subscriber's retry logic
> ---
>
> Key: IMPALA-9006
> URL: https://issues.apache.org/jira/browse/IMPALA-9006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Minor
> Attachments: 76c83e9.diff
>
>
> Currently, a Statestore subscriber starts a separate thread after the initial 
> registration with Statestore to periodically check if the Statestore may have 
> failed and re-registered with Statestore if necessary. Similarly, the 
> function {{StatestoreSubscriber::Register()}} also relies on the old Thrift 
> client's retry logic to retry failed RPC attempts to Statestore. This is 
> needed as the initial registration relies on this retry logic to wait for 
> Statestore to startup in case an Impala daemon starts before the Statestore. 
> These two retry paths may be consolidated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3189) Address scalability issue with N^2 KDC requests on cluster startup

2019-12-06 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990098#comment-16990098
 ] 

Michael Ho commented on IMPALA-3189:


Hi [~tlipcon], we still saw that in the cold startup case even with KRPC under 
a large enough scale (e.g. 300+ nodes). It will manifest as some sort of 
negotiation error and we had to increase the timeout or something to work 
around it (see IMPALA-5901)

> Address scalability issue with N^2 KDC requests on cluster startup
> --
>
> Key: IMPALA-3189
> URL: https://issues.apache.org/jira/browse/IMPALA-3189
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec, Security
>Affects Versions: Impala 2.5.0
>Reporter: Henry Robinson
>Priority: Critical
>  Labels: kerberos, scalability
>
> When Impala runs a query that shuffles data amongst all nodes in a 
> Kerberos-secured cluster, every node will need to acquire a TGS for every 
> other node. In a cluster of 100 nodes or more, this can overwhelm the KDC, 
> and queries can exit with an error ("Could not contact KDC for realm").
> A simple workaround is to run a warm-up query until it succeeds (which can 
> take a few minutes after cluster startup). The KDC can also be scaled (e.g. 
> with secondary KDC nodes). 
> Impala can also consider either forcing a TGS request on start-up in a 
> staggered fashion, or we can move to recommending SSL + client certificates 
> for server<->server communication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8775) Have the option to delete data cache files on Impala shutdown

2019-12-05 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989284#comment-16989284
 ] 

Michael Ho commented on IMPALA-8775:


One idea is to hook into {{ExecEnv::~ExecEnv()}} in which the RpcMgr and other 
subsystems' shutdown happen. Ideally, we want to support proper shutdown 
sequence of all subsystems which should be invoked by graceful shutdown before 
calling exit(0). The existing state of relying on doing the shutdown in 
destructor of some global objects seems risky as the state of other global 
objects are unknown among other things.

> Have the option to delete data cache files on Impala shutdown
> -
>
> Key: IMPALA-8775
> URL: https://issues.apache.org/jira/browse/IMPALA-8775
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Major
>
> Currently, Impala will delete old data cache files upon restart but it only 
> does so when data cache is enabled. However, if the user turns off the data 
> cache after restart, the old data cache files may be left hanging around 
> until the next restart with the cache enabled. We should have an option to 
> delete the data cache files on Impala shutdown. The initial implementation of 
> the data cache has the unlink-on-create behavior but it was confusing to 
> users as there will be unaccounted usage on the storage side (as the file is 
> not easily visible after unlink). We may want to consider support this 
> "unlink-on-create" behavior behind a flag in case users prefer to have the 
> data cache files removed upon Impala's exit. In the long run, we probably 
> want to have an "exit handler" in Impala to make sure resources are freed up 
> on Impala exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8691) Query hint for disabling data caching

2019-12-05 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989280#comment-16989280
 ] 

Michael Ho commented on IMPALA-8691:


The remaining work is to add support for query-hint for disabling caching at 
table level.

> Query hint for disabling data caching
> -
>
> Key: IMPALA-8691
> URL: https://issues.apache.org/jira/browse/IMPALA-8691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Major
>
> IMPALA-8690 tracks the effort for a better eviction algorithm for the 
> Impala's data cache. As a short term workaround, it would be nice to allow 
> users to explicitly set certain tables as not cacheable via query hints or 
> simply disable caching for a query via query options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8691) Query hint for disabling data caching

2019-12-05 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8691:
--

Assignee: Joe McDonnell  (was: Michael Ho)

> Query hint for disabling data caching
> -
>
> Key: IMPALA-8691
> URL: https://issues.apache.org/jira/browse/IMPALA-8691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Major
>
> IMPALA-8690 tracks the effort for a better eviction algorithm for the 
> Impala's data cache. As a short term workaround, it would be nice to allow 
> users to explicitly set certain tables as not cacheable via query hints or 
> simply disable caching for a query via query options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8775) Have the option to delete data cache files on Impala shutdown

2019-12-05 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8775:
--

Assignee: Joe McDonnell  (was: Michael Ho)

> Have the option to delete data cache files on Impala shutdown
> -
>
> Key: IMPALA-8775
> URL: https://issues.apache.org/jira/browse/IMPALA-8775
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Major
>
> Currently, Impala will delete old data cache files upon restart but it only 
> does so when data cache is enabled. However, if the user turns off the data 
> cache after restart, the old data cache files may be left hanging around 
> until the next restart with the cache enabled. We should have an option to 
> delete the data cache files on Impala shutdown. The initial implementation of 
> the data cache has the unlink-on-create behavior but it was confusing to 
> users as there will be unaccounted usage on the storage side (as the file is 
> not easily visible after unlink). We may want to consider support this 
> "unlink-on-create" behavior behind a flag in case users prefer to have the 
> data cache files removed upon Impala's exit. In the long run, we probably 
> want to have an "exit handler" in Impala to make sure resources are freed up 
> on Impala exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9189) Add command to purge data cache

2019-11-22 Thread Michael Ho (Jira)
Michael Ho created IMPALA-9189:
--

 Summary: Add command to purge data cache
 Key: IMPALA-9189
 URL: https://issues.apache.org/jira/browse/IMPALA-9189
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho


It would be great to have a command to purge the data cache on demand so all 
Impala compute nodes' data cache will be empty. This is useful for say 
experimentation or demo purposes. cc'ing [~drorke], [~joemcdonnell]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9154) KRPC DataStreamService threads blocked in PublishFilter

2019-11-21 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979682#comment-16979682
 ] 

Michael Ho commented on IMPALA-9154:


Given the fix is non-trivial, it may make sense to back out the offending 
change for now.

> KRPC DataStreamService threads blocked in PublishFilter
> ---
>
> Key: IMPALA-9154
> URL: https://issues.apache.org/jira/browse/IMPALA-9154
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Tim Armstrong
>Assignee: Fang-Yu Rao
>Priority: Blocker
>  Labels: hang
> Attachments: image-2019-11-13-08-30-27-178.png, pstack-exchange.txt
>
>
> I hit this on primitive_many_fragments when doing a single node perf run:
> {noformat}
>  ./bin/single_node_perf_run.py --num_impalads=1 --scale=30 --ninja 
> --workloads=targeted-perf  --iterations=5
> {noformat}tan 
> I noticed that the query was hung and the execution threads were hung sending 
> row batches. Then looking at the RPCz page, all of the threads were busy:
>  !image-2019-11-13-08-30-27-178.png! 
> Multiple threads were stuck in UpdateFilter() - see  [^pstack-exchange.txt]. 
> It looks like this is a deadlock bug because a KRPC thread is blocked waiting 
> for an RPC that needs to be served by one of the limited threads from that 
> same thread pool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9180) Remove legacy ImpalaInternalService

2019-11-21 Thread Michael Ho (Jira)
Michael Ho created IMPALA-9180:
--

 Summary: Remove legacy ImpalaInternalService
 Key: IMPALA-9180
 URL: https://issues.apache.org/jira/browse/IMPALA-9180
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Fang-Yu Rao


Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service 
can now be removed. The port 22000 can also be freed up. In addition to code 
change, the doc probably needs to be updated to reflect the fact that 22000 is 
no longer in use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9026) Add an option to use resolved IP address for Statestore subscriber

2019-11-20 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-9026.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Add an option to use resolved IP address for Statestore subscriber
> --
>
> Key: IMPALA-9026
> URL: https://issues.apache.org/jira/browse/IMPALA-9026
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Minor
> Fix For: Impala 3.4.0
>
>
> Currently, statestore subscribers currently register with statestore using 
> their hostnames. There may be certain deployment scenarios in which a pod may 
> have an IP address but its DNS entry is not available for a valid reason 
> (e.g. a Kubernetes pod whose readiness probe returns false). An example could 
> be that there could be more than one instances of an Impala component (e.g. 
> coordinator) but only one of them will be active at a time and the rest will 
> serve as backup. In which case, we still want the backup coordinator to 
> receive any updates from statestore but not serve any queries until the 
> primary coordinator fails.
> To handle the case above, we may allow statestore subscribers to register 
> with statestore using its IP address instead of hostname. This may not work 
> with some existing secure deployment in which TLS is enabled between Impala 
> hosts as there may be mismatch between the hostname used at Thrift layer and 
> the certificate so this option is disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8005) Randomize partitioning exchanges destinations

2019-10-24 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8005:
--

Assignee: Anurag Mantripragada  (was: Michael Ho)

> Randomize partitioning exchanges destinations
> -
>
> Key: IMPALA-8005
> URL: https://issues.apache.org/jira/browse/IMPALA-8005
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Michael Ho
>Assignee: Anurag Mantripragada
>Priority: Major
>  Labels: ramp-up
>
> Currently, we use the same hash seed for partitioning exchanges at the 
> sender. For a table with skew in distribution in the shuffling keys, multiple 
> queries using the same shuffling keys for exchanges will end up hashing to 
> the same destination fragments running on particular host and potentially 
> overloading that host.
> We should consider using the query id or other query specific information to 
> seed the hashing function to randomize the destinations for different 
> queries. Thanks to [~tlipcon] for pointing this problem out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2471) Investigate HS2 Thrift efficiency

2019-10-22 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-2471:
--

Assignee: (was: Michael Ho)

> Investigate HS2 Thrift efficiency
> -
>
> Key: IMPALA-2471
> URL: https://issues.apache.org/jira/browse/IMPALA-2471
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Alan Choi
>Priority: Minor
>
> There are some known performance issues with the current HS2 thrift protocol:
> 1. Large overhead ~60%
> This is unacceptably large.
> 2. Slow network throughput ~10MB/sec only
> Even though the HS2 client is co-located with the server, the throughput is 
> max out at ~10MB/sec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3475) Extend partition key scans to support count(*)

2019-10-22 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-3475:
--

Assignee: (was: Michael Ho)

> Extend partition key scans to support count(*)
> --
>
> Key: IMPALA-3475
> URL: https://issues.apache.org/jira/browse/IMPALA-3475
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> Queries like the one below should be solved entirely from metadata where 
> store_sales is partitioned on ss_sold_date_sk
> {code}
> select ss_sold_date_sk , count(*) from store_sales group by ss_sold_date_sk;
> {code}
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=20.00MB VCores=2 |
> |  |
> | 04:EXCHANGE [UNPARTITIONED]  |
> | ||
> | 03:AGGREGATE [FINALIZE]  |
> | |  output: count:merge(*)|
> | |  group by: ss_sold_date_sk |
> | ||
> | 02:EXCHANGE [HASH(ss_sold_date_sk)]  |
> | ||
> | 01:AGGREGATE [STREAMING] |
> | |  output: count(*)  |
> | |  group by: ss_sold_date_sk |
> | ||
> | 00:SCAN HDFS [tpcds_1000_parquet.store_sales]|
> |partitions=1824/1824 files=1824 size=189.24GB |
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9083) Explore co-partitioning join strategies

2019-10-22 Thread Michael Ho (Jira)
Michael Ho created IMPALA-9083:
--

 Summary: Explore co-partitioning join strategies
 Key: IMPALA-9083
 URL: https://issues.apache.org/jira/browse/IMPALA-9083
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Michael Ho


The idea of co-partitioning join is not new and it has been implemented in 
other systems (see 
[link|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/vldbg/partition-wise-joins.html]
 and 
[link|http://amithora.com/understanding-co-partitions-and-co-grouping-in-spark/]).
 With HDFS, we may not have as much control over the block locations so it's 
not as easy to make assumption about the co-location of partitions of multiple 
tables.

With remote reads (e.g. S3), we don't have this constraint anymore as all data 
are remote. So, if we are joining on the partition key of two tables, we can 
create a scan range schedule to co-locate partitions with the same partition 
values on the same executor and do the join locally to avoid the subsequent 
shuffling step.

The potential downside with this idea is that if there are skews in the data 
distribution, we may overwhelm a few of the nodes in both the scans and join 
operators. Previously, if we have skew in data distribution, we may still 
overwhelm the join operator but the scans can be better parallelized. cc'ing 
[~drorke]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9026) Add an option to use resolved IP address for Statestore subscriber

2019-10-08 Thread Michael Ho (Jira)
Michael Ho created IMPALA-9026:
--

 Summary: Add an option to use resolved IP address for Statestore 
subscriber
 Key: IMPALA-9026
 URL: https://issues.apache.org/jira/browse/IMPALA-9026
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Michael Ho


Currently, statestore subscribers currently register with statestore using 
their hostnames. There may be certain deployment scenarios in which a pod may 
have an IP address but its DNS entry is not available for a valid reason (e.g. 
a Kubernetes pod whose readiness probe returns false). An example could be that 
there could be more than one instances of an Impala component (e.g. 
coordinator) but only one of them will be active at a time and the rest will 
serve as backup. In which case, we still want the backup coordinator to receive 
any updates from statestore but not serve any queries until the primary 
coordinator fails.

To handle the case above, we may allow statestore subscribers to register with 
statestore using its IP address instead of hostname. This may not work with 
some existing secure deployment in which TLS is enabled between Impala hosts as 
there may be mismatch between the hostname used at Thrift layer and the 
certificate so this option is disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9006) Consolidate the Statestore subscriber's retry logic

2019-10-08 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-9006:
---
Description: Currently, a Statestore subscriber starts a separate thread 
after the initial registration with Statestore to periodically check if the 
Statestore may have failed and re-registered with Statestore if necessary. 
Similarly, the function {{StatestoreSubscriber::Register()}} also relies on the 
old Thrift client's retry logic to retry failed RPC attempts to Statestore. 
This is needed as the initial registration relies on this retry logic to wait 
for Statestore to startup in case an Impala daemon starts before the 
Statestore. These two retry paths may be consolidated.   (was: Currently, a 
Statestore subscriber starts a separate thread after the initial registration 
with Statestore to periodically check if the Statestore may have failed and 
re-registered with Statestore if necessary. Similarly, the function 
{{StatestoreSubscriber::Register()}} also relies on the old Thrift client's 
retry logic to retry failed RPC attempts to Statestore. This is needed as the 
initial registration relies on this retry logic to wait for Statestore to 
startup in case an Impala daemon starts before the Statestore. These two retry 
paths may be consolidated. 

Last but not least, the current registration logic at Statestore doesn't check 
if the address provided by the subscriber can actually be resolved. In certain 
deployment scenarios, it's possible that the address passed by a subscriber is 
not yet resolvable (e.g. a Kubernetes pod whose readiness probe failed). 
Statestore should check for if the address is resolvable and fail the 
registration if not. The subscriber can keep retrying until its address can be 
resolved by Statestore. This is particularly useful in configuration where 
readiness probe of a pod in Kubernetes is exploited for a warm backup 
configuration.)

> Consolidate the Statestore subscriber's retry logic
> ---
>
> Key: IMPALA-9006
> URL: https://issues.apache.org/jira/browse/IMPALA-9006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> Currently, a Statestore subscriber starts a separate thread after the initial 
> registration with Statestore to periodically check if the Statestore may have 
> failed and re-registered with Statestore if necessary. Similarly, the 
> function {{StatestoreSubscriber::Register()}} also relies on the old Thrift 
> client's retry logic to retry failed RPC attempts to Statestore. This is 
> needed as the initial registration relies on this retry logic to wait for 
> Statestore to startup in case an Impala daemon starts before the Statestore. 
> These two retry paths may be consolidated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9006) Consolidate the Statestore subscriber's retry logic

2019-10-08 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-9006:
---
Priority: Minor  (was: Major)

> Consolidate the Statestore subscriber's retry logic
> ---
>
> Key: IMPALA-9006
> URL: https://issues.apache.org/jira/browse/IMPALA-9006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Minor
>
> Currently, a Statestore subscriber starts a separate thread after the initial 
> registration with Statestore to periodically check if the Statestore may have 
> failed and re-registered with Statestore if necessary. Similarly, the 
> function {{StatestoreSubscriber::Register()}} also relies on the old Thrift 
> client's retry logic to retry failed RPC attempts to Statestore. This is 
> needed as the initial registration relies on this retry logic to wait for 
> Statestore to startup in case an Impala daemon starts before the Statestore. 
> These two retry paths may be consolidated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9006) Consolidate the Statestore subscriber's retry logic

2019-10-03 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-9006:
---
Description: 
Currently, a Statestore subscriber starts a separate thread after the initial 
registration with Statestore to periodically check if the Statestore may have 
failed and re-registered with Statestore if necessary. Similarly, the function 
{{StatestoreSubscriber::Register()}} also relies on the old Thrift client's 
retry logic to retry failed RPC attempts to Statestore. This is needed as the 
initial registration relies on this retry logic to wait for Statestore to 
startup in case an Impala daemon starts before the Statestore. These two retry 
paths may be consolidated. 

Last but not least, the current registration logic at Statestore doesn't check 
if the address provided by the subscriber can actually be resolved. In certain 
deployment scenarios, it's possible that the address passed by a subscriber is 
not yet resolvable (e.g. a Kubernetes pod whose readiness probe failed). 
Statestore should check for if the address is resolvable and fail the 
registration if not. The subscriber can keep retrying until its address can be 
resolved by Statestore. This is particularly useful in configuration where 
readiness probe of a pod in Kubernetes is exploited for a warm backup 
configuration.

  was:
Currently, a Statestore subscriber starts a separate thread after the initial 
registration with Statestore to periodically check if the Statestore may have 
failed and re-registered with Statestore if necessary. Similarly, the function 
{{StatestoreSubscriber::Register()}} also relies on the old Thrift client's 
retry logic to retry failed RPC attempts to Statestore. This is needed as the 
initial registration relies on this retry logic to wait for Statestore to 
startup in case an Impala daemon starts before the Statestore.

Last but not least, the current registration logic at Statestore doesn't check 
if the address provided by the subscriber can actually be resolved. In certain 
deployment scenarios, it's possible that the address passed by a subscriber is 
not yet resolvable (e.g. a Kubernetes pod whose readiness probe failed). 
Statestore should check for if the address is resolvable and fail the 
registration if not. The subscriber can keep retrying until its address can be 
resolved by Statestore. This is particularly useful in configuration where 
readiness probe of a pod in Kubernetes is exploited for a warm backup 
configuration.


> Consolidate the Statestore subscriber's retry logic
> ---
>
> Key: IMPALA-9006
> URL: https://issues.apache.org/jira/browse/IMPALA-9006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> Currently, a Statestore subscriber starts a separate thread after the initial 
> registration with Statestore to periodically check if the Statestore may have 
> failed and re-registered with Statestore if necessary. Similarly, the 
> function {{StatestoreSubscriber::Register()}} also relies on the old Thrift 
> client's retry logic to retry failed RPC attempts to Statestore. This is 
> needed as the initial registration relies on this retry logic to wait for 
> Statestore to startup in case an Impala daemon starts before the Statestore. 
> These two retry paths may be consolidated. 
> Last but not least, the current registration logic at Statestore doesn't 
> check if the address provided by the subscriber can actually be resolved. In 
> certain deployment scenarios, it's possible that the address passed by a 
> subscriber is not yet resolvable (e.g. a Kubernetes pod whose readiness probe 
> failed). Statestore should check for if the address is resolvable and fail 
> the registration if not. The subscriber can keep retrying until its address 
> can be resolved by Statestore. This is particularly useful in configuration 
> where readiness probe of a pod in Kubernetes is exploited for a warm backup 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9006) Consolidate the Statestore subscriber's retry logic

2019-10-03 Thread Michael Ho (Jira)
Michael Ho created IMPALA-9006:
--

 Summary: Consolidate the Statestore subscriber's retry logic
 Key: IMPALA-9006
 URL: https://issues.apache.org/jira/browse/IMPALA-9006
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Michael Ho


Currently, a Statestore subscriber starts a separate thread after the initial 
registration with Statestore to periodically check if the Statestore may have 
failed and re-registered with Statestore if necessary. Similarly, the function 
{{StatestoreSubscriber::Register()}} also relies on the old Thrift client's 
retry logic to retry failed RPC attempts to Statestore. This is needed as the 
initial registration relies on this retry logic to wait for Statestore to 
startup in case an Impala daemon starts before the Statestore.

Last but not least, the current registration logic at Statestore doesn't check 
if the address provided by the subscriber can actually be resolved. In certain 
deployment scenarios, it's possible that the address passed by a subscriber is 
not yet resolvable (e.g. a Kubernetes pod whose readiness probe failed). 
Statestore should check for if the address is resolvable and fail the 
registration if not. The subscriber can keep retrying until its address can be 
resolved by Statestore. This is particularly useful in configuration where 
readiness probe of a pod in Kubernetes is exploited for a warm backup 
configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8960) test_drop_if_exists fails on S3 due to incomplete URI

2019-10-03 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8960.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

Thanks for the fix.

> test_drop_if_exists fails on S3 due to incomplete URI
> -
>
> Key: IMPALA-8960
> URL: https://issues.apache.org/jira/browse/IMPALA-8960
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> Error Message
> {noformat}
> ImpalaBeeswaxException: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}
> Stacktrace
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.soStacktrace
> authorization/test_owner_privileges.py:137: in test_drop_if_exists
> self._setup_drop_if_exist_test(unique_database, test_db)
> authorization/test_owner_privileges.py:172: in _setup_drop_if_exist_test
> self.execute_query("grant all on uri 
> 'hdfs:///test-warehouse/libTestUdfs.so' to"
> common/impala_test_suite.py:751: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:782: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:853: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so
> E   CAUSED BY: IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8996) test_show_create_table in test_zorder.py failed

2019-10-01 Thread Michael Ho (Jira)
Michael Ho created IMPALA-8996:
--

 Summary: test_show_create_table in test_zorder.py failed
 Key: IMPALA-8996
 URL: https://issues.apache.org/jira/browse/IMPALA-8996
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Norbert Luksa


test_show_create_table in the newly added test_zorder.py failed due to some 
unexpected mismatches in table properties. The test was introduced in this 
[change|https://gerrit.cloudera.org/#/c/13955/].

{noformat}
Error Message
assert {} == {'OBJCAPABILITIES': 'EXTREAD,EXTWRITE'}   Right contains more 
items:   {'OBJCAPABILITIES': 'EXTREAD,EXTWRITE'}   Full diff:   - {}   + 
{'OBJCAPABILITIES': 'EXTREAD,EXTWRITE'}
Stacktrace
custom_cluster/test_zorder.py:120: in test_show_create_table
unique_database)
custom_cluster/test_zorder.py:168: in __run_show_create_table_test_case
self.__compare_result(expected_result, create_table_result)
custom_cluster/test_zorder.py:196: in __compare_result
assert expected_tbl_props == actual_tbl_props
E   assert {} == {'OBJCAPABILITIES': 'EXTREAD,EXTWRITE'}
E Right contains more items:
E {'OBJCAPABILITIES': 'EXTREAD,EXTWRITE'}
E Full diff:
E - {}
E + {'OBJCAPABILITIES': 'EXTREAD,EXTWRITE'}
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky

2019-10-01 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942283#comment-16942283
 ] 

Michael Ho commented on IMPALA-8926:


Bumping priority as a number of regular regression builds are affected. May be 
worth considering disabling the test to unbreak the build for now.

> TestResultSpooling::_test_full_queue is flaky
> -
>
> Key: IMPALA-8926
> URL: https://issues.apache.org/jira/browse/IMPALA-8926
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
>
> Has happened a few times, error message is:
> {code:java}
> query_test/test_result_spooling.py:116: in test_full_queue_large_fetch 
> self._test_full_queue(vector, query, fetch_size=num_rows) 
> query_test/test_result_spooling.py:148: in _test_full_queue assert 
> re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E  
>  assert None is not None E+  where None =  0x7f35f0aee320>('RowBatchSendWaitTime: [1-9]', 'Query 
> (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n') E+where  = re.search E 
>+and   'Query (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE 
> WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n   
>   - WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
> WriteIoWaitTime: 0.000ns\n' =  BeeswaxConnection.get_runtime_profile of 
>  0xcde2310>>( 0xcdf0810>) E+  where  BeeswaxConnection.get_runtime_profile of 
> > = 
>  0xcde2310>.get_runtime_profile E+where 
>  = 
> .client {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky

2019-10-01 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8926:
---
Priority: Critical  (was: Major)

> TestResultSpooling::_test_full_queue is flaky
> -
>
> Key: IMPALA-8926
> URL: https://issues.apache.org/jira/browse/IMPALA-8926
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
>
> Has happened a few times, error message is:
> {code:java}
> query_test/test_result_spooling.py:116: in test_full_queue_large_fetch 
> self._test_full_queue(vector, query, fetch_size=num_rows) 
> query_test/test_result_spooling.py:148: in _test_full_queue assert 
> re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E  
>  assert None is not None E+  where None =  0x7f35f0aee320>('RowBatchSendWaitTime: [1-9]', 'Query 
> (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n') E+where  = re.search E 
>+and   'Query (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE 
> WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n   
>   - WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
> WriteIoWaitTime: 0.000ns\n' =  BeeswaxConnection.get_runtime_profile of 
>  0xcde2310>>( 0xcdf0810>) E+  where  BeeswaxConnection.get_runtime_profile of 
> > = 
>  0xcde2310>.get_runtime_profile E+where 
>  = 
> .client {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8991) Data loading failed in Hive with error initializing MapReduce cluster

2019-09-30 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8991:
---
Summary: Data loading failed in Hive with error initializing MapReduce 
cluster  (was: Data loading due to failure initializing MapReduce cluster)

> Data loading failed in Hive with error initializing MapReduce cluster
> -
>
> Key: IMPALA-8991
> URL: https://issues.apache.org/jira/browse/IMPALA-8991
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Michael Ho
>Priority: Major
>
> Data loading with {noformat} insert into table functional_seq_gzip.a 
> ^Mlltypesaggmultifilesnopart SELECT id, bool_col, tinyint_col, smallint_col, 
> int_c ^Mol, bigint_col, float_col, double_col, date_string_col, string_col, 
> timestamp_co ^Ml FROM functional.alltypesaggmultifilesnopa
> rt where id % 4 = 2; {noformat} failed in Hive with unexpected exception:
> {noformat}
> java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:109)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:102)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:454)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:402)
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.io.IOException: Failed to use 
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> at 
> org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:148)
> ... 25 more
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics 
> source LocalJobRunnerMetrics-1803220940 already exists!
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at 
> org.apache.hadoop.mapred.LocalJobRunnerMetrics.create(LocalJobRunnerMetrics.java:46)
> at 
> org.apache.hadoop.mapred.LocalJobRunner.(LocalJobRunner.java:777)
> at 
> org.apache.hadoop.mapred.LocalJobRunner.(LocalJobRunner.java:770)
> at 
> org.apache.hadoop.mapred.LocalClientProtocolProvider.create(LocalClientProtocolProvider.java:42)
> at 
> org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:130)
> ... 25 more
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job Submission failed with exception 'java.io.IOException(Cannot initialize 
> Cluster. Please check your configuration for mapreduce.framework.name and the 
> correspond server addresses.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.had

[jira] [Created] (IMPALA-8991) Data loading due to failure initializing MapReduce cluster

2019-09-30 Thread Michael Ho (Jira)
Michael Ho created IMPALA-8991:
--

 Summary: Data loading due to failure initializing MapReduce cluster
 Key: IMPALA-8991
 URL: https://issues.apache.org/jira/browse/IMPALA-8991
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Michael Ho


Data loading with {noformat} insert into table functional_seq_gzip.a 
^Mlltypesaggmultifilesnopart SELECT id, bool_col, tinyint_col, smallint_col, 
int_c ^Mol, bigint_col, float_col, double_col, date_string_col, string_col, 
timestamp_co ^Ml FROM functional.alltypesaggmultifilesnopa
rt where id % 4 = 2; {noformat} failed in Hive with unexpected exception:

{noformat}
java.io.IOException: Cannot initialize Cluster. Please check your configuration 
for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:109)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:102)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:454)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:402)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.io.IOException: Failed to use 
org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
at 
org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:148)
... 25 more
Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
LocalJobRunnerMetrics-1803220940 already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at 
org.apache.hadoop.mapred.LocalJobRunnerMetrics.create(LocalJobRunnerMetrics.java:46)
at 
org.apache.hadoop.mapred.LocalJobRunner.(LocalJobRunner.java:777)
at 
org.apache.hadoop.mapred.LocalJobRunner.(LocalJobRunner.java:770)
at 
org.apache.hadoop.mapred.LocalClientProtocolProvider.create(LocalClientProtocolProvider.java:42)
at 
org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:130)
... 25 more
Number of reduce tasks is set to 0 since there's no reduce operator
Job Submission failed with exception 'java.io.IOException(Cannot initialize 
Cluster. Please check your configuration for mapreduce.framework.name and the 
correspond server addresses.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Cannot initialize Cluster. Please 
check your configuration for mapreduce.framework.name and the correspond server 
addresses.
{noformat}

cc'ing [~stakiar], [~vihangk1] and [~joemcdonnell]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8960) test_drop_if_exists fails on S3 due to incomplete URI

2019-09-30 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8960:
--

Assignee: Joe McDonnell  (was: Vihang Karajgaonkar)

> test_drop_if_exists fails on S3 due to incomplete URI
> -
>
> Key: IMPALA-8960
> URL: https://issues.apache.org/jira/browse/IMPALA-8960
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Joe McDonnell
>Priority: Critical
>
> Error Message
> {noformat}
> ImpalaBeeswaxException: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}
> Stacktrace
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.soStacktrace
> authorization/test_owner_privileges.py:137: in test_drop_if_exists
> self._setup_drop_if_exist_test(unique_database, test_db)
> authorization/test_owner_privileges.py:172: in _setup_drop_if_exist_test
> self.execute_query("grant all on uri 
> 'hdfs:///test-warehouse/libTestUdfs.so' to"
> common/impala_test_suite.py:751: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:782: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:853: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so
> E   CAUSED BY: IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8990) TestAdmissionController.test_set_request_pool seems flaky

2019-09-30 Thread Michael Ho (Jira)
Michael Ho created IMPALA-8990:
--

 Summary: TestAdmissionController.test_set_request_pool seems flaky
 Key: IMPALA-8990
 URL: https://issues.apache.org/jira/browse/IMPALA-8990
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Bikramjeet Vig


Expected query error didn't occur. Happened once so far. [~bikram], can you 
please take a look ?

{noformat}
Error Message
AssertionError: Query should return error assert False
Stacktrace
hs2/hs2_test_suite.py:63: in add_session
lambda: fn(self))
hs2/hs2_test_suite.py:44: in add_session_helper
fn()
hs2/hs2_test_suite.py:63: in 
lambda: fn(self))
custom_cluster/test_admission_controller.py:312: in test_set_request_pool
self.__check_pool_rejected(client, 'root.queueA', "exceeded timeout")
custom_cluster/test_admission_controller.py:195: in __check_pool_rejected
assert False, "Query should return error"
E   AssertionError: Query should return error
E   assert False
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8989) TestAdmissionController.test_release_backend is flaky

2019-09-30 Thread Michael Ho (Jira)
Michael Ho created IMPALA-8989:
--

 Summary: TestAdmissionController.test_release_backend is flaky
 Key: IMPALA-8989
 URL: https://issues.apache.org/jira/browse/IMPALA-8989
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Sahil Takiar


Test seems to have failed due to a mismatch in number of completed backend. 
Only failed once so far.

{noformat}
Error Message
assert 'NumCompletedBackends: 1 (1)' in 'Query 
(id=c341f85f14c3d981:58274849):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil... - OptimizationTime: 372.020ms\n   
- PeakMemoryUsage: 528.00 KB (540672)\n   - PrepareTime: 39.002ms\n'  + 
 where 'Query (id=c341f85f14c3d981:58274849):\n  DEBUG MODE WARNING: 
Query profile created while running a DEBUG buil... - OptimizationTime: 
372.020ms\n   - PeakMemoryUsage: 528.00 KB (540672)\n   - 
PrepareTime: 39.002ms\n' = >()  +where > = 
.get_runtime_profile  +  where 
 = 
.client
Stacktrace
custom_cluster/test_admission_controller.py:1338: in test_release_backends
assert "NumCompletedBackends: 1 (1)" in 
self.client.get_runtime_profile(handle)
E   assert 'NumCompletedBackends: 1 (1)' in 'Query 
(id=c341f85f14c3d981:58274849):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil... - OptimizationTime: 372.020ms\n   
- PeakMemoryUsage: 528.00 KB (540672)\n   - PrepareTime: 39.002ms\n'
E+  where 'Query (id=c341f85f14c3d981:58274849):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil... - 
OptimizationTime: 372.020ms\n   - PeakMemoryUsage: 528.00 KB (540672)\n 
  - PrepareTime: 39.002ms\n' = >()
E+where > = 
.get_runtime_profile
E+  where  = .client
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7864) TestLocalCatalogRetries::test_replan_limit is flaky

2019-09-30 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-7864:
--

Assignee: Vihang Karajgaonkar  (was: Bharath Vissapragada)

> TestLocalCatalogRetries::test_replan_limit is flaky
> ---
>
> Key: IMPALA-7864
> URL: https://issues.apache.org/jira/browse/IMPALA-7864
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.0, Impala 2.12.0
> Environment: Ubuntu 16.04
>Reporter: Jim Apple
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: broken-build, catalog-v2, flaky
> Fix For: Impala 3.2.0
>
>
> In https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3605/, 
> TestLocalCatalogRetries::test_replan_limit failed on an unrelated patch. On 
> my development machine, the test passed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky

2019-09-30 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941247#comment-16941247
 ] 

Michael Ho commented on IMPALA-8926:


Hi [~stakiar], just hit another failure in a build with the latest commit for 
this JIRA. Can you please take a look ?

{noformat}
Error Message
query_test/test_result_spooling.py:117: in test_full_queue_large_fetch 
self._test_full_queue(vector, query, fetch_size=num_rows) 
query_test/test_result_spooling.py:154: in _test_full_queue assert 
re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) E   
assert None E+  where None = ('RowBatchSendWaitTime: [1-9]', 'Query 
(id=7845739f7afbf276:0ca527e1):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
0.000ns\n') E+where  = re.search E   
 +and   'Query (id=7845739f7afbf276:0ca527e1):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n 
- WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
WriteIoWaitTime: 0.000ns\n' = >() E+  where > = 
.get_runtime_profile E+where 
 = 
.client
Stacktrace
query_test/test_result_spooling.py:117: in test_full_queue_large_fetch
self._test_full_queue(vector, query, fetch_size=num_rows)
query_test/test_result_spooling.py:154: in _test_full_queue
assert re.search(send_wait_time_regex, 
self.client.get_runtime_profile(handle))
E   assert None
E+  where None = ('RowBatchSendWaitTime: 
[1-9]', 'Query (id=7845739f7afbf276:0ca527e1):\n  DEBUG MODE WARNING: 
Query profile created while running a DEBUG buil...: 0.000ns\n - 
WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
WriteIoWaitTime: 0.000ns\n')
E+where  = re.search
E+and   'Query (id=7845739f7afbf276:0ca527e1):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n 
- WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
WriteIoWaitTime: 0.000ns\n' = >()
E+  where > = 
.get_runtime_profile
E+where  = .client
{noformat}

> TestResultSpooling::_test_full_queue is flaky
> -
>
> Key: IMPALA-8926
> URL: https://issues.apache.org/jira/browse/IMPALA-8926
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Has happened a few times, error message is:
> {code:java}
> query_test/test_result_spooling.py:116: in test_full_queue_large_fetch 
> self._test_full_queue(vector, query, fetch_size=num_rows) 
> query_test/test_result_spooling.py:148: in _test_full_queue assert 
> re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E  
>  assert None is not None E+  where None =  0x7f35f0aee320>('RowBatchSendWaitTime: [1-9]', 'Query 
> (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
> 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
> 0.000ns\n') E+where  = re.search E 
>+and   'Query (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE 
> WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n   
>   - WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
> WriteIoWaitTime: 0.000ns\n' =  BeeswaxConnection.get_runtime_profile of 
>  0xcde2310>>( 0xcdf0810>) E+  where  BeeswaxConnection.get_runtime_profile of 
> > = 
>  0xcde2310>.get_runtime_profile E+where 
>  = 
> .client {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8960) test_drop_if_exists fails on S3 due to incomplete URI

2019-09-30 Thread Michael Ho (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941216#comment-16941216
 ] 

Michael Ho commented on IMPALA-8960:


Hit this in two different regular test runs recently. Bumping priority a bit. 
cc'ing [~joemcdonnell]

> test_drop_if_exists fails on S3 due to incomplete URI
> -
>
> Key: IMPALA-8960
> URL: https://issues.apache.org/jira/browse/IMPALA-8960
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> Error Message
> {noformat}
> ImpalaBeeswaxException: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}
> Stacktrace
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.soStacktrace
> authorization/test_owner_privileges.py:137: in test_drop_if_exists
> self._setup_drop_if_exist_test(unique_database, test_db)
> authorization/test_owner_privileges.py:172: in _setup_drop_if_exist_test
> self.execute_query("grant all on uri 
> 'hdfs:///test-warehouse/libTestUdfs.so' to"
> common/impala_test_suite.py:751: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:782: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:853: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so
> E   CAUSED BY: IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8960) test_drop_if_exists fails on S3 due to incomplete URI

2019-09-30 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8960:
---
Priority: Critical  (was: Major)

> test_drop_if_exists fails on S3 due to incomplete URI
> -
>
> Key: IMPALA-8960
> URL: https://issues.apache.org/jira/browse/IMPALA-8960
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> Error Message
> {noformat}
> ImpalaBeeswaxException: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}
> Stacktrace
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: AnalysisException: Incomplete 
> HDFS URI, no host: hdfs:///test-warehouse/libTestUdfs.so CAUSED BY: 
> IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.soStacktrace
> authorization/test_owner_privileges.py:137: in test_drop_if_exists
> self._setup_drop_if_exist_test(unique_database, test_db)
> authorization/test_owner_privileges.py:172: in _setup_drop_if_exist_test
> self.execute_query("grant all on uri 
> 'hdfs:///test-warehouse/libTestUdfs.so' to"
> common/impala_test_suite.py:751: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:782: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:853: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so
> E   CAUSED BY: IOException: Incomplete HDFS URI, no host: 
> hdfs:///test-warehouse/libTestUdfs.so{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2019-09-30 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-7733:
--

Assignee: Lenisha Gandhi  (was: Tianyi Wang)

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Lenisha Gandhi
>Priority: Blocker
>  Labels: broken-build, flaky
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8983) TestExecutorGroups.test_max_concurrent_queries seems flaky

2019-09-27 Thread Michael Ho (Jira)
Michael Ho created IMPALA-8983:
--

 Summary: TestExecutorGroups.test_max_concurrent_queries seems flaky
 Key: IMPALA-8983
 URL: https://issues.apache.org/jira/browse/IMPALA-8983
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Michael Ho
Assignee: Lars Volker


It appears the test failed because an expected query admission failure didn't 
happen. Happened once only so far.

{noformat}
Error Message
assert 'Initial admission queue reason: No query slot available on host' in 
'Query (id=4a4773e2f93eff7f:754a96ed):\n  DEBUG MODE WARNING: Query 
profile created while running a DEBUG buil...0)\n - 
NumRowsFetchedFromCache: 0 (0)\n - RowMaterializationRate: 0\n - 
RowMaterializationTimer: 0.000ns\n'
Stacktrace
custom_cluster/test_executor_groups.py:212: in test_max_concurrent_queries
assert "Initial admission queue reason: No query slot available on host" in 
profile
E   assert 'Initial admission queue reason: No query slot available on host' in 
'Query (id=4a4773e2f93eff7f:754a96ed):\n  DEBUG MODE WARNING: Query 
profile created while running a DEBUG buil...0)\n - 
NumRowsFetchedFromCache: 0 (0)\n - RowMaterializationRate: 0\n - 
RowMaterializationTimer: 0.000ns\n'
Standard Error
-- 2019-09-26 13:41:15,462 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=1 --num_coordinators=1 
--log_dir=/data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/logs/custom_cluster_tests
 --log_level=1 --use_exclusive_coordinators '--impalad_args= 
-executor_groups=coordinator ' --impalad_args=--default_query_options=
13:41:15 MainThread: Starting impala cluster without executors
13:41:16 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
13:41:16 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
13:41:16 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
13:41:16 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
13:41:19 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
13:41:19 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
13:41:19 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-r4-4xlarge-ondemand-0593.vpc.cloudera.com:25000
13:41:19 MainThread: Debug webpage not yet available: ('Connection aborted.', 
error(111, 'Connection refused'))
13:41:21 MainThread: Debug webpage did not become available in expected time.
13:41:21 MainThread: Waiting for num_known_live_backends=1. Current value: None
13:41:22 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
13:41:22 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-r4-4xlarge-ondemand-0593.vpc.cloudera.com:25000
13:41:22 MainThread: num_known_live_backends has reached value: 1
13:41:22 MainThread: Impala Cluster Running with 1 nodes (1 coordinators, 0 
executors).
-- 2019-09-26 13:41:22,976 DEBUGMainThread: Found 1 impalad/1 statestored/1 
catalogd process(es)
-- 2019-09-26 13:41:22,976 INFO MainThread: Getting metric: 
statestore.live-backends from 
impala-ec2-centos74-r4-4xlarge-ondemand-0593.vpc.cloudera.com:25010
-- 2019-09-26 13:41:22,977 INFO MainThread: Starting new HTTP connection 
(1): impala-ec2-centos74-r4-4xlarge-ondemand-0593.vpc.cloudera.com
-- 2019-09-26 13:41:22,979 INFO MainThread: Metric 
'statestore.live-backends' has reached desired value: 2
-- 2019-09-26 13:41:22,980 DEBUGMainThread: Getting num_known_live_backends 
from impala-ec2-centos74-r4-4xlarge-ondemand-0593.vpc.cloudera.com:25000
-- 2019-09-26 13:41:22,980 INFO MainThread: Starting new HTTP connection 
(1): impala-ec2-centos74-r4-4xlarge-ondemand-0593.vpc.cloudera.com
-- 2019-09-26 13:41:22,982 INFO MainThread: num_known_live_backends has 
reached value: 1
SET 
client_identifier=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_max_concurrent_queries;
-- connecting to: localhost:21000
-- connecting to localhost:21050 with impyla
-- 2019-09-26 13:41:23,170 INFO MainThread: Closing active operation
-- 2019-09-26 13:41:23,172 INFO MainThread: Adding 2 executors to group 
default-pool-group1 with minimum size 2
-- 2019-09-26 13:41:23,172 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/start-impala-cluster.py
 '--sta

[jira] [Assigned] (IMPALA-6788) Abort ExecFInstance() RPC loop early after query failure

2019-08-19 Thread Michael Ho (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-6788:
--

Assignee: Thomas Tauber-Marshall

> Abort ExecFInstance() RPC loop early after query failure
> 
>
> Key: IMPALA-6788
> URL: https://issues.apache.org/jira/browse/IMPALA-6788
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: krpc, rpc
> Attachments: connect_thread_busy_queries_failing.txt, 
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then 
> once the startup completes the query is cancelled, this is because one of the 
> intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, 
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() 
> query_id=334cc7dd9758c36c:ec38aeb4 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() 
> query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() 
> query_id=334cc7dd9758c36c:ec38aeb4, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control 
> resources for query_id=334cc7dd9758c36c:ec38aeb4
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() 
> query_id=e44d553b04d47cfb:28f06bb8 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() 
> query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() 
> query_id=e44d553b04d47cfb:28f06bb8, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control 
> resources for query_id=e44d553b04d47cfb:28f06bb8
> {code}
> Checked the coordinator and threads appear to be spending lots of time 
> waiting on exec_complete_barrier_
> {code}
> #0  0x7fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x01222944 in impala::Promise::Get() ()
> #2  0x01220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x01221c87 in impala::Coordinator::Exec() ()
> #4  0x00c3a925 in 
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest 
> const&) ()
> #5  0x00c41f7e in 
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x00bff597 in 
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, 
> std::shared_ptr, bool*, 
> std::shared_ptr*) ()
> #7  0x00c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, 
> std::shared_ptr, 
> std::shared_ptr*) ()
> #8  0x00c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, 
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x00d60c9a in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-13 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8845:
--

Assignee: Michael Ho  (was: Sahil Takiar)

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Michael Ho
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-13 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906607#comment-16906607
 ] 

Michael Ho commented on IMPALA-8845:


I spent more time looking into the issue after managing to reproduce it 
locally. Apparently, the problem was that the merging-exchange hit eos so it 
stopped dequeuing from the data stream receiver. It could happen that some of 
the senders' batches were placed in the deferred queue due to capacity limit in 
the receiver. Consequently, the fact that the merging-exchange stops dequeuing 
means that the deferred batch will not be dequeued until the receiver is 
cancelled and closed. The sender would block waiting in {{TransmitData()}} rpc 
forever.

So, it's not a case of IMPALA-3990 as far as I can tell.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-08-12 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905619#comment-16905619
 ] 

Michael Ho commented on IMPALA-8712:


Please also see https://issues.apache.org/jira/browse/IMPALA-4475

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905597#comment-16905597
 ] 

Michael Ho commented on IMPALA-8845:


Oops.. looks like Sahil already updated the JIRA with the same observation. 
Didn't mean to post the same thing above but the observation is the same.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905594#comment-16905594
 ] 

Michael Ho edited comment on IMPALA-8845 at 8/12/19 9:31 PM:
-

{quote} I haven't traced the issue exactly, but what I think is happening is 
that the MERGING-EXCHANGE operator in the coordinator fragment hits eos 
whenever it has received enough rows to reach the limit defined in the query, 
which could occur before the DATASTREAM SINK sends all the rows from the TopN / 
Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the 
Merging-Exchange has been closed already and the other fragment instance is 
stuck in an RPC call. Usually, when the receiving fragment is closed, it will 
be put into a "closed receiver cache". Incoming traffic will probe against this 
cache and notices that it's closed already and short-circuits the reply to the 
DataStreamSender. At which point, the DataStreamSender should skip issuing the 
RPC (see [code here| 
https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411
 ] However, there is an expiration time (5 minutes) for entries in the cache so 
eventually expired entries will be removed. Traffic arriving for that receiver 
may be stuck for {{--datastream_sender_timeout_ms}} before returning with an 
error.

I probably need to look at the log to confirm whether the latter case is what's 
happening there. Please also see 
https://issues.apache.org/jira/browse/IMPALA-6818




was (Author: kwho):
{quote} I haven't traced the issue exactly, but what I think is happening is 
that the MERGING-EXCHANGE operator in the coordinator fragment hits eos 
whenever it has received enough rows to reach the limit defined in the query, 
which could occur before the DATASTREAM SINK sends all the rows from the TopN / 
Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the 
Merging-Exchange has been closed already and the other fragment instance is 
stuck in an RPC call. Usually, when the receiving fragment is closed, it will 
be put into a "closed receiver cache". Incoming traffic will probe against this 
cache and notices that it's closed already and short-circuits the reply to the 
DataStreamSender. At which point, the DataStreamSender should skip issuing the 
RPC (see [code here| 
https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411
 ] However, there is an expiration time (5 minutes) for entries in the cache so 
eventually expired entries will be removed. Traffic arriving for that receiver 
may be stuck for {{--datastream_sender_timeout_ms}} before returning with an 
error.

That said, if the DataStreamSender manages to 



> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905594#comment-16905594
 ] 

Michael Ho commented on IMPALA-8845:


{quote} I haven't traced the issue exactly, but what I think is happening is 
that the MERGING-EXCHANGE operator in the coordinator fragment hits eos 
whenever it has received enough rows to reach the limit defined in the query, 
which could occur before the DATASTREAM SINK sends all the rows from the TopN / 
Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the 
Merging-Exchange has been closed already and the other fragment instance is 
stuck in an RPC call. Usually, when the receiving fragment is closed, it will 
be put into a "closed receiver cache". Incoming traffic will probe against this 
cache and notices that it's closed already and short-circuits the reply to the 
DataStreamSender. At which point, the DataStreamSender should skip issuing the 
RPC (see [code here| 
https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411
 ] However, there is an expiration time (5 minutes) for entries in the cache so 
eventually expired entries will be removed. Traffic arriving for that receiver 
may be stuck for {{--datastream_sender_timeout_ms}} before returning with an 
error.

That said, if the DataStreamSender manages to 



> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8829) Document limitation of parsing memory string

2019-08-02 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899299#comment-16899299
 ] 

Michael Ho commented on IMPALA-8829:


May also need to update any existing doc with "TB" in the example.

> Document limitation of parsing memory string
> 
>
> Key: IMPALA-8829
> URL: https://issues.apache.org/jira/browse/IMPALA-8829
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>
> During review of https://gerrit.cloudera.org/#/c/13986/, [~tarmstrong] found 
> that {{ParseUtil::ParseMemSpec()}} doesn't support parsing strings with "TB" 
> in it. We may want to document this limitation in older version so that users 
> won't specify "TB" in startup flags. Off the top of my head, the scratch 
> space and data cache are probably affected. Any memory limits related flags 
> may also be affected.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8829) Document limitation of parsing memory string

2019-08-02 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8829:
--

 Summary: Document limitation of parsing memory string
 Key: IMPALA-8829
 URL: https://issues.apache.org/jira/browse/IMPALA-8829
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Affects Versions: Impala 3.2.0, Impala 3.1.0, Impala 3.3.0
Reporter: Michael Ho
Assignee: Alex Rodoni


During review of https://gerrit.cloudera.org/#/c/13986/, [~tarmstrong] found 
that {{ParseUtil::ParseMemSpec(}} doesn't support parsing strings with "TB" in 
it. We may want to document this limitation in older version so that users 
won't specify "TB" in startup flags. Off the top of my head, the scratch space 
and data cache are probably affected. Any memory limits related flags may also 
be affected.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8829) Document limitation of parsing memory string

2019-08-02 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8829:
---
Description: During review of https://gerrit.cloudera.org/#/c/13986/, 
[~tarmstrong] found that {{ParseUtil::ParseMemSpec()}} doesn't support parsing 
strings with "TB" in it. We may want to document this limitation in older 
version so that users won't specify "TB" in startup flags. Off the top of my 
head, the scratch space and data cache are probably affected. Any memory limits 
related flags may also be affected.  (was: During review of 
https://gerrit.cloudera.org/#/c/13986/, [~tarmstrong] found that 
{{ParseUtil::ParseMemSpec(}} doesn't support parsing strings with "TB" in it. 
We may want to document this limitation in older version so that users won't 
specify "TB" in startup flags. Off the top of my head, the scratch space and 
data cache are probably affected. Any memory limits related flags may also be 
affected.)

> Document limitation of parsing memory string
> 
>
> Key: IMPALA-8829
> URL: https://issues.apache.org/jira/browse/IMPALA-8829
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>
> During review of https://gerrit.cloudera.org/#/c/13986/, [~tarmstrong] found 
> that {{ParseUtil::ParseMemSpec()}} doesn't support parsing strings with "TB" 
> in it. We may want to document this limitation in older version so that users 
> won't specify "TB" in startup flags. Off the top of my head, the scratch 
> space and data cache are probably affected. Any memory limits related flags 
> may also be affected.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8403) Possible thread leak in impalad

2019-08-01 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898339#comment-16898339
 ] 

Michael Ho edited comment on IMPALA-8403 at 8/1/19 8:29 PM:


This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of Thrift connection 
threads for backend service grows.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services and 
this problem should no longer happen.


was (Author: kwho):
This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of Thrift connection 
threads for backend service grows.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8403) Possible thread leak in impalad

2019-08-01 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898339#comment-16898339
 ] 

Michael Ho edited comment on IMPALA-8403 at 8/1/19 8:28 PM:


This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of Thrift connection 
threads for backend service grows.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.


was (Author: kwho):
This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of backend threads grow.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8403) Possible thread leak in impalad

2019-08-01 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898339#comment-16898339
 ] 

Michael Ho commented on IMPALA-8403:


This is most likely due to Thrift connections accumulated in the client cache 
of various other Impalads over time. These connections aren't being closed. 
Each connection on the client side will correspond to a thread on the server 
side and there is no limit enforced on the number of these Thrift connection 
threads for "backend" service. Overtime, the number of backend threads grow.

We have converted quite a number of backend service to KRPC, including the 
biggest offenders (e.g. {{TransmitData()}}, {{ReportExecStatus()}}). Once 
IMPALA-7984 is fixed, we can remove the Thrift server for backend services.

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8775) Have the option to delete data cache files on Impala shutdown

2019-07-19 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8775:
---
Priority: Major  (was: Critical)

> Have the option to delete data cache files on Impala shutdown
> -
>
> Key: IMPALA-8775
> URL: https://issues.apache.org/jira/browse/IMPALA-8775
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> Currently, Impala will delete old data cache files upon restart but it only 
> does so when data cache is enabled. However, if the user turns off the data 
> cache after restart, the old data cache files may be left hanging around 
> until the next restart with the cache enabled. We should have an option to 
> delete the data cache files on Impala shutdown. The initial implementation of 
> the data cache has the unlink-on-create behavior but it was confusing to 
> users as there will be unaccounted usage on the storage side (as the file is 
> not easily visible after unlink). We may want to consider support this 
> "unlink-on-create" behavior behind a flag in case users prefer to have the 
> data cache files removed upon Impala's exit. In the long run, we probably 
> want to have an "exit handler" in Impala to make sure resources are freed up 
> on Impala exit.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8775) Have the option to delete data cache files on Impala shutdown

2019-07-19 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8775:
---
Description: Currently, Impala will delete old data cache files upon 
restart but it only does so when data cache is enabled. However, if the user 
turns off the data cache after restart, the old data cache files may be left 
hanging around until the next restart with the cache enabled. We should have an 
option to delete the data cache files on Impala shutdown. The initial 
implementation of the data cache has the unlink-on-create behavior but it was 
confusing to users as there will be unaccounted usage on the storage side (as 
the file is not easily visible after unlink). We may want to consider support 
this "unlink-on-create" behavior behind a flag in case users prefer to have the 
data cache files removed upon Impala's exit. In the long run, we probably want 
to have an "exit handler" in Impala to make sure resources are freed up on 
Impala exit.  (was: Currently, Impala will delete old data cache files upon 
restart but it only does so when data cache is enabled. Impala should 
unconditionally delete the data cache files regardless of whether data cache is 
enabled.)

> Have the option to delete data cache files on Impala shutdown
> -
>
> Key: IMPALA-8775
> URL: https://issues.apache.org/jira/browse/IMPALA-8775
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> Currently, Impala will delete old data cache files upon restart but it only 
> does so when data cache is enabled. However, if the user turns off the data 
> cache after restart, the old data cache files may be left hanging around 
> until the next restart with the cache enabled. We should have an option to 
> delete the data cache files on Impala shutdown. The initial implementation of 
> the data cache has the unlink-on-create behavior but it was confusing to 
> users as there will be unaccounted usage on the storage side (as the file is 
> not easily visible after unlink). We may want to consider support this 
> "unlink-on-create" behavior behind a flag in case users prefer to have the 
> data cache files removed upon Impala's exit. In the long run, we probably 
> want to have an "exit handler" in Impala to make sure resources are freed up 
> on Impala exit.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8775) Have the option to delete data cache files on Impala shutdown

2019-07-19 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8775:
---
Summary: Have the option to delete data cache files on Impala shutdown  
(was: Always delete data cache files on restart)

> Have the option to delete data cache files on Impala shutdown
> -
>
> Key: IMPALA-8775
> URL: https://issues.apache.org/jira/browse/IMPALA-8775
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> Currently, Impala will delete old data cache files upon restart but it only 
> does so when data cache is enabled. Impala should unconditionally delete the 
> data cache files regardless of whether data cache is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8775) Always delete data cache files on restart

2019-07-19 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8775:
--

 Summary: Always delete data cache files on restart
 Key: IMPALA-8775
 URL: https://issues.apache.org/jira/browse/IMPALA-8775
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho
Assignee: Michael Ho


Currently, Impala will delete old data cache files upon restart but it only 
does so when data cache is enabled. Impala should unconditionally delete the 
data cache files regardless of whether data cache is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2019-07-16 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886546#comment-16886546
 ] 

Michael Ho commented on IMPALA-7733:


A recent instance when running 
{{query_test/test_tpcds_queries.py::TestTpcdsInsert::()::test_tpcds_partitioned_insert}}:

{noformat}
query_test/test_tpcds_queries.py:521: in test_tpcds_partitioned_insert
self.run_test_case('partitioned-insert', vector)
common/impala_test_suite.py:563: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:500: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:798: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:184: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:364: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:385: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
Hdfs op (RENAME 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/_impala_insert_staging/834b0c158076d9d0_015f77df/.834b0c158076d9d0-015f77df0004_337386663_dir/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq
 TO 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq)
 failed, error was: 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/_impala_insert_staging/834b0c158076d9d0_015f77df/.834b0c158076d9d0-015f77df0004_337386663_dir/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq
E   Error(5): Input/output error
{noformat}

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: broken-build, flaky
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2019-07-16 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886546#comment-16886546
 ] 

Michael Ho edited comment on IMPALA-7733 at 7/16/19 11:26 PM:
--

A recent instance when running 
{{query_test/test_tpcds_queries.py::TestTpcdsInsert}}:

{noformat}
query_test/test_tpcds_queries.py:521: in test_tpcds_partitioned_insert
self.run_test_case('partitioned-insert', vector)
common/impala_test_suite.py:563: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:500: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:798: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:184: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:364: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:385: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
Hdfs op (RENAME 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/_impala_insert_staging/834b0c158076d9d0_015f77df/.834b0c158076d9d0-015f77df0004_337386663_dir/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq
 TO 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq)
 failed, error was: 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/_impala_insert_staging/834b0c158076d9d0_015f77df/.834b0c158076d9d0-015f77df0004_337386663_dir/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq
E   Error(5): Input/output error
{noformat}


was (Author: kwho):
A recent instance when running 
{{query_test/test_tpcds_queries.py::TestTpcdsInsert::()::test_tpcds_partitioned_insert}}:

{noformat}
query_test/test_tpcds_queries.py:521: in test_tpcds_partitioned_insert
self.run_test_case('partitioned-insert', vector)
common/impala_test_suite.py:563: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:500: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:798: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:184: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:364: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:385: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
Hdfs op (RENAME 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/_impala_insert_staging/834b0c158076d9d0_015f77df/.834b0c158076d9d0-015f77df0004_337386663_dir/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq
 TO 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq)
 failed, error was: 
s3a:///test-warehouse/tpcds_parquet.db/store_sales_insert/_impala_insert_staging/834b0c158076d9d0_015f77df/.834b0c158076d9d0-015f77df0004_337386663_dir/ss_sold_date_sk=2451539/834b0c158076d9d0-015f77df0004_1260764580_data.0.parq
E   Error(5): Input/output error
{noformat}

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: broken-build, flaky
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute

[jira] [Commented] (IMPALA-8740) TestCodegen.test_disable_codegen flakiness

2019-07-11 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883431#comment-16883431
 ] 

Michael Ho commented on IMPALA-8740:


This is mostly due to some flakiness in stat computation or data loading. 
[~anuragmantri], do you happen to have the log of the entire run somewhere ?

> TestCodegen.test_disable_codegen flakiness 
> ---
>
> Key: IMPALA-8740
> URL: https://issues.apache.org/jira/browse/IMPALA-8740
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Anurag Mantripragada
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: broken-build
>
> Looks like the test cannot find "Codegen Disabled: disabled due to 
> optimization hints" in the profile and fails the assertion:
> {code:java}
> query_test/test_codegen.py:44: in test_disable_codegen
> self.run_test_case('QueryTest/disable-codegen', vector)
> common/impala_test_suite.py:617: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:605: in verify_runtime_profile
> actual))
> E   AssertionError: Did not find matches for lines in runtime profile:
> E   EXPECTED LINES:
> E   row_regex: .*Codegen Disabled: disabled due to optimization hints.*
> ...
> ACTUAL PROFILE FOLLOWS{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8740) TestCodegen.test_disable_codegen flakiness

2019-07-11 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883430#comment-16883430
 ] 

Michael Ho commented on IMPALA-8740:


Based on the profile, it seems the stats on one of the table has corrupted 
stats:

{noformat}
E   WARNING: The following tables have potentially corrupt table statistics.
E   Drop and re-compute statistics to resolve this problem.
E   functional.alltypes
{noformat}

Consequently, the logic for disabling codegen didn't get run at all:
{noformat}

  private void checkForDisableCodegen(PlanNode distributedPlan) {
MaxRowsProcessedVisitor visitor = new MaxRowsProcessedVisitor();
distributedPlan.accept(visitor);
if (!visitor.valid()) return; <<
// This heuristic threshold tries to determine if the per-node codegen time 
will
// reduce per-node execution time enough to justify the cost of codegen. 
Per-node
// execution time is correlated with the number of rows flowing through the 
plan.
if (visitor.getMaxRowsProcessedPerNode()
< ctx_.getQueryOptions().getDisable_codegen_rows_threshold()) {
  ctx_.getQueryCtx().disable_codegen_hint = true;
}
  }
{noformat}

> TestCodegen.test_disable_codegen flakiness 
> ---
>
> Key: IMPALA-8740
> URL: https://issues.apache.org/jira/browse/IMPALA-8740
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Anurag Mantripragada
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: broken-build
>
> Looks like the test cannot find "Codegen Disabled: disabled due to 
> optimization hints" in the profile and fails the assertion:
> {code:java}
> query_test/test_codegen.py:44: in test_disable_codegen
> self.run_test_case('QueryTest/disable-codegen', vector)
> common/impala_test_suite.py:617: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:605: in verify_runtime_profile
> actual))
> E   AssertionError: Did not find matches for lines in runtime profile:
> E   EXPECTED LINES:
> E   row_regex: .*Codegen Disabled: disabled due to optimization hints.*
> ...
> ACTUAL PROFILE FOLLOWS{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8543) Log more diagnostics information in TAcceptQueueServer

2019-07-11 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8543.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Log more diagnostics information in TAcceptQueueServer
> --
>
> Key: IMPALA-8543
> URL: https://issues.apache.org/jira/browse/IMPALA-8543
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 3.3.0
>
>
> There is currently not much diagnostic information in 
> {{TAcceptQueueServer.cpp}}. It would be nice to add some additional logging 
> to identify clients which misbehaved or took longer than expected when 
> setting up connections.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8744) TestSessionExpiration.test_closing_idle_connection fails on Centos 6 due to Python 2.6 incompatibility

2019-07-09 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8744.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> TestSessionExpiration.test_closing_idle_connection fails on Centos 6 due to 
> Python 2.6 incompatibility
> --
>
> Key: IMPALA-8744
> URL: https://issues.apache.org/jira/browse/IMPALA-8744
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Ho
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> custom_cluster/test_session_expiry.py fails with the following message:
> {noformat}
> custom_cluster/test_session_expiration.py:131: in test_closing_idle_connection
> "impala.thrift-server.{}-frontend.connections-in-use".format(protocol)
> E   ValueError: zero length field name in format{noformat}
> The format needs to use "\{0}" rather than "{}", because Python 2.6 doesn't 
> support "{}"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8748) Must pass hostname to RpcMgr::GetProxy()

2019-07-09 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8748.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Must pass hostname to RpcMgr::GetProxy()
> 
>
> Key: IMPALA-8748
> URL: https://issues.apache.org/jira/browse/IMPALA-8748
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> Various RPCs converted to KRPC recently mistakenly pass the resolved IP 
> address instead of the actual hostname. An example below in 
> coordinator-backend-state.cc. This may lead to failure when running with 
> Kerberos enabled.
> {noformat}
>   std::unique_ptr proxy;
>   Status get_proxy_status =
>   ControlService::GetProxy(krpc_host_, krpc_host_.hostname, &proxy);
>   if (!get_proxy_status.ok()) {
> SetExecError(get_proxy_status);
> return;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8748) Must pass hostname to RpcMgr::GetProxy()

2019-07-08 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8748:
--

 Summary: Must pass hostname to RpcMgr::GetProxy()
 Key: IMPALA-8748
 URL: https://issues.apache.org/jira/browse/IMPALA-8748
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.3.0
Reporter: Michael Ho
Assignee: Michael Ho


Various RPCs converted to KRPC recently mistakenly pass the resolved IP address 
instead of the actual hostname. An example below in 
coordinator-backend-state.cc. This may lead to failure when running with 
Kerberos enabled.

{noformat}
  std::unique_ptr proxy;
  Status get_proxy_status =
  ControlService::GetProxy(krpc_host_, krpc_host_.hostname, &proxy);
  if (!get_proxy_status.ok()) {
SetExecError(get_proxy_status);
return;
  }
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8733) Add "bytes in core" metric for data cache

2019-07-01 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8733:
---
Labels: observability  (was: )

> Add "bytes in core" metric for data cache
> -
>
> Key: IMPALA-8733
> URL: https://issues.apache.org/jira/browse/IMPALA-8733
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Priority: Major
>  Labels: observability
>
> The remote-read data cache is a major consumer of physical memory.
> It would be useful to have a "bytes in core" metric for the cache that 
> reports the current in core physical size of the Linux page cache memory for 
> the data cache.   This is useful for tuning data cache size or analyzing the 
> impact of the data cache on host memory use and memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-28 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875278#comment-16875278
 ] 

Michael Ho edited comment on IMPALA-8712 at 6/28/19 11:00 PM:
--

We may be to work around some of the serialization overhead by serializing some 
of the immutable Thrift based RPC parameters once and send it as a sidecar. 
This should reduce the need to serialize it once per backend.


was (Author: kwho):
We may be to work around some of the serialization overhead by sending some of 
the currently Thrift based RPC parameters as a sidecar or something.

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-28 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875278#comment-16875278
 ] 

Michael Ho commented on IMPALA-8712:


We may be to work around some of the serialization overhead by sending some of 
the currently Thrift based RPC parameters as a sidecar or something.

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-28 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873765#comment-16873765
 ] 

Michael Ho edited comment on IMPALA-8712 at 6/28/19 10:57 PM:
--

On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

Some of the serialization overhead with ExecQueryFInstance() RPC even after 
IMPALA-7467 is still Thrift related as we just serialize a bunch of Thrift 
structures into a binary blob and send them via KRPC sidecar. The serialization 
is done in parallel by threads in {{exec_rpc_thread_pool_}}. 


was (Author: kwho):
On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

Some of the serialization overhead with ExecQueryFInstance() RPC even after 
IMPALA-7467 is still Thrift related as we just serialize a bunch of Thrift 
structures into a binary blob and send them via KRPC sidecar. The serialization 
is done in parallel by threads in {{exec_rpc_thread_pool_}}. -If we convert 
those Thrift structures into Protobuf, then the serialization can be done in 
parallel by reactor threads in the KRPC stack.-

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-28 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873765#comment-16873765
 ] 

Michael Ho edited comment on IMPALA-8712 at 6/28/19 7:22 AM:
-

On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

Some of the serialization overhead with ExecQueryFInstance() RPC even after 
IMPALA-7467 is still Thrift related as we just serialize a bunch of Thrift 
structures into a binary blob and send them via KRPC sidecar. The serialization 
is done in parallel by threads in {{exec_rpc_thread_pool_}}. -If we convert 
those Thrift structures into Protobuf, then the serialization can be done in 
parallel by reactor threads in the KRPC stack.-


was (Author: kwho):
On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

Some of the serialization overhead with ExecQueryFInstance() RPC even after 
IMPALA-7467 is still Thrift related as we just serialize a bunch of Thrift 
structures into a binary blob and send them via KRPC sidecar. The serialization 
is done in parallel by threads in {{exec_rpc_thread_pool_}}. If we convert 
those Thrift structures into Protobuf, then the serialization can be done in 
parallel by reactor threads in the KRPC stack.

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-27 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873765#comment-16873765
 ] 

Michael Ho edited comment on IMPALA-8712 at 6/27/19 6:59 PM:
-

On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

Some of the serialization overhead with ExecQueryFInstance() RPC even after 
IMPALA-7467 is still Thrift related as we just serialize a bunch of Thrift 
structures into a binary blob and send them via KRPC sidecar. The serialization 
is done in parallel by threads in {{exec_rpc_thread_pool_}}. If we convert 
those Thrift structures into Protobuf, then the serialization can be done in 
parallel by reactor threads in the KRPC stack.


was (Author: kwho):
On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-26 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873765#comment-16873765
 ] 

Michael Ho commented on IMPALA-8712:


On the other hand, {{exec_rpc_thread_pool_}} allows serialization of the RPC 
parameters to happen in parallel so it may not strictly be a simple conversion 
to asynchronous RPC without regression. So careful evaluation with huge RPC 
parameters (e.g. a large number of scan ranges) may be needed to see if there 
may be regression as a result.

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8714) Metrics for tracking query failure reasons

2019-06-26 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8714:
--

 Summary: Metrics for tracking query failure reasons
 Key: IMPALA-8714
 URL: https://issues.apache.org/jira/browse/IMPALA-8714
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho


A query can fail for various reasons when run in Impala:
- analysis failure (e.g. SQL syntax mistake)
- access right issue (e.g. no privilege for certain ops on a table)
- memory limit exceeded
- executor crashes
- rpc failures
- corrupted data files 

This JIRA tracks the effort to explicitly classify query failures into some 
high level categories and expose them in the metrics. This should provide us a 
clearer view of what types of failures contribute the most to bad user 
experience for a given cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-26 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8712:
---
Issue Type: Sub-task  (was: Improvement)
Parent: IMPALA-5486

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-06-26 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8712:
--

 Summary: Convert ExecQueryFInstance() RPC to become asynchronous
 Key: IMPALA-8712
 URL: https://issues.apache.org/jira/browse/IMPALA-8712
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Affects Versions: Impala 3.3.0
Reporter: Michael Ho
Assignee: Thomas Tauber-Marshall


Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
capabilities of KRPC instead of relying on the half-baked way of using 
{{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
already have a reactor thread pool in KRPC to handle sending client RPCs 
asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5746) Remote fragments continue to hold onto memory after stopping the coordinator daemon

2019-06-25 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-5746:
--

Assignee: Joe McDonnell  (was: Michael Ho)

> Remote fragments continue to hold onto memory after stopping the coordinator 
> daemon
> ---
>
> Key: IMPALA-5746
> URL: https://issues.apache.org/jira/browse/IMPALA-5746
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.10.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Critical
> Attachments: remote_fragments_holding_memory.txt
>
>
> Repro 
> # Start running queries 
> # Kill the coordinator node 
> # On the running Impalad check the memz tab, remote fragments continue to run 
> and hold on to resources
> Remote fragments held on to memory +30 minutes after stopping the coordinator 
> service. 
> Attached thread dump from an Impalad running remote fragments .
> Snapshot of memz tab 30 minutes after killing the coordinator
> {code}
> Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB
>   Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB
>   RequestPool=root.default: Total=1.35 GB Peak=178.51 GB
> Query(f64169d4bb3c901c:3a21d8ae): Total=2.64 MB Peak=104.73 MB
>   Fragment f64169d4bb3c901c:3a21d8ae0051: Total=2.64 MB Peak=2.67 MB
> AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB
>   Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=12.29 KB
> DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
>   Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB
> Query(2a4f12b3b4b1dc8c:db7e8cf2): Total=258.29 MB Peak=412.98 MB
>   Fragment 2a4f12b3b4b1dc8c:db7e8cf2008c: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>   Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
>   Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB
> Query(68421d2a5dea0775:83f5d972): Total=282.77 MB Peak=443.53 MB
>   Fragment 68421d2a5dea0775:83f5d972004a: Total=26.77 MB Peak=26.92 MB
> SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB
>   Exprs: Total=4.00 KB Peak=4.00 KB
> ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB
>   Exprs: Total=4.00 KB Peak=4.00 KB
> SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB
> AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB
>   Exprs: Total=85.12 KB Peak=85.12 KB
> EXCHANGE_NODE (id=11): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=84.80 KB
> DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB
> CodeGen: Total=24.80 KB Peak=4.13 MB
>   Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB
> Query(e94c89fa89a74d27:82812bf9): Total=258.29 MB Peak=436.85 MB
>   Fragment e94c89fa89a74d27:82812bf9008e: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>   Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
>   Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB
> Query(4e43dad3bdc935d8:938b8b7e): Total=2.65 MB Peak=105.60 MB
>   Fragment 4e43dad3bdc935d8:938b8b7e0052: Total=2.65 MB Peak=2.68 MB
> AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB
>   Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=13.68 KB
> DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
>   Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB
> Query(b34bdd65f1ed017e:5a0291bd): Total=2.37 MB Peak=106.56 MB
>   Fragment b34bdd65f1ed017e:5a0291bd004b: Total=2.37 MB Peak=2.37 MB
> SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB
>   Exprs: Total=34.12 KB Peak=34.12 KB
> EXCHANGE_NODE (id=9): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=4.23 KB
> DataStreamSender (dst_id=11): Total=3.45 KB 

[jira] [Commented] (IMPALA-8495) Impala Doc: Document Data Read Cache

2019-06-25 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872561#comment-16872561
 ] 

Michael Ho commented on IMPALA-8495:


[~arodoni_cloudera], the one listed in this JIRA {{--data_cache}} is the right 
one. The gerrit summary was probably stale. Sorry for the confusion.

> Impala Doc: Document Data Read Cache
> 
>
> Key: IMPALA-8495
> URL: https://issues.apache.org/jira/browse/IMPALA-8495
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> IMPALA-8341 introduces a data cache for remote reads. In particular, it 
> caches data for non-local reads (e.g. S3, ABFS, ADLS). The data cache can be 
> enabled setting the startup flag 
> {{--data_cache=,,...,:}} in which 
> {{,...,}} are directories on local filesystem and {{quota}} is 
> the storage consumption quota for each directory. Note that multiple Impala 
> daemons running on the same host *must not* share cache directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8691) Query hint for disabling data caching

2019-06-24 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8691:
--

Assignee: Michael Ho

> Query hint for disabling data caching
> -
>
> Key: IMPALA-8691
> URL: https://issues.apache.org/jira/browse/IMPALA-8691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> IMPALA-8690 tracks the effort for a better eviction algorithm for the 
> Impala's data cache. As a short term workaround, it would be nice to allow 
> users to explicitly set certain tables as not cacheable via query hints or 
> simply disable caching for a query via query options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8701) Document idle session management after IMPALA-7802

2019-06-24 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8701:
---
Affects Version/s: Impala 3.3.0

> Document idle session management after IMPALA-7802
> --
>
> Key: IMPALA-8701
> URL: https://issues.apache.org/jira/browse/IMPALA-8701
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Alex Rodoni
>Priority: Major
>
> After IMPALA-7802 is fixed, the network connection of an idle session will be 
> closed {{\-\-idle_client_poll_time_s}} seconds latest after the session has 
> been declared idle. This is a change of behavior from previous versions in 
> which the client connection will hang around until the user explicitly closes 
> the session and disconnects.
> By default, {{\-\-idle_client_poll_time_s}} is set to 30 seconds and the 
> previous behavior can be restored by setting to 0. In addition, the session 
> will only be closed if idle session timeout has been configured to be greater 
> than 0 for the session either via the startup flag {{--idle_session_timeout}} 
> or the query option {{IDLE_SESSION_TIMEOUT}}. If idle session timeout is not 
> configured, a session cannot become idle by definition and therefore its 
> connection cannot be closed until the client explicitly closes it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8701) Document idle session management after IMPALA-7802

2019-06-24 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8701:
--

 Summary: Document idle session management after IMPALA-7802
 Key: IMPALA-8701
 URL: https://issues.apache.org/jira/browse/IMPALA-8701
 Project: IMPALA
  Issue Type: Documentation
  Components: Docs
Reporter: Michael Ho
Assignee: Alex Rodoni


After IMPALA-7802 is fixed, the network connection of an idle session will be 
closed {{\-\-idle_client_poll_time_s}} seconds latest after the session has 
been declared idle. This is a change of behavior from previous versions in 
which the client connection will hang around until the user explicitly closes 
the session and disconnects.

By default, {{\-\-idle_client_poll_time_s}} is set to 30 seconds and the 
previous behavior can be restored by setting to 0. In addition, the session 
will only be closed if idle session timeout has been configured to be greater 
than 0 for the session either via the startup flag {{--idle_session_timeout}} 
or the query option {{IDLE_SESSION_TIMEOUT}}. If idle session timeout is not 
configured, a session cannot become idle by definition and therefore its 
connection cannot be closed until the client explicitly closes it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7802) Implement support for closing idle sessions

2019-06-24 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-7802.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Implement support for closing idle sessions
> ---
>
> Key: IMPALA-7802
> URL: https://issues.apache.org/jira/browse/IMPALA-7802
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>  Labels: supportability
> Fix For: Impala 3.3.0
>
> Attachments: image-2019-04-12-11-14-42-938.png
>
>
> Currently, the query option {{idle_session_timeout}} specifies a timeout in 
> seconds after which all running queries of that idle session will be 
> cancelled and no new queries can be issued to it. However, the idle session 
> will remain open and it needs to be closed explicitly. Please see the 
> [documentation|https://www.cloudera.com/documentation/enterprise/latest/topics/impala_idle_session_timeout.html]
>  for details.
> This behavior may be undesirable as each session still consumes an Impala 
> frontend service thread. The number of frontend service threads is bound by 
> the flag {{fe_service_threads}}. So, in a multi-tenant environment, an Impala 
> server can have a lot of idle sessions but they still consume against the 
> quota of {{fe_service_threads}}. If the number of sessions established 
> reaches {{fe_service_threads}}, all new session creations will block until 
> some of the existing sessions exit. There may be no time bound on when these 
> zombie idle sessions will be closed and it's at the mercy of the client 
> implementation to close them. In some sense, leaving many idle sessions open 
> is a way to launch a denial of service attack on Impala.
> To fix this situation, we should have an option to forcefully close a session 
> when it's considered idle so it won't unnecessarily consume the limited 
> number of frontend service threads. cc'ing [~zoram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-21 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869757#comment-16869757
 ] 

Michael Ho edited comment on IMPALA-8685 at 6/21/19 7:26 PM:
-

Yes, we can definitely see that in the profile if one were to look at the bytes 
scanned in the scan node instances and we can treat the query option as a 
workaround instead of penalizing the cache effectiveness by default. Also, the 
metrics {{impala-server.io-mgr.bytes-read}} can definitely show IO skew among 
executors.

I guess what I am trying to understand is in a workload in which all scan 
ranges are remote, will the hash ring provide enough randomness / fairness that 
the skew is not an issue ? Would there be unexpected side effect if we mix 
remote / local scan ranges and can they be avoided ? Just thinking out aloud 
here but I should probably go look at the code more :-).


was (Author: kwho):
Yes, we can definitely see that in the profile if one were to look at the bytes 
scanned in the scan node instances and we can treat the query option as a 
workaround instead of penalizing the cache effectiveness by default.

I guess what I am trying to understand is in a workload in which all scan 
ranges are remote, will the hash ring provide enough randomness / fairness that 
the skew is not an issue ? Would there be unexpected side effect if we mix 
remote / local scan ranges and can they be avoided ? Just thinking out aloud 
here but I should probably go look at the code more :-).

> Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
> 
>
> Key: IMPALA-8685
> URL: https://issues.apache.org/jira/browse/IMPALA-8685
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Critical
>
> The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
> This means that there are potentially 3 different executors which can process 
> a remote scan range. Over time, the data of a given remote scan range will be 
> spread across these 3 executors. My understanding of why this is not set to 1 
> is to avoid hot spots in pathological cases. On the other hand, this may mean 
> that we may not maximize the utilization of the file handle cache and data 
> cache. Also, for small clusters (e.g. a 3 node cluster), the default value 
> may render deterministic remote scan range scheduling ineffective. We may 
> want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. 
> One possible idea is to set it to min(3, half of cluster size) so it works 
> okay with small cluster, which may be rather common for demo purposes. 
> However, it doesn't address the problem of cache effectiveness in larger 
> clusters as the footprint of the cache is still amplified by 
> {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. There may also be other criteria for 
> evaluating the default value.
> cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-21 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869757#comment-16869757
 ] 

Michael Ho edited comment on IMPALA-8685 at 6/21/19 6:05 PM:
-

Yes, we can definitely see that in the profile if one were to look at the bytes 
scanned in the scan node instances and we can treat the query option as a 
workaround instead of penalizing the cache effectiveness by default.

I guess what I am trying to understand is in a workload in which all scan 
ranges are remote, will the hash ring provide enough randomness / fairness that 
the skew is not an issue ? Would there be unexpected side effect if we mix 
remote / local scan ranges and can they be avoided ? Just thinking out aloud 
here but I should probably go look at the code more :-).


was (Author: kwho):
Yes, we can definitely see that in the profile and we can treat the knob as a 
workaround instead of penalizing the cache effectiveness by default.

I guess what I am trying to understand is in a workload in which all scan 
ranges are remote, will the hash ring provide enough randomness / fairness that 
the skew is not an issue ? Would there be unexpected side effect if we mix 
remote / local scan ranges and can they be avoided ? Just thinking out aloud 
here but I should probably go look at the code more :-).

> Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
> 
>
> Key: IMPALA-8685
> URL: https://issues.apache.org/jira/browse/IMPALA-8685
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Critical
>
> The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
> This means that there are potentially 3 different executors which can process 
> a remote scan range. Over time, the data of a given remote scan range will be 
> spread across these 3 executors. My understanding of why this is not set to 1 
> is to avoid hot spots in pathological cases. On the other hand, this may mean 
> that we may not maximize the utilization of the file handle cache and data 
> cache. Also, for small clusters (e.g. a 3 node cluster), the default value 
> may render deterministic remote scan range scheduling ineffective. We may 
> want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. 
> One possible idea is to set it to min(3, half of cluster size) so it works 
> okay with small cluster, which may be rather common for demo purposes. 
> However, it doesn't address the problem of cache effectiveness in larger 
> clusters as the footprint of the cache is still amplified by 
> {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. There may also be other criteria for 
> evaluating the default value.
> cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-21 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869757#comment-16869757
 ] 

Michael Ho commented on IMPALA-8685:


Yes, we can definitely see that in the profile and we can treat the knob as a 
workaround instead of penalizing the cache effectiveness by default.

I guess what I am trying to understand is in a workload in which all scan 
ranges are remote, will the hash ring provide enough randomness / fairness that 
the skew is not an issue ? Would there be unexpected side effect if we mix 
remote / local scan ranges and can they be avoided ? Just thinking out aloud 
here but I should probably go look at the code more :-).

> Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
> 
>
> Key: IMPALA-8685
> URL: https://issues.apache.org/jira/browse/IMPALA-8685
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Critical
>
> The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
> This means that there are potentially 3 different executors which can process 
> a remote scan range. Over time, the data of a given remote scan range will be 
> spread across these 3 executors. My understanding of why this is not set to 1 
> is to avoid hot spots in pathological cases. On the other hand, this may mean 
> that we may not maximize the utilization of the file handle cache and data 
> cache. Also, for small clusters (e.g. a 3 node cluster), the default value 
> may render deterministic remote scan range scheduling ineffective. We may 
> want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. 
> One possible idea is to set it to min(3, half of cluster size) so it works 
> okay with small cluster, which may be rather common for demo purposes. 
> However, it doesn't address the problem of cache effectiveness in larger 
> clusters as the footprint of the cache is still amplified by 
> {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. There may also be other criteria for 
> evaluating the default value.
> cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-21 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8685:
---
Description: 
The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
This means that there are potentially 3 different executors which can process a 
remote scan range. Over time, the data of a given remote scan range will be 
spread across these 3 executors. My understanding of why this is not set to 1 
is to avoid hot spots in pathological cases. On the other hand, this may mean 
that we may not maximize the utilization of the file handle cache and data 
cache. Also, for small clusters (e.g. a 3 node cluster), the default value may 
render deterministic remote scan range scheduling ineffective. We may want to 
re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. One 
possible idea is to set it to min(3, half of cluster size) so it works okay 
with small cluster, which may be rather common for demo purposes. However, it 
doesn't address the problem of cache effectiveness in larger clusters as the 
footprint of the cache is still amplified by 
{{NUM_REMOTE_EXECUTOR_CANDIDATES}}. There may also be other criteria for 
evaluating the default value.

cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]


  was:
The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
This means that there are potentially 3 different executors which can process a 
remote scan range. Over time, the data of a given remote scan range will be 
spread across these 3 executors. My understanding of why this is not set to 1 
is to avoid hot spots in pathological cases. On the other hand, this may mean 
that we may not maximize the utilization of the file handle cache and data 
cache. Also, for small clusters (e.g. a 3 node cluster), the default value may 
render deterministic remote scan range scheduling ineffective. We may want to 
re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. One idea 
is to set it to min(3, half of cluster size) so it works okay with small 
cluster, which may be rather common for demo purposes. There may also be other 
criteria for evaluating the default value.

cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



> Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
> 
>
> Key: IMPALA-8685
> URL: https://issues.apache.org/jira/browse/IMPALA-8685
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Critical
>
> The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
> This means that there are potentially 3 different executors which can process 
> a remote scan range. Over time, the data of a given remote scan range will be 
> spread across these 3 executors. My understanding of why this is not set to 1 
> is to avoid hot spots in pathological cases. On the other hand, this may mean 
> that we may not maximize the utilization of the file handle cache and data 
> cache. Also, for small clusters (e.g. a 3 node cluster), the default value 
> may render deterministic remote scan range scheduling ineffective. We may 
> want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. 
> One possible idea is to set it to min(3, half of cluster size) so it works 
> okay with small cluster, which may be rather common for demo purposes. 
> However, it doesn't address the problem of cache effectiveness in larger 
> clusters as the footprint of the cache is still amplified by 
> {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. There may also be other criteria for 
> evaluating the default value.
> cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-21 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8685:
--

Assignee: Joe McDonnell

> Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
> 
>
> Key: IMPALA-8685
> URL: https://issues.apache.org/jira/browse/IMPALA-8685
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Assignee: Joe McDonnell
>Priority: Critical
>
> The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
> This means that there are potentially 3 different executors which can process 
> a remote scan range. Over time, the data of a given remote scan range will be 
> spread across these 3 executors. My understanding of why this is not set to 1 
> is to avoid hot spots in pathological cases. On the other hand, this may mean 
> that we may not maximize the utilization of the file handle cache and data 
> cache. Also, for small clusters (e.g. a 3 node cluster), the default value 
> may render deterministic remote scan range scheduling ineffective. We may 
> want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. 
> One idea is to set it to min(3, half of cluster size) so it works okay with 
> small cluster, which may be rather common for demo purposes. There may also 
> be other criteria for evaluating the default value.
> cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-21 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869711#comment-16869711
 ] 

Michael Ho commented on IMPALA-8685:


Yes, I agree that setting NUM_REMOTE_EXECUTOR_CANDIDATES to anything other than 
1 will definitely reduce the effectiveness of the cache in general. I believe I 
need to understand the issue of skew and scheduling in general better and see 
if there are workarounds than setting it to 3.

> Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
> 
>
> Key: IMPALA-8685
> URL: https://issues.apache.org/jira/browse/IMPALA-8685
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Michael Ho
>Priority: Critical
>
> The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
> This means that there are potentially 3 different executors which can process 
> a remote scan range. Over time, the data of a given remote scan range will be 
> spread across these 3 executors. My understanding of why this is not set to 1 
> is to avoid hot spots in pathological cases. On the other hand, this may mean 
> that we may not maximize the utilization of the file handle cache and data 
> cache. Also, for small clusters (e.g. a 3 node cluster), the default value 
> may render deterministic remote scan range scheduling ineffective. We may 
> want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. 
> One idea is to set it to min(3, half of cluster size) so it works okay with 
> small cluster, which may be rather common for demo purposes. There may also 
> be other criteria for evaluating the default value.
> cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8691) Query hint for disabling data caching

2019-06-20 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8691:
--

 Summary: Query hint for disabling data caching
 Key: IMPALA-8691
 URL: https://issues.apache.org/jira/browse/IMPALA-8691
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho


IMPALA-8690 tracks the effort for a better eviction algorithm for the Impala's 
data cache. As a short term workaround, it would be nice to allow users to 
explicitly set certain tables as not cacheable via query hints or simply 
disable caching for a query via query options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8690) Better eviction algorithm for data cache

2019-06-20 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8690:
--

 Summary: Better eviction algorithm for data cache
 Key: IMPALA-8690
 URL: https://issues.apache.org/jira/browse/IMPALA-8690
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho


With the current implementation of data cache, all data access will be cached 
regardless of the access pattern. The current LRU eviction algorithm is not 
resistant to scan traffic so in case some users scan a big fact table, a lot of 
the heavily accessed items will be evicted inevitably. We should adopt better 
eviction algorithm (e.g. LRFU or some other well known ones in the literature). 
Would be nice to evaluate it against some users' traces now that IMPALA-8542 is 
fixed.

In the short run, we probably need some workaround (e.g. query hints to disable 
caching for certain tables). Will file a separate jira for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

2019-06-19 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8685:
--

 Summary: Evaluate default configuration of 
NUM_REMOTE_EXECUTOR_CANDIDATES
 Key: IMPALA-8685
 URL: https://issues.apache.org/jira/browse/IMPALA-8685
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Michael Ho


The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. 
This means that there are potentially 3 different executors which can process a 
remote scan range. Over time, the data of a given remote scan range will be 
spread across these 3 executors. My understanding of why this is not set to 1 
is to avoid hot spots in pathological cases. On the other hand, this may mean 
that we may not maximize the utilization of the file handle cache and data 
cache. Also, for small clusters (e.g. a 3 node cluster), the default value may 
render deterministic remote scan range scheduling ineffective. We may want to 
re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. One idea 
is to set it to min(3, half of cluster size) so it works okay with small 
cluster, which may be rather common for demo purposes. There may also be other 
criteria for evaluating the default value.

cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8656) Support for eagerly fetching and spooling all query result rows

2019-06-11 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8656:
--

 Summary: Support for eagerly fetching and spooling all query 
result rows
 Key: IMPALA-8656
 URL: https://issues.apache.org/jira/browse/IMPALA-8656
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.2.0, Impala 2.12.0
Reporter: Michael Ho


Impala's current interaction with clients is pulled-based: it relies on clients 
to fetch results to trigger the generation of more result row batches until all 
the result rows have been produced. If a client issues a query without fetching 
all the results, the query fragments will continue to consume the resources 
until the query hits is cancelled and unregistered for whatever reasons. This 
is undesirable as resources are held up by misbehaving clients and other 
queries may wait for extended period of time in admission control due to this.

The high level idea for this JIRA is for Impala to have a mode in which result 
sets of queries are eagerly fetched and spooled somewhere (preferably some 
persistent storage). In this way, the cluster's resources are freed up once all 
result rows have been fetched and stored in the spooling location. Incoming 
client fetches can be returned from this spooled locations.

cc'ing [~stakiar], [~twm378], [~joemcdonnell], [~lv]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8634) Catalog client should be resilient to temporary Catalog outage

2019-06-06 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8634:
--

 Summary: Catalog client should be resilient to temporary Catalog 
outage
 Key: IMPALA-8634
 URL: https://issues.apache.org/jira/browse/IMPALA-8634
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Affects Versions: Impala 3.2.0
Reporter: Michael Ho


Currently, when the catalog server is down, catalog clients will fail all RPCs 
sent to it. In essence, DDL queries will fail and the Impala service becomes a 
lot less functional. Catalog clients should consider retrying failed RPCs with 
some exponential backoff in between while catalog server is being restarted 
after crashing. We probably need to add [a test 
|https://github.com/apache/impala/blob/master/tests/custom_cluster/test_restart_services.py]
 to exercise the paths of catalog restart to verify coordinators are resilient 
to it.

cc'ing [~stakiar], [~joemcdonnell], [~twm378]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8562) Data cache should skip scan range with mtime == -1

2019-06-06 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8562.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Data cache should skip scan range with mtime == -1
> --
>
> Key: IMPALA-8562
> URL: https://issues.apache.org/jira/browse/IMPALA-8562
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
> Fix For: Impala 3.3.0
>
>
> As show in IMPALA-8561, using mtime == -1 as part of cache key may lead to 
> reading stale data. Data cache should probably just skip caching those 
> entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6788) Abort ExecFInstance() RPC loop early after query failure

2019-06-06 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-6788:
--

Assignee: (was: Dan Hecht)

> Abort ExecFInstance() RPC loop early after query failure
> 
>
> Key: IMPALA-6788
> URL: https://issues.apache.org/jira/browse/IMPALA-6788
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: krpc, rpc
> Attachments: connect_thread_busy_queries_failing.txt, 
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then 
> once the startup completes the query is cancelled, this is because one of the 
> intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, 
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() 
> query_id=334cc7dd9758c36c:ec38aeb4 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() 
> query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() 
> query_id=334cc7dd9758c36c:ec38aeb4, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control 
> resources for query_id=334cc7dd9758c36c:ec38aeb4
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() 
> query_id=e44d553b04d47cfb:28f06bb8 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() 
> query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() 
> query_id=e44d553b04d47cfb:28f06bb8, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control 
> resources for query_id=e44d553b04d47cfb:28f06bb8
> {code}
> Checked the coordinator and threads appear to be spending lots of time 
> waiting on exec_complete_barrier_
> {code}
> #0  0x7fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x01222944 in impala::Promise::Get() ()
> #2  0x01220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x01221c87 in impala::Coordinator::Exec() ()
> #4  0x00c3a925 in 
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest 
> const&) ()
> #5  0x00c41f7e in 
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x00bff597 in 
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, 
> std::shared_ptr, bool*, 
> std::shared_ptr*) ()
> #7  0x00c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, 
> std::shared_ptr, 
> std::shared_ptr*) ()
> #8  0x00c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, 
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x00d60c9a in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5119) Don't make RPCs from Coordinator::UpdateBackendExecStatus()

2019-06-06 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-5119:
--

Assignee: (was: Dan Hecht)

> Don't make RPCs from Coordinator::UpdateBackendExecStatus()
> ---
>
> Key: IMPALA-5119
> URL: https://issues.apache.org/jira/browse/IMPALA-5119
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Priority: Major
>
> If it reports a bad status, {{UpdateFragmentExecStatus()}} will call 
> {{UpdateStatus()}}, which takes {{Coordinator::lock_}} and then calls 
> {{Cancel()}}. That method issues one RPC per fragment instance.
> In KRPC, doing so much work from {{UpdateFragmentExecStatus()}} - which is an 
> RPC handler - is a bad idea, even if the RPCs are issued asynchronously. 
> There's still some serialization cost.
> It's also a bad idea to do all this work while holding {{lock_}}. We should 
> address both of these to ensure scalability of the cancellation path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6788) Abort ExecFInstance() RPC loop early after query failure

2019-06-06 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-6788:
---
Summary: Abort ExecFInstance() RPC loop early after query failure  (was: 
Query fragments can spend lots of time starting up then fail right after 
"starting" all backends)

> Abort ExecFInstance() RPC loop early after query failure
> 
>
> Key: IMPALA-6788
> URL: https://issues.apache.org/jira/browse/IMPALA-6788
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Dan Hecht
>Priority: Major
>  Labels: krpc, rpc
> Attachments: connect_thread_busy_queries_failing.txt, 
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then 
> once the startup completes the query is cancelled, this is because one of the 
> intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, 
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() 
> query_id=334cc7dd9758c36c:ec38aeb4 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() 
> query_id=334cc7dd9758c36c:ec38aeb4
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() 
> query_id=334cc7dd9758c36c:ec38aeb4, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control 
> resources for query_id=334cc7dd9758c36c:ec38aeb4
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() 
> query_id=e44d553b04d47cfb:28f06bb8 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() 
> query_id=e44d553b04d47cfb:28f06bb8
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() 
> query_id=e44d553b04d47cfb:28f06bb8, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control 
> resources for query_id=e44d553b04d47cfb:28f06bb8
> {code}
> Checked the coordinator and threads appear to be spending lots of time 
> waiting on exec_complete_barrier_
> {code}
> #0  0x7fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x01222944 in impala::Promise::Get() ()
> #2  0x01220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x01221c87 in impala::Coordinator::Exec() ()
> #4  0x00c3a925 in 
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest 
> const&) ()
> #5  0x00c41f7e in 
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x00bff597 in 
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, 
> std::shared_ptr, bool*, 
> std::shared_ptr*) ()
> #7  0x00c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, 
> std::shared_ptr, 
> std::shared_ptr*) ()
> #8  0x00c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, 
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x00d60c9a in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8562) Data cache should scan range with mtime == -1

2019-05-16 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8562:
---
Summary: Data cache should scan range with mtime == -1  (was: Data cache 
should skip file with mtime == -1)

> Data cache should scan range with mtime == -1
> -
>
> Key: IMPALA-8562
> URL: https://issues.apache.org/jira/browse/IMPALA-8562
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>
> As show in IMPALA-8561, using mtime == -1 as part of cache key may lead to 
> reading stale data. Data cache should probably just skip caching those 
> entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8562) Data cache should skip scan range with mtime == -1

2019-05-16 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8562:
---
Summary: Data cache should skip scan range with mtime == -1  (was: Data 
cache should scan range with mtime == -1)

> Data cache should skip scan range with mtime == -1
> --
>
> Key: IMPALA-8562
> URL: https://issues.apache.org/jira/browse/IMPALA-8562
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>
> As show in IMPALA-8561, using mtime == -1 as part of cache key may lead to 
> reading stale data. Data cache should probably just skip caching those 
> entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8562) Data cache should skip file with mtime == -1

2019-05-16 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8562:
--

 Summary: Data cache should skip file with mtime == -1
 Key: IMPALA-8562
 URL: https://issues.apache.org/jira/browse/IMPALA-8562
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho
Assignee: Michael Ho


As show in IMPALA-8561, using mtime == -1 as part of cache key may lead to 
reading stale data. Data cache should probably just skip caching those entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8561) ScanRanges with mtime=-1 can lead to inconsistent reads when using the file handle cache

2019-05-16 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841805#comment-16841805
 ] 

Michael Ho commented on IMPALA-8561:


Nice debugging ! This will probably affect the data cache too as it relies on 
mtime as part of cache key.

> ScanRanges with mtime=-1 can lead to inconsistent reads when using the file 
> handle cache
> 
>
> Key: IMPALA-8561
> URL: https://issues.apache.org/jira/browse/IMPALA-8561
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>
> The file handle cache relies on the mtime to distinguish between different 
> versions of a file. For example, if file X exists with mtime=1, then it is 
> overwritten and the metadata is updated so that now it is at mtime=2, the 
> file handle cache treats them as completely different things and can never 
> use a single file handle to serve both. However, some codepaths generate 
> ScanRanges with an mtime of -1. This removes the ability to distinguish these 
> two versions of a file and can read to consistency problems.
> A specific example is the code that reads the parquet footer 
> [HdfsParquetScanner::ProcessFooter()|[https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354]].
>  We don't know ahead of time how big the Parquet footer is. So, we read 100KB 
> (determined by 
> [FOOTER_SIZE|[https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]]).
>  If the footer size encoded in the last few bytes of the file indicates that 
> the footer is larger than that [code 
> here|[https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414]],
>  then we issue a separate read for the actual size of the footer. That 
> separate read does not inherit the mtime of the original read and instead 
> uses an mtime of -1. I verified this by adding tracing and issuing a select 
> against functional_parquet.widetable_1000_cols.
> A failure scenario associated with this is that we read the last 100KB using 
> a ScanRange with mtime=2, then we find that the footer is larger than 100KB 
> and issue a ScanRange with mtime=-1. This uses a file handle that is from a 
> previous version of the file equivalent to mtime=1. The data it is reading 
> may not come from the end of the file, or it may be at the end of the file 
> but the footer has a different length. (There is no validation on the new 
> read to check the magic value or metadata size reported by the new buffer.) 
> Either would result in a failure to deserialize the thrift for the footer. 
> For example, a problem case could produce an error message like:
>  
> {noformat}
> File hdfs://test-warehouse/example_file.parq of length 1048576 bytes has 
> invalid file metadata at file offset 462017. Error = couldn't deserialize 
> thrift msg:
> TProtocolException: Invalid data
> .{noformat}
> To fix this, we should examine all locations that can result in ScanRanges 
> with mtime=-1 and eliminate any that we can. For example, the 
> HdfsParquetScanner::ProcessFooter() code should create a ScanRange that 
> inherits the mtime from the original footer ScanRange. Also, the file handle 
> cache should refuse to cache file handles with mtime=-1.
> The code in HdfsParquetScanner::ProcessFooter() should add validation for the 
> magic value and metadata size when reading a footer larger than 100KB to 
> verify that we are reading something valid. The thrift deserialize failure 
> gives some information, but catching this case more specifically would 
> provide a better error message.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6159) DataStreamSender should transparently handle some connection reset by peer

2019-05-15 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-6159:
--

Assignee: Michael Ho

> DataStreamSender should transparently handle some connection reset by peer
> --
>
> Key: IMPALA-6159
> URL: https://issues.apache.org/jira/browse/IMPALA-6159
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> A client to server KRPC connection can become stale if the socket was closed 
> on the server side due to various reasons such as idle connection removal or 
> remote Impalad restart. Currently, the KRPC code will invoke the callback of 
> all RPCs using that stale connection with the failed status (e.g. "Connection 
> reset by peer"). DataStreamSender should pattern match against certain error 
> string (as they are mostly output from strerror()) and retry the RPC 
> transparently. This may be also be useful for KUDU-2192 which tracks the 
> effort to detect stuck connection and close them. In which case, we may also 
> want to transparently retry the RPC
> FWIW, KUDU-279 is tracking the effort to have a cleaner protocol for 
> connection teardown due to idle client connection removal on the server side. 
> However, Impala still needs to handle other reasons for a stale connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7802) Implement support for closing idle sessions

2019-05-15 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-7802:
--

Assignee: Michael Ho

> Implement support for closing idle sessions
> ---
>
> Key: IMPALA-7802
> URL: https://issues.apache.org/jira/browse/IMPALA-7802
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>  Labels: supportability
> Attachments: image-2019-04-12-11-14-42-938.png
>
>
> Currently, the query option {{idle_session_timeout}} specifies a timeout in 
> seconds after which all running queries of that idle session will be 
> cancelled and no new queries can be issued to it. However, the idle session 
> will remain open and it needs to be closed explicitly. Please see the 
> [documentation|https://www.cloudera.com/documentation/enterprise/latest/topics/impala_idle_session_timeout.html]
>  for details.
> This behavior may be undesirable as each session still consumes an Impala 
> frontend service thread. The number of frontend service threads is bound by 
> the flag {{fe_service_threads}}. So, in a multi-tenant environment, an Impala 
> server can have a lot of idle sessions but they still consume against the 
> quota of {{fe_service_threads}}. If the number of sessions established 
> reaches {{fe_service_threads}}, all new session creations will block until 
> some of the existing sessions exit. There may be no time bound on when these 
> zombie idle sessions will be closed and it's at the mercy of the client 
> implementation to close them. In some sense, leaving many idle sessions open 
> is a way to launch a denial of service attack on Impala.
> To fix this situation, we should have an option to forcefully close a session 
> when it's considered idle so it won't unnecessarily consume the limited 
> number of frontend service threads. cc'ing [~zoram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8512) Data cache tests failing on older CentOS 6 versions

2019-05-13 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8512.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Data cache tests failing on older CentOS 6 versions
> ---
>
> Key: IMPALA-8512
> URL: https://issues.apache.org/jira/browse/IMPALA-8512
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Michael Ho
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> They are failing with errors like:
> {noformat}
> Error: Data dir /tmp/data-cache-test.0 is on an ext4 filesystem vulnerable to 
> KUDU-1508.
> {noformat}
> {noformat}
> custom_cluster.test_data_cache.TestDataCache.test_data_cache_deterministic[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none]
> DataCacheTest.TestBasics
> DataCacheTest.RotateFiles
> DataCacheTest.RotateAndDeleteFiles
> DataCacheTest.Eviction
> DataCacheTest.MultiThreadedNoMisses
> DataCacheTest.MultiThreadedWithMisses
> DataCacheTest.MultiPartitions
> DataCacheTest.LargeFootprint
> FilesystemUtil.DirEntryTypes
> custom_cluster.test_data_cache.TestDataCache.test_data_cache[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none]
> {noformat}
> Can we disable these tests on affected systems?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6048) Queries make very slow progress and report WaitForRPC() stuck for too long

2019-05-13 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-6048.

   Resolution: Duplicate
Fix Version/s: Not Applicable

So, as explained, this could be triggered by heavy spilling under high 
concurrency so some nodes are slow to consume row batches, leading to long RPC 
wait time. This seems to be very similar to what's being tracked in IMPALA-6294.

> Queries make very slow progress and report  WaitForRPC() stuck for too long
> ---
>
> Key: IMPALA-6048
> URL: https://issues.apache.org/jira/browse/IMPALA-6048
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Fix For: Not Applicable
>
> Attachments: Archive 2.zip
>
>
> When running 32 concurrent queries from TPCDS a couple of instances from 
> TPC-DS Q78 9 hours to finish and it appeared to be hung.
> On an idle cluster the query finished in under 5 minutes, profiles attached. 
> When the query ran for long fragments reported +16 hours of network 
> send/receive time
> The logs show there is a lot of messages like the one below, there are 
> incidents for this log message where a node waited too long from an RPC from 
> itself
> {code}
> W1012 00:47:57.633549 117475 krpc-data-stream-sender.cc:360] XXX: 
> WaitForRPC() stuck for too long address=10.17.234.37:29000 
> fragment_instace_id_=1e48ef897e797131:2f05789b05eb dest_node_id_=24 
> sender_id_=81
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8260) Improve TotalEosReceived counter in profile

2019-05-13 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8260:
---
Labels: observability profile ramp-up  (was: observability profile)

> Improve TotalEosReceived counter in profile
> ---
>
> Key: IMPALA-8260
> URL: https://issues.apache.org/jira/browse/IMPALA-8260
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Assignee: Lars Volker
>Priority: Major
>  Labels: observability, profile, ramp-up
>
> Feedback from [~jeszyb] I got a while back on the profiles. TotalEosReceived 
> could be made a lot more useful, since it could let you figure out, for a 
> stuck query, how many senders it's still waiting on. It's not obvious that it 
> corresponds to the other sender metrics. Maybe TotalSendersDone. It's also 
> would be useful to have TotalSenders so you can see how many are left. Or 
> actually, you could just have TotalSendersRemaining, which would answer that 
> directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8006) Consider replacing the modulus in KrpcDatastreamSender with fast mod

2019-05-13 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-8006:
---
Labels: ramp-up  (was: )

> Consider replacing the modulus in KrpcDatastreamSender with fast mod 
> -
>
> Key: IMPALA-8006
> URL: https://issues.apache.org/jira/browse/IMPALA-8006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Michael Ho
>Priority: Major
>  Labels: ramp-up
>
> [~tlipcon]  pointed out that there is potential improvement which can be 
> implemented for the modulus used in our sender for the partitioning 
> exchanges: 
> http://www.idryman.org/blog/2017/05/03/writing-a-damn-fast-hash-table-with-tiny-memory-footprints/
>  (Optimizing Division for Hash Table Size).
> We should evaluate its effectiveness and implement it for 
> KrpcDataStreamSender if appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   >