[jira] [Work started] (IMPALA-8761) Configuration validation introduced in IMPALA-8559 can be improved

2019-09-16 Thread Anurag Mantripragada (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8761 started by Anurag Mantripragada.

> Configuration validation introduced in IMPALA-8559 can be improved
> --
>
> Key: IMPALA-8761
> URL: https://issues.apache.org/jira/browse/IMPALA-8761
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
>
> The issue with configuration validation in IMPALA-8559 is that it validates 
> one configuration at a time and fails as soon as there is a validation error. 
> Since there are more than one configuration keys to validate, user may have 
> to restart HMS again and again if there are multiple configuration changes 
> which are needed. This is not a great user experience. A simple improvement 
> that can be made is do all the configuration validations together and then 
> present the results together in case of failures so that user can change all 
> the required changes in one go.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8761) Configuration validation introduced in IMPALA-8559 can be improved

2019-09-16 Thread Anurag Mantripragada (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Mantripragada reassigned IMPALA-8761:


Assignee: Anurag Mantripragada

> Configuration validation introduced in IMPALA-8559 can be improved
> --
>
> Key: IMPALA-8761
> URL: https://issues.apache.org/jira/browse/IMPALA-8761
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Anurag Mantripragada
>Priority: Major
>
> The issue with configuration validation in IMPALA-8559 is that it validates 
> one configuration at a time and fails as soon as there is a validation error. 
> Since there are more than one configuration keys to validate, user may have 
> to restart HMS again and again if there are multiple configuration changes 
> which are needed. This is not a great user experience. A simple improvement 
> that can be made is do all the configuration validations together and then 
> present the results together in case of failures so that user can change all 
> the required changes in one go.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger

2019-09-16 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930910#comment-16930910
 ] 

Fang-Yu Rao edited comment on IMPALA-8587 at 9/17/19 3:28 AM:
--

After testing the proposed patch, I found that even we log in to impalad via 
Impala shell as a non-Ranger super user, the execution of that user could still 
succeed. For example, if we log in to impalad as a user using
{code:java}
./bin/impala-shell.sh -u random_user;
{code}
The SQL statement in the following could still succeed.
{code:java}
show grant user admin on database functional;
{code}
This seems like a bug since a user that does not correspond to a Ranger super 
user should not be able to execute this SQL statement successfully.


was (Author: fangyurao):
After testing the proposed patch, I found that even we log in to impalad via 
Impala shell as a non-Ranger super user, the execution of that SQL user could 
still succeed. For example, if we log in to impalad as a user using
{code:java}
./bin/impala-shell.sh -u random_user;
{code}
The SQL statement in the following could still succeed.
{code:java}
show grant user admin on database functional;
{code}
This seems like a bug since a user that does not correspond to a Ranger super 
user should not be able to execute this SQL statement successfully.

> Show inherited privileges in show grant w/ Ranger
> -
>
> Key: IMPALA-8587
> URL: https://issues.apache.org/jira/browse/IMPALA-8587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Austin Nobis
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> If an admin has privileges from:
> *grant all on server to user admin;*
>  
> Currently the command below will show no results:
> *show grant user admin on database functional;*
>  
> After the change, the user should see server level privileges from:
> *show grant user admin on database functional;*
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3160:
-

Assignee: Thomas Tauber-Marshall

> Queries may not get cancelled if cancellation pool hits 
> MAX_CANCELLATION_QUEUE_SIZE
> ---
>
> Key: IMPALA-3160
> URL: https://issues.apache.org/jira/browse/IMPALA-3160
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0
>Reporter: Sailesh Mukil
>Assignee: Thomas Tauber-Marshall
>Priority: Minor
>  Labels: correctness, downgraded
>
> The ImpalaServer::MembershipCallback() function determines if a backend(s) is 
> down from the topic updates from the statestore. It also cancels all the 
> queries that are already in flight on these failed backends after comparing 
> the failed backend from the topic update to the failed backend in the 
> query_locations_ map which maps backends to queries running on it.
> If the cancellation queue is too large (tracked by 
> MAX_CANCELLATION_QUEUE_SIZE), we do not cancel the queries hoping that by the 
> next heartbeat, the cancellation queue frees up so we can re-try the 
> cancellation of these queries.
> However, by that point we already remove the failed backend from the 
> query_locations_ map. So, the next heartbeat will never find this backend to 
> cancel the queries running on it.
> {code:java}
> // Maps from query id (to be cancelled) to a list of failed Impalads that 
> are
> // the cause of the cancellation.
> map > queries_to_cancel; // : 
> LOCAL MAP
> {
>   // Build a list of queries that are running on failed hosts (as 
> evidenced by their
>   // absence from the membership list).
>   // TODO: crash-restart failures can give false negatives for failed 
> Impala demons.
>   lock_guard l(query_locations_lock_);
>   QueryLocations::const_iterator loc_entry = query_locations_.begin();
>   while (loc_entry != query_locations_.end()) {
> if (current_membership.find(loc_entry->first) == 
> current_membership.end()) {
>   unordered_set::const_iterator query_id = 
> loc_entry->second.begin();
>   // Add failed backend locations to all queries that ran on that 
> backend.
>   for(; query_id != loc_entry->second.end(); ++query_id) {
> vector& failed_hosts = 
> queries_to_cancel[*query_id];
> failed_hosts.push_back(loc_entry->first);
>   }
>   
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first);
>   // We can remove the location wholesale once we know backend's 
> failed. To do so
>   // safely during iteration, we have to be careful not in invalidate 
> the current
>   // iterator, so copy the iterator to do the erase(..) and advance 
> the original.
>   QueryLocations::const_iterator failed_backend = loc_entry;
>   ++loc_entry;
>   // : WE ERASE THE ENTRY FROM THE GLOBAL MAP HERE.
>   query_locations_.erase(failed_backend);
> } else {
>   ++loc_entry;
> }
>   }
> }
> if (cancellation_thread_pool_->GetQueueSize() + queries_to_cancel.size() >
> MAX_CANCELLATION_QUEUE_SIZE) {
>   // Ignore the cancellations - we'll be able to process them on the next 
> heartbeat
>   // instead.
>   LOG_EVERY_N(WARNING, 60) << "Cancellation queue is full";
>   // : WE DON'T CANCEL HERE AND BY THE NEXT HEARTBEAT, WE WON'T FIND 
> THE FAILED BACKEND AGAIN.
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE

2019-09-16 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930990#comment-16930990
 ] 

Tim Armstrong commented on IMPALA-3160:
---

[~twmarshall] is this still an issue?

> Queries may not get cancelled if cancellation pool hits 
> MAX_CANCELLATION_QUEUE_SIZE
> ---
>
> Key: IMPALA-3160
> URL: https://issues.apache.org/jira/browse/IMPALA-3160
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0
>Reporter: Sailesh Mukil
>Assignee: Thomas Tauber-Marshall
>Priority: Minor
>  Labels: correctness, downgraded
>
> The ImpalaServer::MembershipCallback() function determines if a backend(s) is 
> down from the topic updates from the statestore. It also cancels all the 
> queries that are already in flight on these failed backends after comparing 
> the failed backend from the topic update to the failed backend in the 
> query_locations_ map which maps backends to queries running on it.
> If the cancellation queue is too large (tracked by 
> MAX_CANCELLATION_QUEUE_SIZE), we do not cancel the queries hoping that by the 
> next heartbeat, the cancellation queue frees up so we can re-try the 
> cancellation of these queries.
> However, by that point we already remove the failed backend from the 
> query_locations_ map. So, the next heartbeat will never find this backend to 
> cancel the queries running on it.
> {code:java}
> // Maps from query id (to be cancelled) to a list of failed Impalads that 
> are
> // the cause of the cancellation.
> map > queries_to_cancel; // : 
> LOCAL MAP
> {
>   // Build a list of queries that are running on failed hosts (as 
> evidenced by their
>   // absence from the membership list).
>   // TODO: crash-restart failures can give false negatives for failed 
> Impala demons.
>   lock_guard l(query_locations_lock_);
>   QueryLocations::const_iterator loc_entry = query_locations_.begin();
>   while (loc_entry != query_locations_.end()) {
> if (current_membership.find(loc_entry->first) == 
> current_membership.end()) {
>   unordered_set::const_iterator query_id = 
> loc_entry->second.begin();
>   // Add failed backend locations to all queries that ran on that 
> backend.
>   for(; query_id != loc_entry->second.end(); ++query_id) {
> vector& failed_hosts = 
> queries_to_cancel[*query_id];
> failed_hosts.push_back(loc_entry->first);
>   }
>   
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first);
>   // We can remove the location wholesale once we know backend's 
> failed. To do so
>   // safely during iteration, we have to be careful not in invalidate 
> the current
>   // iterator, so copy the iterator to do the erase(..) and advance 
> the original.
>   QueryLocations::const_iterator failed_backend = loc_entry;
>   ++loc_entry;
>   // : WE ERASE THE ENTRY FROM THE GLOBAL MAP HERE.
>   query_locations_.erase(failed_backend);
> } else {
>   ++loc_entry;
> }
>   }
> }
> if (cancellation_thread_pool_->GetQueueSize() + queries_to_cancel.size() >
> MAX_CANCELLATION_QUEUE_SIZE) {
>   // Ignore the cancellations - we'll be able to process them on the next 
> heartbeat
>   // instead.
>   LOG_EVERY_N(WARNING, 60) << "Cancellation queue is full";
>   // : WE DON'T CANCEL HERE AND BY THE NEXT HEARTBEAT, WE WON'T FIND 
> THE FAILED BACKEND AGAIN.
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3171) data-source-tables.test is flaky when BATCH_SIZE is changed

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3171.
---
Resolution: Later

> data-source-tables.test is flaky when BATCH_SIZE is changed
> ---
>
> Key: IMPALA-3171
> URL: https://issues.apache.org/jira/browse/IMPALA-3171
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: impala 2.3
>Reporter: Juan Yu
>Priority: Minor
>
> data-source-tables.test is flaky and could return different result when 
> changing BATCH_SIZE
> {code}
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4510 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=12345;
> BATCH_SIZE set to 12345
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 5000 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=1;
> BATCH_SIZE set to 1
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4501 |
> +--+
> Fetched 1 row(s) in 0.40s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3171) data-source-tables.test is flaky when BATCH_SIZE is changed

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3171.
---
Resolution: Later

> data-source-tables.test is flaky when BATCH_SIZE is changed
> ---
>
> Key: IMPALA-3171
> URL: https://issues.apache.org/jira/browse/IMPALA-3171
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: impala 2.3
>Reporter: Juan Yu
>Priority: Minor
>
> data-source-tables.test is flaky and could return different result when 
> changing BATCH_SIZE
> {code}
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4510 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=12345;
> BATCH_SIZE set to 12345
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 5000 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=1;
> BATCH_SIZE set to 1
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4501 |
> +--+
> Fetched 1 row(s) in 0.40s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2983) Optimize passthrough preaggregations

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2983.
---
Resolution: Later

> Optimize passthrough preaggregations
> 
>
> Key: IMPALA-2983
> URL: https://issues.apache.org/jira/browse/IMPALA-2983
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: performance
>
> The initial patch for IMPALA-1305 is fairly conservative and leaves a lot of 
> room for improvement. There were some ideas that were shelved because they 
> could cause perf regressions if not carefully implemented.
> * Tune the threshold values better. This is a little tricky since it depends 
> on the cost of exchange, which depends on the cluster properties.
> * Evict some or all partitions from memory to reduce memory overhead and 
> avoid the cost of hash table lookups. The memory reduction is more useful 
> here since the merge agg's hash table inserts will almost certainly be slower 
> than the preaggs hash table lookups.
> * Periodically evict hash table entries to keep the hash tables below a 
> certain threshold



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2983) Optimize passthrough preaggregations

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2983.
---
Resolution: Later

> Optimize passthrough preaggregations
> 
>
> Key: IMPALA-2983
> URL: https://issues.apache.org/jira/browse/IMPALA-2983
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: performance
>
> The initial patch for IMPALA-1305 is fairly conservative and leaves a lot of 
> room for improvement. There were some ideas that were shelved because they 
> could cause perf regressions if not carefully implemented.
> * Tune the threshold values better. This is a little tricky since it depends 
> on the cost of exchange, which depends on the cluster properties.
> * Evict some or all partitions from memory to reduce memory overhead and 
> avoid the cost of hash table lookups. The memory reduction is more useful 
> here since the merge agg's hash table inserts will almost certainly be slower 
> than the preaggs hash table lookups.
> * Periodically evict hash table entries to keep the hash tables below a 
> certain threshold



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2910) create nested types perf microbenchmarks

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2910.
---
Resolution: Later

> create nested types perf microbenchmarks
> 
>
> Key: IMPALA-2910
> URL: https://issues.apache.org/jira/browse/IMPALA-2910
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Silvius Rus
>Priority: Minor
>
> Please extend the perf microbenchmarks to cover performance specific to 
> queries on nested data.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2910) create nested types perf microbenchmarks

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2910.
---
Resolution: Later

> create nested types perf microbenchmarks
> 
>
> Key: IMPALA-2910
> URL: https://issues.apache.org/jira/browse/IMPALA-2910
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Silvius Rus
>Priority: Minor
>
> Please extend the perf microbenchmarks to cover performance specific to 
> queries on nested data.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2579) Limiting the number of records to be fetched/returned by default

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2579.
---
Resolution: Won't Fix

> Limiting the number of records to be fetched/returned by default
> 
>
> Key: IMPALA-2579
> URL: https://issues.apache.org/jira/browse/IMPALA-2579
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Eric Lin
>Priority: Minor
>
> It would be nice to have a feature to limit the resultset to be returned back 
> to client side, either impala-shell or tableau or others.
> This can be achieved by setting a default limit on the following:
> 1) number of rows
> 2) data size
> This is particularly useful when dealing with tables that have millions or 
> billions of records (which is in most cases), and users usually forget to 
> manually LIMIT the result and it will either crash the client software or 
> have to end up manually killing the query.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2579) Limiting the number of records to be fetched/returned by default

2019-09-16 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930989#comment-16930989
 ] 

Tim Armstrong commented on IMPALA-2579:
---

IMPALA-8096 added the option for a limit.

I don't think we want to have an implicit limit - clients should do that if 
they want it, but we're not really in the business of truncating result sets.

> Limiting the number of records to be fetched/returned by default
> 
>
> Key: IMPALA-2579
> URL: https://issues.apache.org/jira/browse/IMPALA-2579
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Eric Lin
>Priority: Minor
>
> It would be nice to have a feature to limit the resultset to be returned back 
> to client side, either impala-shell or tableau or others.
> This can be achieved by setting a default limit on the following:
> 1) number of rows
> 2) data size
> This is particularly useful when dealing with tables that have millions or 
> billions of records (which is in most cases), and users usually forget to 
> manually LIMIT the result and it will either crash the client software or 
> have to end up manually killing the query.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2579) Limiting the number of records to be fetched/returned by default

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2579.
---
Resolution: Won't Fix

> Limiting the number of records to be fetched/returned by default
> 
>
> Key: IMPALA-2579
> URL: https://issues.apache.org/jira/browse/IMPALA-2579
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Eric Lin
>Priority: Minor
>
> It would be nice to have a feature to limit the resultset to be returned back 
> to client side, either impala-shell or tableau or others.
> This can be achieved by setting a default limit on the following:
> 1) number of rows
> 2) data size
> This is particularly useful when dealing with tables that have millions or 
> billions of records (which is in most cases), and users usually forget to 
> manually LIMIT the result and it will either crash the client software or 
> have to end up manually killing the query.
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IMPALA-2312) Timing bug in both MonotonicStopWatch and StopWatch

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2312:
-

Assignee: Tim Armstrong  (was: Henry Robinson)

> Timing bug in both MonotonicStopWatch and StopWatch
> ---
>
> Key: IMPALA-2312
> URL: https://issues.apache.org/jira/browse/IMPALA-2312
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Henry Robinson
>Assignee: Tim Armstrong
>Priority: Minor
>
> Both {{MonotonicStopWatch}} and {{StopWatch}} underestimate the total time if 
> the stopwatch is running while {{ElapsedTime()}} is called. For example:
> {code}
> uint64_t ElapsedTime() const {
> if (!running_) return total_time_;
> timespec end;
> clock_gettime(CLOCK_MONOTONIC, );
> // Should include total_time_, but does not
> return (end.tv_sec - start_.tv_sec) * 1000L * 1000L * 1000L +
> (end.tv_nsec - start_.tv_nsec);
>   }
> {code}
> The effect is that we could have:
> {code}
> MonotonicStopWatch sw;
> sw.Start();
> sw.Stop();
> uint64_t total = sw.ElapsedTime();
> sw.Start();
> // With the bug, this could fail.
> ASSERT_GE(sw.ElapsedTime(), total);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8947.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
> -
>
> Key: IMPALA-8947
> URL: https://issues.apache.org/jira/browse/IMPALA-8947
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: supportability
> Fix For: Impala 3.4.0
>
>
> {noformat}
> ERROR: Could not create files in any configured scratch directories 
> (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of 
> scratch is currently in use by this Impala Daemon (69.80 GB by this query). 
> See logs for previous errors that may have prevented creating or writing 
> scratch files. The following directories were at capacity: /path/to/scratch
> {noformat}
> This issue is that the total for the impala daemon uses the wrong counter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8947.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
> -
>
> Key: IMPALA-8947
> URL: https://issues.apache.org/jira/browse/IMPALA-8947
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: supportability
> Fix For: Impala 3.4.0
>
>
> {noformat}
> ERROR: Could not create files in any configured scratch directories 
> (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of 
> scratch is currently in use by this Impala Daemon (69.80 GB by this query). 
> See logs for previous errors that may have prevented creating or writing 
> scratch files. The following directories were at capacity: /path/to/scratch
> {noformat}
> This issue is that the total for the impala daemon uses the wrong counter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2391) Add Hive to the performance framework.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2391.
---
Resolution: Later

> Add Hive to the performance framework.
> --
>
> Key: IMPALA-2391
> URL: https://issues.apache.org/jira/browse/IMPALA-2391
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Ishaan Joshi
>Priority: Minor
>  Labels: test-infra
>
> Currently, the performance framework Impala does not support Hive. With the 
> on-going work to add impyla (and therefore hs2) as an interface to run 
> queries, we should support running Hive queries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2391) Add Hive to the performance framework.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2391.
---
Resolution: Later

> Add Hive to the performance framework.
> --
>
> Key: IMPALA-2391
> URL: https://issues.apache.org/jira/browse/IMPALA-2391
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Ishaan Joshi
>Priority: Minor
>  Labels: test-infra
>
> Currently, the performance framework Impala does not support Hive. With the 
> on-going work to add impyla (and therefore hs2) as an interface to run 
> queries, we should support running Hive queries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2361) Using AVX intrinsic to accelerate the sort operation

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2361.
---
Resolution: Later

> Using AVX intrinsic to accelerate the sort operation
> 
>
> Key: IMPALA-2361
> URL: https://issues.apache.org/jira/browse/IMPALA-2361
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Youwei Wang
>Priority: Minor
>  Labels: performance
>
> Using AVX intrinsic to accelerate the sort operation



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2214) Add some parquet-related testing

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2214.
---
Resolution: Later

> Add some parquet-related testing
> 
>
> Key: IMPALA-2214
> URL: https://issues.apache.org/jira/browse/IMPALA-2214
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Minor
>  Labels: test, test-infra
>
> We need to add testing for:
> (A) Reading a file that it is shorter than what we think, because of stale 
> metadata 
> (B) Reading a file that it is longer than what we think, because of stale 
> metadata (IMPALA-2213)
> (C) Have a very low --read_size (IMPALA-1291)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2214) Add some parquet-related testing

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2214.
---
Resolution: Later

> Add some parquet-related testing
> 
>
> Key: IMPALA-2214
> URL: https://issues.apache.org/jira/browse/IMPALA-2214
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Minor
>  Labels: test, test-infra
>
> We need to add testing for:
> (A) Reading a file that it is shorter than what we think, because of stale 
> metadata 
> (B) Reading a file that it is longer than what we think, because of stale 
> metadata (IMPALA-2213)
> (C) Have a very low --read_size (IMPALA-1291)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2361) Using AVX intrinsic to accelerate the sort operation

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2361.
---
Resolution: Later

> Using AVX intrinsic to accelerate the sort operation
> 
>
> Key: IMPALA-2361
> URL: https://issues.apache.org/jira/browse/IMPALA-2361
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Youwei Wang
>Priority: Minor
>  Labels: performance
>
> Using AVX intrinsic to accelerate the sort operation



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2311) Clean up duplicated deep copy code

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2311.
---
Resolution: Won't Fix

> Clean up duplicated deep copy code
> --
>
> Key: IMPALA-2311
> URL: https://issues.apache.org/jira/browse/IMPALA-2311
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Tim Armstrong
>Priority: Minor
>
> There are multiple implementations of deep copying in the codebase that 
> duplicate much of the same logic - in RowBatch (Serialize), Tuple (2x) and 
> BufferedTupleStream. There are some differences in how they read and write 
> data, but the main difference is how they allocate memory - it seems like 
> this could be factored out in some way so that that core deep copy logic can 
> be implemented only once (probably as a templated function).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2311) Clean up duplicated deep copy code

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2311.
---
Resolution: Won't Fix

> Clean up duplicated deep copy code
> --
>
> Key: IMPALA-2311
> URL: https://issues.apache.org/jira/browse/IMPALA-2311
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Tim Armstrong
>Priority: Minor
>
> There are multiple implementations of deep copying in the codebase that 
> duplicate much of the same logic - in RowBatch (Serialize), Tuple (2x) and 
> BufferedTupleStream. There are some differences in how they read and write 
> data, but the main difference is how they allocate memory - it seems like 
> this could be factored out in some way so that that core deep copy logic can 
> be implemented only once (probably as a templated function).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2119) Clean up destructors and the fragment/query Close() path

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2119.
---
Resolution: Fixed

I think this cleanup has largely been completed.

> Clean up destructors and the fragment/query Close() path
> 
>
> Key: IMPALA-2119
> URL: https://issues.apache.org/jira/browse/IMPALA-2119
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Matthew Jacobs
>Assignee: Marcel Kornacker
>Priority: Minor
>  Labels: query-lifecycle
>
> We need to make sure that there is a sane query and fragment Close() path on 
> the query and fragments, i.e. we shouldn't be relying on destructors, 
> shared/scoped_ptrs. The worst offenders are the coordinator and fragment 
> management classes, e.g. PlanFragmentExecutor, FragmentExecState, 
> QueryExecState, Coordinator. At the same time, we should make sure that all 
> ExecNodes, sinks, and other backend classes used for query execution have all 
> cleanup logic in Close() methods. In some cases, Close() methods will need to 
> be added.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2027) Create separate impala-shell packages

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2027.
---
Resolution: Duplicate

> Create separate impala-shell packages
> -
>
> Key: IMPALA-2027
> URL: https://issues.apache.org/jira/browse/IMPALA-2027
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: Impala 2.1.1, Impala 2.2
>Reporter: Jeff Hammerbacher
>Priority: Minor
>  Labels: shell, usability
>
> It would be wonderful if a separate {{impala-shell}} package were made 
> available so that users could install the Impala shell on their laptops using 
> their favorite package manager (e.g. Homebrew on Mac).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-2003) Implement "beeswax_field_delimiter" query option to choose different delimiters for beeswax instead of default "\t"

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2003.
---
Resolution: Won't Fix

Impala shell now has HS2 support - IMPALA-7290,  which provides a better 
solution for this delimiter problem

> Implement "beeswax_field_delimiter" query option to choose different 
> delimiters for beeswax instead of default "\t"
> ---
>
> Key: IMPALA-2003
> URL: https://issues.apache.org/jira/browse/IMPALA-2003
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 2.2
>Reporter: Mala Chikka Kempanna
>Priority: Minor
>  Labels: newbie, sql-language
>
> Please implement "beeswax_field_delimiter" query option to choose different 
> delimiters for beeswax instead of default "\t"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2003) Implement "beeswax_field_delimiter" query option to choose different delimiters for beeswax instead of default "\t"

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2003.
---
Resolution: Won't Fix

Impala shell now has HS2 support - IMPALA-7290,  which provides a better 
solution for this delimiter problem

> Implement "beeswax_field_delimiter" query option to choose different 
> delimiters for beeswax instead of default "\t"
> ---
>
> Key: IMPALA-2003
> URL: https://issues.apache.org/jira/browse/IMPALA-2003
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 2.2
>Reporter: Mala Chikka Kempanna
>Priority: Minor
>  Labels: newbie, sql-language
>
> Please implement "beeswax_field_delimiter" query option to choose different 
> delimiters for beeswax instead of default "\t"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1678) Reconsider use of memory buffer in TSaslTransport

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1678.
---
Resolution: Later

Probably less relevant after the KRPC work

> Reconsider use of memory buffer in TSaslTransport
> -
>
> Key: IMPALA-1678
> URL: https://issues.apache.org/jira/browse/IMPALA-1678
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Perf Investigation
>Affects Versions: Impala 2.1
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: performance
>
> {{TSaslTransport}} uses a {{TMemoryBuffer}} to stage bytes that have not been 
> read. {{TMemoryBuffer}} doubles its capacity when it needs to expand. This 
> might not be the best policy - we should consider using a {{deque}} or some 
> other efficient queue implementation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1678) Reconsider use of memory buffer in TSaslTransport

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1678.
---
Resolution: Later

Probably less relevant after the KRPC work

> Reconsider use of memory buffer in TSaslTransport
> -
>
> Key: IMPALA-1678
> URL: https://issues.apache.org/jira/browse/IMPALA-1678
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Perf Investigation
>Affects Versions: Impala 2.1
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: performance
>
> {{TSaslTransport}} uses a {{TMemoryBuffer}} to stage bytes that have not been 
> read. {{TMemoryBuffer}} doubles its capacity when it needs to expand. This 
> might not be the best policy - we should consider using a {{deque}} or some 
> other efficient queue implementation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1581) Shell should fetch results in a separate thread

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1581.
---
Resolution: Later

> Shell should fetch results in a separate thread
> ---
>
> Key: IMPALA-1581
> URL: https://issues.apache.org/jira/browse/IMPALA-1581
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 2.0.1
>Reporter: casey
>Priority: Minor
>  Labels: shell
>
> For queries with large result sets, the client can add significant time to 
> the overall execution time as seen by the end user. For example see 
> IMPALA-1580. Fetching results in a separate thread should significantly 
> reduce the end user wait time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1581) Shell should fetch results in a separate thread

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1581.
---
Resolution: Later

> Shell should fetch results in a separate thread
> ---
>
> Key: IMPALA-1581
> URL: https://issues.apache.org/jira/browse/IMPALA-1581
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 2.0.1
>Reporter: casey
>Priority: Minor
>  Labels: shell
>
> For queries with large result sets, the client can add significant time to 
> the overall execution time as seen by the end user. For example see 
> IMPALA-1580. Fetching results in a separate thread should significantly 
> reduce the end user wait time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1580) Optimize conversion of row batch to query result set

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1580.
---
Resolution: Duplicate

> Optimize conversion of row batch to query result set
> 
>
> Key: IMPALA-1580
> URL: https://issues.apache.org/jira/browse/IMPALA-1580
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 2.0.1
>Reporter: casey
>Priority: Minor
>  Labels: performance, ramp-up
> Attachments: select_lineitem.profile
>
>
> For simple queries that produce a large result set such as "select * from 
> tpch.lineitem" the server execution time is limited by the time required to 
> convert row batches (results in the internal structure) to query results (the 
> structure to be sent to the client). The data conversion is the limiting 
> factor in this case because the query plan execution happens in parallel.
> Here are some data points from the profile of "select * from tpch.lineitem" 
> using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so 
> the exchange node would never block because of a full buffer.). Beeswax takes 
> even longer to convert the rows.
> * Query Timeline: 1m9s
> * Execution Profile -- Total: 1s295ms
> * ClientFetchWaitTimer: 52s553ms
> * RowMaterializationTimer: 15s216ms
> * Coordinator Fragment F01:(Total: 1s092ms
> * Averaged Fragment F00:(Total: 5s608ms
> So the "RowMaterializationTimer", which is actually conversion time, adds ~9 
> seconds or ~2x the plan execution time to the overall time.
> Ideally the conversion time would be codegen'd but even without that there 
> should be a lot of room for improvement by reducing function calls.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1580) Optimize conversion of row batch to query result set

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1580.
---
Resolution: Duplicate

> Optimize conversion of row batch to query result set
> 
>
> Key: IMPALA-1580
> URL: https://issues.apache.org/jira/browse/IMPALA-1580
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 2.0.1
>Reporter: casey
>Priority: Minor
>  Labels: performance, ramp-up
> Attachments: select_lineitem.profile
>
>
> For simple queries that produce a large result set such as "select * from 
> tpch.lineitem" the server execution time is limited by the time required to 
> convert row batches (results in the internal structure) to query results (the 
> structure to be sent to the client). The data conversion is the limiting 
> factor in this case because the query plan execution happens in parallel.
> Here are some data points from the profile of "select * from tpch.lineitem" 
> using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so 
> the exchange node would never block because of a full buffer.). Beeswax takes 
> even longer to convert the rows.
> * Query Timeline: 1m9s
> * Execution Profile -- Total: 1s295ms
> * ClientFetchWaitTimer: 52s553ms
> * RowMaterializationTimer: 15s216ms
> * Coordinator Fragment F01:(Total: 1s092ms
> * Averaged Fragment F00:(Total: 5s608ms
> So the "RowMaterializationTimer", which is actually conversion time, adds ~9 
> seconds or ~2x the plan execution time to the overall time.
> Ideally the conversion time would be codegen'd but even without that there 
> should be a lot of room for improvement by reducing function calls.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1463) Improve HDFS Caching DDL performance for partitioned table

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1463.
---
Resolution: Later

Don't think there's much interest in this one.

> Improve HDFS Caching DDL performance for partitioned table
> --
>
> Key: IMPALA-1463
> URL: https://issues.apache.org/jira/browse/IMPALA-1463
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 1.4.1, Impala 2.0
>Reporter: Alan Choi
>Priority: Minor
>  Labels: performance
>
> Enabling HDFS caching for partitioned table requires two steps. For each 
> partition, it'll have to set a directive to HDFS. Then, it'll issue an "alter 
> table" to HiveMetaStore. This is done in a single-threaded loop.
> Issuing the "atler table" to HiveMetaStore is very slow. In my experiment, 
> each call is ~1sec. For a 4k partition table, it took 4k seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1504) Allow non-delimited-text default file formats

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1504.
---
Resolution: Duplicate

> Allow non-delimited-text default file formats
> -
>
> Key: IMPALA-1504
> URL: https://issues.apache.org/jira/browse/IMPALA-1504
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.0
>Reporter: Jeremy Beard
>Priority: Minor
>  Labels: incompatibility, usability
>
> It would be helpful if tables could be created in a desired file format 
> (...Parquet) without specifying STORED AS X each time. This is especially 
> true for data analysts who are often repeatedly creating and dropping tables 
> in their work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1463) Improve HDFS Caching DDL performance for partitioned table

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1463.
---
Resolution: Later

Don't think there's much interest in this one.

> Improve HDFS Caching DDL performance for partitioned table
> --
>
> Key: IMPALA-1463
> URL: https://issues.apache.org/jira/browse/IMPALA-1463
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 1.4.1, Impala 2.0
>Reporter: Alan Choi
>Priority: Minor
>  Labels: performance
>
> Enabling HDFS caching for partitioned table requires two steps. For each 
> partition, it'll have to set a directive to HDFS. Then, it'll issue an "alter 
> table" to HiveMetaStore. This is done in a single-threaded loop.
> Issuing the "atler table" to HiveMetaStore is very slow. In my experiment, 
> each call is ~1sec. For a 4k partition table, it took 4k seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1504) Allow non-delimited-text default file formats

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1504.
---
Resolution: Duplicate

> Allow non-delimited-text default file formats
> -
>
> Key: IMPALA-1504
> URL: https://issues.apache.org/jira/browse/IMPALA-1504
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.0
>Reporter: Jeremy Beard
>Priority: Minor
>  Labels: incompatibility, usability
>
> It would be helpful if tables could be created in a desired file format 
> (...Parquet) without specifying STORED AS X each time. This is especially 
> true for data analysts who are often repeatedly creating and dropping tables 
> in their work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-1265) Allow embedding query options as hints.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1265.
---
Resolution: Won't Fix

> Allow embedding query options as hints.
> ---
>
> Key: IMPALA-1265
> URL: https://issues.apache.org/jira/browse/IMPALA-1265
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.0
>Reporter: Alexander Behm
>Priority: Minor
>  Labels: planner, usability
>
> With SET we can change query options on a per-session basis, but it would be 
> nice to be able to set query options like hints (and by extension we could 
> create views with specific query options). Something like this:
> select /* +mem_limit=10g */ int_col ...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1265) Allow embedding query options as hints.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1265.
---
Resolution: Won't Fix

> Allow embedding query options as hints.
> ---
>
> Key: IMPALA-1265
> URL: https://issues.apache.org/jira/browse/IMPALA-1265
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.0
>Reporter: Alexander Behm
>Priority: Minor
>  Labels: planner, usability
>
> With SET we can change query options on a per-session basis, but it would be 
> nice to be able to set query options like hints (and by extension we could 
> create views with specific query options). Something like this:
> select /* +mem_limit=10g */ int_col ...



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IMPALA-1171) create_testdata.sh called from both buildall.sh and load-test-warehouse-snapshot.sh

2019-09-16 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930946#comment-16930946
 ] 

Tim Armstrong commented on IMPALA-1171:
---

[~joemcdonnell] maybe not a bug, but I'd imagine you can triage this

> create_testdata.sh called from both buildall.sh and 
> load-test-warehouse-snapshot.sh
> ---
>
> Key: IMPALA-1171
> URL: https://issues.apache.org/jira/browse/IMPALA-1171
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.0
>Reporter: Dan Hecht
>Assignee: Joe McDonnell
>Priority: Minor
>
> buildall.sh calls bin/create_testdata.sh itself.  Then, if loading from a 
> snapshot, it calls:
> testdata/bin/create-load-data.sh which calls:
> testdata/bin/load-test-warehouse-snapshot.sh which calls:
> bin/create_testdata.sh again.
> There's no functional bug, but this may be an opportunity to reduce load time 
> and might indicate that the scripts could use some tidying up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-1173) create-load-data.sh shouldn't try to do load-data.py --force when loading from a snapshot

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-1173:
-

Assignee: Joe McDonnell

> create-load-data.sh shouldn't try to do load-data.py --force when loading 
> from a snapshot
> -
>
> Key: IMPALA-1173
> URL: https://issues.apache.org/jira/browse/IMPALA-1173
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.0
>Reporter: Dan Hecht
>Assignee: Joe McDonnell
>Priority: Minor
>
> testdata/bin/create-load-data.sh first loads a snapshot.  Afterwards, it 
> checks to make sure the loaded schema matches that in git.  If it doesn't 
> match, it forces a reload through load-data.py.
> If the user supplied a snapshot file, then I think it would be better to fail 
> when the schema mismatch is detected rather than falling back to the 
> load_data.py --force path.  It seems more likely that the user would prefer 
> to download an updated snapshot to resolve the situation.
> This has burned me a couple of times now when I've downloaded snapshots in 
> the window between the schema update and when the new snapshot is ready.  
> Surprising (to me at least), the scripts went down the load_data.py --force 
> path, which led to another problem (which Lenni as since fixed). But it would 
> have been better if the script just told me that my snapshot is out of date 
> to begin with.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-1171) create_testdata.sh called from both buildall.sh and load-test-warehouse-snapshot.sh

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-1171:
-

Assignee: Joe McDonnell

> create_testdata.sh called from both buildall.sh and 
> load-test-warehouse-snapshot.sh
> ---
>
> Key: IMPALA-1171
> URL: https://issues.apache.org/jira/browse/IMPALA-1171
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.0
>Reporter: Dan Hecht
>Assignee: Joe McDonnell
>Priority: Minor
>
> buildall.sh calls bin/create_testdata.sh itself.  Then, if loading from a 
> snapshot, it calls:
> testdata/bin/create-load-data.sh which calls:
> testdata/bin/load-test-warehouse-snapshot.sh which calls:
> bin/create_testdata.sh again.
> There's no functional bug, but this may be an opportunity to reduce load time 
> and might indicate that the scripts could use some tidying up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1108) Impala should check the number of opened files/partition during insert

2019-09-16 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930944#comment-16930944
 ] 

Tim Armstrong commented on IMPALA-1108:
---

I think between the clustered inserts and IMPALA-8125 this would be covered.

> Impala should check the number of opened files/partition during insert
> --
>
> Key: IMPALA-1108
> URL: https://issues.apache.org/jira/browse/IMPALA-1108
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4
>Reporter: Alan Choi
>Priority: Minor
>  Labels: ramp-up
>
> For insert, when Impala is inserting into a huge number of partition, Impala 
> might be opening too many files. HDFS will return an error, but the error is 
> incomprehensible as "Error(12): Cannot allocate memory".
> We can do better to improve the error message. Here are two suggestions:
> 1. During planning, if there's stats, we know how many partitions are being 
> inserted per Impalad. Based on that, we can determine if we'll be opening too 
> many files. Either return an error or a warning message.
> 2. During query execution, keep track of the number of files opened for read 
> and write. If we're opening too many files for write, abort the query and 
> returns a proper error message.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1076) Add a shell option to limit the maximum number of rows that are pretty printed

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1076.
---
Resolution: Later

> Add a shell option to limit the maximum number of rows that are pretty printed
> --
>
> Key: IMPALA-1076
> URL: https://issues.apache.org/jira/browse/IMPALA-1076
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.3.1
>Reporter: Nong Li
>Priority: Minor
>  Labels: impala-shell
>
> I think people like to run a query that returns a large number of rows and 
> redirect them as a simple benchmark. This results in a high amount of time 
> spent in pretty print.
> We should add an option "max_pretty_printed_rows" or something and when we 
> hit that value, the shell should disable pretty printing for that query.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-3929) multithreaded text encoding

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3929.
---
Resolution: Won't Fix

IMPALA-3902 is the threading model we should be going towards and will help 
parallelise inserts.

> multithreaded text encoding
> ---
>
> Key: IMPALA-3929
> URL: https://issues.apache.org/jira/browse/IMPALA-3929
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Marcell Szabo
>Priority: Minor
>
> Often the bottleneck of the INSERT statement is the serialisation to text, 
> measured by EncodeTimer.
> Could we implement a producer-consumer model to allow serialisation happen in 
> multiple threads even if we write one file?
> Thank you



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-3929) multithreaded text encoding

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3929.
---
Resolution: Won't Fix

IMPALA-3902 is the threading model we should be going towards and will help 
parallelise inserts.

> multithreaded text encoding
> ---
>
> Key: IMPALA-3929
> URL: https://issues.apache.org/jira/browse/IMPALA-3929
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Marcell Szabo
>Priority: Minor
>
> Often the bottleneck of the INSERT statement is the serialisation to text, 
> measured by EncodeTimer.
> Could we implement a producer-consumer model to allow serialisation happen in 
> multiple threads even if we write one file?
> Thank you



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1076) Add a shell option to limit the maximum number of rows that are pretty printed

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1076.
---
Resolution: Later

> Add a shell option to limit the maximum number of rows that are pretty printed
> --
>
> Key: IMPALA-1076
> URL: https://issues.apache.org/jira/browse/IMPALA-1076
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.3.1
>Reporter: Nong Li
>Priority: Minor
>  Labels: impala-shell
>
> I think people like to run a query that returns a large number of rows and 
> redirect them as a simple benchmark. This results in a high amount of time 
> spent in pretty print.
> We should add an option "max_pretty_printed_rows" or something and when we 
> hit that value, the shell should disable pretty printing for that query.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1034) Verify RSS stays within mem_limit + JVM heapsize

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1034.
---
Resolution: Won't Do

When stress testing this is generally true. I don't think there's a 
particularly robust way to test this automatically.

> Verify RSS stays within mem_limit + JVM heapsize
> 
>
> Key: IMPALA-1034
> URL: https://issues.apache.org/jira/browse/IMPALA-1034
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Affects Versions: Impala 1.3.1
>Reporter: Alan Choi
>Priority: Major
>  Labels: resource-management
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1034) Verify RSS stays within mem_limit + JVM heapsize

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1034.
---
Resolution: Won't Do

When stress testing this is generally true. I don't think there's a 
particularly robust way to test this automatically.

> Verify RSS stays within mem_limit + JVM heapsize
> 
>
> Key: IMPALA-1034
> URL: https://issues.apache.org/jira/browse/IMPALA-1034
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Affects Versions: Impala 1.3.1
>Reporter: Alan Choi
>Priority: Major
>  Labels: resource-management
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-988:
-
Description: 
The amount of available memory changes the trade-off between partitioned and 
shuffle join strategies: if switching to shuffle join can avoid spilling to 
disk, it may be worth paying the cost of the additional network transfer.

There are two issues:
1. Join strategy decision only takes query mem-limit into account but ignore 
process mem-limit.
2. Join strategy decision does not take other joins of the same query into 
account. When multiple joins are present, memory consumption can be very high.

I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 - 
there's a phase ordering problem here - we currently choose the best-performing 
plan then decide how much memory to allocate in admission control based on that 
plan. We can't preserve that while attempting to change the plan to fit  the 
mem_limit. That said, I think the current heuristic is a little too aggressive 
about picking broadcast when the right side is very large - it should probably 
bias more towards shuffle as the right side gets larger.

Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
running to completion, but still affects performance.

  was:
The amount of available memory changes the trade-off between partitioned and 
shuffle join strategies: if switching to shuffle join can avoid spilling to 
disk, it may be worth paying the cost of the additional network transfer.

There are two issues:
1. Join strategy decision only takes query mem-limit into account but ignore 
process mem-limit.
2. Join strategy decision does not take other joins of the same query into 
account. When multiple joins are present, it'll go over the mem-limit.

Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
running to completion, but still affects performance.


> Join strategy (broadcast vs shuffle) decision does not take memory 
> consumption and other joins into account
> ---
>
> Key: IMPALA-988
> URL: https://issues.apache.org/jira/browse/IMPALA-988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: resource-management
>
> The amount of available memory changes the trade-off between partitioned and 
> shuffle join strategies: if switching to shuffle join can avoid spilling to 
> disk, it may be worth paying the cost of the additional network transfer.
> There are two issues:
> 1. Join strategy decision only takes query mem-limit into account but ignore 
> process mem-limit.
> 2. Join strategy decision does not take other joins of the same query into 
> account. When multiple joins are present, memory consumption can be very high.
> I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 - 
> there's a phase ordering problem here - we currently choose the 
> best-performing plan then decide how much memory to allocate in admission 
> control based on that plan. We can't preserve that while attempting to change 
> the plan to fit  the mem_limit. That said, I think the current heuristic is a 
> little too aggressive about picking broadcast when the right side is very 
> large - it should probably bias more towards shuffle as the right side gets 
> larger.
> Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
> running to completion, but still affects performance.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-988:
-
Summary: Join strategy (broadcast vs shuffle) decision does not take memory 
consumption and other joins into account  (was: Join strategy (broadcast vs 
shuffle) decision does not take mem limit and other joins into account)

> Join strategy (broadcast vs shuffle) decision does not take memory 
> consumption and other joins into account
> ---
>
> Key: IMPALA-988
> URL: https://issues.apache.org/jira/browse/IMPALA-988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: resource-management
>
> The amount of available memory changes the trade-off between partitioned and 
> shuffle join strategies: if switching to shuffle join can avoid spilling to 
> disk, it may be worth paying the cost of the additional network transfer.
> There are two issues:
> 1. Join strategy decision only takes query mem-limit into account but ignore 
> process mem-limit.
> 2. Join strategy decision does not take other joins of the same query into 
> account. When multiple joins are present, it'll go over the mem-limit.
> Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
> running to completion, but still affects performance.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-975) Improve RunShellProcess() virtual memory behaviour

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-975.
--
Resolution: Won't Fix

Not so much of an issue now that we don't fork after startup - see IMPALA-5734

> Improve RunShellProcess() virtual memory behaviour
> --
>
> Key: IMPALA-975
> URL: https://issues.apache.org/jira/browse/IMPALA-975
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.3.1
>Reporter: Henry Robinson
>Priority: Minor
>
> Impala uses {{popen()}} to create a new process to call {{kinit}}. This may 
> not be wise (i.e. may return {{ENOMEM}}) when there's a lot of virtual memory 
> used by Impala.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-5734) Don't call fork() when the process VM size may be very large.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5734.
---
Resolution: Fixed

> Don't call fork() when the process VM size may be very large.
> -
>
> Key: IMPALA-5734
> URL: https://issues.apache.org/jira/browse/IMPALA-5734
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Tim Armstrong
>Priority: Major
>
> We should avoid doing this because it can lead to OOMs with certain 
> vm.overcommit settings.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-5734) Don't call fork() when the process VM size may be very large.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5734.
---
Resolution: Fixed

> Don't call fork() when the process VM size may be very large.
> -
>
> Key: IMPALA-5734
> URL: https://issues.apache.org/jira/browse/IMPALA-5734
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Tim Armstrong
>Priority: Major
>
> We should avoid doing this because it can lead to OOMs with certain 
> vm.overcommit settings.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-975) Improve RunShellProcess() virtual memory behaviour

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-975.
--
Resolution: Won't Fix

Not so much of an issue now that we don't fork after startup - see IMPALA-5734

> Improve RunShellProcess() virtual memory behaviour
> --
>
> Key: IMPALA-975
> URL: https://issues.apache.org/jira/browse/IMPALA-975
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.3.1
>Reporter: Henry Robinson
>Priority: Minor
>
> Impala uses {{popen()}} to create a new process to call {{kinit}}. This may 
> not be wise (i.e. may return {{ENOMEM}}) when there's a lot of virtual memory 
> used by Impala.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-967) Investigate wide table performance

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-967.
--
Resolution: Won't Fix

This is kinda open ended. We fixed a bunch of perf issues in this area so I'll 
close for now.

> Investigate wide table performance
> --
>
> Key: IMPALA-967
> URL: https://issues.apache.org/jira/browse/IMPALA-967
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 1.3
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: codegen
>
> Querying wide tables (very roughly 1000+ columns) is very slow. It looks like 
> the time is spent in planning and/or codegen, and that the time increases 
> worse than linearly with the number of columns.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-967) Investigate wide table performance

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-967.
--
Resolution: Won't Fix

This is kinda open ended. We fixed a bunch of perf issues in this area so I'll 
close for now.

> Investigate wide table performance
> --
>
> Key: IMPALA-967
> URL: https://issues.apache.org/jira/browse/IMPALA-967
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 1.3
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: codegen
>
> Querying wide tables (very roughly 1000+ columns) is very slow. It looks like 
> the time is spent in planning and/or codegen, and that the time increases 
> worse than linearly with the number of columns.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-714) Improve error messages for Analysis Exceptions

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-714.
--
Resolution: Won't Fix

> Improve error messages for Analysis Exceptions
> --
>
> Key: IMPALA-714
> URL: https://issues.apache.org/jira/browse/IMPALA-714
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Udai Kiran Potluri
>Priority: Minor
>
> for eg: the below exception does not make it easy to understand the root 
> cause of why it failed to load metadata.
> {code}
> ERROR: AnalysisException: Failed to load metadata for table: default.abc
> CAUSED BY: TableLoadingException: Failed to load metadata for table: abc
> CAUSED BY: InvalidStorageDescriptorException: Unsupported SerDe:
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe
> {code}
> It would be nice to bubble up the root cause in this case "Unsupported SerDe: 
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe" to the top on the shell. 
> While still maintain the whole trace in the logs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-725) "errors" printed to the shell are sometimes warnings

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-725.
--
Resolution: Fixed

Fixed by IMPALA-5474

> "errors" printed to the shell are sometimes warnings
> 
>
> Key: IMPALA-725
> URL: https://issues.apache.org/jira/browse/IMPALA-725
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.2.3
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>
> We print runtime errors to the shell, prepended with "ERRORS ENCOUNTERED 
> DURING EXECUTION". This is potentially confusing when the message is actually 
> a warning (e.g. "Parquet file should not be split into multiple 
> hdfs-blocks"). It might be useful to separately log errors and warnings, or 
> at least to change the warning messages to indicate that the query still ran 
> successfully.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-725) "errors" printed to the shell are sometimes warnings

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-725.
--
Resolution: Fixed

Fixed by IMPALA-5474

> "errors" printed to the shell are sometimes warnings
> 
>
> Key: IMPALA-725
> URL: https://issues.apache.org/jira/browse/IMPALA-725
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.2.3
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>
> We print runtime errors to the shell, prepended with "ERRORS ENCOUNTERED 
> DURING EXECUTION". This is potentially confusing when the message is actually 
> a warning (e.g. "Parquet file should not be split into multiple 
> hdfs-blocks"). It might be useful to separately log errors and warnings, or 
> at least to change the warning messages to indicate that the query still ran 
> successfully.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-714) Improve error messages for Analysis Exceptions

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-714.
--
Resolution: Won't Fix

> Improve error messages for Analysis Exceptions
> --
>
> Key: IMPALA-714
> URL: https://issues.apache.org/jira/browse/IMPALA-714
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Udai Kiran Potluri
>Priority: Minor
>
> for eg: the below exception does not make it easy to understand the root 
> cause of why it failed to load metadata.
> {code}
> ERROR: AnalysisException: Failed to load metadata for table: default.abc
> CAUSED BY: TableLoadingException: Failed to load metadata for table: abc
> CAUSED BY: InvalidStorageDescriptorException: Unsupported SerDe:
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe
> {code}
> It would be nice to bubble up the root cause in this case "Unsupported SerDe: 
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe" to the top on the shell. 
> While still maintain the whole trace in the logs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-708) optimize hdfs-table-sink output partition hashing

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-708.
--
Resolution: Won't Fix

See dan's last comment.

> optimize hdfs-table-sink output partition hashing
> -
>
> Key: IMPALA-708
> URL: https://issues.apache.org/jira/browse/IMPALA-708
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.0, Impala 1.2
>Reporter: Nong Li
>Priority: Minor
>  Labels: poc
>
> Looking at some basic profiling while doing an unpartitioned insert, it looks 
> like we have some very low hanging fruit:
>  226  16.2%  16.2%  226  16.2% 
> boost::unordered_detail::hash_table::find_iterator
>   <-- Need to track down where this is (we need better cluster tools) but 
> this seems like a big waste of time.
>  178  12.8%  29.0%  178  12.8% 
> impala::HdfsParquetTableWriter::AppendRowBatch
>  157  11.3%  40.3%  157  11.3% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9ff700
>  131   9.4%  49.7%  131   9.4% __strncmp_sse42
>  129   9.3%  59.0%  133   9.6% impala::TextConverter::WriteSlot
>  109   7.8%  66.9%  109   7.8% 
> impala::DelimitedTextParser::ParseFieldLocations
>   94   6.8%  73.6%   94   6.8% snappy::internal::CompressFragment
>   71   5.1%  78.7%   71   5.1% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fca90
>   56   4.0%  82.7%   56   4.0% impala::HdfsScanner::WriteCompleteTuple
>   36   2.6%  85.3%   36   2.6% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fd3f0
>   34   2.4%  87.8%   34   2.4% impala::HashUtil::Hash
>   34   2.4%  90.2%   34   2.4% 
> impala::StringParser::StringToIntInternal@801fd0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-708) optimize hdfs-table-sink output partition hashing

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-708.
--
Resolution: Won't Fix

See dan's last comment.

> optimize hdfs-table-sink output partition hashing
> -
>
> Key: IMPALA-708
> URL: https://issues.apache.org/jira/browse/IMPALA-708
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.0, Impala 1.2
>Reporter: Nong Li
>Priority: Minor
>  Labels: poc
>
> Looking at some basic profiling while doing an unpartitioned insert, it looks 
> like we have some very low hanging fruit:
>  226  16.2%  16.2%  226  16.2% 
> boost::unordered_detail::hash_table::find_iterator
>   <-- Need to track down where this is (we need better cluster tools) but 
> this seems like a big waste of time.
>  178  12.8%  29.0%  178  12.8% 
> impala::HdfsParquetTableWriter::AppendRowBatch
>  157  11.3%  40.3%  157  11.3% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9ff700
>  131   9.4%  49.7%  131   9.4% __strncmp_sse42
>  129   9.3%  59.0%  133   9.6% impala::TextConverter::WriteSlot
>  109   7.8%  66.9%  109   7.8% 
> impala::DelimitedTextParser::ParseFieldLocations
>   94   6.8%  73.6%   94   6.8% snappy::internal::CompressFragment
>   71   5.1%  78.7%   71   5.1% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fca90
>   56   4.0%  82.7%   56   4.0% impala::HdfsScanner::WriteCompleteTuple
>   36   2.6%  85.3%   36   2.6% 
> impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fd3f0
>   34   2.4%  87.8%   34   2.4% impala::HashUtil::Hash
>   34   2.4%  90.2%   34   2.4% 
> impala::StringParser::StringToIntInternal@801fd0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-682) Improve statestore network performance with large topics

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-682.
--
Resolution: Won't Fix

With the local catalog improvements, the metadata topic only contains 
invalidations. So no need to invest in this.

> Improve statestore network performance with large topics
> 
>
> Key: IMPALA-682
> URL: https://issues.apache.org/jira/browse/IMPALA-682
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 1.2.2
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
>
> When the statestore has a large topic to transmit (e.g. over 100MB), a lot of 
> network bandwidth will be used. This is particularly acute at startup, when 
> many subscribers are competing for the complete version of a single large 
> topic.
> A lot of the statestore's content is textual, and therefore likely to be very 
> compressible. We can use Thrift's {{TZlibTransport}} to transparently 
> compress large topics, but the problem then is that we'll be doing a lot of 
> redundant work to compress the same topic many times. 
> Instead, maybe we will have to serialise the Topic's thrift structure to a 
> byte string (much as we do for topic values), and then compress it in a 
> single thread. We can do this repeatedly in the background as topics get 
> updated. There'll be some double-serialisation cost to pay, since presumably 
> serialising and deserialising Thrift structs in the application involves 
> another copy, but it should be worth it.
> We can also mitigate the startup problem by having subscribers wait for a lot 
> longer to get their first heartbeat after registration, since the first set 
> of topic updates is going to be large.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-682) Improve statestore network performance with large topics

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-682.
--
Resolution: Won't Fix

With the local catalog improvements, the metadata topic only contains 
invalidations. So no need to invest in this.

> Improve statestore network performance with large topics
> 
>
> Key: IMPALA-682
> URL: https://issues.apache.org/jira/browse/IMPALA-682
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 1.2.2
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
>
> When the statestore has a large topic to transmit (e.g. over 100MB), a lot of 
> network bandwidth will be used. This is particularly acute at startup, when 
> many subscribers are competing for the complete version of a single large 
> topic.
> A lot of the statestore's content is textual, and therefore likely to be very 
> compressible. We can use Thrift's {{TZlibTransport}} to transparently 
> compress large topics, but the problem then is that we'll be doing a lot of 
> redundant work to compress the same topic many times. 
> Instead, maybe we will have to serialise the Topic's thrift structure to a 
> byte string (much as we do for topic values), and then compress it in a 
> single thread. We can do this repeatedly in the background as topics get 
> updated. There'll be some double-serialisation cost to pay, since presumably 
> serialising and deserialising Thrift structs in the application involves 
> another copy, but it should be worth it.
> We can also mitigate the startup problem by having subscribers wait for a lot 
> longer to get their first heartbeat after registration, since the first set 
> of topic updates is going to be large.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-685) Add method in FunctionContext to support loading binaries in HDFS

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-685.
--
Resolution: Later

Closing some JIRAs that have useful suggestions but limited interest. We can 
reopen if there is more interest.

> Add method in FunctionContext to support loading binaries in HDFS
> -
>
> Key: IMPALA-685
> URL: https://issues.apache.org/jira/browse/IMPALA-685
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.2, Impala 2.5.0
>Reporter: Nong Li
>Priority: Minor
>
> We need to add the ability to load additional .so's from the FunctionContext 
> object. There is a partner
> that needs to be able to load additional .so from a UDF. This would just be a 
> thin wrapper around our
> library cache.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-685) Add method in FunctionContext to support loading binaries in HDFS

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-685.
--
Resolution: Later

Closing some JIRAs that have useful suggestions but limited interest. We can 
reopen if there is more interest.

> Add method in FunctionContext to support loading binaries in HDFS
> -
>
> Key: IMPALA-685
> URL: https://issues.apache.org/jira/browse/IMPALA-685
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.2, Impala 2.5.0
>Reporter: Nong Li
>Priority: Minor
>
> We need to add the ability to load additional .so's from the FunctionContext 
> object. There is a partner
> that needs to be able to load additional .so from a UDF. This would just be a 
> thin wrapper around our
> library cache.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-661) Explain plan should characterize the cost of evaluating predicates

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-661.
--
Resolution: Later

Closing some JIRAs that have useful suggestions but limited interest. We can 
reopen if there is more interest.

> Explain plan should characterize the cost of evaluating predicates
> --
>
> Key: IMPALA-661
> URL: https://issues.apache.org/jira/browse/IMPALA-661
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 1.1.1, Impala 2.5.0
>Reporter: Alan Choi
>Priority: Minor
>
> Explain plan is an effective tool for tuning query. However, it doesn't give 
> much inside into the (cpu) cost of expression evaluation. For complex 
> predicates, it'll greatly affect the runtime of the query. For example, 
> complex regex is very costly to evaluate.
> Right now, if a complex join predicate cause the join to slow down, it's not 
> very easy to tell.
> If explain plan can annotate the cost of predicate evaluation (roughly), then 
> it can guide our user to identify and tune the predicates.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-661) Explain plan should characterize the cost of evaluating predicates

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-661.
--
Resolution: Later

Closing some JIRAs that have useful suggestions but limited interest. We can 
reopen if there is more interest.

> Explain plan should characterize the cost of evaluating predicates
> --
>
> Key: IMPALA-661
> URL: https://issues.apache.org/jira/browse/IMPALA-661
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 1.1.1, Impala 2.5.0
>Reporter: Alan Choi
>Priority: Minor
>
> Explain plan is an effective tool for tuning query. However, it doesn't give 
> much inside into the (cpu) cost of expression evaluation. For complex 
> predicates, it'll greatly affect the runtime of the query. For example, 
> complex regex is very costly to evaluate.
> Right now, if a complex join predicate cause the join to slow down, it's not 
> very easy to tell.
> If explain plan can annotate the cost of predicate evaluation (roughly), then 
> it can guide our user to identify and tune the predicates.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-653) Add basic filtering to SHOW STATS command.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-653.
--
Resolution: Later

Will close since there seems to be limited interest

> Add basic filtering to SHOW STATS command.
> --
>
> Key: IMPALA-653
> URL: https://issues.apache.org/jira/browse/IMPALA-653
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2
>Reporter: Alexander Behm
>Priority: Minor
>
> Enhance SHOW TABLE STATS with the capability to filter on partitions.
> Enhance SHOW COLUMN STATS with the capability to filter on columns.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-653) Add basic filtering to SHOW STATS command.

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-653.
--
Resolution: Later

Will close since there seems to be limited interest

> Add basic filtering to SHOW STATS command.
> --
>
> Key: IMPALA-653
> URL: https://issues.apache.org/jira/browse/IMPALA-653
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2
>Reporter: Alexander Behm
>Priority: Minor
>
> Enhance SHOW TABLE STATS with the capability to filter on partitions.
> Enhance SHOW COLUMN STATS with the capability to filter on columns.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-609) Impala should provide a more accurate error message when region server is unhealthy - fails with NoClassDefFoundError

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-609.
--
Resolution: Cannot Reproduce

> Impala should provide a more accurate error message when region server is 
> unhealthy - fails with NoClassDefFoundError
> -
>
> Key: IMPALA-609
> URL: https://issues.apache.org/jira/browse/IMPALA-609
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.1
> Environment: CentOS release 6.4 (Final)
> CDH: CDH 4.4.0-1.cdh4.4.0.p0.39
> JDK: java version "1.7.0_21"
>Reporter: Tony Xu
>Priority: Minor
>  Labels: impala
>
> When running query in Impala (Command line) "select name from product where 
> manufacturername = "Cat's Pride" limit 1;", Impala returns error message: 
> "org.apache.hadoop.ipc.RemoteException(java.lang.NoClassDefFoundError): IPC 
> server unable to read call parameters: Could not initialize class 
> org.apache.hadoop.hbase.util.Classes"
> Note:
> 1. Basic "select * from table_name limit 1;" works in both command line and 
> Hue Impala UI. 
> 2. We used CM to install Impala, "All Services" -> "Actions" -> "Add a 
> Service" -> Then choose Impala for the initial installation then updated 
> through "Parcels" interface.
> 3. The same query works in hive. 
> Cause:
> One of the "region server" couldn't talk to the Hbase master, there are also 
> error messages in Hbase log file on the problematic node. The error message I 
> caught from Hbase log is:
> =
> WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for 
> client 10.6.70.3.
> =
> 10.6.70.3 is the Hbase master's IP address.
> Impala should have failed with a better error message than 
> "NoClassDefFoundError". 
> Impala log:
> http://paste.ubuntu.com/6179531/
> Hbase unhealthy canary report log:
> http://paste.ubuntu.com/6179569/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-609) Impala should provide a more accurate error message when region server is unhealthy - fails with NoClassDefFoundError

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-609.
--
Resolution: Cannot Reproduce

> Impala should provide a more accurate error message when region server is 
> unhealthy - fails with NoClassDefFoundError
> -
>
> Key: IMPALA-609
> URL: https://issues.apache.org/jira/browse/IMPALA-609
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.1
> Environment: CentOS release 6.4 (Final)
> CDH: CDH 4.4.0-1.cdh4.4.0.p0.39
> JDK: java version "1.7.0_21"
>Reporter: Tony Xu
>Priority: Minor
>  Labels: impala
>
> When running query in Impala (Command line) "select name from product where 
> manufacturername = "Cat's Pride" limit 1;", Impala returns error message: 
> "org.apache.hadoop.ipc.RemoteException(java.lang.NoClassDefFoundError): IPC 
> server unable to read call parameters: Could not initialize class 
> org.apache.hadoop.hbase.util.Classes"
> Note:
> 1. Basic "select * from table_name limit 1;" works in both command line and 
> Hue Impala UI. 
> 2. We used CM to install Impala, "All Services" -> "Actions" -> "Add a 
> Service" -> Then choose Impala for the initial installation then updated 
> through "Parcels" interface.
> 3. The same query works in hive. 
> Cause:
> One of the "region server" couldn't talk to the Hbase master, there are also 
> error messages in Hbase log file on the problematic node. The error message I 
> caught from Hbase log is:
> =
> WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for 
> client 10.6.70.3.
> =
> 10.6.70.3 is the Hbase master's IP address.
> Impala should have failed with a better error message than 
> "NoClassDefFoundError". 
> Impala log:
> http://paste.ubuntu.com/6179531/
> Hbase unhealthy canary report log:
> http://paste.ubuntu.com/6179569/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-518) when a query skips bytes or encounters errors in HDFS files, include the details in the profile

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-518.
--
Resolution: Later

> when a query skips bytes or encounters errors in HDFS files, include the 
> details in the profile
> ---
>
> Key: IMPALA-518
> URL: https://issues.apache.org/jira/browse/IMPALA-518
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.0.1
>Reporter: Chris Leroy
>Priority: Minor
>  Labels: observability, supportability
>
> We'd like to be able to collect information on the actual files in HDFS that 
> contain the problematic data when a query hits such data. It's fine to stop 
> reporting once you hit some reporting limit, but it'd be nice to be able to 
> point to the specific problematic files.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-518) when a query skips bytes or encounters errors in HDFS files, include the details in the profile

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-518.
--
Resolution: Later

> when a query skips bytes or encounters errors in HDFS files, include the 
> details in the profile
> ---
>
> Key: IMPALA-518
> URL: https://issues.apache.org/jira/browse/IMPALA-518
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.0.1
>Reporter: Chris Leroy
>Priority: Minor
>  Labels: observability, supportability
>
> We'd like to be able to collect information on the actual files in HDFS that 
> contain the problematic data when a query hits such data. It's fine to stop 
> reporting once you hit some reporting limit, but it'd be nice to be able to 
> point to the specific problematic files.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-536) Verbose mode in the shell

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-536.
--
Resolution: Later

> Verbose mode in the shell
> -
>
> Key: IMPALA-536
> URL: https://issues.apache.org/jira/browse/IMPALA-536
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: John Russell
>Priority: Minor
>
> In normal usage, I appreciate the concise output of impala-shell compared to 
> the Hive shell. Sometimes during debugging, when I try operations in Hive, I 
> do find it convenient to see messages about processes, files, and URLs 
> printed directly in the Hive shell. Could we have a query option for 
> impala-shell to enable a 'verbose' mode?
> For example, after an INSERT it could print the HDFS URI(s) of data files 
> created. (When Impala write operations fail, it can be inconvenient to try 
> and track down files that need to be cleaned up by browsing through the HDFS 
> directory tree.) After a query, it could print the URL pointing to the web UI 
> where one could see the relevant log info. There are probably many other 
> possibilities that we could think of once the basic mechanism was in place.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-536) Verbose mode in the shell

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-536.
--
Resolution: Later

> Verbose mode in the shell
> -
>
> Key: IMPALA-536
> URL: https://issues.apache.org/jira/browse/IMPALA-536
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: John Russell
>Priority: Minor
>
> In normal usage, I appreciate the concise output of impala-shell compared to 
> the Hive shell. Sometimes during debugging, when I try operations in Hive, I 
> do find it convenient to see messages about processes, files, and URLs 
> printed directly in the Hive shell. Could we have a query option for 
> impala-shell to enable a 'verbose' mode?
> For example, after an INSERT it could print the HDFS URI(s) of data files 
> created. (When Impala write operations fail, it can be inconvenient to try 
> and track down files that need to be cleaned up by browsing through the HDFS 
> directory tree.) After a query, it could print the URL pointing to the web UI 
> where one could see the relevant log info. There are probably many other 
> possibilities that we could think of once the basic mechanism was in place.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-462) ALTER DATABASE statement

2019-09-16 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930919#comment-16930919
 ] 

Tim Armstrong commented on IMPALA-462:
--

IMPALA-7016 added the statement but not the support for the two requested 
statements. 

> ALTER DATABASE statement
> 
>
> Key: IMPALA-462
> URL: https://issues.apache.org/jira/browse/IMPALA-462
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 1.1, Impala 2.3.0
>Reporter: John Russell
>Priority: Minor
>  Labels: ramp-up
> Fix For: Product Backlog
>
>
> I suggest adding an ALTER DATABASE statement, for completeness and future 
> expansion.
> Currently, Hive has ALTER DATABASE that AFAICT only allows a SET clause to 
> change properties.
> One logical syntax / use case for an Impala ALTER DATABASE would be:
> ALTER DATABASE old_name RENAME TO new_name;
> (OK to disallow for the DEFAULT database or the currently USEd database.)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-416) Impala and Hive's unescaping of octal escape sequences is flawed.

2019-09-16 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930917#comment-16930917
 ] 

Tim Armstrong commented on IMPALA-416:
--

Confirmed this is still a bug

> Impala and Hive's unescaping of octal escape sequences is flawed.
> -
>
> Key: IMPALA-416
> URL: https://issues.apache.org/jira/browse/IMPALA-416
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 1.0
>Reporter: Alexander Behm
>Priority: Minor
>
> Impala reuses Hive's code to do deal with escape sequences, so both Hive and 
> Impala are affected.
> Octal values > 127 fail to unescape properly. This may lead to problems if, 
> e.g., a user has text data with exotic single-byte delimiters (UTF-8 single 
> byte or ASCII extended).
> For example,
> select "\127" returns "W
> but
> select "\128" returns "128"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger

2019-09-16 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930910#comment-16930910
 ] 

Fang-Yu Rao edited comment on IMPALA-8587 at 9/16/19 10:27 PM:
---

After testing the proposed patch, I found that even we log in to impalad via 
Impala shell as a non-Ranger super user, the execution of that SQL user could 
still succeed. For example, if we log in to impalad as a user using
{code:java}
./bin/impala-shell.sh -u random_user;
{code}
The SQL statement in the following could still succeed.
{code:java}
show grant user admin on database functional;
{code}
This seems like a bug since a user that does not correspond to a Ranger super 
user should not be able to execute this SQL statement successfully.


was (Author: fangyurao):
After testing the proposed patch, I found that even we log in to impalad via 
Impala shell as a non-Ranger super user, the execution of that SQL user could 
still succeed. For example, if we log in to impalad as a user using
{code:java}
./bin/impala-shell.sh -u random_user;
{code}
The SQL statement in the following could still succeed.
{code:java}
show grant user admin on database functional;
{code}
This seems like a bug.

> Show inherited privileges in show grant w/ Ranger
> -
>
> Key: IMPALA-8587
> URL: https://issues.apache.org/jira/browse/IMPALA-8587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Austin Nobis
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> If an admin has privileges from:
> *grant all on server to user admin;*
>  
> Currently the command below will show no results:
> *show grant user admin on database functional;*
>  
> After the change, the user should see server level privileges from:
> *show grant user admin on database functional;*
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger

2019-09-16 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930910#comment-16930910
 ] 

Fang-Yu Rao commented on IMPALA-8587:
-

After testing the proposed patch, I found that even we log in to impalad via 
Impala shell as a non-Ranger super user, the execution of that SQL user could 
still succeed. For example, if we log in to impalad as a user using
{code:java}
./bin/impala-shell.sh -u random_user;
{code}
The SQL statement in the following could still succeed.
{code:java}
show grant user admin on database functional;
{code}
This seems like a bug.

> Show inherited privileges in show grant w/ Ranger
> -
>
> Key: IMPALA-8587
> URL: https://issues.apache.org/jira/browse/IMPALA-8587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Austin Nobis
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> If an admin has privileges from:
> *grant all on server to user admin;*
>  
> Currently the command below will show no results:
> *show grant user admin on database functional;*
>  
> After the change, the user should see server level privileges from:
> *show grant user admin on database functional;*
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5234) Get rid of redundant LogError() messages

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5234.
---
Resolution: Later

This has been dormant for a while. Not sure that it's a real problem in practive

> Get rid of redundant LogError() messages
> 
>
> Key: IMPALA-5234
> URL: https://issues.apache.org/jira/browse/IMPALA-5234
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: errorhandling
>
> In a few places in the codebase, there are redundant LogError() calls that 
> add error statuses to the error_log AND return the same error status up the 
> call stack. This results in the same error message being sent back twice to 
> the client. We need to find all such cases and remove these redundant 
> LogError() calls.
> Repro:
> set mem_limit=1m;
> select * from tpch.lineitem;
> Output:
> {code}
> [localhost:21000] > select * from tpch.lineitem;
> Query: select * from tpch.lineitem
> Query submitted at: 2017-04-20 12:04:22 (Coordinator: http://localhost:25000)
> Query progress can be monitored at: 
> http://localhost:25000/query_plan?query_id=6048492f67282f78:ef0f2bd4
> WARNINGS: Memory limit exceeded: Failed to allocate tuple buffer
> HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit.
> Error occurred on backend localhost:22000 by fragment 
> 6048492f67282f78:ef0f2bd40003
> Memory left in process limit: 8.24 GB
> Memory left in query limit: -7369392.00 B
> Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 
> MB Total=8.03 MB Peak=8.03 MB
>   Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB
> EXCHANGE_NODE (id=1): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> PLAN_ROOT_SINK: Total=0 Peak=0
> CodeGen: Total=0 Peak=0
>   Block Manager: Total=0 Peak=0
>   Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB
> HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB
> DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B
> CodeGen: Total=0 Peak=0
> Memory limit exceeded: Failed to allocate tuple buffer
> HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit.
> Error occurred on backend localhost:22000 by fragment 
> 6048492f67282f78:ef0f2bd40003
> Memory left in process limit: 8.24 GB
> Memory left in query limit: -7369392.00 B
> Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 
> MB Total=8.03 MB Peak=8.03 MB
>   Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB
> EXCHANGE_NODE (id=1): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> PLAN_ROOT_SINK: Total=0 Peak=0
> CodeGen: Total=0 Peak=0
>   Block Manager: Total=0 Peak=0
>   Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB
> HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB
> DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B
> CodeGen: Total=0 Peak=0
> {code}
> This can be traced back to:
> https://github.com/apache/incubator-impala/blob/a50c344077f6c9bbea3d3cbaa2e9146ba20ac9a9/be/src/runtime/row-batch.cc#L462
> https://github.com/apache/incubator-impala/blob/master/be/src/runtime/mem-tracker.cc#L319-L320
> There are more such examples that need to be taken care of too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5234) Get rid of redundant LogError() messages

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5234.
---
Resolution: Later

This has been dormant for a while. Not sure that it's a real problem in practive

> Get rid of redundant LogError() messages
> 
>
> Key: IMPALA-5234
> URL: https://issues.apache.org/jira/browse/IMPALA-5234
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: errorhandling
>
> In a few places in the codebase, there are redundant LogError() calls that 
> add error statuses to the error_log AND return the same error status up the 
> call stack. This results in the same error message being sent back twice to 
> the client. We need to find all such cases and remove these redundant 
> LogError() calls.
> Repro:
> set mem_limit=1m;
> select * from tpch.lineitem;
> Output:
> {code}
> [localhost:21000] > select * from tpch.lineitem;
> Query: select * from tpch.lineitem
> Query submitted at: 2017-04-20 12:04:22 (Coordinator: http://localhost:25000)
> Query progress can be monitored at: 
> http://localhost:25000/query_plan?query_id=6048492f67282f78:ef0f2bd4
> WARNINGS: Memory limit exceeded: Failed to allocate tuple buffer
> HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit.
> Error occurred on backend localhost:22000 by fragment 
> 6048492f67282f78:ef0f2bd40003
> Memory left in process limit: 8.24 GB
> Memory left in query limit: -7369392.00 B
> Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 
> MB Total=8.03 MB Peak=8.03 MB
>   Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB
> EXCHANGE_NODE (id=1): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> PLAN_ROOT_SINK: Total=0 Peak=0
> CodeGen: Total=0 Peak=0
>   Block Manager: Total=0 Peak=0
>   Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB
> HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB
> DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B
> CodeGen: Total=0 Peak=0
> Memory limit exceeded: Failed to allocate tuple buffer
> HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit.
> Error occurred on backend localhost:22000 by fragment 
> 6048492f67282f78:ef0f2bd40003
> Memory left in process limit: 8.24 GB
> Memory left in query limit: -7369392.00 B
> Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 
> MB Total=8.03 MB Peak=8.03 MB
>   Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB
> EXCHANGE_NODE (id=1): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> PLAN_ROOT_SINK: Total=0 Peak=0
> CodeGen: Total=0 Peak=0
>   Block Manager: Total=0 Peak=0
>   Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB
> HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB
> DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B
> CodeGen: Total=0 Peak=0
> {code}
> This can be traced back to:
> https://github.com/apache/incubator-impala/blob/a50c344077f6c9bbea3d3cbaa2e9146ba20ac9a9/be/src/runtime/row-batch.cc#L462
> https://github.com/apache/incubator-impala/blob/master/be/src/runtime/mem-tracker.cc#L319-L320
> There are more such examples that need to be taken care of too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8945.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala Doc: Incorrect Claim of Equivalence in Impala Docs
> -
>
> Key: IMPALA-8945
> URL: https://issues.apache.org/jira/browse/IMPALA-8945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Reported by [~icook]
> The Impala docs entry for the IS DISTINCT FROM operator states:
> The <=> operator, used like an equality operator in a join query, is more 
> efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The 
> <=> operator can use a hash join, while the OR expression cannot.
> But this expression is not equivalent to A <=> B. See the attached screenshot 
> demonstrating their non-equivalence. An expression that is equivalent to A 
> <=> B is this:
> (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B))
>  This expression should replace the existing incorrect expression.
> Another expression that is equivalent to A <=> B is:
> if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B)
> This one is a bit easier to follow. If you use this one in the docs, just 
> replace the following line with:
> The <=> operator can use a hash join, while the if expression cannot.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8945.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala Doc: Incorrect Claim of Equivalence in Impala Docs
> -
>
> Key: IMPALA-8945
> URL: https://issues.apache.org/jira/browse/IMPALA-8945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Reported by [~icook]
> The Impala docs entry for the IS DISTINCT FROM operator states:
> The <=> operator, used like an equality operator in a join query, is more 
> efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The 
> <=> operator can use a hash join, while the OR expression cannot.
> But this expression is not equivalent to A <=> B. See the attached screenshot 
> demonstrating their non-equivalence. An expression that is equivalent to A 
> <=> B is this:
> (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B))
>  This expression should replace the existing incorrect expression.
> Another expression that is equivalent to A <=> B is:
> if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B)
> This one is a bit easier to follow. If you use this one in the docs, just 
> replace the following line with:
> The <=> operator can use a hash join, while the if expression cannot.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8930.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala Doc: Document object ownership with Ranger authorization provider
> 
>
> Key: IMPALA-8930
> URL: https://issues.apache.org/jira/browse/IMPALA-8930
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8930.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala Doc: Document object ownership with Ranger authorization provider
> 
>
> Key: IMPALA-8930
> URL: https://issues.apache.org/jira/browse/IMPALA-8930
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IMPALA-8805) Clarify behaviour around when resource pool config changes take effect

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8805:
--
Target Version: Product Backlog  (was: Impala 3.4.0)

> Clarify behaviour around when resource pool config changes take effect
> --
>
> Key: IMPALA-8805
> URL: https://issues.apache.org/jira/browse/IMPALA-8805
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control
>
> I spent some time playing around with resource pool configs and it's quite 
> difficult to tell when a config change took effect - I at least was 
> temporarily convinced that there was a bug where change *were not* picked up. 
> I believe I was mistaken and just getting confused by the staleness of the 
> /admission UI until a query is submitted by the pool, plus my query not 
> getting routed to the right pool.
> * Confirm that all resource pool config changes take effect dynamically when 
> the next query is submitted
> * Document this behaviour
> * Investigate what it would take to reflect pool configs in the admission UI 
> immediately when they take effect
> * Add information to the debug UI that indicates that pool configs are 
> correct as-of the last query submitted to the pool
> * Maybe add logging when the config change is picked up
> * Maybe add a warning when REQUEST_POOL is specified but the query does not 
> end up in that pool (e.g. if the pool doesn't exist).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs

2019-09-16 Thread Alex Rodoni (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930848#comment-16930848
 ] 

Alex Rodoni commented on IMPALA-8945:
-

https://gerrit.cloudera.org/#/c/14239/

> Impala Doc: Incorrect Claim of Equivalence in Impala Docs
> -
>
> Key: IMPALA-8945
> URL: https://issues.apache.org/jira/browse/IMPALA-8945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> Reported by [~icook]
> The Impala docs entry for the IS DISTINCT FROM operator states:
> The <=> operator, used like an equality operator in a join query, is more 
> efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The 
> <=> operator can use a hash join, while the OR expression cannot.
> But this expression is not equivalent to A <=> B. See the attached screenshot 
> demonstrating their non-equivalence. An expression that is equivalent to A 
> <=> B is this:
> (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B))
>  This expression should replace the existing incorrect expression.
> Another expression that is equivalent to A <=> B is:
> if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B)
> This one is a bit easier to follow. If you use this one in the docs, just 
> replace the following line with:
> The <=> operator can use a hash join, while the if expression cannot.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8384) Insert ACL tests fail on dockerised cluster

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8384.
---
Resolution: Won't Fix

I think this is a mix of catalogv2 issues, tracked by IMPALA-7539, and test 
issues, e.g. test_multiple_group_acls I think depends on some assumptions about 
groups and users that are broken by the impalad running in the container.

> Insert ACL tests fail on dockerised cluster
> ---
>
> Key: IMPALA-8384
> URL: https://issues.apache.org/jira/browse/IMPALA-8384
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> {noformat}
> $ TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" impala-py.test 
> tests/query_test/test_insert_behaviour.py -k acl
> ...
> tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_inherit_acls
>  xfail
> tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_acl_permissions
>  FAILED
> tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_multiple_group_acls
>  FAILED
> {noformat}
> {noformat}
> _
>  TestInsertBehaviour.test_insert_acl_permissions 
> __
> tests/query_test/test_insert_behaviour.py:410: in test_insert_acl_permissions
> self.execute_query_expect_failure(self.client, insert_query)
> tests/common/impala_test_suite.py:607: in wrapper
> return function(*args, **kwargs)
> tests/common/impala_test_suite.py:629: in execute_query_expect_failure
> assert not result.success, "No failure encountered for query %s" % query
> E   AssertionError: No failure encountered for query INSERT INTO 
> `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` VALUES(1)
> --
>  Captured stderr setup 
> ---
> SET 
> client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> Conn 
> -- 2019-04-03 16:21:43,525 INFO MainThread: Closing active operation
> SET 
> client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_insert_acl_permissions_4941df88` CASCADE;
> -- 2019-04-03 16:21:43,531 INFO MainThread: Started query 
> 0847d4339b358537:c1c6ad23
> SET 
> client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_insert_acl_permissions_4941df88`;
> -- 2019-04-03 16:21:43,958 INFO MainThread: Started query 
> 694436ba3cf75303:287b5135
> -- 2019-04-03 16:21:43,966 INFO MainThread: Created database 
> "test_insert_acl_permissions_4941df88" for test ID 
> "query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions"
> ---
>  Captured stderr call 
> ---
> -- executing against localhost:21000
> DROP TABLE IF EXISTS 
> `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`;
> -- 2019-04-03 16:21:43,977 INFO MainThread: Started query 
> ac48b16897d5b622:d96d3999
> -- executing against localhost:21000
> CREATE TABLE `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` 
> (col int);
> -- 2019-04-03 16:21:44,454 INFO MainThread: Started query 
> e741c32bf0fbf1f5:84e69d42
> -- executing against localhost:21000
> INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` 
> VALUES(1);
> -- 2019-04-03 16:21:47,252 INFO MainThread: Started query 
> cb44c5c675b81864:3dfa1a50
> -- 2019-04-03 16:21:47,371 INFO MainThread: Starting new HTTP connection 
> (1): 0.0.0.0
> -- 2019-04-03 16:21:47,381 INFO MainThread: Starting new HTTP connection 
> (1): 0.0.0.0
> -- executing against localhost:21000
> REFRESH `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`;
> -- 2019-04-03 16:21:47,482 INFO MainThread: Started query 
> b049107c5b287ed1:636e8735
> -- executing against localhost:21000
> INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` 
> VALUES(1);
> -- 2019-04-03 16:21:47,625 INFO  

[jira] [Resolved] (IMPALA-8384) Insert ACL tests fail on dockerised cluster

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8384.
---
Resolution: Won't Fix

I think this is a mix of catalogv2 issues, tracked by IMPALA-7539, and test 
issues, e.g. test_multiple_group_acls I think depends on some assumptions about 
groups and users that are broken by the impalad running in the container.

> Insert ACL tests fail on dockerised cluster
> ---
>
> Key: IMPALA-8384
> URL: https://issues.apache.org/jira/browse/IMPALA-8384
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> {noformat}
> $ TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" impala-py.test 
> tests/query_test/test_insert_behaviour.py -k acl
> ...
> tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_inherit_acls
>  xfail
> tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_acl_permissions
>  FAILED
> tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_multiple_group_acls
>  FAILED
> {noformat}
> {noformat}
> _
>  TestInsertBehaviour.test_insert_acl_permissions 
> __
> tests/query_test/test_insert_behaviour.py:410: in test_insert_acl_permissions
> self.execute_query_expect_failure(self.client, insert_query)
> tests/common/impala_test_suite.py:607: in wrapper
> return function(*args, **kwargs)
> tests/common/impala_test_suite.py:629: in execute_query_expect_failure
> assert not result.success, "No failure encountered for query %s" % query
> E   AssertionError: No failure encountered for query INSERT INTO 
> `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` VALUES(1)
> --
>  Captured stderr setup 
> ---
> SET 
> client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> Conn 
> -- 2019-04-03 16:21:43,525 INFO MainThread: Closing active operation
> SET 
> client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_insert_acl_permissions_4941df88` CASCADE;
> -- 2019-04-03 16:21:43,531 INFO MainThread: Started query 
> 0847d4339b358537:c1c6ad23
> SET 
> client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_insert_acl_permissions_4941df88`;
> -- 2019-04-03 16:21:43,958 INFO MainThread: Started query 
> 694436ba3cf75303:287b5135
> -- 2019-04-03 16:21:43,966 INFO MainThread: Created database 
> "test_insert_acl_permissions_4941df88" for test ID 
> "query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions"
> ---
>  Captured stderr call 
> ---
> -- executing against localhost:21000
> DROP TABLE IF EXISTS 
> `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`;
> -- 2019-04-03 16:21:43,977 INFO MainThread: Started query 
> ac48b16897d5b622:d96d3999
> -- executing against localhost:21000
> CREATE TABLE `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` 
> (col int);
> -- 2019-04-03 16:21:44,454 INFO MainThread: Started query 
> e741c32bf0fbf1f5:84e69d42
> -- executing against localhost:21000
> INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` 
> VALUES(1);
> -- 2019-04-03 16:21:47,252 INFO MainThread: Started query 
> cb44c5c675b81864:3dfa1a50
> -- 2019-04-03 16:21:47,371 INFO MainThread: Starting new HTTP connection 
> (1): 0.0.0.0
> -- 2019-04-03 16:21:47,381 INFO MainThread: Starting new HTTP connection 
> (1): 0.0.0.0
> -- executing against localhost:21000
> REFRESH `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`;
> -- 2019-04-03 16:21:47,482 INFO MainThread: Started query 
> b049107c5b287ed1:636e8735
> -- executing against localhost:21000
> INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` 
> VALUES(1);
> -- 2019-04-03 16:21:47,625 INFO  

[jira] [Work started] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8945 started by Alex Rodoni.
---
> Impala Doc: Incorrect Claim of Equivalence in Impala Docs
> -
>
> Key: IMPALA-8945
> URL: https://issues.apache.org/jira/browse/IMPALA-8945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> Reported by [~icook]
> The Impala docs entry for the IS DISTINCT FROM operator states:
> The <=> operator, used like an equality operator in a join query, is more 
> efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The 
> <=> operator can use a hash join, while the OR expression cannot.
> But this expression is not equivalent to A <=> B. See the attached screenshot 
> demonstrating their non-equivalence. An expression that is equivalent to A 
> <=> B is this:
> (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B))
>  This expression should replace the existing incorrect expression.
> Another expression that is equivalent to A <=> B is:
> if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B)
> This one is a bit easier to follow. If you use this one in the docs, just 
> replace the following line with:
> The <=> operator can use a hash join, while the if expression cannot.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric

2019-09-16 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8947 started by Tim Armstrong.
-
> SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
> -
>
> Key: IMPALA-8947
> URL: https://issues.apache.org/jira/browse/IMPALA-8947
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: supportability
>
> {noformat}
> ERROR: Could not create files in any configured scratch directories 
> (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of 
> scratch is currently in use by this Impala Daemon (69.80 GB by this query). 
> See logs for previous errors that may have prevented creating or writing 
> scratch files. The following directories were at capacity: /path/to/scratch
> {noformat}
> This issue is that the total for the impala daemon uses the wrong counter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8948) [DOCS] Review "How Impala Works with Hadoop File Formats"

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8948 started by Alex Rodoni.
---
> [DOCS]  Review "How Impala Works with Hadoop File Formats"
> --
>
> Key: IMPALA-8948
> URL: https://issues.apache.org/jira/browse/IMPALA-8948
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 3.2.0
>Reporter: Vincent Tran
>Assignee: Alex Rodoni
>Priority: Minor
>
> Ref: 
> [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]
>  
> In the "Impala Can INSERT?" column of the file type support matrix for Text, 
> we claim that Impala can insert into a compressed-text table: "Yes: {{CREATE 
> TABLE}}, {{INSERT}}, {{LOAD DATA}}, and query."
>  
> This doesn't appear to be the case as Impala does not support the writing of 
> compressed text in any version at the time of this writing.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8903:

Description: https://gerrit.cloudera.org/#/c/14235/

> Impala Doc: TRUNCATE for Insert-only ACID tables
> 
>
> Key: IMPALA-8903
> URL: https://issues.apache.org/jira/browse/IMPALA-8903
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
>
> https://gerrit.cloudera.org/#/c/14235/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8948) [DOCS] Review "How Impala Works with Hadoop File Formats"

2019-09-16 Thread Vincent Tran (Jira)
Vincent Tran created IMPALA-8948:


 Summary: [DOCS]  Review "How Impala Works with Hadoop File Formats"
 Key: IMPALA-8948
 URL: https://issues.apache.org/jira/browse/IMPALA-8948
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Vincent Tran
Assignee: Alex Rodoni


Ref: [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]

 

In the "Impala Can INSERT?" column of the file type support matrix for Text, we 
claim that Impala can insert into a compressed-text table: "Yes: {{CREATE 
TABLE}}, {{INSERT}}, {{LOAD DATA}}, and query."

 

This doesn't appear to be the case as Impala does not support the writing of 
compressed text in any version at the time of this writing.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8948) [DOCS] Review "How Impala Works with Hadoop File Formats"

2019-09-16 Thread Vincent Tran (Jira)
Vincent Tran created IMPALA-8948:


 Summary: [DOCS]  Review "How Impala Works with Hadoop File Formats"
 Key: IMPALA-8948
 URL: https://issues.apache.org/jira/browse/IMPALA-8948
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Vincent Tran
Assignee: Alex Rodoni


Ref: [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]

 

In the "Impala Can INSERT?" column of the file type support matrix for Text, we 
claim that Impala can insert into a compressed-text table: "Yes: {{CREATE 
TABLE}}, {{INSERT}}, {{LOAD DATA}}, and query."

 

This doesn't appear to be the case as Impala does not support the writing of 
compressed text in any version at the time of this writing.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work started] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables

2019-09-16 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8903 started by Alex Rodoni.
---
> Impala Doc: TRUNCATE for Insert-only ACID tables
> 
>
> Key: IMPALA-8903
> URL: https://issues.apache.org/jira/browse/IMPALA-8903
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   >