[jira] [Work started] (IMPALA-8761) Configuration validation introduced in IMPALA-8559 can be improved
[ https://issues.apache.org/jira/browse/IMPALA-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8761 started by Anurag Mantripragada. > Configuration validation introduced in IMPALA-8559 can be improved > -- > > Key: IMPALA-8761 > URL: https://issues.apache.org/jira/browse/IMPALA-8761 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Anurag Mantripragada >Priority: Major > > The issue with configuration validation in IMPALA-8559 is that it validates > one configuration at a time and fails as soon as there is a validation error. > Since there are more than one configuration keys to validate, user may have > to restart HMS again and again if there are multiple configuration changes > which are needed. This is not a great user experience. A simple improvement > that can be made is do all the configuration validations together and then > present the results together in case of failures so that user can change all > the required changes in one go. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8761) Configuration validation introduced in IMPALA-8559 can be improved
[ https://issues.apache.org/jira/browse/IMPALA-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Mantripragada reassigned IMPALA-8761: Assignee: Anurag Mantripragada > Configuration validation introduced in IMPALA-8559 can be improved > -- > > Key: IMPALA-8761 > URL: https://issues.apache.org/jira/browse/IMPALA-8761 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Anurag Mantripragada >Priority: Major > > The issue with configuration validation in IMPALA-8559 is that it validates > one configuration at a time and fails as soon as there is a validation error. > Since there are more than one configuration keys to validate, user may have > to restart HMS again and again if there are multiple configuration changes > which are needed. This is not a great user experience. A simple improvement > that can be made is do all the configuration validations together and then > present the results together in case of failures so that user can change all > the required changes in one go. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger
[ https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930910#comment-16930910 ] Fang-Yu Rao edited comment on IMPALA-8587 at 9/17/19 3:28 AM: -- After testing the proposed patch, I found that even we log in to impalad via Impala shell as a non-Ranger super user, the execution of that user could still succeed. For example, if we log in to impalad as a user using {code:java} ./bin/impala-shell.sh -u random_user; {code} The SQL statement in the following could still succeed. {code:java} show grant user admin on database functional; {code} This seems like a bug since a user that does not correspond to a Ranger super user should not be able to execute this SQL statement successfully. was (Author: fangyurao): After testing the proposed patch, I found that even we log in to impalad via Impala shell as a non-Ranger super user, the execution of that SQL user could still succeed. For example, if we log in to impalad as a user using {code:java} ./bin/impala-shell.sh -u random_user; {code} The SQL statement in the following could still succeed. {code:java} show grant user admin on database functional; {code} This seems like a bug since a user that does not correspond to a Ranger super user should not be able to execute this SQL statement successfully. > Show inherited privileges in show grant w/ Ranger > - > > Key: IMPALA-8587 > URL: https://issues.apache.org/jira/browse/IMPALA-8587 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Austin Nobis >Assignee: Fang-Yu Rao >Priority: Critical > > If an admin has privileges from: > *grant all on server to user admin;* > > Currently the command below will show no results: > *show grant user admin on database functional;* > > After the change, the user should see server level privileges from: > *show grant user admin on database functional;* > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE
[ https://issues.apache.org/jira/browse/IMPALA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-3160: - Assignee: Thomas Tauber-Marshall > Queries may not get cancelled if cancellation pool hits > MAX_CANCELLATION_QUEUE_SIZE > --- > > Key: IMPALA-3160 > URL: https://issues.apache.org/jira/browse/IMPALA-3160 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.5.0 >Reporter: Sailesh Mukil >Assignee: Thomas Tauber-Marshall >Priority: Minor > Labels: correctness, downgraded > > The ImpalaServer::MembershipCallback() function determines if a backend(s) is > down from the topic updates from the statestore. It also cancels all the > queries that are already in flight on these failed backends after comparing > the failed backend from the topic update to the failed backend in the > query_locations_ map which maps backends to queries running on it. > If the cancellation queue is too large (tracked by > MAX_CANCELLATION_QUEUE_SIZE), we do not cancel the queries hoping that by the > next heartbeat, the cancellation queue frees up so we can re-try the > cancellation of these queries. > However, by that point we already remove the failed backend from the > query_locations_ map. So, the next heartbeat will never find this backend to > cancel the queries running on it. > {code:java} > // Maps from query id (to be cancelled) to a list of failed Impalads that > are > // the cause of the cancellation. > map > queries_to_cancel; // : > LOCAL MAP > { > // Build a list of queries that are running on failed hosts (as > evidenced by their > // absence from the membership list). > // TODO: crash-restart failures can give false negatives for failed > Impala demons. > lock_guard l(query_locations_lock_); > QueryLocations::const_iterator loc_entry = query_locations_.begin(); > while (loc_entry != query_locations_.end()) { > if (current_membership.find(loc_entry->first) == > current_membership.end()) { > unordered_set::const_iterator query_id = > loc_entry->second.begin(); > // Add failed backend locations to all queries that ran on that > backend. > for(; query_id != loc_entry->second.end(); ++query_id) { > vector& failed_hosts = > queries_to_cancel[*query_id]; > failed_hosts.push_back(loc_entry->first); > } > > exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); > // We can remove the location wholesale once we know backend's > failed. To do so > // safely during iteration, we have to be careful not in invalidate > the current > // iterator, so copy the iterator to do the erase(..) and advance > the original. > QueryLocations::const_iterator failed_backend = loc_entry; > ++loc_entry; > // : WE ERASE THE ENTRY FROM THE GLOBAL MAP HERE. > query_locations_.erase(failed_backend); > } else { > ++loc_entry; > } > } > } > if (cancellation_thread_pool_->GetQueueSize() + queries_to_cancel.size() > > MAX_CANCELLATION_QUEUE_SIZE) { > // Ignore the cancellations - we'll be able to process them on the next > heartbeat > // instead. > LOG_EVERY_N(WARNING, 60) << "Cancellation queue is full"; > // : WE DON'T CANCEL HERE AND BY THE NEXT HEARTBEAT, WE WON'T FIND > THE FAILED BACKEND AGAIN. > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE
[ https://issues.apache.org/jira/browse/IMPALA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930990#comment-16930990 ] Tim Armstrong commented on IMPALA-3160: --- [~twmarshall] is this still an issue? > Queries may not get cancelled if cancellation pool hits > MAX_CANCELLATION_QUEUE_SIZE > --- > > Key: IMPALA-3160 > URL: https://issues.apache.org/jira/browse/IMPALA-3160 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.5.0 >Reporter: Sailesh Mukil >Assignee: Thomas Tauber-Marshall >Priority: Minor > Labels: correctness, downgraded > > The ImpalaServer::MembershipCallback() function determines if a backend(s) is > down from the topic updates from the statestore. It also cancels all the > queries that are already in flight on these failed backends after comparing > the failed backend from the topic update to the failed backend in the > query_locations_ map which maps backends to queries running on it. > If the cancellation queue is too large (tracked by > MAX_CANCELLATION_QUEUE_SIZE), we do not cancel the queries hoping that by the > next heartbeat, the cancellation queue frees up so we can re-try the > cancellation of these queries. > However, by that point we already remove the failed backend from the > query_locations_ map. So, the next heartbeat will never find this backend to > cancel the queries running on it. > {code:java} > // Maps from query id (to be cancelled) to a list of failed Impalads that > are > // the cause of the cancellation. > map > queries_to_cancel; // : > LOCAL MAP > { > // Build a list of queries that are running on failed hosts (as > evidenced by their > // absence from the membership list). > // TODO: crash-restart failures can give false negatives for failed > Impala demons. > lock_guard l(query_locations_lock_); > QueryLocations::const_iterator loc_entry = query_locations_.begin(); > while (loc_entry != query_locations_.end()) { > if (current_membership.find(loc_entry->first) == > current_membership.end()) { > unordered_set::const_iterator query_id = > loc_entry->second.begin(); > // Add failed backend locations to all queries that ran on that > backend. > for(; query_id != loc_entry->second.end(); ++query_id) { > vector& failed_hosts = > queries_to_cancel[*query_id]; > failed_hosts.push_back(loc_entry->first); > } > > exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); > // We can remove the location wholesale once we know backend's > failed. To do so > // safely during iteration, we have to be careful not in invalidate > the current > // iterator, so copy the iterator to do the erase(..) and advance > the original. > QueryLocations::const_iterator failed_backend = loc_entry; > ++loc_entry; > // : WE ERASE THE ENTRY FROM THE GLOBAL MAP HERE. > query_locations_.erase(failed_backend); > } else { > ++loc_entry; > } > } > } > if (cancellation_thread_pool_->GetQueueSize() + queries_to_cancel.size() > > MAX_CANCELLATION_QUEUE_SIZE) { > // Ignore the cancellations - we'll be able to process them on the next > heartbeat > // instead. > LOG_EVERY_N(WARNING, 60) << "Cancellation queue is full"; > // : WE DON'T CANCEL HERE AND BY THE NEXT HEARTBEAT, WE WON'T FIND > THE FAILED BACKEND AGAIN. > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-3171) data-source-tables.test is flaky when BATCH_SIZE is changed
[ https://issues.apache.org/jira/browse/IMPALA-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3171. --- Resolution: Later > data-source-tables.test is flaky when BATCH_SIZE is changed > --- > > Key: IMPALA-3171 > URL: https://issues.apache.org/jira/browse/IMPALA-3171 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Affects Versions: impala 2.3 >Reporter: Juan Yu >Priority: Minor > > data-source-tables.test is flaky and could return different result when > changing BATCH_SIZE > {code} > [localhost:21000] > select count(*) from alltypes_datasource; > Query: select count(*) from alltypes_datasource > +--+ > | count(*) | > +--+ > | 4510 | > +--+ > Fetched 1 row(s) in 0.40s > [localhost:21000] > set batch_size=12345; > BATCH_SIZE set to 12345 > [localhost:21000] > select count(*) from alltypes_datasource; > Query: select count(*) from alltypes_datasource > +--+ > | count(*) | > +--+ > | 5000 | > +--+ > Fetched 1 row(s) in 0.40s > [localhost:21000] > set batch_size=1; > BATCH_SIZE set to 1 > [localhost:21000] > select count(*) from alltypes_datasource; > Query: select count(*) from alltypes_datasource > +--+ > | count(*) | > +--+ > | 4501 | > +--+ > Fetched 1 row(s) in 0.40s > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-3171) data-source-tables.test is flaky when BATCH_SIZE is changed
[ https://issues.apache.org/jira/browse/IMPALA-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3171. --- Resolution: Later > data-source-tables.test is flaky when BATCH_SIZE is changed > --- > > Key: IMPALA-3171 > URL: https://issues.apache.org/jira/browse/IMPALA-3171 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Affects Versions: impala 2.3 >Reporter: Juan Yu >Priority: Minor > > data-source-tables.test is flaky and could return different result when > changing BATCH_SIZE > {code} > [localhost:21000] > select count(*) from alltypes_datasource; > Query: select count(*) from alltypes_datasource > +--+ > | count(*) | > +--+ > | 4510 | > +--+ > Fetched 1 row(s) in 0.40s > [localhost:21000] > set batch_size=12345; > BATCH_SIZE set to 12345 > [localhost:21000] > select count(*) from alltypes_datasource; > Query: select count(*) from alltypes_datasource > +--+ > | count(*) | > +--+ > | 5000 | > +--+ > Fetched 1 row(s) in 0.40s > [localhost:21000] > set batch_size=1; > BATCH_SIZE set to 1 > [localhost:21000] > select count(*) from alltypes_datasource; > Query: select count(*) from alltypes_datasource > +--+ > | count(*) | > +--+ > | 4501 | > +--+ > Fetched 1 row(s) in 0.40s > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2983) Optimize passthrough preaggregations
[ https://issues.apache.org/jira/browse/IMPALA-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2983. --- Resolution: Later > Optimize passthrough preaggregations > > > Key: IMPALA-2983 > URL: https://issues.apache.org/jira/browse/IMPALA-2983 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Tim Armstrong >Priority: Minor > Labels: performance > > The initial patch for IMPALA-1305 is fairly conservative and leaves a lot of > room for improvement. There were some ideas that were shelved because they > could cause perf regressions if not carefully implemented. > * Tune the threshold values better. This is a little tricky since it depends > on the cost of exchange, which depends on the cluster properties. > * Evict some or all partitions from memory to reduce memory overhead and > avoid the cost of hash table lookups. The memory reduction is more useful > here since the merge agg's hash table inserts will almost certainly be slower > than the preaggs hash table lookups. > * Periodically evict hash table entries to keep the hash tables below a > certain threshold -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2983) Optimize passthrough preaggregations
[ https://issues.apache.org/jira/browse/IMPALA-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2983. --- Resolution: Later > Optimize passthrough preaggregations > > > Key: IMPALA-2983 > URL: https://issues.apache.org/jira/browse/IMPALA-2983 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Tim Armstrong >Priority: Minor > Labels: performance > > The initial patch for IMPALA-1305 is fairly conservative and leaves a lot of > room for improvement. There were some ideas that were shelved because they > could cause perf regressions if not carefully implemented. > * Tune the threshold values better. This is a little tricky since it depends > on the cost of exchange, which depends on the cluster properties. > * Evict some or all partitions from memory to reduce memory overhead and > avoid the cost of hash table lookups. The memory reduction is more useful > here since the merge agg's hash table inserts will almost certainly be slower > than the preaggs hash table lookups. > * Periodically evict hash table entries to keep the hash tables below a > certain threshold -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2910) create nested types perf microbenchmarks
[ https://issues.apache.org/jira/browse/IMPALA-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2910. --- Resolution: Later > create nested types perf microbenchmarks > > > Key: IMPALA-2910 > URL: https://issues.apache.org/jira/browse/IMPALA-2910 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 2.3.0 >Reporter: Silvius Rus >Priority: Minor > > Please extend the perf microbenchmarks to cover performance specific to > queries on nested data. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2910) create nested types perf microbenchmarks
[ https://issues.apache.org/jira/browse/IMPALA-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2910. --- Resolution: Later > create nested types perf microbenchmarks > > > Key: IMPALA-2910 > URL: https://issues.apache.org/jira/browse/IMPALA-2910 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 2.3.0 >Reporter: Silvius Rus >Priority: Minor > > Please extend the perf microbenchmarks to cover performance specific to > queries on nested data. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2579) Limiting the number of records to be fetched/returned by default
[ https://issues.apache.org/jira/browse/IMPALA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2579. --- Resolution: Won't Fix > Limiting the number of records to be fetched/returned by default > > > Key: IMPALA-2579 > URL: https://issues.apache.org/jira/browse/IMPALA-2579 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.2.4 >Reporter: Eric Lin >Priority: Minor > > It would be nice to have a feature to limit the resultset to be returned back > to client side, either impala-shell or tableau or others. > This can be achieved by setting a default limit on the following: > 1) number of rows > 2) data size > This is particularly useful when dealing with tables that have millions or > billions of records (which is in most cases), and users usually forget to > manually LIMIT the result and it will either crash the client software or > have to end up manually killing the query. > Thanks -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2579) Limiting the number of records to be fetched/returned by default
[ https://issues.apache.org/jira/browse/IMPALA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930989#comment-16930989 ] Tim Armstrong commented on IMPALA-2579: --- IMPALA-8096 added the option for a limit. I don't think we want to have an implicit limit - clients should do that if they want it, but we're not really in the business of truncating result sets. > Limiting the number of records to be fetched/returned by default > > > Key: IMPALA-2579 > URL: https://issues.apache.org/jira/browse/IMPALA-2579 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.2.4 >Reporter: Eric Lin >Priority: Minor > > It would be nice to have a feature to limit the resultset to be returned back > to client side, either impala-shell or tableau or others. > This can be achieved by setting a default limit on the following: > 1) number of rows > 2) data size > This is particularly useful when dealing with tables that have millions or > billions of records (which is in most cases), and users usually forget to > manually LIMIT the result and it will either crash the client software or > have to end up manually killing the query. > Thanks -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2579) Limiting the number of records to be fetched/returned by default
[ https://issues.apache.org/jira/browse/IMPALA-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2579. --- Resolution: Won't Fix > Limiting the number of records to be fetched/returned by default > > > Key: IMPALA-2579 > URL: https://issues.apache.org/jira/browse/IMPALA-2579 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.2.4 >Reporter: Eric Lin >Priority: Minor > > It would be nice to have a feature to limit the resultset to be returned back > to client side, either impala-shell or tableau or others. > This can be achieved by setting a default limit on the following: > 1) number of rows > 2) data size > This is particularly useful when dealing with tables that have millions or > billions of records (which is in most cases), and users usually forget to > manually LIMIT the result and it will either crash the client software or > have to end up manually killing the query. > Thanks -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (IMPALA-2312) Timing bug in both MonotonicStopWatch and StopWatch
[ https://issues.apache.org/jira/browse/IMPALA-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-2312: - Assignee: Tim Armstrong (was: Henry Robinson) > Timing bug in both MonotonicStopWatch and StopWatch > --- > > Key: IMPALA-2312 > URL: https://issues.apache.org/jira/browse/IMPALA-2312 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.2.4 >Reporter: Henry Robinson >Assignee: Tim Armstrong >Priority: Minor > > Both {{MonotonicStopWatch}} and {{StopWatch}} underestimate the total time if > the stopwatch is running while {{ElapsedTime()}} is called. For example: > {code} > uint64_t ElapsedTime() const { > if (!running_) return total_time_; > timespec end; > clock_gettime(CLOCK_MONOTONIC, ); > // Should include total_time_, but does not > return (end.tv_sec - start_.tv_sec) * 1000L * 1000L * 1000L + > (end.tv_nsec - start_.tv_nsec); > } > {code} > The effect is that we could have: > {code} > MonotonicStopWatch sw; > sw.Start(); > sw.Stop(); > uint64_t total = sw.ElapsedTime(); > sw.Start(); > // With the bug, this could fail. > ASSERT_GE(sw.ElapsedTime(), total); > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
[ https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8947. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric > - > > Key: IMPALA-8947 > URL: https://issues.apache.org/jira/browse/IMPALA-8947 > Project: IMPALA > Issue Type: Bug >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > Labels: supportability > Fix For: Impala 3.4.0 > > > {noformat} > ERROR: Could not create files in any configured scratch directories > (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of > scratch is currently in use by this Impala Daemon (69.80 GB by this query). > See logs for previous errors that may have prevented creating or writing > scratch files. The following directories were at capacity: /path/to/scratch > {noformat} > This issue is that the total for the impala daemon uses the wrong counter. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
[ https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8947. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric > - > > Key: IMPALA-8947 > URL: https://issues.apache.org/jira/browse/IMPALA-8947 > Project: IMPALA > Issue Type: Bug >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > Labels: supportability > Fix For: Impala 3.4.0 > > > {noformat} > ERROR: Could not create files in any configured scratch directories > (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of > scratch is currently in use by this Impala Daemon (69.80 GB by this query). > See logs for previous errors that may have prevented creating or writing > scratch files. The following directories were at capacity: /path/to/scratch > {noformat} > This issue is that the total for the impala daemon uses the wrong counter. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2391) Add Hive to the performance framework.
[ https://issues.apache.org/jira/browse/IMPALA-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2391. --- Resolution: Later > Add Hive to the performance framework. > -- > > Key: IMPALA-2391 > URL: https://issues.apache.org/jira/browse/IMPALA-2391 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 2.3.0 >Reporter: Ishaan Joshi >Priority: Minor > Labels: test-infra > > Currently, the performance framework Impala does not support Hive. With the > on-going work to add impyla (and therefore hs2) as an interface to run > queries, we should support running Hive queries. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2391) Add Hive to the performance framework.
[ https://issues.apache.org/jira/browse/IMPALA-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2391. --- Resolution: Later > Add Hive to the performance framework. > -- > > Key: IMPALA-2391 > URL: https://issues.apache.org/jira/browse/IMPALA-2391 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 2.3.0 >Reporter: Ishaan Joshi >Priority: Minor > Labels: test-infra > > Currently, the performance framework Impala does not support Hive. With the > on-going work to add impyla (and therefore hs2) as an interface to run > queries, we should support running Hive queries. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2361) Using AVX intrinsic to accelerate the sort operation
[ https://issues.apache.org/jira/browse/IMPALA-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2361. --- Resolution: Later > Using AVX intrinsic to accelerate the sort operation > > > Key: IMPALA-2361 > URL: https://issues.apache.org/jira/browse/IMPALA-2361 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.2.4 >Reporter: Youwei Wang >Priority: Minor > Labels: performance > > Using AVX intrinsic to accelerate the sort operation -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2214) Add some parquet-related testing
[ https://issues.apache.org/jira/browse/IMPALA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2214. --- Resolution: Later > Add some parquet-related testing > > > Key: IMPALA-2214 > URL: https://issues.apache.org/jira/browse/IMPALA-2214 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 2.2 >Reporter: Ippokratis Pandis >Priority: Minor > Labels: test, test-infra > > We need to add testing for: > (A) Reading a file that it is shorter than what we think, because of stale > metadata > (B) Reading a file that it is longer than what we think, because of stale > metadata (IMPALA-2213) > (C) Have a very low --read_size (IMPALA-1291) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2214) Add some parquet-related testing
[ https://issues.apache.org/jira/browse/IMPALA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2214. --- Resolution: Later > Add some parquet-related testing > > > Key: IMPALA-2214 > URL: https://issues.apache.org/jira/browse/IMPALA-2214 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 2.2 >Reporter: Ippokratis Pandis >Priority: Minor > Labels: test, test-infra > > We need to add testing for: > (A) Reading a file that it is shorter than what we think, because of stale > metadata > (B) Reading a file that it is longer than what we think, because of stale > metadata (IMPALA-2213) > (C) Have a very low --read_size (IMPALA-1291) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2361) Using AVX intrinsic to accelerate the sort operation
[ https://issues.apache.org/jira/browse/IMPALA-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2361. --- Resolution: Later > Using AVX intrinsic to accelerate the sort operation > > > Key: IMPALA-2361 > URL: https://issues.apache.org/jira/browse/IMPALA-2361 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.2.4 >Reporter: Youwei Wang >Priority: Minor > Labels: performance > > Using AVX intrinsic to accelerate the sort operation -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2311) Clean up duplicated deep copy code
[ https://issues.apache.org/jira/browse/IMPALA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2311. --- Resolution: Won't Fix > Clean up duplicated deep copy code > -- > > Key: IMPALA-2311 > URL: https://issues.apache.org/jira/browse/IMPALA-2311 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Tim Armstrong >Priority: Minor > > There are multiple implementations of deep copying in the codebase that > duplicate much of the same logic - in RowBatch (Serialize), Tuple (2x) and > BufferedTupleStream. There are some differences in how they read and write > data, but the main difference is how they allocate memory - it seems like > this could be factored out in some way so that that core deep copy logic can > be implemented only once (probably as a templated function). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2311) Clean up duplicated deep copy code
[ https://issues.apache.org/jira/browse/IMPALA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2311. --- Resolution: Won't Fix > Clean up duplicated deep copy code > -- > > Key: IMPALA-2311 > URL: https://issues.apache.org/jira/browse/IMPALA-2311 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Tim Armstrong >Priority: Minor > > There are multiple implementations of deep copying in the codebase that > duplicate much of the same logic - in RowBatch (Serialize), Tuple (2x) and > BufferedTupleStream. There are some differences in how they read and write > data, but the main difference is how they allocate memory - it seems like > this could be factored out in some way so that that core deep copy logic can > be implemented only once (probably as a templated function). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2119) Clean up destructors and the fragment/query Close() path
[ https://issues.apache.org/jira/browse/IMPALA-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2119. --- Resolution: Fixed I think this cleanup has largely been completed. > Clean up destructors and the fragment/query Close() path > > > Key: IMPALA-2119 > URL: https://issues.apache.org/jira/browse/IMPALA-2119 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.2 >Reporter: Matthew Jacobs >Assignee: Marcel Kornacker >Priority: Minor > Labels: query-lifecycle > > We need to make sure that there is a sane query and fragment Close() path on > the query and fragments, i.e. we shouldn't be relying on destructors, > shared/scoped_ptrs. The worst offenders are the coordinator and fragment > management classes, e.g. PlanFragmentExecutor, FragmentExecState, > QueryExecState, Coordinator. At the same time, we should make sure that all > ExecNodes, sinks, and other backend classes used for query execution have all > cleanup logic in Close() methods. In some cases, Close() methods will need to > be added. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2027) Create separate impala-shell packages
[ https://issues.apache.org/jira/browse/IMPALA-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2027. --- Resolution: Duplicate > Create separate impala-shell packages > - > > Key: IMPALA-2027 > URL: https://issues.apache.org/jira/browse/IMPALA-2027 > Project: IMPALA > Issue Type: New Feature > Components: Clients >Affects Versions: Impala 2.1.1, Impala 2.2 >Reporter: Jeff Hammerbacher >Priority: Minor > Labels: shell, usability > > It would be wonderful if a separate {{impala-shell}} package were made > available so that users could install the Impala shell on their laptops using > their favorite package manager (e.g. Homebrew on Mac). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-2003) Implement "beeswax_field_delimiter" query option to choose different delimiters for beeswax instead of default "\t"
[ https://issues.apache.org/jira/browse/IMPALA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2003. --- Resolution: Won't Fix Impala shell now has HS2 support - IMPALA-7290, which provides a better solution for this delimiter problem > Implement "beeswax_field_delimiter" query option to choose different > delimiters for beeswax instead of default "\t" > --- > > Key: IMPALA-2003 > URL: https://issues.apache.org/jira/browse/IMPALA-2003 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 2.2 >Reporter: Mala Chikka Kempanna >Priority: Minor > Labels: newbie, sql-language > > Please implement "beeswax_field_delimiter" query option to choose different > delimiters for beeswax instead of default "\t" -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2003) Implement "beeswax_field_delimiter" query option to choose different delimiters for beeswax instead of default "\t"
[ https://issues.apache.org/jira/browse/IMPALA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2003. --- Resolution: Won't Fix Impala shell now has HS2 support - IMPALA-7290, which provides a better solution for this delimiter problem > Implement "beeswax_field_delimiter" query option to choose different > delimiters for beeswax instead of default "\t" > --- > > Key: IMPALA-2003 > URL: https://issues.apache.org/jira/browse/IMPALA-2003 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 2.2 >Reporter: Mala Chikka Kempanna >Priority: Minor > Labels: newbie, sql-language > > Please implement "beeswax_field_delimiter" query option to choose different > delimiters for beeswax instead of default "\t" -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1678) Reconsider use of memory buffer in TSaslTransport
[ https://issues.apache.org/jira/browse/IMPALA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1678. --- Resolution: Later Probably less relevant after the KRPC work > Reconsider use of memory buffer in TSaslTransport > - > > Key: IMPALA-1678 > URL: https://issues.apache.org/jira/browse/IMPALA-1678 > Project: IMPALA > Issue Type: New Feature > Components: Perf Investigation >Affects Versions: Impala 2.1 >Reporter: Henry Robinson >Priority: Minor > Labels: performance > > {{TSaslTransport}} uses a {{TMemoryBuffer}} to stage bytes that have not been > read. {{TMemoryBuffer}} doubles its capacity when it needs to expand. This > might not be the best policy - we should consider using a {{deque}} or some > other efficient queue implementation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1678) Reconsider use of memory buffer in TSaslTransport
[ https://issues.apache.org/jira/browse/IMPALA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1678. --- Resolution: Later Probably less relevant after the KRPC work > Reconsider use of memory buffer in TSaslTransport > - > > Key: IMPALA-1678 > URL: https://issues.apache.org/jira/browse/IMPALA-1678 > Project: IMPALA > Issue Type: New Feature > Components: Perf Investigation >Affects Versions: Impala 2.1 >Reporter: Henry Robinson >Priority: Minor > Labels: performance > > {{TSaslTransport}} uses a {{TMemoryBuffer}} to stage bytes that have not been > read. {{TMemoryBuffer}} doubles its capacity when it needs to expand. This > might not be the best policy - we should consider using a {{deque}} or some > other efficient queue implementation. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1581) Shell should fetch results in a separate thread
[ https://issues.apache.org/jira/browse/IMPALA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1581. --- Resolution: Later > Shell should fetch results in a separate thread > --- > > Key: IMPALA-1581 > URL: https://issues.apache.org/jira/browse/IMPALA-1581 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 2.0.1 >Reporter: casey >Priority: Minor > Labels: shell > > For queries with large result sets, the client can add significant time to > the overall execution time as seen by the end user. For example see > IMPALA-1580. Fetching results in a separate thread should significantly > reduce the end user wait time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1581) Shell should fetch results in a separate thread
[ https://issues.apache.org/jira/browse/IMPALA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1581. --- Resolution: Later > Shell should fetch results in a separate thread > --- > > Key: IMPALA-1581 > URL: https://issues.apache.org/jira/browse/IMPALA-1581 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 2.0.1 >Reporter: casey >Priority: Minor > Labels: shell > > For queries with large result sets, the client can add significant time to > the overall execution time as seen by the end user. For example see > IMPALA-1580. Fetching results in a separate thread should significantly > reduce the end user wait time. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1580) Optimize conversion of row batch to query result set
[ https://issues.apache.org/jira/browse/IMPALA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1580. --- Resolution: Duplicate > Optimize conversion of row batch to query result set > > > Key: IMPALA-1580 > URL: https://issues.apache.org/jira/browse/IMPALA-1580 > Project: IMPALA > Issue Type: Improvement > Components: Perf Investigation >Affects Versions: Impala 2.0.1 >Reporter: casey >Priority: Minor > Labels: performance, ramp-up > Attachments: select_lineitem.profile > > > For simple queries that produce a large result set such as "select * from > tpch.lineitem" the server execution time is limited by the time required to > convert row batches (results in the internal structure) to query results (the > structure to be sent to the client). The data conversion is the limiting > factor in this case because the query plan execution happens in parallel. > Here are some data points from the profile of "select * from tpch.lineitem" > using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so > the exchange node would never block because of a full buffer.). Beeswax takes > even longer to convert the rows. > * Query Timeline: 1m9s > * Execution Profile -- Total: 1s295ms > * ClientFetchWaitTimer: 52s553ms > * RowMaterializationTimer: 15s216ms > * Coordinator Fragment F01:(Total: 1s092ms > * Averaged Fragment F00:(Total: 5s608ms > So the "RowMaterializationTimer", which is actually conversion time, adds ~9 > seconds or ~2x the plan execution time to the overall time. > Ideally the conversion time would be codegen'd but even without that there > should be a lot of room for improvement by reducing function calls. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1580) Optimize conversion of row batch to query result set
[ https://issues.apache.org/jira/browse/IMPALA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1580. --- Resolution: Duplicate > Optimize conversion of row batch to query result set > > > Key: IMPALA-1580 > URL: https://issues.apache.org/jira/browse/IMPALA-1580 > Project: IMPALA > Issue Type: Improvement > Components: Perf Investigation >Affects Versions: Impala 2.0.1 >Reporter: casey >Priority: Minor > Labels: performance, ramp-up > Attachments: select_lineitem.profile > > > For simple queries that produce a large result set such as "select * from > tpch.lineitem" the server execution time is limited by the time required to > convert row batches (results in the internal structure) to query results (the > structure to be sent to the client). The data conversion is the limiting > factor in this case because the query plan execution happens in parallel. > Here are some data points from the profile of "select * from tpch.lineitem" > using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so > the exchange node would never block because of a full buffer.). Beeswax takes > even longer to convert the rows. > * Query Timeline: 1m9s > * Execution Profile -- Total: 1s295ms > * ClientFetchWaitTimer: 52s553ms > * RowMaterializationTimer: 15s216ms > * Coordinator Fragment F01:(Total: 1s092ms > * Averaged Fragment F00:(Total: 5s608ms > So the "RowMaterializationTimer", which is actually conversion time, adds ~9 > seconds or ~2x the plan execution time to the overall time. > Ideally the conversion time would be codegen'd but even without that there > should be a lot of room for improvement by reducing function calls. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1463) Improve HDFS Caching DDL performance for partitioned table
[ https://issues.apache.org/jira/browse/IMPALA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1463. --- Resolution: Later Don't think there's much interest in this one. > Improve HDFS Caching DDL performance for partitioned table > -- > > Key: IMPALA-1463 > URL: https://issues.apache.org/jira/browse/IMPALA-1463 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 1.4.1, Impala 2.0 >Reporter: Alan Choi >Priority: Minor > Labels: performance > > Enabling HDFS caching for partitioned table requires two steps. For each > partition, it'll have to set a directive to HDFS. Then, it'll issue an "alter > table" to HiveMetaStore. This is done in a single-threaded loop. > Issuing the "atler table" to HiveMetaStore is very slow. In my experiment, > each call is ~1sec. For a 4k partition table, it took 4k seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1504) Allow non-delimited-text default file formats
[ https://issues.apache.org/jira/browse/IMPALA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1504. --- Resolution: Duplicate > Allow non-delimited-text default file formats > - > > Key: IMPALA-1504 > URL: https://issues.apache.org/jira/browse/IMPALA-1504 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.0 >Reporter: Jeremy Beard >Priority: Minor > Labels: incompatibility, usability > > It would be helpful if tables could be created in a desired file format > (...Parquet) without specifying STORED AS X each time. This is especially > true for data analysts who are often repeatedly creating and dropping tables > in their work. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1463) Improve HDFS Caching DDL performance for partitioned table
[ https://issues.apache.org/jira/browse/IMPALA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1463. --- Resolution: Later Don't think there's much interest in this one. > Improve HDFS Caching DDL performance for partitioned table > -- > > Key: IMPALA-1463 > URL: https://issues.apache.org/jira/browse/IMPALA-1463 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 1.4.1, Impala 2.0 >Reporter: Alan Choi >Priority: Minor > Labels: performance > > Enabling HDFS caching for partitioned table requires two steps. For each > partition, it'll have to set a directive to HDFS. Then, it'll issue an "alter > table" to HiveMetaStore. This is done in a single-threaded loop. > Issuing the "atler table" to HiveMetaStore is very slow. In my experiment, > each call is ~1sec. For a 4k partition table, it took 4k seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1504) Allow non-delimited-text default file formats
[ https://issues.apache.org/jira/browse/IMPALA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1504. --- Resolution: Duplicate > Allow non-delimited-text default file formats > - > > Key: IMPALA-1504 > URL: https://issues.apache.org/jira/browse/IMPALA-1504 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.0 >Reporter: Jeremy Beard >Priority: Minor > Labels: incompatibility, usability > > It would be helpful if tables could be created in a desired file format > (...Parquet) without specifying STORED AS X each time. This is especially > true for data analysts who are often repeatedly creating and dropping tables > in their work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1265) Allow embedding query options as hints.
[ https://issues.apache.org/jira/browse/IMPALA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1265. --- Resolution: Won't Fix > Allow embedding query options as hints. > --- > > Key: IMPALA-1265 > URL: https://issues.apache.org/jira/browse/IMPALA-1265 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.0 >Reporter: Alexander Behm >Priority: Minor > Labels: planner, usability > > With SET we can change query options on a per-session basis, but it would be > nice to be able to set query options like hints (and by extension we could > create views with specific query options). Something like this: > select /* +mem_limit=10g */ int_col ... -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1265) Allow embedding query options as hints.
[ https://issues.apache.org/jira/browse/IMPALA-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1265. --- Resolution: Won't Fix > Allow embedding query options as hints. > --- > > Key: IMPALA-1265 > URL: https://issues.apache.org/jira/browse/IMPALA-1265 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.0 >Reporter: Alexander Behm >Priority: Minor > Labels: planner, usability > > With SET we can change query options on a per-session basis, but it would be > nice to be able to set query options like hints (and by extension we could > create views with specific query options). Something like this: > select /* +mem_limit=10g */ int_col ... -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (IMPALA-1171) create_testdata.sh called from both buildall.sh and load-test-warehouse-snapshot.sh
[ https://issues.apache.org/jira/browse/IMPALA-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930946#comment-16930946 ] Tim Armstrong commented on IMPALA-1171: --- [~joemcdonnell] maybe not a bug, but I'd imagine you can triage this > create_testdata.sh called from both buildall.sh and > load-test-warehouse-snapshot.sh > --- > > Key: IMPALA-1171 > URL: https://issues.apache.org/jira/browse/IMPALA-1171 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 2.0 >Reporter: Dan Hecht >Assignee: Joe McDonnell >Priority: Minor > > buildall.sh calls bin/create_testdata.sh itself. Then, if loading from a > snapshot, it calls: > testdata/bin/create-load-data.sh which calls: > testdata/bin/load-test-warehouse-snapshot.sh which calls: > bin/create_testdata.sh again. > There's no functional bug, but this may be an opportunity to reduce load time > and might indicate that the scripts could use some tidying up. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-1173) create-load-data.sh shouldn't try to do load-data.py --force when loading from a snapshot
[ https://issues.apache.org/jira/browse/IMPALA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-1173: - Assignee: Joe McDonnell > create-load-data.sh shouldn't try to do load-data.py --force when loading > from a snapshot > - > > Key: IMPALA-1173 > URL: https://issues.apache.org/jira/browse/IMPALA-1173 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.0 >Reporter: Dan Hecht >Assignee: Joe McDonnell >Priority: Minor > > testdata/bin/create-load-data.sh first loads a snapshot. Afterwards, it > checks to make sure the loaded schema matches that in git. If it doesn't > match, it forces a reload through load-data.py. > If the user supplied a snapshot file, then I think it would be better to fail > when the schema mismatch is detected rather than falling back to the > load_data.py --force path. It seems more likely that the user would prefer > to download an updated snapshot to resolve the situation. > This has burned me a couple of times now when I've downloaded snapshots in > the window between the schema update and when the new snapshot is ready. > Surprising (to me at least), the scripts went down the load_data.py --force > path, which led to another problem (which Lenni as since fixed). But it would > have been better if the script just told me that my snapshot is out of date > to begin with. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-1171) create_testdata.sh called from both buildall.sh and load-test-warehouse-snapshot.sh
[ https://issues.apache.org/jira/browse/IMPALA-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-1171: - Assignee: Joe McDonnell > create_testdata.sh called from both buildall.sh and > load-test-warehouse-snapshot.sh > --- > > Key: IMPALA-1171 > URL: https://issues.apache.org/jira/browse/IMPALA-1171 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 2.0 >Reporter: Dan Hecht >Assignee: Joe McDonnell >Priority: Minor > > buildall.sh calls bin/create_testdata.sh itself. Then, if loading from a > snapshot, it calls: > testdata/bin/create-load-data.sh which calls: > testdata/bin/load-test-warehouse-snapshot.sh which calls: > bin/create_testdata.sh again. > There's no functional bug, but this may be an opportunity to reduce load time > and might indicate that the scripts could use some tidying up. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-1108) Impala should check the number of opened files/partition during insert
[ https://issues.apache.org/jira/browse/IMPALA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930944#comment-16930944 ] Tim Armstrong commented on IMPALA-1108: --- I think between the clustered inserts and IMPALA-8125 this would be covered. > Impala should check the number of opened files/partition during insert > -- > > Key: IMPALA-1108 > URL: https://issues.apache.org/jira/browse/IMPALA-1108 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.4 >Reporter: Alan Choi >Priority: Minor > Labels: ramp-up > > For insert, when Impala is inserting into a huge number of partition, Impala > might be opening too many files. HDFS will return an error, but the error is > incomprehensible as "Error(12): Cannot allocate memory". > We can do better to improve the error message. Here are two suggestions: > 1. During planning, if there's stats, we know how many partitions are being > inserted per Impalad. Based on that, we can determine if we'll be opening too > many files. Either return an error or a warning message. > 2. During query execution, keep track of the number of files opened for read > and write. If we're opening too many files for write, abort the query and > returns a proper error message. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1076) Add a shell option to limit the maximum number of rows that are pretty printed
[ https://issues.apache.org/jira/browse/IMPALA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1076. --- Resolution: Later > Add a shell option to limit the maximum number of rows that are pretty printed > -- > > Key: IMPALA-1076 > URL: https://issues.apache.org/jira/browse/IMPALA-1076 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 1.3.1 >Reporter: Nong Li >Priority: Minor > Labels: impala-shell > > I think people like to run a query that returns a large number of rows and > redirect them as a simple benchmark. This results in a high amount of time > spent in pretty print. > We should add an option "max_pretty_printed_rows" or something and when we > hit that value, the shell should disable pretty printing for that query. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-3929) multithreaded text encoding
[ https://issues.apache.org/jira/browse/IMPALA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3929. --- Resolution: Won't Fix IMPALA-3902 is the threading model we should be going towards and will help parallelise inserts. > multithreaded text encoding > --- > > Key: IMPALA-3929 > URL: https://issues.apache.org/jira/browse/IMPALA-3929 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Marcell Szabo >Priority: Minor > > Often the bottleneck of the INSERT statement is the serialisation to text, > measured by EncodeTimer. > Could we implement a producer-consumer model to allow serialisation happen in > multiple threads even if we write one file? > Thank you -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-3929) multithreaded text encoding
[ https://issues.apache.org/jira/browse/IMPALA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3929. --- Resolution: Won't Fix IMPALA-3902 is the threading model we should be going towards and will help parallelise inserts. > multithreaded text encoding > --- > > Key: IMPALA-3929 > URL: https://issues.apache.org/jira/browse/IMPALA-3929 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Marcell Szabo >Priority: Minor > > Often the bottleneck of the INSERT statement is the serialisation to text, > measured by EncodeTimer. > Could we implement a producer-consumer model to allow serialisation happen in > multiple threads even if we write one file? > Thank you -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1076) Add a shell option to limit the maximum number of rows that are pretty printed
[ https://issues.apache.org/jira/browse/IMPALA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1076. --- Resolution: Later > Add a shell option to limit the maximum number of rows that are pretty printed > -- > > Key: IMPALA-1076 > URL: https://issues.apache.org/jira/browse/IMPALA-1076 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 1.3.1 >Reporter: Nong Li >Priority: Minor > Labels: impala-shell > > I think people like to run a query that returns a large number of rows and > redirect them as a simple benchmark. This results in a high amount of time > spent in pretty print. > We should add an option "max_pretty_printed_rows" or something and when we > hit that value, the shell should disable pretty printing for that query. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1034) Verify RSS stays within mem_limit + JVM heapsize
[ https://issues.apache.org/jira/browse/IMPALA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1034. --- Resolution: Won't Do When stress testing this is generally true. I don't think there's a particularly robust way to test this automatically. > Verify RSS stays within mem_limit + JVM heapsize > > > Key: IMPALA-1034 > URL: https://issues.apache.org/jira/browse/IMPALA-1034 > Project: IMPALA > Issue Type: Test > Components: Backend >Affects Versions: Impala 1.3.1 >Reporter: Alan Choi >Priority: Major > Labels: resource-management > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1034) Verify RSS stays within mem_limit + JVM heapsize
[ https://issues.apache.org/jira/browse/IMPALA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1034. --- Resolution: Won't Do When stress testing this is generally true. I don't think there's a particularly robust way to test this automatically. > Verify RSS stays within mem_limit + JVM heapsize > > > Key: IMPALA-1034 > URL: https://issues.apache.org/jira/browse/IMPALA-1034 > Project: IMPALA > Issue Type: Test > Components: Backend >Affects Versions: Impala 1.3.1 >Reporter: Alan Choi >Priority: Major > Labels: resource-management > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account
[ https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-988: - Description: The amount of available memory changes the trade-off between partitioned and shuffle join strategies: if switching to shuffle join can avoid spilling to disk, it may be worth paying the cost of the additional network transfer. There are two issues: 1. Join strategy decision only takes query mem-limit into account but ignore process mem-limit. 2. Join strategy decision does not take other joins of the same query into account. When multiple joins are present, memory consumption can be very high. I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 - there's a phase ordering problem here - we currently choose the best-performing plan then decide how much memory to allocate in admission control based on that plan. We can't preserve that while attempting to change the plan to fit the mem_limit. That said, I think the current heuristic is a little too aggressive about picking broadcast when the right side is very large - it should probably bias more towards shuffle as the right side gets larger. Note that when IMPALA-3200 is completed, this shouldn't prevent the query running to completion, but still affects performance. was: The amount of available memory changes the trade-off between partitioned and shuffle join strategies: if switching to shuffle join can avoid spilling to disk, it may be worth paying the cost of the additional network transfer. There are two issues: 1. Join strategy decision only takes query mem-limit into account but ignore process mem-limit. 2. Join strategy decision does not take other joins of the same query into account. When multiple joins are present, it'll go over the mem-limit. Note that when IMPALA-3200 is completed, this shouldn't prevent the query running to completion, but still affects performance. > Join strategy (broadcast vs shuffle) decision does not take memory > consumption and other joins into account > --- > > Key: IMPALA-988 > URL: https://issues.apache.org/jira/browse/IMPALA-988 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.2.1 >Reporter: Alan Choi >Priority: Minor > Labels: resource-management > > The amount of available memory changes the trade-off between partitioned and > shuffle join strategies: if switching to shuffle join can avoid spilling to > disk, it may be worth paying the cost of the additional network transfer. > There are two issues: > 1. Join strategy decision only takes query mem-limit into account but ignore > process mem-limit. > 2. Join strategy decision does not take other joins of the same query into > account. When multiple joins are present, memory consumption can be very high. > I ([~tarmstr...@cloudera.com]) don't think we should attempt to fix #1 - > there's a phase ordering problem here - we currently choose the > best-performing plan then decide how much memory to allocate in admission > control based on that plan. We can't preserve that while attempting to change > the plan to fit the mem_limit. That said, I think the current heuristic is a > little too aggressive about picking broadcast when the right side is very > large - it should probably bias more towards shuffle as the right side gets > larger. > Note that when IMPALA-3200 is completed, this shouldn't prevent the query > running to completion, but still affects performance. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account
[ https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-988: - Summary: Join strategy (broadcast vs shuffle) decision does not take memory consumption and other joins into account (was: Join strategy (broadcast vs shuffle) decision does not take mem limit and other joins into account) > Join strategy (broadcast vs shuffle) decision does not take memory > consumption and other joins into account > --- > > Key: IMPALA-988 > URL: https://issues.apache.org/jira/browse/IMPALA-988 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.2.1 >Reporter: Alan Choi >Priority: Minor > Labels: resource-management > > The amount of available memory changes the trade-off between partitioned and > shuffle join strategies: if switching to shuffle join can avoid spilling to > disk, it may be worth paying the cost of the additional network transfer. > There are two issues: > 1. Join strategy decision only takes query mem-limit into account but ignore > process mem-limit. > 2. Join strategy decision does not take other joins of the same query into > account. When multiple joins are present, it'll go over the mem-limit. > Note that when IMPALA-3200 is completed, this shouldn't prevent the query > running to completion, but still affects performance. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-975) Improve RunShellProcess() virtual memory behaviour
[ https://issues.apache.org/jira/browse/IMPALA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-975. -- Resolution: Won't Fix Not so much of an issue now that we don't fork after startup - see IMPALA-5734 > Improve RunShellProcess() virtual memory behaviour > -- > > Key: IMPALA-975 > URL: https://issues.apache.org/jira/browse/IMPALA-975 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 1.3.1 >Reporter: Henry Robinson >Priority: Minor > > Impala uses {{popen()}} to create a new process to call {{kinit}}. This may > not be wise (i.e. may return {{ENOMEM}}) when there's a lot of virtual memory > used by Impala. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-5734) Don't call fork() when the process VM size may be very large.
[ https://issues.apache.org/jira/browse/IMPALA-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-5734. --- Resolution: Fixed > Don't call fork() when the process VM size may be very large. > - > > Key: IMPALA-5734 > URL: https://issues.apache.org/jira/browse/IMPALA-5734 > Project: IMPALA > Issue Type: Epic >Reporter: Tim Armstrong >Priority: Major > > We should avoid doing this because it can lead to OOMs with certain > vm.overcommit settings. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-5734) Don't call fork() when the process VM size may be very large.
[ https://issues.apache.org/jira/browse/IMPALA-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-5734. --- Resolution: Fixed > Don't call fork() when the process VM size may be very large. > - > > Key: IMPALA-5734 > URL: https://issues.apache.org/jira/browse/IMPALA-5734 > Project: IMPALA > Issue Type: Epic >Reporter: Tim Armstrong >Priority: Major > > We should avoid doing this because it can lead to OOMs with certain > vm.overcommit settings. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-975) Improve RunShellProcess() virtual memory behaviour
[ https://issues.apache.org/jira/browse/IMPALA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-975. -- Resolution: Won't Fix Not so much of an issue now that we don't fork after startup - see IMPALA-5734 > Improve RunShellProcess() virtual memory behaviour > -- > > Key: IMPALA-975 > URL: https://issues.apache.org/jira/browse/IMPALA-975 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 1.3.1 >Reporter: Henry Robinson >Priority: Minor > > Impala uses {{popen()}} to create a new process to call {{kinit}}. This may > not be wise (i.e. may return {{ENOMEM}}) when there's a lot of virtual memory > used by Impala. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-967) Investigate wide table performance
[ https://issues.apache.org/jira/browse/IMPALA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-967. -- Resolution: Won't Fix This is kinda open ended. We fixed a bunch of perf issues in this area so I'll close for now. > Investigate wide table performance > -- > > Key: IMPALA-967 > URL: https://issues.apache.org/jira/browse/IMPALA-967 > Project: IMPALA > Issue Type: Improvement > Components: Perf Investigation >Affects Versions: Impala 1.3 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: codegen > > Querying wide tables (very roughly 1000+ columns) is very slow. It looks like > the time is spent in planning and/or codegen, and that the time increases > worse than linearly with the number of columns. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-967) Investigate wide table performance
[ https://issues.apache.org/jira/browse/IMPALA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-967. -- Resolution: Won't Fix This is kinda open ended. We fixed a bunch of perf issues in this area so I'll close for now. > Investigate wide table performance > -- > > Key: IMPALA-967 > URL: https://issues.apache.org/jira/browse/IMPALA-967 > Project: IMPALA > Issue Type: Improvement > Components: Perf Investigation >Affects Versions: Impala 1.3 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: codegen > > Querying wide tables (very roughly 1000+ columns) is very slow. It looks like > the time is spent in planning and/or codegen, and that the time increases > worse than linearly with the number of columns. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-714) Improve error messages for Analysis Exceptions
[ https://issues.apache.org/jira/browse/IMPALA-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-714. -- Resolution: Won't Fix > Improve error messages for Analysis Exceptions > -- > > Key: IMPALA-714 > URL: https://issues.apache.org/jira/browse/IMPALA-714 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.2.1 >Reporter: Udai Kiran Potluri >Priority: Minor > > for eg: the below exception does not make it easy to understand the root > cause of why it failed to load metadata. > {code} > ERROR: AnalysisException: Failed to load metadata for table: default.abc > CAUSED BY: TableLoadingException: Failed to load metadata for table: abc > CAUSED BY: InvalidStorageDescriptorException: Unsupported SerDe: > org.apache.hadoop.hive.contrib.serde2.RegexSerDe > {code} > It would be nice to bubble up the root cause in this case "Unsupported SerDe: > org.apache.hadoop.hive.contrib.serde2.RegexSerDe" to the top on the shell. > While still maintain the whole trace in the logs. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-725) "errors" printed to the shell are sometimes warnings
[ https://issues.apache.org/jira/browse/IMPALA-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-725. -- Resolution: Fixed Fixed by IMPALA-5474 > "errors" printed to the shell are sometimes warnings > > > Key: IMPALA-725 > URL: https://issues.apache.org/jira/browse/IMPALA-725 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 1.2.3 >Reporter: Skye Wanderman-Milne >Priority: Minor > > We print runtime errors to the shell, prepended with "ERRORS ENCOUNTERED > DURING EXECUTION". This is potentially confusing when the message is actually > a warning (e.g. "Parquet file should not be split into multiple > hdfs-blocks"). It might be useful to separately log errors and warnings, or > at least to change the warning messages to indicate that the query still ran > successfully. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-725) "errors" printed to the shell are sometimes warnings
[ https://issues.apache.org/jira/browse/IMPALA-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-725. -- Resolution: Fixed Fixed by IMPALA-5474 > "errors" printed to the shell are sometimes warnings > > > Key: IMPALA-725 > URL: https://issues.apache.org/jira/browse/IMPALA-725 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 1.2.3 >Reporter: Skye Wanderman-Milne >Priority: Minor > > We print runtime errors to the shell, prepended with "ERRORS ENCOUNTERED > DURING EXECUTION". This is potentially confusing when the message is actually > a warning (e.g. "Parquet file should not be split into multiple > hdfs-blocks"). It might be useful to separately log errors and warnings, or > at least to change the warning messages to indicate that the query still ran > successfully. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-714) Improve error messages for Analysis Exceptions
[ https://issues.apache.org/jira/browse/IMPALA-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-714. -- Resolution: Won't Fix > Improve error messages for Analysis Exceptions > -- > > Key: IMPALA-714 > URL: https://issues.apache.org/jira/browse/IMPALA-714 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.2.1 >Reporter: Udai Kiran Potluri >Priority: Minor > > for eg: the below exception does not make it easy to understand the root > cause of why it failed to load metadata. > {code} > ERROR: AnalysisException: Failed to load metadata for table: default.abc > CAUSED BY: TableLoadingException: Failed to load metadata for table: abc > CAUSED BY: InvalidStorageDescriptorException: Unsupported SerDe: > org.apache.hadoop.hive.contrib.serde2.RegexSerDe > {code} > It would be nice to bubble up the root cause in this case "Unsupported SerDe: > org.apache.hadoop.hive.contrib.serde2.RegexSerDe" to the top on the shell. > While still maintain the whole trace in the logs. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-708) optimize hdfs-table-sink output partition hashing
[ https://issues.apache.org/jira/browse/IMPALA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-708. -- Resolution: Won't Fix See dan's last comment. > optimize hdfs-table-sink output partition hashing > - > > Key: IMPALA-708 > URL: https://issues.apache.org/jira/browse/IMPALA-708 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 1.0, Impala 1.2 >Reporter: Nong Li >Priority: Minor > Labels: poc > > Looking at some basic profiling while doing an unpartitioned insert, it looks > like we have some very low hanging fruit: > 226 16.2% 16.2% 226 16.2% > boost::unordered_detail::hash_table::find_iterator > <-- Need to track down where this is (we need better cluster tools) but > this seems like a big waste of time. > 178 12.8% 29.0% 178 12.8% > impala::HdfsParquetTableWriter::AppendRowBatch > 157 11.3% 40.3% 157 11.3% > impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9ff700 > 131 9.4% 49.7% 131 9.4% __strncmp_sse42 > 129 9.3% 59.0% 133 9.6% impala::TextConverter::WriteSlot > 109 7.8% 66.9% 109 7.8% > impala::DelimitedTextParser::ParseFieldLocations > 94 6.8% 73.6% 94 6.8% snappy::internal::CompressFragment > 71 5.1% 78.7% 71 5.1% > impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fca90 > 56 4.0% 82.7% 56 4.0% impala::HdfsScanner::WriteCompleteTuple > 36 2.6% 85.3% 36 2.6% > impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fd3f0 > 34 2.4% 87.8% 34 2.4% impala::HashUtil::Hash > 34 2.4% 90.2% 34 2.4% > impala::StringParser::StringToIntInternal@801fd0 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-708) optimize hdfs-table-sink output partition hashing
[ https://issues.apache.org/jira/browse/IMPALA-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-708. -- Resolution: Won't Fix See dan's last comment. > optimize hdfs-table-sink output partition hashing > - > > Key: IMPALA-708 > URL: https://issues.apache.org/jira/browse/IMPALA-708 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 1.0, Impala 1.2 >Reporter: Nong Li >Priority: Minor > Labels: poc > > Looking at some basic profiling while doing an unpartitioned insert, it looks > like we have some very low hanging fruit: > 226 16.2% 16.2% 226 16.2% > boost::unordered_detail::hash_table::find_iterator > <-- Need to track down where this is (we need better cluster tools) but > this seems like a big waste of time. > 178 12.8% 29.0% 178 12.8% > impala::HdfsParquetTableWriter::AppendRowBatch > 157 11.3% 40.3% 157 11.3% > impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9ff700 > 131 9.4% 49.7% 131 9.4% __strncmp_sse42 > 129 9.3% 59.0% 133 9.6% impala::TextConverter::WriteSlot > 109 7.8% 66.9% 109 7.8% > impala::DelimitedTextParser::ParseFieldLocations > 94 6.8% 73.6% 94 6.8% snappy::internal::CompressFragment > 71 5.1% 78.7% 71 5.1% > impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fca90 > 56 4.0% 82.7% 56 4.0% impala::HdfsScanner::WriteCompleteTuple > 36 2.6% 85.3% 36 2.6% > impala::HdfsParquetTableWriter::ColumnWriter::EncodeValue@9fd3f0 > 34 2.4% 87.8% 34 2.4% impala::HashUtil::Hash > 34 2.4% 90.2% 34 2.4% > impala::StringParser::StringToIntInternal@801fd0 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-682) Improve statestore network performance with large topics
[ https://issues.apache.org/jira/browse/IMPALA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-682. -- Resolution: Won't Fix With the local catalog improvements, the metadata topic only contains invalidations. So no need to invest in this. > Improve statestore network performance with large topics > > > Key: IMPALA-682 > URL: https://issues.apache.org/jira/browse/IMPALA-682 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Affects Versions: Impala 1.2.2 >Reporter: Henry Robinson >Assignee: Henry Robinson >Priority: Minor > > When the statestore has a large topic to transmit (e.g. over 100MB), a lot of > network bandwidth will be used. This is particularly acute at startup, when > many subscribers are competing for the complete version of a single large > topic. > A lot of the statestore's content is textual, and therefore likely to be very > compressible. We can use Thrift's {{TZlibTransport}} to transparently > compress large topics, but the problem then is that we'll be doing a lot of > redundant work to compress the same topic many times. > Instead, maybe we will have to serialise the Topic's thrift structure to a > byte string (much as we do for topic values), and then compress it in a > single thread. We can do this repeatedly in the background as topics get > updated. There'll be some double-serialisation cost to pay, since presumably > serialising and deserialising Thrift structs in the application involves > another copy, but it should be worth it. > We can also mitigate the startup problem by having subscribers wait for a lot > longer to get their first heartbeat after registration, since the first set > of topic updates is going to be large. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-682) Improve statestore network performance with large topics
[ https://issues.apache.org/jira/browse/IMPALA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-682. -- Resolution: Won't Fix With the local catalog improvements, the metadata topic only contains invalidations. So no need to invest in this. > Improve statestore network performance with large topics > > > Key: IMPALA-682 > URL: https://issues.apache.org/jira/browse/IMPALA-682 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Affects Versions: Impala 1.2.2 >Reporter: Henry Robinson >Assignee: Henry Robinson >Priority: Minor > > When the statestore has a large topic to transmit (e.g. over 100MB), a lot of > network bandwidth will be used. This is particularly acute at startup, when > many subscribers are competing for the complete version of a single large > topic. > A lot of the statestore's content is textual, and therefore likely to be very > compressible. We can use Thrift's {{TZlibTransport}} to transparently > compress large topics, but the problem then is that we'll be doing a lot of > redundant work to compress the same topic many times. > Instead, maybe we will have to serialise the Topic's thrift structure to a > byte string (much as we do for topic values), and then compress it in a > single thread. We can do this repeatedly in the background as topics get > updated. There'll be some double-serialisation cost to pay, since presumably > serialising and deserialising Thrift structs in the application involves > another copy, but it should be worth it. > We can also mitigate the startup problem by having subscribers wait for a lot > longer to get their first heartbeat after registration, since the first set > of topic updates is going to be large. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-685) Add method in FunctionContext to support loading binaries in HDFS
[ https://issues.apache.org/jira/browse/IMPALA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-685. -- Resolution: Later Closing some JIRAs that have useful suggestions but limited interest. We can reopen if there is more interest. > Add method in FunctionContext to support loading binaries in HDFS > - > > Key: IMPALA-685 > URL: https://issues.apache.org/jira/browse/IMPALA-685 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 1.2, Impala 2.5.0 >Reporter: Nong Li >Priority: Minor > > We need to add the ability to load additional .so's from the FunctionContext > object. There is a partner > that needs to be able to load additional .so from a UDF. This would just be a > thin wrapper around our > library cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-685) Add method in FunctionContext to support loading binaries in HDFS
[ https://issues.apache.org/jira/browse/IMPALA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-685. -- Resolution: Later Closing some JIRAs that have useful suggestions but limited interest. We can reopen if there is more interest. > Add method in FunctionContext to support loading binaries in HDFS > - > > Key: IMPALA-685 > URL: https://issues.apache.org/jira/browse/IMPALA-685 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 1.2, Impala 2.5.0 >Reporter: Nong Li >Priority: Minor > > We need to add the ability to load additional .so's from the FunctionContext > object. There is a partner > that needs to be able to load additional .so from a UDF. This would just be a > thin wrapper around our > library cache. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-661) Explain plan should characterize the cost of evaluating predicates
[ https://issues.apache.org/jira/browse/IMPALA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-661. -- Resolution: Later Closing some JIRAs that have useful suggestions but limited interest. We can reopen if there is more interest. > Explain plan should characterize the cost of evaluating predicates > -- > > Key: IMPALA-661 > URL: https://issues.apache.org/jira/browse/IMPALA-661 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Affects Versions: Impala 1.1.1, Impala 2.5.0 >Reporter: Alan Choi >Priority: Minor > > Explain plan is an effective tool for tuning query. However, it doesn't give > much inside into the (cpu) cost of expression evaluation. For complex > predicates, it'll greatly affect the runtime of the query. For example, > complex regex is very costly to evaluate. > Right now, if a complex join predicate cause the join to slow down, it's not > very easy to tell. > If explain plan can annotate the cost of predicate evaluation (roughly), then > it can guide our user to identify and tune the predicates. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-661) Explain plan should characterize the cost of evaluating predicates
[ https://issues.apache.org/jira/browse/IMPALA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-661. -- Resolution: Later Closing some JIRAs that have useful suggestions but limited interest. We can reopen if there is more interest. > Explain plan should characterize the cost of evaluating predicates > -- > > Key: IMPALA-661 > URL: https://issues.apache.org/jira/browse/IMPALA-661 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Affects Versions: Impala 1.1.1, Impala 2.5.0 >Reporter: Alan Choi >Priority: Minor > > Explain plan is an effective tool for tuning query. However, it doesn't give > much inside into the (cpu) cost of expression evaluation. For complex > predicates, it'll greatly affect the runtime of the query. For example, > complex regex is very costly to evaluate. > Right now, if a complex join predicate cause the join to slow down, it's not > very easy to tell. > If explain plan can annotate the cost of predicate evaluation (roughly), then > it can guide our user to identify and tune the predicates. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-653) Add basic filtering to SHOW STATS command.
[ https://issues.apache.org/jira/browse/IMPALA-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-653. -- Resolution: Later Will close since there seems to be limited interest > Add basic filtering to SHOW STATS command. > -- > > Key: IMPALA-653 > URL: https://issues.apache.org/jira/browse/IMPALA-653 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.2 >Reporter: Alexander Behm >Priority: Minor > > Enhance SHOW TABLE STATS with the capability to filter on partitions. > Enhance SHOW COLUMN STATS with the capability to filter on columns. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-653) Add basic filtering to SHOW STATS command.
[ https://issues.apache.org/jira/browse/IMPALA-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-653. -- Resolution: Later Will close since there seems to be limited interest > Add basic filtering to SHOW STATS command. > -- > > Key: IMPALA-653 > URL: https://issues.apache.org/jira/browse/IMPALA-653 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.2 >Reporter: Alexander Behm >Priority: Minor > > Enhance SHOW TABLE STATS with the capability to filter on partitions. > Enhance SHOW COLUMN STATS with the capability to filter on columns. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-609) Impala should provide a more accurate error message when region server is unhealthy - fails with NoClassDefFoundError
[ https://issues.apache.org/jira/browse/IMPALA-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-609. -- Resolution: Cannot Reproduce > Impala should provide a more accurate error message when region server is > unhealthy - fails with NoClassDefFoundError > - > > Key: IMPALA-609 > URL: https://issues.apache.org/jira/browse/IMPALA-609 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.1 > Environment: CentOS release 6.4 (Final) > CDH: CDH 4.4.0-1.cdh4.4.0.p0.39 > JDK: java version "1.7.0_21" >Reporter: Tony Xu >Priority: Minor > Labels: impala > > When running query in Impala (Command line) "select name from product where > manufacturername = "Cat's Pride" limit 1;", Impala returns error message: > "org.apache.hadoop.ipc.RemoteException(java.lang.NoClassDefFoundError): IPC > server unable to read call parameters: Could not initialize class > org.apache.hadoop.hbase.util.Classes" > Note: > 1. Basic "select * from table_name limit 1;" works in both command line and > Hue Impala UI. > 2. We used CM to install Impala, "All Services" -> "Actions" -> "Add a > Service" -> Then choose Impala for the initial installation then updated > through "Parcels" interface. > 3. The same query works in hive. > Cause: > One of the "region server" couldn't talk to the Hbase master, there are also > error messages in Hbase log file on the problematic node. The error message I > caught from Hbase log is: > = > WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for > client 10.6.70.3. > = > 10.6.70.3 is the Hbase master's IP address. > Impala should have failed with a better error message than > "NoClassDefFoundError". > Impala log: > http://paste.ubuntu.com/6179531/ > Hbase unhealthy canary report log: > http://paste.ubuntu.com/6179569/ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-609) Impala should provide a more accurate error message when region server is unhealthy - fails with NoClassDefFoundError
[ https://issues.apache.org/jira/browse/IMPALA-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-609. -- Resolution: Cannot Reproduce > Impala should provide a more accurate error message when region server is > unhealthy - fails with NoClassDefFoundError > - > > Key: IMPALA-609 > URL: https://issues.apache.org/jira/browse/IMPALA-609 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.1 > Environment: CentOS release 6.4 (Final) > CDH: CDH 4.4.0-1.cdh4.4.0.p0.39 > JDK: java version "1.7.0_21" >Reporter: Tony Xu >Priority: Minor > Labels: impala > > When running query in Impala (Command line) "select name from product where > manufacturername = "Cat's Pride" limit 1;", Impala returns error message: > "org.apache.hadoop.ipc.RemoteException(java.lang.NoClassDefFoundError): IPC > server unable to read call parameters: Could not initialize class > org.apache.hadoop.hbase.util.Classes" > Note: > 1. Basic "select * from table_name limit 1;" works in both command line and > Hue Impala UI. > 2. We used CM to install Impala, "All Services" -> "Actions" -> "Add a > Service" -> Then choose Impala for the initial installation then updated > through "Parcels" interface. > 3. The same query works in hive. > Cause: > One of the "region server" couldn't talk to the Hbase master, there are also > error messages in Hbase log file on the problematic node. The error message I > caught from Hbase log is: > = > WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for > client 10.6.70.3. > = > 10.6.70.3 is the Hbase master's IP address. > Impala should have failed with a better error message than > "NoClassDefFoundError". > Impala log: > http://paste.ubuntu.com/6179531/ > Hbase unhealthy canary report log: > http://paste.ubuntu.com/6179569/ -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-518) when a query skips bytes or encounters errors in HDFS files, include the details in the profile
[ https://issues.apache.org/jira/browse/IMPALA-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-518. -- Resolution: Later > when a query skips bytes or encounters errors in HDFS files, include the > details in the profile > --- > > Key: IMPALA-518 > URL: https://issues.apache.org/jira/browse/IMPALA-518 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 1.0.1 >Reporter: Chris Leroy >Priority: Minor > Labels: observability, supportability > > We'd like to be able to collect information on the actual files in HDFS that > contain the problematic data when a query hits such data. It's fine to stop > reporting once you hit some reporting limit, but it'd be nice to be able to > point to the specific problematic files. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-518) when a query skips bytes or encounters errors in HDFS files, include the details in the profile
[ https://issues.apache.org/jira/browse/IMPALA-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-518. -- Resolution: Later > when a query skips bytes or encounters errors in HDFS files, include the > details in the profile > --- > > Key: IMPALA-518 > URL: https://issues.apache.org/jira/browse/IMPALA-518 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 1.0.1 >Reporter: Chris Leroy >Priority: Minor > Labels: observability, supportability > > We'd like to be able to collect information on the actual files in HDFS that > contain the problematic data when a query hits such data. It's fine to stop > reporting once you hit some reporting limit, but it'd be nice to be able to > point to the specific problematic files. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-536) Verbose mode in the shell
[ https://issues.apache.org/jira/browse/IMPALA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-536. -- Resolution: Later > Verbose mode in the shell > - > > Key: IMPALA-536 > URL: https://issues.apache.org/jira/browse/IMPALA-536 > Project: IMPALA > Issue Type: New Feature > Components: Clients >Affects Versions: Impala 1.1 >Reporter: John Russell >Priority: Minor > > In normal usage, I appreciate the concise output of impala-shell compared to > the Hive shell. Sometimes during debugging, when I try operations in Hive, I > do find it convenient to see messages about processes, files, and URLs > printed directly in the Hive shell. Could we have a query option for > impala-shell to enable a 'verbose' mode? > For example, after an INSERT it could print the HDFS URI(s) of data files > created. (When Impala write operations fail, it can be inconvenient to try > and track down files that need to be cleaned up by browsing through the HDFS > directory tree.) After a query, it could print the URL pointing to the web UI > where one could see the relevant log info. There are probably many other > possibilities that we could think of once the basic mechanism was in place. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-536) Verbose mode in the shell
[ https://issues.apache.org/jira/browse/IMPALA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-536. -- Resolution: Later > Verbose mode in the shell > - > > Key: IMPALA-536 > URL: https://issues.apache.org/jira/browse/IMPALA-536 > Project: IMPALA > Issue Type: New Feature > Components: Clients >Affects Versions: Impala 1.1 >Reporter: John Russell >Priority: Minor > > In normal usage, I appreciate the concise output of impala-shell compared to > the Hive shell. Sometimes during debugging, when I try operations in Hive, I > do find it convenient to see messages about processes, files, and URLs > printed directly in the Hive shell. Could we have a query option for > impala-shell to enable a 'verbose' mode? > For example, after an INSERT it could print the HDFS URI(s) of data files > created. (When Impala write operations fail, it can be inconvenient to try > and track down files that need to be cleaned up by browsing through the HDFS > directory tree.) After a query, it could print the URL pointing to the web UI > where one could see the relevant log info. There are probably many other > possibilities that we could think of once the basic mechanism was in place. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-462) ALTER DATABASE statement
[ https://issues.apache.org/jira/browse/IMPALA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930919#comment-16930919 ] Tim Armstrong commented on IMPALA-462: -- IMPALA-7016 added the statement but not the support for the two requested statements. > ALTER DATABASE statement > > > Key: IMPALA-462 > URL: https://issues.apache.org/jira/browse/IMPALA-462 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 1.1, Impala 2.3.0 >Reporter: John Russell >Priority: Minor > Labels: ramp-up > Fix For: Product Backlog > > > I suggest adding an ALTER DATABASE statement, for completeness and future > expansion. > Currently, Hive has ALTER DATABASE that AFAICT only allows a SET clause to > change properties. > One logical syntax / use case for an Impala ALTER DATABASE would be: > ALTER DATABASE old_name RENAME TO new_name; > (OK to disallow for the DEFAULT database or the currently USEd database.) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-416) Impala and Hive's unescaping of octal escape sequences is flawed.
[ https://issues.apache.org/jira/browse/IMPALA-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930917#comment-16930917 ] Tim Armstrong commented on IMPALA-416: -- Confirmed this is still a bug > Impala and Hive's unescaping of octal escape sequences is flawed. > - > > Key: IMPALA-416 > URL: https://issues.apache.org/jira/browse/IMPALA-416 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 1.0 >Reporter: Alexander Behm >Priority: Minor > > Impala reuses Hive's code to do deal with escape sequences, so both Hive and > Impala are affected. > Octal values > 127 fail to unescape properly. This may lead to problems if, > e.g., a user has text data with exotic single-byte delimiters (UTF-8 single > byte or ASCII extended). > For example, > select "\127" returns "W > but > select "\128" returns "128" -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger
[ https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930910#comment-16930910 ] Fang-Yu Rao edited comment on IMPALA-8587 at 9/16/19 10:27 PM: --- After testing the proposed patch, I found that even we log in to impalad via Impala shell as a non-Ranger super user, the execution of that SQL user could still succeed. For example, if we log in to impalad as a user using {code:java} ./bin/impala-shell.sh -u random_user; {code} The SQL statement in the following could still succeed. {code:java} show grant user admin on database functional; {code} This seems like a bug since a user that does not correspond to a Ranger super user should not be able to execute this SQL statement successfully. was (Author: fangyurao): After testing the proposed patch, I found that even we log in to impalad via Impala shell as a non-Ranger super user, the execution of that SQL user could still succeed. For example, if we log in to impalad as a user using {code:java} ./bin/impala-shell.sh -u random_user; {code} The SQL statement in the following could still succeed. {code:java} show grant user admin on database functional; {code} This seems like a bug. > Show inherited privileges in show grant w/ Ranger > - > > Key: IMPALA-8587 > URL: https://issues.apache.org/jira/browse/IMPALA-8587 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Austin Nobis >Assignee: Fang-Yu Rao >Priority: Critical > > If an admin has privileges from: > *grant all on server to user admin;* > > Currently the command below will show no results: > *show grant user admin on database functional;* > > After the change, the user should see server level privileges from: > *show grant user admin on database functional;* > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8587) Show inherited privileges in show grant w/ Ranger
[ https://issues.apache.org/jira/browse/IMPALA-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930910#comment-16930910 ] Fang-Yu Rao commented on IMPALA-8587: - After testing the proposed patch, I found that even we log in to impalad via Impala shell as a non-Ranger super user, the execution of that SQL user could still succeed. For example, if we log in to impalad as a user using {code:java} ./bin/impala-shell.sh -u random_user; {code} The SQL statement in the following could still succeed. {code:java} show grant user admin on database functional; {code} This seems like a bug. > Show inherited privileges in show grant w/ Ranger > - > > Key: IMPALA-8587 > URL: https://issues.apache.org/jira/browse/IMPALA-8587 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Austin Nobis >Assignee: Fang-Yu Rao >Priority: Critical > > If an admin has privileges from: > *grant all on server to user admin;* > > Currently the command below will show no results: > *show grant user admin on database functional;* > > After the change, the user should see server level privileges from: > *show grant user admin on database functional;* > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-5234) Get rid of redundant LogError() messages
[ https://issues.apache.org/jira/browse/IMPALA-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-5234. --- Resolution: Later This has been dormant for a while. Not sure that it's a real problem in practive > Get rid of redundant LogError() messages > > > Key: IMPALA-5234 > URL: https://issues.apache.org/jira/browse/IMPALA-5234 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Sailesh Mukil >Priority: Major > Labels: errorhandling > > In a few places in the codebase, there are redundant LogError() calls that > add error statuses to the error_log AND return the same error status up the > call stack. This results in the same error message being sent back twice to > the client. We need to find all such cases and remove these redundant > LogError() calls. > Repro: > set mem_limit=1m; > select * from tpch.lineitem; > Output: > {code} > [localhost:21000] > select * from tpch.lineitem; > Query: select * from tpch.lineitem > Query submitted at: 2017-04-20 12:04:22 (Coordinator: http://localhost:25000) > Query progress can be monitored at: > http://localhost:25000/query_plan?query_id=6048492f67282f78:ef0f2bd4 > WARNINGS: Memory limit exceeded: Failed to allocate tuple buffer > HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. > Error occurred on backend localhost:22000 by fragment > 6048492f67282f78:ef0f2bd40003 > Memory left in process limit: 8.24 GB > Memory left in query limit: -7369392.00 B > Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 > MB Total=8.03 MB Peak=8.03 MB > Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB > EXCHANGE_NODE (id=1): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > PLAN_ROOT_SINK: Total=0 Peak=0 > CodeGen: Total=0 Peak=0 > Block Manager: Total=0 Peak=0 > Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB > HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB > DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B > CodeGen: Total=0 Peak=0 > Memory limit exceeded: Failed to allocate tuple buffer > HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. > Error occurred on backend localhost:22000 by fragment > 6048492f67282f78:ef0f2bd40003 > Memory left in process limit: 8.24 GB > Memory left in query limit: -7369392.00 B > Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 > MB Total=8.03 MB Peak=8.03 MB > Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB > EXCHANGE_NODE (id=1): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > PLAN_ROOT_SINK: Total=0 Peak=0 > CodeGen: Total=0 Peak=0 > Block Manager: Total=0 Peak=0 > Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB > HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB > DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B > CodeGen: Total=0 Peak=0 > {code} > This can be traced back to: > https://github.com/apache/incubator-impala/blob/a50c344077f6c9bbea3d3cbaa2e9146ba20ac9a9/be/src/runtime/row-batch.cc#L462 > https://github.com/apache/incubator-impala/blob/master/be/src/runtime/mem-tracker.cc#L319-L320 > There are more such examples that need to be taken care of too. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-5234) Get rid of redundant LogError() messages
[ https://issues.apache.org/jira/browse/IMPALA-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-5234. --- Resolution: Later This has been dormant for a while. Not sure that it's a real problem in practive > Get rid of redundant LogError() messages > > > Key: IMPALA-5234 > URL: https://issues.apache.org/jira/browse/IMPALA-5234 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Sailesh Mukil >Priority: Major > Labels: errorhandling > > In a few places in the codebase, there are redundant LogError() calls that > add error statuses to the error_log AND return the same error status up the > call stack. This results in the same error message being sent back twice to > the client. We need to find all such cases and remove these redundant > LogError() calls. > Repro: > set mem_limit=1m; > select * from tpch.lineitem; > Output: > {code} > [localhost:21000] > select * from tpch.lineitem; > Query: select * from tpch.lineitem > Query submitted at: 2017-04-20 12:04:22 (Coordinator: http://localhost:25000) > Query progress can be monitored at: > http://localhost:25000/query_plan?query_id=6048492f67282f78:ef0f2bd4 > WARNINGS: Memory limit exceeded: Failed to allocate tuple buffer > HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. > Error occurred on backend localhost:22000 by fragment > 6048492f67282f78:ef0f2bd40003 > Memory left in process limit: 8.24 GB > Memory left in query limit: -7369392.00 B > Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 > MB Total=8.03 MB Peak=8.03 MB > Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB > EXCHANGE_NODE (id=1): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > PLAN_ROOT_SINK: Total=0 Peak=0 > CodeGen: Total=0 Peak=0 > Block Manager: Total=0 Peak=0 > Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB > HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB > DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B > CodeGen: Total=0 Peak=0 > Memory limit exceeded: Failed to allocate tuple buffer > HDFS_SCAN_NODE (id=0) could not allocate 190.00 KB without exceeding limit. > Error occurred on backend localhost:22000 by fragment > 6048492f67282f78:ef0f2bd40003 > Memory left in process limit: 8.24 GB > Memory left in query limit: -7369392.00 B > Query(6048492f67282f78:ef0f2bd4): memory limit exceeded. Limit=1.00 > MB Total=8.03 MB Peak=8.03 MB > Fragment 6048492f67282f78:ef0f2bd4: Total=8.00 KB Peak=8.00 KB > EXCHANGE_NODE (id=1): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > PLAN_ROOT_SINK: Total=0 Peak=0 > CodeGen: Total=0 Peak=0 > Block Manager: Total=0 Peak=0 > Fragment 6048492f67282f78:ef0f2bd40003: Total=8.02 MB Peak=8.02 MB > HDFS_SCAN_NODE (id=0): Total=8.01 MB Peak=8.01 MB > DataStreamSender (dst_id=1): Total=688.00 B Peak=688.00 B > CodeGen: Total=0 Peak=0 > {code} > This can be traced back to: > https://github.com/apache/incubator-impala/blob/a50c344077f6c9bbea3d3cbaa2e9146ba20ac9a9/be/src/runtime/row-batch.cc#L462 > https://github.com/apache/incubator-impala/blob/master/be/src/runtime/mem-tracker.cc#L319-L320 > There are more such examples that need to be taken care of too. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs
[ https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8945. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: Incorrect Claim of Equivalence in Impala Docs > - > > Key: IMPALA-8945 > URL: https://issues.apache.org/jira/browse/IMPALA-8945 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Fix For: Impala 3.4.0 > > > Reported by [~icook] > The Impala docs entry for the IS DISTINCT FROM operator states: > The <=> operator, used like an equality operator in a join query, is more > efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The > <=> operator can use a hash join, while the OR expression cannot. > But this expression is not equivalent to A <=> B. See the attached screenshot > demonstrating their non-equivalence. An expression that is equivalent to A > <=> B is this: > (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B)) > This expression should replace the existing incorrect expression. > Another expression that is equivalent to A <=> B is: > if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B) > This one is a bit easier to follow. If you use this one in the docs, just > replace the following line with: > The <=> operator can use a hash join, while the if expression cannot. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs
[ https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8945. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: Incorrect Claim of Equivalence in Impala Docs > - > > Key: IMPALA-8945 > URL: https://issues.apache.org/jira/browse/IMPALA-8945 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Fix For: Impala 3.4.0 > > > Reported by [~icook] > The Impala docs entry for the IS DISTINCT FROM operator states: > The <=> operator, used like an equality operator in a join query, is more > efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The > <=> operator can use a hash join, while the OR expression cannot. > But this expression is not equivalent to A <=> B. See the attached screenshot > demonstrating their non-equivalence. An expression that is equivalent to A > <=> B is this: > (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B)) > This expression should replace the existing incorrect expression. > Another expression that is equivalent to A <=> B is: > if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B) > This one is a bit easier to follow. If you use this one in the docs, just > replace the following line with: > The <=> operator can use a hash join, while the if expression cannot. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider
[ https://issues.apache.org/jira/browse/IMPALA-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8930. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: Document object ownership with Ranger authorization provider > > > Key: IMPALA-8930 > URL: https://issues.apache.org/jira/browse/IMPALA-8930 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider
[ https://issues.apache.org/jira/browse/IMPALA-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8930. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: Document object ownership with Ranger authorization provider > > > Key: IMPALA-8930 > URL: https://issues.apache.org/jira/browse/IMPALA-8930 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (IMPALA-8805) Clarify behaviour around when resource pool config changes take effect
[ https://issues.apache.org/jira/browse/IMPALA-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8805: -- Target Version: Product Backlog (was: Impala 3.4.0) > Clarify behaviour around when resource pool config changes take effect > -- > > Key: IMPALA-8805 > URL: https://issues.apache.org/jira/browse/IMPALA-8805 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: admission-control > > I spent some time playing around with resource pool configs and it's quite > difficult to tell when a config change took effect - I at least was > temporarily convinced that there was a bug where change *were not* picked up. > I believe I was mistaken and just getting confused by the staleness of the > /admission UI until a query is submitted by the pool, plus my query not > getting routed to the right pool. > * Confirm that all resource pool config changes take effect dynamically when > the next query is submitted > * Document this behaviour > * Investigate what it would take to reflect pool configs in the admission UI > immediately when they take effect > * Add information to the debug UI that indicates that pool configs are > correct as-of the last query submitted to the pool > * Maybe add logging when the config change is picked up > * Maybe add a warning when REQUEST_POOL is specified but the query does not > end up in that pool (e.g. if the pool doesn't exist). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs
[ https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930848#comment-16930848 ] Alex Rodoni commented on IMPALA-8945: - https://gerrit.cloudera.org/#/c/14239/ > Impala Doc: Incorrect Claim of Equivalence in Impala Docs > - > > Key: IMPALA-8945 > URL: https://issues.apache.org/jira/browse/IMPALA-8945 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > > Reported by [~icook] > The Impala docs entry for the IS DISTINCT FROM operator states: > The <=> operator, used like an equality operator in a join query, is more > efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The > <=> operator can use a hash join, while the OR expression cannot. > But this expression is not equivalent to A <=> B. See the attached screenshot > demonstrating their non-equivalence. An expression that is equivalent to A > <=> B is this: > (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B)) > This expression should replace the existing incorrect expression. > Another expression that is equivalent to A <=> B is: > if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B) > This one is a bit easier to follow. If you use this one in the docs, just > replace the following line with: > The <=> operator can use a hash join, while the if expression cannot. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8384) Insert ACL tests fail on dockerised cluster
[ https://issues.apache.org/jira/browse/IMPALA-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8384. --- Resolution: Won't Fix I think this is a mix of catalogv2 issues, tracked by IMPALA-7539, and test issues, e.g. test_multiple_group_acls I think depends on some assumptions about groups and users that are broken by the impalad running in the container. > Insert ACL tests fail on dockerised cluster > --- > > Key: IMPALA-8384 > URL: https://issues.apache.org/jira/browse/IMPALA-8384 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > {noformat} > $ TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" impala-py.test > tests/query_test/test_insert_behaviour.py -k acl > ... > tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_inherit_acls > xfail > tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_acl_permissions > FAILED > tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_multiple_group_acls > FAILED > {noformat} > {noformat} > _ > TestInsertBehaviour.test_insert_acl_permissions > __ > tests/query_test/test_insert_behaviour.py:410: in test_insert_acl_permissions > self.execute_query_expect_failure(self.client, insert_query) > tests/common/impala_test_suite.py:607: in wrapper > return function(*args, **kwargs) > tests/common/impala_test_suite.py:629: in execute_query_expect_failure > assert not result.success, "No failure encountered for query %s" % query > E AssertionError: No failure encountered for query INSERT INTO > `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` VALUES(1) > -- > Captured stderr setup > --- > SET > client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions; > -- connecting to: localhost:21000 > -- connecting to localhost:21050 with impyla > Conn > -- 2019-04-03 16:21:43,525 INFO MainThread: Closing active operation > SET > client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions; > SET sync_ddl=False; > -- executing against localhost:21000 > DROP DATABASE IF EXISTS `test_insert_acl_permissions_4941df88` CASCADE; > -- 2019-04-03 16:21:43,531 INFO MainThread: Started query > 0847d4339b358537:c1c6ad23 > SET > client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions; > SET sync_ddl=False; > -- executing against localhost:21000 > CREATE DATABASE `test_insert_acl_permissions_4941df88`; > -- 2019-04-03 16:21:43,958 INFO MainThread: Started query > 694436ba3cf75303:287b5135 > -- 2019-04-03 16:21:43,966 INFO MainThread: Created database > "test_insert_acl_permissions_4941df88" for test ID > "query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions" > --- > Captured stderr call > --- > -- executing against localhost:21000 > DROP TABLE IF EXISTS > `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`; > -- 2019-04-03 16:21:43,977 INFO MainThread: Started query > ac48b16897d5b622:d96d3999 > -- executing against localhost:21000 > CREATE TABLE `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` > (col int); > -- 2019-04-03 16:21:44,454 INFO MainThread: Started query > e741c32bf0fbf1f5:84e69d42 > -- executing against localhost:21000 > INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` > VALUES(1); > -- 2019-04-03 16:21:47,252 INFO MainThread: Started query > cb44c5c675b81864:3dfa1a50 > -- 2019-04-03 16:21:47,371 INFO MainThread: Starting new HTTP connection > (1): 0.0.0.0 > -- 2019-04-03 16:21:47,381 INFO MainThread: Starting new HTTP connection > (1): 0.0.0.0 > -- executing against localhost:21000 > REFRESH `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`; > -- 2019-04-03 16:21:47,482 INFO MainThread: Started query > b049107c5b287ed1:636e8735 > -- executing against localhost:21000 > INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` > VALUES(1); > -- 2019-04-03 16:21:47,625 INFO
[jira] [Resolved] (IMPALA-8384) Insert ACL tests fail on dockerised cluster
[ https://issues.apache.org/jira/browse/IMPALA-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8384. --- Resolution: Won't Fix I think this is a mix of catalogv2 issues, tracked by IMPALA-7539, and test issues, e.g. test_multiple_group_acls I think depends on some assumptions about groups and users that are broken by the impalad running in the container. > Insert ACL tests fail on dockerised cluster > --- > > Key: IMPALA-8384 > URL: https://issues.apache.org/jira/browse/IMPALA-8384 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > {noformat} > $ TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" impala-py.test > tests/query_test/test_insert_behaviour.py -k acl > ... > tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_inherit_acls > xfail > tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_acl_permissions > FAILED > tests/query_test/test_insert_behaviour.py::TestInsertBehaviour::test_multiple_group_acls > FAILED > {noformat} > {noformat} > _ > TestInsertBehaviour.test_insert_acl_permissions > __ > tests/query_test/test_insert_behaviour.py:410: in test_insert_acl_permissions > self.execute_query_expect_failure(self.client, insert_query) > tests/common/impala_test_suite.py:607: in wrapper > return function(*args, **kwargs) > tests/common/impala_test_suite.py:629: in execute_query_expect_failure > assert not result.success, "No failure encountered for query %s" % query > E AssertionError: No failure encountered for query INSERT INTO > `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` VALUES(1) > -- > Captured stderr setup > --- > SET > client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions; > -- connecting to: localhost:21000 > -- connecting to localhost:21050 with impyla > Conn > -- 2019-04-03 16:21:43,525 INFO MainThread: Closing active operation > SET > client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions; > SET sync_ddl=False; > -- executing against localhost:21000 > DROP DATABASE IF EXISTS `test_insert_acl_permissions_4941df88` CASCADE; > -- 2019-04-03 16:21:43,531 INFO MainThread: Started query > 0847d4339b358537:c1c6ad23 > SET > client_identifier=query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions; > SET sync_ddl=False; > -- executing against localhost:21000 > CREATE DATABASE `test_insert_acl_permissions_4941df88`; > -- 2019-04-03 16:21:43,958 INFO MainThread: Started query > 694436ba3cf75303:287b5135 > -- 2019-04-03 16:21:43,966 INFO MainThread: Created database > "test_insert_acl_permissions_4941df88" for test ID > "query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_acl_permissions" > --- > Captured stderr call > --- > -- executing against localhost:21000 > DROP TABLE IF EXISTS > `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`; > -- 2019-04-03 16:21:43,977 INFO MainThread: Started query > ac48b16897d5b622:d96d3999 > -- executing against localhost:21000 > CREATE TABLE `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` > (col int); > -- 2019-04-03 16:21:44,454 INFO MainThread: Started query > e741c32bf0fbf1f5:84e69d42 > -- executing against localhost:21000 > INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` > VALUES(1); > -- 2019-04-03 16:21:47,252 INFO MainThread: Started query > cb44c5c675b81864:3dfa1a50 > -- 2019-04-03 16:21:47,371 INFO MainThread: Starting new HTTP connection > (1): 0.0.0.0 > -- 2019-04-03 16:21:47,381 INFO MainThread: Starting new HTTP connection > (1): 0.0.0.0 > -- executing against localhost:21000 > REFRESH `test_insert_acl_permissions_4941df88`.`insert_acl_permissions`; > -- 2019-04-03 16:21:47,482 INFO MainThread: Started query > b049107c5b287ed1:636e8735 > -- executing against localhost:21000 > INSERT INTO `test_insert_acl_permissions_4941df88`.`insert_acl_permissions` > VALUES(1); > -- 2019-04-03 16:21:47,625 INFO
[jira] [Work started] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs
[ https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8945 started by Alex Rodoni. --- > Impala Doc: Incorrect Claim of Equivalence in Impala Docs > - > > Key: IMPALA-8945 > URL: https://issues.apache.org/jira/browse/IMPALA-8945 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > > Reported by [~icook] > The Impala docs entry for the IS DISTINCT FROM operator states: > The <=> operator, used like an equality operator in a join query, is more > efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The > <=> operator can use a hash join, while the OR expression cannot. > But this expression is not equivalent to A <=> B. See the attached screenshot > demonstrating their non-equivalence. An expression that is equivalent to A > <=> B is this: > (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B)) > This expression should replace the existing incorrect expression. > Another expression that is equivalent to A <=> B is: > if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B) > This one is a bit easier to follow. If you use this one in the docs, just > replace the following line with: > The <=> operator can use a hash join, while the if expression cannot. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
[ https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8947 started by Tim Armstrong. - > SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric > - > > Key: IMPALA-8947 > URL: https://issues.apache.org/jira/browse/IMPALA-8947 > Project: IMPALA > Issue Type: Bug >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > Labels: supportability > > {noformat} > ERROR: Could not create files in any configured scratch directories > (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of > scratch is currently in use by this Impala Daemon (69.80 GB by this query). > See logs for previous errors that may have prevented creating or writing > scratch files. The following directories were at capacity: /path/to/scratch > {noformat} > This issue is that the total for the impala daemon uses the wrong counter. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8948) [DOCS] Review "How Impala Works with Hadoop File Formats"
[ https://issues.apache.org/jira/browse/IMPALA-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8948 started by Alex Rodoni. --- > [DOCS] Review "How Impala Works with Hadoop File Formats" > -- > > Key: IMPALA-8948 > URL: https://issues.apache.org/jira/browse/IMPALA-8948 > Project: IMPALA > Issue Type: Bug > Components: Docs >Affects Versions: Impala 3.2.0 >Reporter: Vincent Tran >Assignee: Alex Rodoni >Priority: Minor > > Ref: > [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html] > > In the "Impala Can INSERT?" column of the file type support matrix for Text, > we claim that Impala can insert into a compressed-text table: "Yes: {{CREATE > TABLE}}, {{INSERT}}, {{LOAD DATA}}, and query." > > This doesn't appear to be the case as Impala does not support the writing of > compressed text in any version at the time of this writing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-8903: Description: https://gerrit.cloudera.org/#/c/14235/ > Impala Doc: TRUNCATE for Insert-only ACID tables > > > Key: IMPALA-8903 > URL: https://issues.apache.org/jira/browse/IMPALA-8903 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > > https://gerrit.cloudera.org/#/c/14235/ -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8948) [DOCS] Review "How Impala Works with Hadoop File Formats"
Vincent Tran created IMPALA-8948: Summary: [DOCS] Review "How Impala Works with Hadoop File Formats" Key: IMPALA-8948 URL: https://issues.apache.org/jira/browse/IMPALA-8948 Project: IMPALA Issue Type: Bug Components: Docs Affects Versions: Impala 3.2.0 Reporter: Vincent Tran Assignee: Alex Rodoni Ref: [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html] In the "Impala Can INSERT?" column of the file type support matrix for Text, we claim that Impala can insert into a compressed-text table: "Yes: {{CREATE TABLE}}, {{INSERT}}, {{LOAD DATA}}, and query." This doesn't appear to be the case as Impala does not support the writing of compressed text in any version at the time of this writing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8948) [DOCS] Review "How Impala Works with Hadoop File Formats"
Vincent Tran created IMPALA-8948: Summary: [DOCS] Review "How Impala Works with Hadoop File Formats" Key: IMPALA-8948 URL: https://issues.apache.org/jira/browse/IMPALA-8948 Project: IMPALA Issue Type: Bug Components: Docs Affects Versions: Impala 3.2.0 Reporter: Vincent Tran Assignee: Alex Rodoni Ref: [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html] In the "Impala Can INSERT?" column of the file type support matrix for Text, we claim that Impala can insert into a compressed-text table: "Yes: {{CREATE TABLE}}, {{INSERT}}, {{LOAD DATA}}, and query." This doesn't appear to be the case as Impala does not support the writing of compressed text in any version at the time of this writing. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work started] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8903 started by Alex Rodoni. --- > Impala Doc: TRUNCATE for Insert-only ACID tables > > > Key: IMPALA-8903 > URL: https://issues.apache.org/jira/browse/IMPALA-8903 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org