[jira] [Resolved] (IMPALA-4846) Upgrade snappy to 1.1.4
[ https://issues.apache.org/jira/browse/IMPALA-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson resolved IMPALA-4846. Resolution: Fixed Fix Version/s: Impala 2.9.0 https://github.com/apache/incubator-impala/commit/aafcda0c9bcea5e316f3cc43cee49551c7be5c60 > Upgrade snappy to 1.1.4 > --- > > Key: IMPALA-4846 > URL: https://issues.apache.org/jira/browse/IMPALA-4846 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Tim Armstrong >Assignee: Laszlo Gaal > Labels: perf, ramp-up > Fix For: Impala 2.9.0 > > > The latest version of Snappy claims a significant perf improvement: > https://github.com/google/snappy/blob/master/NEWS > We should pick this up as an easy perf win. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (IMPALA-4999) Impala.tests.custom_cluster.test_spilling.TestSpillStress.test_spill_stress failed intermittently
[ https://issues.apache.org/jira/browse/IMPALA-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-4999. --- Resolution: Fixed Fix Version/s: Impala 2.9.0 IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress The test does not work as intended and would likely not provide very good coverage of the different spilling paths, because it only runs a single simple query. The stress test (tests/stress/concurrent_select.py) provides much better coverage. test_mem_usage_scaling.py probably also provides better coverage that TestSpillStress in its current form. Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Reviewed-on: http://gerrit.cloudera.org:8080/6437 Reviewed-by: Michael BrownTested-by: Impala Public Jenkins --- > Impala.tests.custom_cluster.test_spilling.TestSpillStress.test_spill_stress > failed intermittently > - > > Key: IMPALA-4999 > URL: https://issues.apache.org/jira/browse/IMPALA-4999 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Michael Ho >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Fix For: Impala 2.9.0 > > > {noformat} > Stacktrace > self = > vector = > @pytest.mark.stress > def test_spill_stress(self, vector): > # Number of times to execute each query > for i in xrange(vector.get_value('iterations')): > > self.run_test_case('agg_stress', vector) > custom_cluster/test_spilling.py:99: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > common/impala_test_suite.py:359: in run_test_case > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:567: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:160: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:173: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:339: in __execute_query > self.wait_for_completion(handle) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > query_handle = QueryHandle(log_context='21431d366cf751da:e62f8676', > id='21431d366cf751da:e62f8676') > def wait_for_completion(self, query_handle): > """Given a query handle, polls the coordinator waiting for the query to > complete""" > while True: > query_state = self.get_state(query_handle) > # if the rpc succeeded, the output is the query state > if query_state == self.query_states["FINISHED"]: > break > elif query_state == self.query_states["EXCEPTION"]: > try: > error_log = self.__do_rpc( > lambda: self.imp_service.get_log(query_handle.log_context)) > > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted: > E Memory limit exceeded > E The memory limit is set too low to initialize spilling operator > (id=3). The minimum required memory to spill this operator is 4.25 MB. > E > E > E > E Memory Limit Exceeded by fragment: > 21431d366cf751da:e62f86760004 > E Query(21431d366cf751da:e62f8676): Total=260.67 MB > Peak=303.73 MB > E Fragment 21431d366cf751da:e62f8676000d: Total=27.86 MB > Peak=33.97 MB > E AGGREGATION_NODE (id=6): Total=8.00 KB Peak=8.00 KB > E Exprs: Total=4.00 KB Peak=4.00 KB > E AGGREGATION_NODE (id=11): Total=27.82 MB Peak=32.07 MB > E EXCHANGE_NODE (id=10): Total=0 Peak=0 > E DataStreamRecvr: Total=7.52 KB Peak=2.62 MB > E DataStreamSender (dst_id=12): Total=16.00 KB Peak=16.00 KB > E CodeGen: Total=5.57 KB Peak=395.50 KB > E Block Manager: Limit=250.00 MB Total=250.00 MB Peak=250.00 MB > E Fragment 21431d366cf751da:e62f8676000a: Total=224.32 MB > Peak=228.25 MB > E Runtime Filter Bank: Total=1.00 MB Peak=1.00 MB > E AGGREGATION_NODE (id=5): Total=80.46 MB Peak=80.46 MB > E HASH_JOIN_NODE (id=4): Total=142.73 MB Peak=149.62 MB > E Hash Join Builder (join_node_id=4): Total=142.64 MB > Peak=149.58 MB > E EXCHANGE_NODE (id=8): Total=0 Peak=0 > E DataStreamRecvr: Total=2.84 KB Peak=23.96 MB > E EXCHANGE_NODE (id=9): Total=0 Peak=0 > E DataStreamRecvr:
[jira] [Created] (IMPALA-5102) Handle uncaught exceptions in Impalad
Michael Ho created IMPALA-5102: -- Summary: Handle uncaught exceptions in Impalad Key: IMPALA-5102 URL: https://issues.apache.org/jira/browse/IMPALA-5102 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.9.0 Reporter: Michael Ho Assignee: Joe McDonnell Priority: Critical Impalad uses noexcept API of boost library whenever possible. However, there are certain API which don't implement the noexcept variant. One example of this is the thread creation interface: {noformat} void Thread::StartThread(const ThreadFunctor& functor) { DCHECK(thread_manager.get() != nullptr) << "Thread created before InitThreading called"; DCHECK(tid_ == UNINITIALISED_THREAD_ID) << "StartThread called twice"; Promise thread_started; thread_.reset( new thread(::SuperviseThread, name_, category_, functor, _started)); // TODO: This slows down thread creation although not enormously. To make this faster, // consider delaying thread_started.Get() until the first call to tid(), but bear in // mind that some coordination is required between SuperviseThread() and this to make // sure that the thread is still available to have its tid set. tid_ = thread_started.Get(); VLOG(2) << "Started thread " << tid_ << " - " << category_ << ":" << name_; } {noformat} We have been bitten by this uncaught exception in the past such as IMPALA-3104. This kind of exception is more prone to occur when there are a large number of fragment instances running in an Impala cluster. There are other uncaught exceptions in the code. Please update this JIRA as we find more. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IMPALA-2518) DROP DATABASE CASCADE does not remove cache directives of tables
[ https://issues.apache.org/jira/browse/IMPALA-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933328#comment-15933328 ] Jim Apple commented on IMPALA-2518: --- Are you sure this was resolved by 2.8? https://github.com/apache/incubator-impala/tree/2.8.0 > DROP DATABASE CASCADE does not remove cache directives of tables > > > Key: IMPALA-2518 > URL: https://issues.apache.org/jira/browse/IMPALA-2518 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Assignee: Dimitris Tsirogiannis >Priority: Critical > Labels: catalog-server, usability > Fix For: Impala 2.8.0 > > > Executing a DROP DATABASE statement with the CASCADE option does not clear > the cache directives associated with the cached tables that are contained in > the dropped database. > To reproduce: > {code} > impala> create database foo; > impala> use foo; > impala>create table t1 (a int) cached in 'testPool' with replication = 8; > shell> hdfs cacheadmin -listDirectives > impala> use default; > impala> drop database foo cascade; > shell> hdfs cachedmin -listDirectives <-- the output contains the directive > associated with the path of table t1 > {code} > This has been breaking impala-cdh5.5.x-repeated-runs > (https://issues.cloudera.org/browse/IMPALA-2510) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (IMPALA-4914) TestSpillStress makes flawed assumptions about running concurrently
[ https://issues.apache.org/jira/browse/IMPALA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-4914: - Assignee: Tim Armstrong > TestSpillStress makes flawed assumptions about running concurrently > --- > > Key: IMPALA-4914 > URL: https://issues.apache.org/jira/browse/IMPALA-4914 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.9.0 >Reporter: Michael Brown >Assignee: Tim Armstrong > > I took a look at TestSpillStress and found a bunch of problems with it. > # It's not being run, because its workload isn't being run exhaustively > # It can't run, because it doesn't set up a client properly. > # It looks like it was intended to run in parallel fashion, but custom > cluster tests don't run in parallel. > The evidence for my claim of #3 is due to the fact that it uses the > agg_stress workload, which says: > {noformat} > # This query forces many joins and aggregations with spilling > # and can expose race conditions in the spilling code if run in parallel > {noformat} > 1 and 2 could be fixed very quickly and were fixed at > https://gerrit.cloudera.org/#/c/6002/ > 3 is a different thing and requires some thought. We don't have any mechanism > to run custom cluster tests in any sort of parallel way. All custom cluster > tests are serial, even though they lack the serial mark. The reason this is > the case is because most custom cluster tests involve restarting impalad on > every *method*. We can't have methods run in parallel if they will be > restarting on top of each other. As such, custom cluster tests are invoked > differently than other e2e tests, and the invocation does not include {{-n}}, > which causes tests to run in parallel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (IMPALA-2518) DROP DATABASE CASCADE does not remove cache directives of tables
[ https://issues.apache.org/jira/browse/IMPALA-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimitris Tsirogiannis resolved IMPALA-2518. --- Resolution: Fixed Fix Version/s: Impala 2.8.0 Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 Reviewed-on: http://gerrit.cloudera.org:8080/5815 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java M tests/query_test/test_hdfs_caching.py 3 files changed, 66 insertions(+), 34 deletions(-) Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved > DROP DATABASE CASCADE does not remove cache directives of tables > > > Key: IMPALA-2518 > URL: https://issues.apache.org/jira/browse/IMPALA-2518 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Assignee: Dimitris Tsirogiannis >Priority: Critical > Labels: catalog-server, usability > Fix For: Impala 2.8.0 > > > Executing a DROP DATABASE statement with the CASCADE option does not clear > the cache directives associated with the cached tables that are contained in > the dropped database. > To reproduce: > {code} > impala> create database foo; > impala> use foo; > impala>create table t1 (a int) cached in 'testPool' with replication = 8; > shell> hdfs cacheadmin -listDirectives > impala> use default; > impala> drop database foo cascade; > shell> hdfs cachedmin -listDirectives <-- the output contains the directive > associated with the path of table t1 > {code} > This has been breaking impala-cdh5.5.x-repeated-runs > (https://issues.cloudera.org/browse/IMPALA-2510) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (IMPALA-4999) Impala.tests.custom_cluster.test_spilling.TestSpillStress.test_spill_stress failed intermittently
[ https://issues.apache.org/jira/browse/IMPALA-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Brown reassigned IMPALA-4999: - Assignee: Tim Armstrong > Impala.tests.custom_cluster.test_spilling.TestSpillStress.test_spill_stress > failed intermittently > - > > Key: IMPALA-4999 > URL: https://issues.apache.org/jira/browse/IMPALA-4999 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Michael Ho >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > > {noformat} > Stacktrace > self = > vector = > @pytest.mark.stress > def test_spill_stress(self, vector): > # Number of times to execute each query > for i in xrange(vector.get_value('iterations')): > > self.run_test_case('agg_stress', vector) > custom_cluster/test_spilling.py:99: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > common/impala_test_suite.py:359: in run_test_case > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:567: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:160: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:173: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:339: in __execute_query > self.wait_for_completion(handle) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > query_handle = QueryHandle(log_context='21431d366cf751da:e62f8676', > id='21431d366cf751da:e62f8676') > def wait_for_completion(self, query_handle): > """Given a query handle, polls the coordinator waiting for the query to > complete""" > while True: > query_state = self.get_state(query_handle) > # if the rpc succeeded, the output is the query state > if query_state == self.query_states["FINISHED"]: > break > elif query_state == self.query_states["EXCEPTION"]: > try: > error_log = self.__do_rpc( > lambda: self.imp_service.get_log(query_handle.log_context)) > > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted: > E Memory limit exceeded > E The memory limit is set too low to initialize spilling operator > (id=3). The minimum required memory to spill this operator is 4.25 MB. > E > E > E > E Memory Limit Exceeded by fragment: > 21431d366cf751da:e62f86760004 > E Query(21431d366cf751da:e62f8676): Total=260.67 MB > Peak=303.73 MB > E Fragment 21431d366cf751da:e62f8676000d: Total=27.86 MB > Peak=33.97 MB > E AGGREGATION_NODE (id=6): Total=8.00 KB Peak=8.00 KB > E Exprs: Total=4.00 KB Peak=4.00 KB > E AGGREGATION_NODE (id=11): Total=27.82 MB Peak=32.07 MB > E EXCHANGE_NODE (id=10): Total=0 Peak=0 > E DataStreamRecvr: Total=7.52 KB Peak=2.62 MB > E DataStreamSender (dst_id=12): Total=16.00 KB Peak=16.00 KB > E CodeGen: Total=5.57 KB Peak=395.50 KB > E Block Manager: Limit=250.00 MB Total=250.00 MB Peak=250.00 MB > E Fragment 21431d366cf751da:e62f8676000a: Total=224.32 MB > Peak=228.25 MB > E Runtime Filter Bank: Total=1.00 MB Peak=1.00 MB > E AGGREGATION_NODE (id=5): Total=80.46 MB Peak=80.46 MB > E HASH_JOIN_NODE (id=4): Total=142.73 MB Peak=149.62 MB > E Hash Join Builder (join_node_id=4): Total=142.64 MB > Peak=149.58 MB > E EXCHANGE_NODE (id=8): Total=0 Peak=0 > E DataStreamRecvr: Total=2.84 KB Peak=23.96 MB > E EXCHANGE_NODE (id=9): Total=0 Peak=0 > E DataStreamRecvr: Total=0 Peak=130.42 KB > E DataStreamSender (dst_id=10): Total=90.57 KB Peak=202.57 KB > E CodeGen: Total=25.46 KB Peak=1.53 MB > E Fragment 21431d366cf751da:e62f86760004: Total=8.49 MB > Peak=193.94 MB > E Runtime Filter Bank: Total=4.00 MB Peak=4.00 MB > E HASH_JOIN_NODE (id=3): Total=4.36 MB Peak=168.69 MB > E Hash Join Builder (join_node_id=3): Total=4.30 MB > Peak=168.55 MB > E HDFS_SCAN_NODE (id=2): Total=0 Peak=11.95 MB > E EXCHANGE_NODE (id=7): Total=0 Peak=0 > E DataStreamRecvr: Total=0 Peak=22.20 MB > E
[jira] [Created] (IMPALA-5101) Allow insert into table with BOOLEAN partition columns
Lars Volker created IMPALA-5101: --- Summary: Allow insert into table with BOOLEAN partition columns Key: IMPALA-5101 URL: https://issues.apache.org/jira/browse/IMPALA-5101 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 2.9.0 Reporter: Lars Volker HIVE-6590 will add support for insert into tables with BOOLEAN partition columns. Once it has been merged, we should enable these, too. In particular, this means changing [analysis/InsertStmt.java#L475|https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java#L475]. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (IMPALA-5100) add runtime single row subquery check
[ https://issues.apache.org/jira/browse/IMPALA-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Rahn updated IMPALA-5100: -- Labels: planner tpc-ds (was: planner) > add runtime single row subquery check > - > > Key: IMPALA-5100 > URL: https://issues.apache.org/jira/browse/IMPALA-5100 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Greg Rahn > Labels: planner, tpc-ds > > If an uncorrelated subquery is used with an equality predicate, it must > return only one row to be valid. If this can not be guaranteed at parse time > through a single row aggregate or limit clause, Impala fails the query like > such. > {noformat} > select i_manufact from item where i_item_sk = (select i_item_sk from item > where i_item_sk = 1); > ERROR: AnalysisException: Subquery must return a single row: (SELECT > i_item_sk FROM tpcds_1_parquet.item WHERE i_item_sk = 1) > {noformat} > Impala should allow these to run successfully by adding a run time assert in > these cases if the SQ returns > 1 row when it should not. > This impacts TPC-DS query6, query54, query58. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (IMPALA-5100) add runtime single row subquery check
[ https://issues.apache.org/jira/browse/IMPALA-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Rahn updated IMPALA-5100: -- Labels: (was: planner) > add runtime single row subquery check > - > > Key: IMPALA-5100 > URL: https://issues.apache.org/jira/browse/IMPALA-5100 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Greg Rahn > Labels: planner > > If an uncorrelated subquery is used with an equality predicate, it must > return only one row to be valid. If this can not be guaranteed at parse time > through a single row aggregate or limit clause, Impala fails the query like > such. > {noformat} > select i_manufact from item where i_item_sk = (select i_item_sk from item > where i_item_sk = 1); > ERROR: AnalysisException: Subquery must return a single row: (SELECT > i_item_sk FROM tpcds_1_parquet.item WHERE i_item_sk = 1) > {noformat} > Impala should allow these to run successfully by adding a run time assert in > these cases if the SQ returns > 1 row when it should not. > This impacts TPC-DS query6, query54, query58. -- This message was sent by Atlassian JIRA (v6.3.15#6346)