[jira] [Commented] (IMPALA-5165) Allocate memory for all data from BufferPool
[ https://issues.apache.org/jira/browse/IMPALA-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430319#comment-17430319 ] Tim Armstrong commented on IMPALA-5165: --- [~Xinyi Zou] it was a while since I worked on this but the work we did on memory management ended up doing a lot to solve the problem even without converting everything. > Allocate memory for all data from BufferPool > > > Key: IMPALA-5165 > URL: https://issues.apache.org/jira/browse/IMPALA-5165 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: resource-management > > Eventually we should back RowBatches and other runtime memory (e.g. MemPools, > FreePools, compression buffers, etc) with memory from BufferPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-5165) Allocate memory for all data from BufferPool
[ https://issues.apache.org/jira/browse/IMPALA-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-5165. --- Resolution: Later I think in the end this wasn't all that important - allocating small amounts of memory from MemPool is OK - it turned out that moving the large allocations to BufferPool was generally sufficient. > Allocate memory for all data from BufferPool > > > Key: IMPALA-5165 > URL: https://issues.apache.org/jira/browse/IMPALA-5165 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: resource-management > > Eventually we should back RowBatches and other runtime memory (e.g. MemPools, > FreePools, compression buffers, etc) with memory from BufferPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10506) Check if Impala LZ4 has same bug as ARROW-11301
Tim Armstrong created IMPALA-10506: -- Summary: Check if Impala LZ4 has same bug as ARROW-11301 Key: IMPALA-10506 URL: https://issues.apache.org/jira/browse/IMPALA-10506 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Tim Armstrong Assignee: Csaba Ringhofer I noticed ARROW-11301 in the context of a Parquet discussion (https://github.com/apache/parquet-format/pull/164/files/2dfe463c948948f7d9624bee3cdd4706eb3488b5#diff-a1727652430ce24c121536393f2ece63c5799a99583738f48aa8bb9fa71cb3f8) and wondered if Impala has made the same mistake CC [~arawat] [~csringhofer] [~boroknagyz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9382) Prototype denser runtime profile implementation
[ https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9382. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Prototype denser runtime profile implementation > --- > > Key: IMPALA-9382 > URL: https://issues.apache.org/jira/browse/IMPALA-9382 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > Attachments: profile_504b379400cba9f2_2d2cf007, > tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt > > > RuntimeProfile trees can potentially stress the memory allocator and use up a > lot more memory and cache than is really necessary: > * std::map is used throughout, and allocates a node per map entry. We do > depend on the counters being displayed in-order, but we would probably be > better of storing the counters in a vector and lazily sorting when needed > (since the set of counters is generally static after Prepare()). > * We store the same counter names redundantly all over the place. We'd > probably be best off using a pool of constant counter names (we could just > require registering them upfront). > There may be a small gain from switching thrift to using unordered_map, e.g. > for the info strings that appear with some frequency in profiles. > However, I think we need to restructure the thrift representation and > in-memory representation to get significant gains. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10032) Unable to close the connection when fetching data from two databases
[ https://issues.apache.org/jira/browse/IMPALA-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10032. Resolution: Cannot Reproduce I tried to understand what the issue was here. Attaching a test program after doing the work to clean up and reformat your code (please try to actually provide usable examples and don't make us poor maintainers work to understand what you're talking about). It works fine if the SQL statements succeed. If you don't clean up the connections by calling close() on an error, it does leave the connection open. The best practice is to have a finally() close that will clean up connections in event of error. > Unable to close the connection when fetching data from two databases > > > Key: IMPALA-10032 > URL: https://issues.apache.org/jira/browse/IMPALA-10032 > Project: IMPALA > Issue Type: Bug > Components: Clients >Reporter: jayashree >Assignee: Tim Armstrong >Priority: Blocker > Attachments: JDBCExample2.java > > > Hi Team, > I am connecting two databases using cloudera.impala.jdbc41.Driver. > I have two classes with two different connections, each having different SQLs > to perform in respective database. > When I am executing these two classes together, I am getting below error > though I am closing first connection and then connecting to other Database. > Error: > j*ava.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing > query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, > sqlState:HY000, errorMessage:AnalysisException: Could not resolve table > reference:* > *Caused by: java.sql.SQLException:* > *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error > Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:AnalysisException: Could not resolve table reference:* > *Caused by: com.cloudera.support.exceptions.GeneralException:* > *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error > Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:AnalysisException: Could not resolve table reference:* > > *Seems its not able to close first connection.* > *Can you please check it* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10032) Unable to close the connection when fetching data from two databases
[ https://issues.apache.org/jira/browse/IMPALA-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283281#comment-17283281 ] Tim Armstrong commented on IMPALA-10032: javac JDBCExample2.java && CLASSPATH=.:~/ClouderaImpalaJDBC-2.6.3.1004/ImpalaJDBC41.jar java JDBCExample2 > Unable to close the connection when fetching data from two databases > > > Key: IMPALA-10032 > URL: https://issues.apache.org/jira/browse/IMPALA-10032 > Project: IMPALA > Issue Type: Bug > Components: Clients >Reporter: jayashree >Assignee: Tim Armstrong >Priority: Blocker > Attachments: JDBCExample2.java > > > Hi Team, > I am connecting two databases using cloudera.impala.jdbc41.Driver. > I have two classes with two different connections, each having different SQLs > to perform in respective database. > When I am executing these two classes together, I am getting below error > though I am closing first connection and then connecting to other Database. > Error: > j*ava.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing > query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, > sqlState:HY000, errorMessage:AnalysisException: Could not resolve table > reference:* > *Caused by: java.sql.SQLException:* > *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error > Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:AnalysisException: Could not resolve table reference:* > *Caused by: com.cloudera.support.exceptions.GeneralException:* > *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error > Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:AnalysisException: Could not resolve table reference:* > > *Seems its not able to close first connection.* > *Can you please check it* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10032) Unable to close the connection when fetching data from two databases
[ https://issues.apache.org/jira/browse/IMPALA-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10032: --- Attachment: JDBCExample2.java > Unable to close the connection when fetching data from two databases > > > Key: IMPALA-10032 > URL: https://issues.apache.org/jira/browse/IMPALA-10032 > Project: IMPALA > Issue Type: Bug > Components: Clients >Reporter: jayashree >Assignee: Tim Armstrong >Priority: Blocker > Attachments: JDBCExample2.java > > > Hi Team, > I am connecting two databases using cloudera.impala.jdbc41.Driver. > I have two classes with two different connections, each having different SQLs > to perform in respective database. > When I am executing these two classes together, I am getting below error > though I am closing first connection and then connecting to other Database. > Error: > j*ava.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing > query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, > sqlState:HY000, errorMessage:AnalysisException: Could not resolve table > reference:* > *Caused by: java.sql.SQLException:* > *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error > Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:AnalysisException: Could not resolve table reference:* > *Caused by: com.cloudera.support.exceptions.GeneralException:* > *[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error > Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:AnalysisException: Could not resolve table reference:* > > *Seems its not able to close first connection.* > *Can you please check it* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10339) Apparent hang or crash in TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action
[ https://issues.apache.org/jira/browse/IMPALA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-10339: -- Assignee: Wenzhe Zhou (was: Tim Armstrong) > Apparent hang or crash in > TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action > --- > > Key: IMPALA-10339 > URL: https://issues.apache.org/jira/browse/IMPALA-10339 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Tim Armstrong >Assignee: Wenzhe Zhou >Priority: Blocker > Labels: broken-build, flaky, hang > > Release build with this commit as the tip: > {noformat} > commit 9400e9b17b13f613defb6d7b9deb471256b1d95c (CDH/cdpd-master-staging) > Author: wzhou-code > Date: Thu Oct 29 22:32:03 2020 -0700 > IMPALA-10305: Sync Kudu's FIPS compliant changes > > {noformat} > {noformat} > Regression > query_test.test_spilling.TestSpillingNoDebugActionDimensions.test_spilling_no_debug_action[protocol: > beeswax | exec_option: {'mt_dop': 0, 'default_spillable_buffer_size': '64k'} > | table_format: parquet/none] (from pytest) > Failing for the past 1 build (Since Failed#100 ) > Took 1 hr 59 min. > add description > Error Message > query_test/test_spilling.py:134: in test_spilling_no_debug_action > self.run_test_case('QueryTest/spilling-no-debug-action', vector) > common/impala_test_suite.py:668: in run_test_case > self.__verify_exceptions(test_section['CATCH'], str(e), use_db) > common/impala_test_suite.py:485: in __verify_exceptions (expected_str, > actual_str) E AssertionError: Unexpected exception string. Expected: > row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did > not reduce the size of a spilled partition.* E Not found in actual: Timeout > >7200s > Stacktrace > query_test/test_spilling.py:134: in test_spilling_no_debug_action > self.run_test_case('QueryTest/spilling-no-debug-action', vector) > common/impala_test_suite.py:668: in run_test_case > self.__verify_exceptions(test_section['CATCH'], str(e), use_db) > common/impala_test_suite.py:485: in __verify_exceptions > (expected_str, actual_str) > E AssertionError: Unexpected exception string. Expected: row_regex:.*Cannot > perform hash join at node with id .*. Repartitioning did not reduce the size > of a spilled partition.* > E Not found in actual: Timeout >7200s > Standard Error > SET > client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none]; > -- executing against localhost:21000 > use tpch_parquet; > -- 2020-11-11 23:12:04,319 INFO MainThread: Started query > c740c1c66d9679a9:6a40f161 > SET > client_identifier=query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::()::test_spilling_no_debug_action[protocol:beeswax|exec_option:{'mt_dop':0;'default_spillable_buffer_size':'64k'}|table_format:parquet/none]; > SET mt_dop=0; > SET default_spillable_buffer_size=64k; > -- 2020-11-11 23:12:04,320 INFO MainThread: Loading query test file: > /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test > -- 2020-11-11 23:12:04,323 INFO MainThread: Starting new HTTP connection > (1): localhost > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:04,377 INFO MainThread: Started query > c044afcf5ae44df9:a2e2e7c6 > -- executing against localhost:21000 > select straight_join count(*) > from > lineitem a, lineitem b > where > a.l_partkey = 1 and > a.l_orderkey = b.l_orderkey; > -- 2020-11-11 23:12:04,385 INFO MainThread: Started query > 314c019cd252f322:2411bc76 > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:05,199 INFO MainThread: Started query > 80424e68922c30f9:b2144dff > -- executing against localhost:21000 > set debug_action="-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0"; > -- 2020-11-11 23:12:05,207 INFO MainThread: Started query > 2a4c1f4b26ea52da:4339f3ff > -- executing against localhost:21000 > select straight_join count(*) > from > lineitem a > where > a.l_partkey not in (select l_partkey from lineitem where l_partkey > 10) > and a.l_partkey < 1000; > -- 2020-11-11 23:12:05,215 INFO MainThread: Started query > f845afd00a569446:79c5054a > -- executing against localhost:21000 > SET DEBUG_ACTION=""; > -- 2020-11-11 23:12:07,507 INFO MainThread: Started query > ee4f8a685928e7ef:830d9651
[jira] [Assigned] (IMPALA-10301) Insert query hangs in test_local_catalog_ddls_with_invalidate_metadata_sync_ddl
[ https://issues.apache.org/jira/browse/IMPALA-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-10301: -- Assignee: Wenzhe Zhou (was: Tim Armstrong) > Insert query hangs in > test_local_catalog_ddls_with_invalidate_metadata_sync_ddl > --- > > Key: IMPALA-10301 > URL: https://issues.apache.org/jira/browse/IMPALA-10301 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Qifan Chen >Assignee: Wenzhe Zhou >Priority: Blocker > Labels: broken-build, flaky, hang > Attachments: failure-output.txt, > test_mixed_catalog_ddls_with_invalidate_metadata.tar.gz > > > In impala-cdpd-master-staging-core-s3 test, the following error was seen > (insert into a partitioned table timeout). > Error Message > {code:java} > AssertionError: Query timeout(60s): insert overwrite table > test_local_catalog_ddls_with_invalidate_metadata_sync_ddl_b87f02d6.test_2_part > partition(j=2) values (1), (2), (3), (4), (5) assert False > Stacktrace > {code} > {code:java} > custom_cluster/test_concurrent_ddls.py:83: in > test_local_catalog_ddls_with_invalidate_metadata_sync_ddl > self._run_ddls_with_invalidation(unique_database, sync_ddl=True) > custom_cluster/test_concurrent_ddls.py:146: in _run_ddls_with_invalidation > for i in pool.imap_unordered(run_ddls, xrange(1, NUM_ITERS + 1)): > /usr/lib64/python2.7/multiprocessing/pool.py:655: in next > raise value > E AssertionError: Query timeout(60s): insert overwrite table > test_local_catalog_ddls_with_invalidate_metadata_sync_ddl_b87f02d6.test_2_part > partition(j=2) values (1), (2), (3), (4), (5) > E assert False > {code} > The URL is > https://master-02.jenkins.cloudera.com/view/Impala/view/Evergreen-cdpd-master-staging/job/impala-cdpd-master-staging-core-s3/lastCompletedBuild/testReport/custom_cluster.test_concurrent_ddls/TestConcurrentDdls/test_local_catalog_ddls_with_invalidate_metadata_sync_ddl/. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10470) Update wiki and README with info about Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10470. Fix Version/s: Impala 4.0 Resolution: Fixed > Update wiki and README with info about Impala quickstart > > > Key: IMPALA-10470 > URL: https://issues.apache.org/jira/browse/IMPALA-10470 > Project: IMPALA > Issue Type: Sub-task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9793) Improved Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9793. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Improved Impala quickstart > -- > > Key: IMPALA-9793 > URL: https://issues.apache.org/jira/browse/IMPALA-9793 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > > Kudu built a single container quickstart here > https://github.com/apache/kudu/tree/master/examples/quickstart/impala . > We should do a better quickstart container with the following features: > * Store data in docker volumes > * Use the daemon containers that are more production ready > * Have an easy solution for loading data > * Support Kudu and Hive tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10501) Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= num_buffered_values_
Tim Armstrong created IMPALA-10501: -- Summary: Hit DCHECK in parquet-column-readers.cc: def_levels_.CacheRemaining() <= num_buffered_values_ Key: IMPALA-10501 URL: https://issues.apache.org/jira/browse/IMPALA-10501 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 4.0 Reporter: Tim Armstrong Assignee: Zoltán Borók-Nagy Attachments: consoleText.3.gz, impalad_coord_exec-0.tar.gz https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3814/ {noformat} F0211 03:55:26.383247 14487 parquet-column-readers.cc:517] be46bb72819942fd:85934edd0001] Check failed: def_levels_.CacheRemaining() <= num_buffered_values_ (921 vs. 916) *** Check failure stack trace: *** @ 0x53646ec google::LogMessage::Fail() @ 0x5365fdc google::LogMessage::SendToLog() @ 0x536404a google::LogMessage::Flush() @ 0x5367c48 google::LogMessageFatal::~LogMessageFatal() @ 0x2ff886f impala::ScalarColumnReader<>::MaterializeValueBatch<>() @ 0x2f8ae44 impala::ScalarColumnReader<>::MaterializeValueBatch<>() @ 0x2f761bf impala::ScalarColumnReader<>::ReadValueBatch<>() @ 0x2f2889a impala::ScalarColumnReader<>::ReadValueBatch() @ 0x2ebd8c0 impala::HdfsParquetScanner::AssembleRows() @ 0x2eb882e impala::HdfsParquetScanner::GetNextInternal() @ 0x2eb67bd impala::HdfsParquetScanner::ProcessSplit() @ 0x2aaf3f2 impala::HdfsScanNode::ProcessSplit() @ 0x2aae773 impala::HdfsScanNode::ScannerThread() @ 0x2aadadb _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv @ 0x2aafe94 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x220e331 boost::function0<>::operator()() @ 0x2842e7f impala::Thread::SuperviseThread() @ 0x284ae1c boost::_bi::list5<>::operator()<>() @ 0x284ad40 boost::_bi::bind_t<>::operator()() @ 0x284ad01 boost::detail::thread_data<>::run() @ 0x406b291 thread_proxy @ 0x7f2465cba6b9 start_thread @ 0x7f24627e64dc clone rImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:866) {noformat} It was likely a fuzz test: {noformat} 19:55:23 query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit: 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 19:55:23 [gw5] PASSED query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit: 50 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 19:55:23 query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit: 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 19:55:25 [gw2] PASSED query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol: beeswax | exec_option: {'mt_dop': 0, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 19:55:25 query_test/test_queries.py::TestPartitionKeyScans::test_partition_key_scans[protocol: beeswax | exec_option: {'mt_dop': 1, 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] 19:55:26 [gw5] PASSED query_test/test_mem_usage_scaling.py::TestTpchMemLimitError::test_low_mem_limit_q22[mem_limit: 80 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, '
[jira] [Commented] (IMPALA-10470) Update wiki and README with info about Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282824#comment-17282824 ] Tim Armstrong commented on IMPALA-10470: I updated the front page of CWiki > Update wiki and README with info about Impala quickstart > > > Key: IMPALA-10470 > URL: https://issues.apache.org/jira/browse/IMPALA-10470 > Project: IMPALA > Issue Type: Sub-task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10470) Update wiki and README with info about Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10470 started by Tim Armstrong. -- > Update wiki and README with info about Impala quickstart > > > Key: IMPALA-10470 > URL: https://issues.apache.org/jira/browse/IMPALA-10470 > Project: IMPALA > Issue Type: Sub-task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10470) Update wiki and README with info about Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10470: --- Summary: Update wiki and README with info about Impala quickstart (was: Update wiki with info about Impala quickstart) > Update wiki and README with info about Impala quickstart > > > Key: IMPALA-10470 > URL: https://issues.apache.org/jira/browse/IMPALA-10470 > Project: IMPALA > Issue Type: Sub-task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n
[ https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10228: --- Affects Version/s: Impala 4.0 > Avoid or codegen std::map comparisons in partitioned top-n > -- > > Key: IMPALA-10228 > URL: https://issues.apache.org/jira/browse/IMPALA-10228 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Tim Armstrong >Priority: Major > Labels: codegen, performance > > The partitioned top-n implementation currently uses std::map to store the > heaps. We can't inline the tuple comparator easily because of the > indirections in the standard library code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n
[ https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10228: --- Labels: codegen performance (was: ) > Avoid or codegen std::map comparisons in partitioned top-n > -- > > Key: IMPALA-10228 > URL: https://issues.apache.org/jira/browse/IMPALA-10228 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: codegen, performance > > The partitioned top-n implementation currently uses std::map to store the > heaps. We can't inline the tuple comparator easily because of the > indirections in the standard library code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9853) Push rank() predicates into sort
[ https://issues.apache.org/jira/browse/IMPALA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9853. --- Fix Version/s: Impala 4.0 Resolution: Fixed Commit b42c64993d46893488a667fb9c425548fdf964ab in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b42c649 ] IMPALA-9979: part 2: partitioned top-n Planner changes: --- The planner now identifies predicates that can be converted into limits in a partitioned or unpartitioned top-n with the following method: Push down predicates that reference analytic tuple into inline view. These will be evaluated after the analytic plan for the inline SelectStmt is generated. Identify predicates that reference the analytic tuple and could be converted to limits. If they can be applied to the last sort group of the analytic plan, and the windows are all compatible, then the lowest limit gets converted into a limit in the top N. Otherwise generate a select node with the conjuncts. We add logic to merge SELECT nodes to avoid generating duplicates from inside and outside the inline view. The pushed predicate is still added to the SELECT node because it is necessary for correctness for predicates like '=' to filter additional rows and also the limit pushdown optimization looks for analytic predicates there, so retaining all predicates simplifies that. The selectivity of the predicate is adjusted so that cardinality estimates remain accurate. The optimization can be disabled by setting ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is only enabled for limits of 1000 or less, because the in-memory Top-N may perform significantly worse than a full sort for large heaps (since updating the heap for every input row ends up being more expensive than doing a traditional sort). We could probably optimize this more with better tuning so that it can gracefully fall back to doing the full sort at runtime. rank() and row_number() are handled. rank() needs support in the TopN node to include ties for the last place, which is also added in this patch. If predicates are trivially false, we generate empty nodes. This interacts with the limit pushdwon optimization. The limit pushdown optimization is applied after the partitioned top-n is generated, and can sometimes result in more optimal plans, so it is generalized to handle pushing into partitioned top-n nodes. Backend changes: --- The top-n node in the backend is augmented to handle the partitioned case, for which we use a std::map and a comparator based on the partition exprs. The partitioned top-n node has a soft limit of 64MB on the size of the in-memory heaps and can spill with use of an embedded Sorter. The current implementation tries to evict heaps that are less effective at filtering rows. Limitations: --- There are several possible extensions to this that we did not do: dense_rank() is not supported because it would require additional backend support - IMPALA-10014. ntile() is not supported because it would require additional backend support - IMPALA-10174. Only one predicate per analytic is pushed. Redundant rank()/row_number() predicates are not merged, only the lowest is chosen. Lower bounds are not converted into OFFSET. The analytic operator cannot be eliminated even if the analytic expression was only used in the predicate. This doesn't push predicates into UNION - IMPALA-10013 Always false predicates don't result in empty plan - IMPALA-10015 Tests: Planner tests - added tests that exercise the interesting code paths added in planning. Predicate ordering in SELECT nodes changed in a couple of cases because some predicates were pushed into the inline views. Modified SORT targeted perf tests to avoid conversion to Top-N Added targeted perf test for partitioned top-n. End-to-end tests Unpartitioned Top-N end-to-end tests Basic partitioning and duplicate handling tests on functional Similar basic tests on larger inputs from TPC-DS and with larger partition counts. I inspected the results and also ran the same tests with analytic_rank_pushdown_threshold=0 to confirm that the results were the same as with the full sort. Fallback to spilling sort. Perf: Added a targeted benchmark that goes from ~2s to ~1s with mt_dop=8 on TPC-H 30 on my desktop. Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Reviewed-on: http://gerrit.cloudera.org:8080/16242 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Push rank() predicates into sort > > > Key: IMPALA-9853 > URL: https://issues.apache.org/jira/browse/IMPALA-9853 > Project: IMPALA > Issue T
[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n
[ https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10228: --- Priority: Minor (was: Major) > Avoid or codegen std::map comparisons in partitioned top-n > -- > > Key: IMPALA-10228 > URL: https://issues.apache.org/jira/browse/IMPALA-10228 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Tim Armstrong >Priority: Minor > Labels: codegen, performance > > The partitioned top-n implementation currently uses std::map to store the > heaps. We can't inline the tuple comparator easily because of the > indirections in the standard library code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10228) Avoid or codegen std::map comparisons in partitioned top-n
[ https://issues.apache.org/jira/browse/IMPALA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10228: --- Parent: (was: IMPALA-9853) Issue Type: Improvement (was: Sub-task) > Avoid or codegen std::map comparisons in partitioned top-n > -- > > Key: IMPALA-10228 > URL: https://issues.apache.org/jira/browse/IMPALA-10228 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > > The partitioned top-n implementation currently uses std::map to store the > heaps. We can't inline the tuple comparator easily because of the > indirections in the standard library code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2783) Push down filters on rank similar to limit
[ https://issues.apache.org/jira/browse/IMPALA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-2783. --- Fix Version/s: Impala 4.0 Resolution: Fixed Fixed with Commit b42c64993d46893488a667fb9c425548fdf964ab in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b42c649 ] IMPALA-9979: part 2: partitioned top-n Planner changes: --- The planner now identifies predicates that can be converted into limits in a partitioned or unpartitioned top-n with the following method: Push down predicates that reference analytic tuple into inline view. These will be evaluated after the analytic plan for the inline SelectStmt is generated. Identify predicates that reference the analytic tuple and could be converted to limits. If they can be applied to the last sort group of the analytic plan, and the windows are all compatible, then the lowest limit gets converted into a limit in the top N. Otherwise generate a select node with the conjuncts. We add logic to merge SELECT nodes to avoid generating duplicates from inside and outside the inline view. The pushed predicate is still added to the SELECT node because it is necessary for correctness for predicates like '=' to filter additional rows and also the limit pushdown optimization looks for analytic predicates there, so retaining all predicates simplifies that. The selectivity of the predicate is adjusted so that cardinality estimates remain accurate. The optimization can be disabled by setting ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is only enabled for limits of 1000 or less, because the in-memory Top-N may perform significantly worse than a full sort for large heaps (since updating the heap for every input row ends up being more expensive than doing a traditional sort). We could probably optimize this more with better tuning so that it can gracefully fall back to doing the full sort at runtime. rank() and row_number() are handled. rank() needs support in the TopN node to include ties for the last place, which is also added in this patch. If predicates are trivially false, we generate empty nodes. This interacts with the limit pushdwon optimization. The limit pushdown optimization is applied after the partitioned top-n is generated, and can sometimes result in more optimal plans, so it is generalized to handle pushing into partitioned top-n nodes. Backend changes: --- The top-n node in the backend is augmented to handle the partitioned case, for which we use a std::map and a comparator based on the partition exprs. The partitioned top-n node has a soft limit of 64MB on the size of the in-memory heaps and can spill with use of an embedded Sorter. The current implementation tries to evict heaps that are less effective at filtering rows. Limitations: --- There are several possible extensions to this that we did not do: dense_rank() is not supported because it would require additional backend support - IMPALA-10014. ntile() is not supported because it would require additional backend support - IMPALA-10174. Only one predicate per analytic is pushed. Redundant rank()/row_number() predicates are not merged, only the lowest is chosen. Lower bounds are not converted into OFFSET. The analytic operator cannot be eliminated even if the analytic expression was only used in the predicate. This doesn't push predicates into UNION - IMPALA-10013 Always false predicates don't result in empty plan - IMPALA-10015 Tests: Planner tests - added tests that exercise the interesting code paths added in planning. Predicate ordering in SELECT nodes changed in a couple of cases because some predicates were pushed into the inline views. Modified SORT targeted perf tests to avoid conversion to Top-N Added targeted perf test for partitioned top-n. End-to-end tests Unpartitioned Top-N end-to-end tests Basic partitioning and duplicate handling tests on functional Similar basic tests on larger inputs from TPC-DS and with larger partition counts. I inspected the results and also ran the same tests with analytic_rank_pushdown_threshold=0 to confirm that the results were the same as with the full sort. Fallback to spilling sort. Perf: Added a targeted benchmark that goes from ~2s to ~1s with mt_dop=8 on TPC-H 30 on my desktop. Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Reviewed-on: http://gerrit.cloudera.org:8080/16242 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Push down filters on rank similar to limit > -- > > Key: IMPALA-2783 > URL: https://issues.apache.org/jira/browse/IMPALA-2783 > Pro
[jira] [Resolved] (IMPALA-9979) Backend partitioned top-n operator
[ https://issues.apache.org/jira/browse/IMPALA-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9979. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Backend partitioned top-n operator > -- > > Key: IMPALA-9979 > URL: https://issues.apache.org/jira/browse/IMPALA-9979 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > > This is to implement the backend support for the partitioned top-n operator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10469) Support pushing quickstart images to Apache repo
[ https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10469. Fix Version/s: Impala 4.0 Resolution: Fixed > Support pushing quickstart images to Apache repo > > > Key: IMPALA-10469 > URL: https://issues.apache.org/jira/browse/IMPALA-10469 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > > We need a naming scheme and maybe a script to do the push. We've so far > assumed a different repository for each image, but in the Apache docker, we > only have a single repository and need to encode the image type and version > into the tag > See https://hub.docker.com/repository/docker/apache/kudu for an example. > They have: > apache/kudu: > apache/kudu:kudu-python- > apache/kudu:impala-latest > Airflow does the opposite, and this might be easier to use with > IMPALA_QUICKSTART_IMAGE_PREFIX: > https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1&ordering=last_updated -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column
[ https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8721. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Wrong result when Impala reads a Hive written parquet TimeStamp column > -- > > Key: IMPALA-8721 > URL: https://issues.apache.org/jira/browse/IMPALA-8721 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Abhishek Rawat >Assignee: Tim Armstrong >Priority: Critical > Labels: Interoperability, correctness, hive, impala, parquet, > timestamp > Fix For: Impala 4.0 > > > > Easy to repro on latest upstream: > {code:java} > hive> create table t1_hive(c1 timestamp) stored as parquet; > hive> insert into t1_hive values('2009-03-09 01:20:03.6'); > hive> select * from t1_hive; > OK > 2009-03-09 01:20:03.6 > [localhost:21000] default> invalidate metadata t1_hive; > [localhost:21000] default> select * from t1_hive; > Query: select * from t1_hive > Query submitted at: 2019-06-24 09:55:36 (Coordinator: > http://optimus-prime:25000) > Query progress can be monitored at: > http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24 > +---+ > | c1 | > +---+ > | 2009-03-09 09:20:03.6 | +---+ > bin/start-impala-cluster.py > --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true' > [localhost:21000] default> select * from t1_hive; > Query: select * from t1_hive > Query submitted at: 2019-06-24 10:00:22 (Coordinator: > http://optimus-prime:25000) > Query progress can be monitored at: > http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034 > +---+ > | c1 | > +---+ > | 2009-03-09 02:20:03.6 |. < +---+ > > {code} > > This issue is causing testcase test_hive_impala_interop to fail. Untill this > issue is fixed, the testcase will be updated to not include a timestamp > column. The test case should be updated to include a timestamp column once > this issue is fixed. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7092) Re-enable EC tests broken by HDFS-13539
[ https://issues.apache.org/jira/browse/IMPALA-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281953#comment-17281953 ] Tim Armstrong commented on IMPALA-7092: --- These seem to be marked by @SkipIfEC.oom > Re-enable EC tests broken by HDFS-13539 > > > Key: IMPALA-7092 > URL: https://issues.apache.org/jira/browse/IMPALA-7092 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend, Infrastructure >Affects Versions: Impala 3.1.0 >Reporter: Tianyi Wang >Priority: Major > > With HDFS-13539 and HDFS-13540 fixed, we should be able to re-enable some > tests and diagnose the causes of the remaining failed tests without much > noise. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9586) Update query option docs to account for interactions with mt_dop
[ https://issues.apache.org/jira/browse/IMPALA-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9586. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Update query option docs to account for interactions with mt_dop > > > Key: IMPALA-9586 > URL: https://issues.apache.org/jira/browse/IMPALA-9586 > Project: IMPALA > Issue Type: Improvement > Components: Docs >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > > in some cases mt_dop changes the behaviour of other options or makes them a > no-op. We need to update docs to reflect this. > * Setting NUM_NODES=1 along with MT_DOP >=1 effectively reduces MT_DOP to 1, > i.e. only one thread is used. > * NUM_SCANNER_THREADS has no effect when MT_DOP>=1 > * Maybe other changes? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9382) Prototype denser runtime profile implementation
[ https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9382 started by Tim Armstrong. - > Prototype denser runtime profile implementation > --- > > Key: IMPALA-9382 > URL: https://issues.apache.org/jira/browse/IMPALA-9382 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Attachments: profile_504b379400cba9f2_2d2cf007, > tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt > > > RuntimeProfile trees can potentially stress the memory allocator and use up a > lot more memory and cache than is really necessary: > * std::map is used throughout, and allocates a node per map entry. We do > depend on the counters being displayed in-order, but we would probably be > better of storing the counters in a vector and lazily sorting when needed > (since the set of counters is generally static after Prepare()). > * We store the same counter names redundantly all over the place. We'd > probably be best off using a pool of constant counter names (we could just > require registering them upfront). > There may be a small gain from switching thrift to using unordered_map, e.g. > for the info strings that appear with some frequency in profiles. > However, I think we need to restructure the thrift representation and > in-memory representation to get significant gains. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9382) Prototype denser runtime profile implementation
[ https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281873#comment-17281873 ] Tim Armstrong commented on IMPALA-9382: --- Actually I should reduce the verbosity of the default option a bit as part 3 > Prototype denser runtime profile implementation > --- > > Key: IMPALA-9382 > URL: https://issues.apache.org/jira/browse/IMPALA-9382 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Attachments: profile_504b379400cba9f2_2d2cf007, > tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt > > > RuntimeProfile trees can potentially stress the memory allocator and use up a > lot more memory and cache than is really necessary: > * std::map is used throughout, and allocates a node per map entry. We do > depend on the counters being displayed in-order, but we would probably be > better of storing the counters in a vector and lazily sorting when needed > (since the set of counters is generally static after Prepare()). > * We store the same counter names redundantly all over the place. We'd > probably be best off using a pool of constant counter names (we could just > require registering them upfront). > There may be a small gain from switching thrift to using unordered_map, e.g. > for the info strings that appear with some frequency in profiles. > However, I think we need to restructure the thrift representation and > in-memory representation to get significant gains. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-9382) Prototype denser runtime profile implementation
[ https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reopened IMPALA-9382: --- > Prototype denser runtime profile implementation > --- > > Key: IMPALA-9382 > URL: https://issues.apache.org/jira/browse/IMPALA-9382 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > Attachments: profile_504b379400cba9f2_2d2cf007, > tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt > > > RuntimeProfile trees can potentially stress the memory allocator and use up a > lot more memory and cache than is really necessary: > * std::map is used throughout, and allocates a node per map entry. We do > depend on the counters being displayed in-order, but we would probably be > better of storing the counters in a vector and lazily sorting when needed > (since the set of counters is generally static after Prepare()). > * We store the same counter names redundantly all over the place. We'd > probably be best off using a pool of constant counter names (we could just > require registering them upfront). > There may be a small gain from switching thrift to using unordered_map, e.g. > for the info strings that appear with some frequency in profiles. > However, I think we need to restructure the thrift representation and > in-memory representation to get significant gains. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9382) Prototype denser runtime profile implementation
[ https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9382: -- Fix Version/s: (was: Impala 4.0) > Prototype denser runtime profile implementation > --- > > Key: IMPALA-9382 > URL: https://issues.apache.org/jira/browse/IMPALA-9382 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Attachments: profile_504b379400cba9f2_2d2cf007, > tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt > > > RuntimeProfile trees can potentially stress the memory allocator and use up a > lot more memory and cache than is really necessary: > * std::map is used throughout, and allocates a node per map entry. We do > depend on the counters being displayed in-order, but we would probably be > better of storing the counters in a vector and lazily sorting when needed > (since the set of counters is generally static after Prepare()). > * We store the same counter names redundantly all over the place. We'd > probably be best off using a pool of constant counter names (we could just > require registering them upfront). > There may be a small gain from switching thrift to using unordered_map, e.g. > for the info strings that appear with some frequency in profiles. > However, I think we need to restructure the thrift representation and > in-memory representation to get significant gains. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9586) Update query option docs to account for interactions with mt_dop
[ https://issues.apache.org/jira/browse/IMPALA-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9586 started by Tim Armstrong. - > Update query option docs to account for interactions with mt_dop > > > Key: IMPALA-9586 > URL: https://issues.apache.org/jira/browse/IMPALA-9586 > Project: IMPALA > Issue Type: Improvement > Components: Docs >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > in some cases mt_dop changes the behaviour of other options or makes them a > no-op. We need to update docs to reflect this. > * Setting NUM_NODES=1 along with MT_DOP >=1 effectively reduces MT_DOP to 1, > i.e. only one thread is used. > * NUM_SCANNER_THREADS has no effect when MT_DOP>=1 > * Maybe other changes? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9378) CPU usage for runtime profiles with multithreading
[ https://issues.apache.org/jira/browse/IMPALA-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9378: - Assignee: (was: Tim Armstrong) > CPU usage for runtime profiles with multithreading > -- > > Key: IMPALA-9378 > URL: https://issues.apache.org/jira/browse/IMPALA-9378 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: multithreading, performance > Attachments: coord_q5_dop0.svg, coord_q5_dop16.svg > > > [~drorke] reports that significant amounts of time can be spent on the > runtime profile with higher values of mt_dop. This can impact query > performance from the client's point of view since profile serialisation is on > the critical path for closing the query. Also serialising the profile for the > webserver holds the ClientRequestState's lock, so can block query progress. > We should figure out how to make this more efficient. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9884: - Assignee: (was: Tim Armstrong) > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Priority: Critical > Labels: broken-build, flaky > Attachments: impalad-executors.tar.gz, > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10491) Impala parquet scanner should use writer.time.zone when converting Hive timestamps
Tim Armstrong created IMPALA-10491: -- Summary: Impala parquet scanner should use writer.time.zone when converting Hive timestamps Key: IMPALA-10491 URL: https://issues.apache.org/jira/browse/IMPALA-10491 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 3.4.0 Reporter: Tim Armstrong IMPALA-8721 reports some issues with Hive 3 and timezone conversion. HIVE-21290 fixed some of the issues, and also sets writer.time.zone in the Parquet metadata, which provides a better way to determine how the time zone was written. E.g. {noformat} tarmstrong@tarmstrong-Precision-7540:~/impala/impala$ hadoop jar ~/repos/parquet-mr/parquet-tools/target/parquet-tools-1.12.0-SNAPSHOT.jar meta /test-warehouse/asdfgh/00_0 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: reading another 1 footers 21/02/08 20:26:44 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5 file:hdfs://localhost:20500/test-warehouse/asdfgh/00_0 creator: parquet-mr version 1.10.99.7.2.7.0-44 (build 27344fd5fdaa371e364c604f471b340f8bcf8936) extra: writer.date.proleptic = false extra: writer.time.zone = America/Los_Angeles extra: writer.model.name = 3.1.3000.7.2.7.0-44 {noformat} We should use this timezone when converting timestamps, I think either always or when convert_legacy_hive_parquet_utc_timestamps=true. CC [~boroknagyz] [~csringhofer] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column
[ https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281518#comment-17281518 ] Tim Armstrong commented on IMPALA-8721: --- I think this was fixed by HIVE-21290 - the test passes now if I revert IMPALA-8689 > Wrong result when Impala reads a Hive written parquet TimeStamp column > -- > > Key: IMPALA-8721 > URL: https://issues.apache.org/jira/browse/IMPALA-8721 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Abhishek Rawat >Assignee: Tim Armstrong >Priority: Critical > Labels: Interoperability, correctness, hive, impala, parquet, > timestamp > > > Easy to repro on latest upstream: > {code:java} > hive> create table t1_hive(c1 timestamp) stored as parquet; > hive> insert into t1_hive values('2009-03-09 01:20:03.6'); > hive> select * from t1_hive; > OK > 2009-03-09 01:20:03.6 > [localhost:21000] default> invalidate metadata t1_hive; > [localhost:21000] default> select * from t1_hive; > Query: select * from t1_hive > Query submitted at: 2019-06-24 09:55:36 (Coordinator: > http://optimus-prime:25000) > Query progress can be monitored at: > http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24 > +---+ > | c1 | > +---+ > | 2009-03-09 09:20:03.6 | +---+ > bin/start-impala-cluster.py > --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true' > [localhost:21000] default> select * from t1_hive; > Query: select * from t1_hive > Query submitted at: 2019-06-24 10:00:22 (Coordinator: > http://optimus-prime:25000) > Query progress can be monitored at: > http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034 > +---+ > | c1 | > +---+ > | 2009-03-09 02:20:03.6 |. < +---+ > > {code} > > This issue is causing testcase test_hive_impala_interop to fail. Untill this > issue is fixed, the testcase will be updated to not include a timestamp > column. The test case should be updated to include a timestamp column once > this issue is fixed. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8646) Integrate etcd into Impala for cluster membership updates
[ https://issues.apache.org/jira/browse/IMPALA-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8646: - Assignee: (was: Ethan) > Integrate etcd into Impala for cluster membership updates > - > > Key: IMPALA-8646 > URL: https://issues.apache.org/jira/browse/IMPALA-8646 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Ethan >Priority: Minor > Attachments: 91204e6.diff.zip > > > This task involves replacing usage of the statestore membership topic with > etcd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8646) Integrate etcd into Impala for cluster membership updates
[ https://issues.apache.org/jira/browse/IMPALA-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281429#comment-17281429 ] Tim Armstrong commented on IMPALA-8646: --- Uploading patches that [~ethan.xue] had on gerrit > Integrate etcd into Impala for cluster membership updates > - > > Key: IMPALA-8646 > URL: https://issues.apache.org/jira/browse/IMPALA-8646 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Ethan >Assignee: Ethan >Priority: Minor > Attachments: 91204e6.diff.zip > > > This task involves replacing usage of the statestore membership topic with > etcd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8646) Integrate etcd into Impala for cluster membership updates
[ https://issues.apache.org/jira/browse/IMPALA-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8646: -- Attachment: 91204e6.diff.zip > Integrate etcd into Impala for cluster membership updates > - > > Key: IMPALA-8646 > URL: https://issues.apache.org/jira/browse/IMPALA-8646 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Ethan >Assignee: Ethan >Priority: Minor > Attachments: 91204e6.diff.zip > > > This task involves replacing usage of the statestore membership topic with > etcd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8645) Write a basic C++ gRPC client for etcd
[ https://issues.apache.org/jira/browse/IMPALA-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8645: -- Attachment: 5684404.diff.zip a8e3d99.diff.zip > Write a basic C++ gRPC client for etcd > -- > > Key: IMPALA-8645 > URL: https://issues.apache.org/jira/browse/IMPALA-8645 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Ethan >Assignee: Ethan >Priority: Minor > Attachments: 5684404.diff.zip, a8e3d99.diff.zip > > > This task involves creating a basic C++ gRPC client that can interact with a > local etcd pseudo-cluster and has an API that can be used to replace the > functionality of the statestore. Also, this client and its dependencies > should be integrated into the Impala repo. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8645) Write a basic C++ gRPC client for etcd
[ https://issues.apache.org/jira/browse/IMPALA-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8645: - Assignee: (was: Ethan) > Write a basic C++ gRPC client for etcd > -- > > Key: IMPALA-8645 > URL: https://issues.apache.org/jira/browse/IMPALA-8645 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Ethan >Priority: Minor > Attachments: 5684404.diff.zip, a8e3d99.diff.zip > > > This task involves creating a basic C++ gRPC client that can interact with a > local etcd pseudo-cluster and has an API that can be used to replace the > functionality of the statestore. Also, this client and its dependencies > should be integrated into the Impala repo. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8645) Write a basic C++ gRPC client for etcd
[ https://issues.apache.org/jira/browse/IMPALA-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281427#comment-17281427 ] Tim Armstrong commented on IMPALA-8645: --- Uploading patches that [~ethan.xue] submitted to gerrit > Write a basic C++ gRPC client for etcd > -- > > Key: IMPALA-8645 > URL: https://issues.apache.org/jira/browse/IMPALA-8645 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Ethan >Assignee: Ethan >Priority: Minor > Attachments: 5684404.diff.zip, a8e3d99.diff.zip > > > This task involves creating a basic C++ gRPC client that can interact with a > local etcd pseudo-cluster and has an API that can be used to replace the > functionality of the statestore. Also, this client and its dependencies > should be integrated into the Impala repo. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9382) Prototype denser runtime profile implementation
[ https://issues.apache.org/jira/browse/IMPALA-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9382. --- Fix Version/s: Impala 4.0 Resolution: Fixed We have a solid prototype > Prototype denser runtime profile implementation > --- > > Key: IMPALA-9382 > URL: https://issues.apache.org/jira/browse/IMPALA-9382 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > Attachments: profile_504b379400cba9f2_2d2cf007, > tpcds_q10_profile_v1.txt, tpcds_q10_profile_v2.txt, tpcds_q10_profile_v2.txt > > > RuntimeProfile trees can potentially stress the memory allocator and use up a > lot more memory and cache than is really necessary: > * std::map is used throughout, and allocates a node per map entry. We do > depend on the counters being displayed in-order, but we would probably be > better of storing the counters in a vector and lazily sorting when needed > (since the set of counters is generally static after Prepare()). > * We store the same counter names redundantly all over the place. We'd > probably be best off using a pool of constant counter names (we could just > require registering them upfront). > There may be a small gain from switching thrift to using unordered_map, e.g. > for the info strings that appear with some frequency in profiles. > However, I think we need to restructure the thrift representation and > in-memory representation to get significant gains. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7885) Create function to convert to ts from unix millis
[ https://issues.apache.org/jira/browse/IMPALA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-7885: - Assignee: (was: Tim Armstrong) > Create function to convert to ts from unix millis > - > > Key: IMPALA-7885 > URL: https://issues.apache.org/jira/browse/IMPALA-7885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: eugen yushin >Priority: Major > Labels: ramp-up > > There're several functions like > `from_unixtime`/`unix_micros_to_utc_timestamp`/`to_timestamp` in Impala which > accepts seconds and micros, but none of them works with millis. > At the same time, Impala already has all necessary utility methods to add > such a functionality: > [https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-value.inline.h#L54] > {code} > inline TimestampValue TimestampValue::UtcFromUnixTimeMillis(int64_t > unix_time_millis) { > return UtcFromUnixTimeTicks(unix_time_millis); > } > {code} > https://github.com/apache/impala/blob/master/be/src/exprs/timestamp-functions-ir.cc#L141 > {code} > TimestampVal TimestampFunctions::UnixMicrosToUtcTimestamp(FunctionContext* > context, > const BigIntVal& unix_time_micros) { > if (unix_time_micros.is_null) return TimestampVal::null(); > TimestampValue tv = > TimestampValue::UtcFromUnixTimeMicros(unix_time_micros.val); > TimestampVal result; > tv.ToTimestampVal(&result); > return result; > } > {code} > It would be better to have Unix millis to timestamp conversion function as > buit-in functionality to prevent from: > - creating cumbersome 'aliases' like: > {code} > select unix_micros_to_utc_timestamp(1513895588243 * 1000) > {code} > or > http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Why-not-from-unixtime-function-handles-an-unix-timestamp-in/m-p/63182#M3969 > {code} > select cast(1513895588243 div 1000 as timestamp) + interval (1513895588243 % > 1000) milliseconds; > {code} > - writing relatively slow udfs in java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9846) Switch to aggregated runtime profile representation
[ https://issues.apache.org/jira/browse/IMPALA-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9846: - Assignee: (was: Tim Armstrong) > Switch to aggregated runtime profile representation > --- > > Key: IMPALA-9846 > URL: https://issues.apache.org/jira/browse/IMPALA-9846 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: multithreading > > We need to ensure that the aggregated profile is an adequate replacement, > then switch over the default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-6973) auth_to_local not considered for delegated users
[ https://issues.apache.org/jira/browse/IMPALA-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-6973: - Assignee: (was: Tim Armstrong) > auth_to_local not considered for delegated users > > > Key: IMPALA-6973 > URL: https://issues.apache.org/jira/browse/IMPALA-6973 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Adriano >Priority: Major > Labels: seca > > When the user-names are stored in Active Directory in UPPERCASE, but all > usernames in linux/CDH are in lowercase it is usually used the user name > conversion by the auth_to_local_rule. > I.e.: > To perform this conversion, we use the rule: > auth_to_local=RULE:[1:$1@$0](.*@*.COMPANY.COM)s/@.*///L > with the switch "/L" to convert usernames to lower case. > This works for "normal user" authentication, i.e. the webinterfaces, access > to impala via ODBC. > However, when it is used the "delegation user", the auth_to_local_rule is not > used and to get it works the should be configured > in UPPERCASE. > We are checking auth_to_local for the User authentication: > https://github.com/cloudera/Impala/blob/cdh5-2.5.0_5.7.5/fe/src/main/java/com/cloudera/impala/authorization/User.java > but not for the delegated user: > https://github.com/cloudera/Impala/blob/87482a4f367f8c1edd12af494e4992ac8f7aa3ba/be/src/service/impala-hs2-server.cc#L308-L336 > https://github.com/cloudera/Impala/blob/cdh5-2.5.0_5.7.5/be/src/service/impala-server.cc#L1197-L1230 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9774) impala-shell regression when connecting to cluster with SSL enabled
[ https://issues.apache.org/jira/browse/IMPALA-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9774: - Assignee: (was: Tim Armstrong) > impala-shell regression when connecting to cluster with SSL enabled > --- > > Key: IMPALA-9774 > URL: https://issues.apache.org/jira/browse/IMPALA-9774 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Reporter: Tim Armstrong >Priority: Major > > {noformat} > $ impala-shell -i xxx.vpc.cloudera.com -d default -k --ssl --ca_cert > /xx.pem > Starting Impala Shell with Kerberos authentication using Python 2.7.5 > Using service name 'impala' > SSL is enabled > No handlers could be found for logger "thrift.transport.sslcompat" > Error connecting: NotImplementedError, Wrong number of arguments for > overloaded function 'Client_setAttr'. > Possible C/C++ prototypes are: > setAttr(saslwrapper::Client *,std::string const &,std::string const &) > setAttr(saslwrapper::Client *,std::string const &,uint32_t) > {noformat} > This was caused by the unicode changes in "IMPALA-3343, IMPALA-9489: Make > impala-shell compatible with python 3" - in some places a unicode string gets > passed into the sasl library, and the older version of the library can't > handle it. The SASL upgrade - IMPALA-9719 - fixes it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-3902) Multi-threaded query execution
[ https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-3902: -- Fix Version/s: Impala 4.0 > Multi-threaded query execution > -- > > Key: IMPALA-3902 > URL: https://issues.apache.org/jira/browse/IMPALA-3902 > Project: IMPALA > Issue Type: Epic > Components: Backend >Affects Versions: Impala 2.6.0 >Reporter: Marcel Kinard >Assignee: Tim Armstrong >Priority: Minor > Labels: multithreading > Fix For: Impala 4.0 > > > Currently, a single query fragment is run in a quasi-single threaded manner > on a node: the scanners are run in multiple threads, but all other operators > (joins, aggregation) are run in the main thread. > The goal is to add multi-threaded execution on a single node by running > multiple fragment instances (each of which runs in a single thread). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-3902) Multi-threaded query execution
[ https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3902. --- Resolution: Fixed Closing because it is now broadly usable for many use cases. There are some issues that might cause challenges for migration of complex workloads that I have moved to IMPALA-10486. > Multi-threaded query execution > -- > > Key: IMPALA-3902 > URL: https://issues.apache.org/jira/browse/IMPALA-3902 > Project: IMPALA > Issue Type: Epic > Components: Backend >Affects Versions: Impala 2.6.0 >Reporter: Marcel Kinard >Assignee: Tim Armstrong >Priority: Minor > Labels: multithreading > > Currently, a single query fragment is run in a quasi-single threaded manner > on a node: the scanners are run in multiple threads, but all other operators > (joins, aggregation) are run in the main thread. > The goal is to add multi-threaded execution on a single node by running > multiple fragment instances (each of which runs in a single thread). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9586) Update query option docs to account for interactions with mt_dop
[ https://issues.apache.org/jira/browse/IMPALA-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9586: - Assignee: Tim Armstrong > Update query option docs to account for interactions with mt_dop > > > Key: IMPALA-9586 > URL: https://issues.apache.org/jira/browse/IMPALA-9586 > Project: IMPALA > Issue Type: Improvement > Components: Docs >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > in some cases mt_dop changes the behaviour of other options or makes them a > no-op. We need to update docs to reflect this. > * Setting NUM_NODES=1 along with MT_DOP >=1 effectively reduces MT_DOP to 1, > i.e. only one thread is used. > * NUM_SCANNER_THREADS has no effect when MT_DOP>=1 > * Maybe other changes? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10486) Multithreading upgrade path for large clusters
Tim Armstrong created IMPALA-10486: -- Summary: Multithreading upgrade path for large clusters Key: IMPALA-10486 URL: https://issues.apache.org/jira/browse/IMPALA-10486 Project: IMPALA Issue Type: Epic Components: Backend Reporter: Tim Armstrong Issues needed to be able to smoothly enable multithreading for existing workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3902) Multi-threaded query execution
[ https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-3902: - Assignee: (was: Tim Armstrong) > Multi-threaded query execution > -- > > Key: IMPALA-3902 > URL: https://issues.apache.org/jira/browse/IMPALA-3902 > Project: IMPALA > Issue Type: Epic > Components: Backend >Affects Versions: Impala 2.6.0 >Reporter: Marcel Kinard >Priority: Minor > Labels: multithreading > > Currently, a single query fragment is run in a quasi-single threaded manner > on a node: the scanners are run in multiple threads, but all other operators > (joins, aggregation) are run in the main thread. > The goal is to add multi-threaded execution on a single node by running > multiple fragment instances (each of which runs in a single thread). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3902) Multi-threaded query execution
[ https://issues.apache.org/jira/browse/IMPALA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-3902: - Assignee: Tim Armstrong > Multi-threaded query execution > -- > > Key: IMPALA-3902 > URL: https://issues.apache.org/jira/browse/IMPALA-3902 > Project: IMPALA > Issue Type: Epic > Components: Backend >Affects Versions: Impala 2.6.0 >Reporter: Marcel Kinard >Assignee: Tim Armstrong >Priority: Minor > Labels: multithreading > > Currently, a single query fragment is run in a quasi-single threaded manner > on a node: the scanners are run in multiple threads, but all other operators > (joins, aggregation) are run in the main thread. > The goal is to add multi-threaded execution on a single node by running > multiple fragment instances (each of which runs in a single thread). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10470) Update wiki with info about Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-10470: -- Assignee: Tim Armstrong > Update wiki with info about Impala quickstart > - > > Key: IMPALA-10470 > URL: https://issues.apache.org/jira/browse/IMPALA-10470 > Project: IMPALA > Issue Type: Sub-task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10469) Support pushing quickstart images to Apache repo
[ https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-10469: -- Assignee: Tim Armstrong > Support pushing quickstart images to Apache repo > > > Key: IMPALA-10469 > URL: https://issues.apache.org/jira/browse/IMPALA-10469 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > We need a naming scheme and maybe a script to do the push. We've so far > assumed a different repository for each image, but in the Apache docker, we > only have a single repository and need to encode the image type and version > into the tag > See https://hub.docker.com/repository/docker/apache/kudu for an example. > They have: > apache/kudu: > apache/kudu:kudu-python- > apache/kudu:impala-latest > Airflow does the opposite, and this might be easier to use with > IMPALA_QUICKSTART_IMAGE_PREFIX: > https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1&ordering=last_updated -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10469) Support pushing quickstart images to Apache repo
[ https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10469 started by Tim Armstrong. -- > Support pushing quickstart images to Apache repo > > > Key: IMPALA-10469 > URL: https://issues.apache.org/jira/browse/IMPALA-10469 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > We need a naming scheme and maybe a script to do the push. We've so far > assumed a different repository for each image, but in the Apache docker, we > only have a single repository and need to encode the image type and version > into the tag > See https://hub.docker.com/repository/docker/apache/kudu for an example. > They have: > apache/kudu: > apache/kudu:kudu-python- > apache/kudu:impala-latest > Airflow does the opposite, and this might be easier to use with > IMPALA_QUICKSTART_IMAGE_PREFIX: > https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1&ordering=last_updated -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work stopped] (IMPALA-9793) Improved Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9793 stopped by Tim Armstrong. - > Improved Impala quickstart > -- > > Key: IMPALA-9793 > URL: https://issues.apache.org/jira/browse/IMPALA-9793 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > Kudu built a single container quickstart here > https://github.com/apache/kudu/tree/master/examples/quickstart/impala . > We should do a better quickstart container with the following features: > * Store data in docker volumes > * Use the daemon containers that are more production ready > * Have an easy solution for loading data > * Support Kudu and Hive tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9793) Improved Impala quickstart
[ https://issues.apache.org/jira/browse/IMPALA-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9793 started by Tim Armstrong. - > Improved Impala quickstart > -- > > Key: IMPALA-9793 > URL: https://issues.apache.org/jira/browse/IMPALA-9793 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > Kudu built a single container quickstart here > https://github.com/apache/kudu/tree/master/examples/quickstart/impala . > We should do a better quickstart container with the following features: > * Store data in docker volumes > * Use the daemon containers that are more production ready > * Have an easy solution for loading data > * Support Kudu and Hive tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10469) Support pushing quickstart images to Apache repo
[ https://issues.apache.org/jira/browse/IMPALA-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10469: --- Description: We need a naming scheme and maybe a script to do the push. We've so far assumed a different repository for each image, but in the Apache docker, we only have a single repository and need to encode the image type and version into the tag See https://hub.docker.com/repository/docker/apache/kudu for an example. They have: apache/kudu: apache/kudu:kudu-python- apache/kudu:impala-latest Airflow does the opposite, and this might be easier to use with IMPALA_QUICKSTART_IMAGE_PREFIX: https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1&ordering=last_updated was: We need a naming scheme and maybe a script to do the push. We've so far assumed a different repository for each image, but in the Apache docker, we only have a single repository and need to encode the image type and version into the tag See https://hub.docker.com/repository/docker/apache/kudu for an example. They have: apache/kudu: apache/kudu:kudu-python- apache/kudu:impala-latest > Support pushing quickstart images to Apache repo > > > Key: IMPALA-10469 > URL: https://issues.apache.org/jira/browse/IMPALA-10469 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Tim Armstrong >Priority: Major > > We need a naming scheme and maybe a script to do the push. We've so far > assumed a different repository for each image, but in the Apache docker, we > only have a single repository and need to encode the image type and version > into the tag > See https://hub.docker.com/repository/docker/apache/kudu for an example. > They have: > apache/kudu: > apache/kudu:kudu-python- > apache/kudu:impala-latest > Airflow does the opposite, and this might be easier to use with > IMPALA_QUICKSTART_IMAGE_PREFIX: > https://hub.docker.com/repository/registry-1.docker.io/apache/airflow/tags?page=1&ordering=last_updated -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10389) Container for impala-profile-tool
[ https://issues.apache.org/jira/browse/IMPALA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10389. Fix Version/s: Impala 4.0 Resolution: Fixed > Container for impala-profile-tool > - > > Key: IMPALA-10389 > URL: https://issues.apache.org/jira/browse/IMPALA-10389 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > > Following on from IMPALA-9865, it would be useful to have a docker container > available to dump out Impala profiles - this would make it wayyy easier to > consume profile logs without installing anything complex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML
[ https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279237#comment-17279237 ] Tim Armstrong commented on IMPALA-10475: I think I was a bit sloppy saying "traditional", we don't use that in the doc. I guess maybe we should actually just say that it does apply to all filesystem-based tables - transactional tables will be strongly consistent anyway, so the user-facing behaviour will be the same as if SYNC_DDL was used. Maybe we can just weave it into the original sentence "Although INSERT is classified as a DML statement, when the SYNC_DDL option is enabled, INSERT statements on filesystem-based tables ..." > SYNC_DDL docs should clarify that it only affects DML > - > > Key: IMPALA-10475 > URL: https://issues.apache.org/jira/browse/IMPALA-10475 > Project: IMPALA > Issue Type: Task > Components: Docs >Reporter: Tim Armstrong >Assignee: shajini thayasingh >Priority: Major > > https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html > {noformat} > Although INSERT is classified as a DML statement, when the SYNC_DDL option is > enabled, INSERT statements also delay their completion until all the > underlying data and metadata changes are propagated to all Impala nodes. > Internally, Impala inserts have similarities with DDL statements in > traditional database systems, because they create metadata needed to track > HDFS block locations for new files and they potentially add new partitions to > partitioned tables. > {noformat} > I saw someone read this as applying to all tables (Kudu, HBase, etc) but it > only inherently applies to traditional non-transactional filesystem-based > tables. It also applies to transactional tables until IMPALA-8631 is fixed, > after which they will be more strongly consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML
[ https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10475: --- Issue Type: Task (was: Documentation) > SYNC_DDL docs should clarify that it only affects DML > - > > Key: IMPALA-10475 > URL: https://issues.apache.org/jira/browse/IMPALA-10475 > Project: IMPALA > Issue Type: Task > Components: Docs >Reporter: Tim Armstrong >Assignee: shajini thayasingh >Priority: Major > > https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html > {noformat} > Although INSERT is classified as a DML statement, when the SYNC_DDL option is > enabled, INSERT statements also delay their completion until all the > underlying data and metadata changes are propagated to all Impala nodes. > Internally, Impala inserts have similarities with DDL statements in > traditional database systems, because they create metadata needed to track > HDFS block locations for new files and they potentially add new partitions to > partitioned tables. > {noformat} > I saw someone read this as applying to all tables (Kudu, HBase, etc) but it > only inherently applies to traditional non-transactional filesystem-based > tables. It also applies to transactional tables until IMPALA-8631 is fixed, > after which they will be more strongly consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML
[ https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-10475: -- Assignee: shajini thayasingh > SYNC_DDL docs should clarify that it only affects DML > - > > Key: IMPALA-10475 > URL: https://issues.apache.org/jira/browse/IMPALA-10475 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Tim Armstrong >Assignee: shajini thayasingh >Priority: Major > > https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html > {noformat} > Although INSERT is classified as a DML statement, when the SYNC_DDL option is > enabled, INSERT statements also delay their completion until all the > underlying data and metadata changes are propagated to all Impala nodes. > Internally, Impala inserts have similarities with DDL statements in > traditional database systems, because they create metadata needed to track > HDFS block locations for new files and they potentially add new partitions to > partitioned tables. > {noformat} > I saw someone read this as applying to all tables (Kudu, HBase, etc) but it > only inherently applies to traditional non-transactional filesystem-based > tables. It also applies to transactional tables until IMPALA-8631 is fixed, > after which they will be more strongly consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML
[ https://issues.apache.org/jira/browse/IMPALA-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10475: --- Issue Type: Documentation (was: Bug) > SYNC_DDL docs should clarify that it only affects DML > - > > Key: IMPALA-10475 > URL: https://issues.apache.org/jira/browse/IMPALA-10475 > Project: IMPALA > Issue Type: Documentation > Components: Docs >Reporter: Tim Armstrong >Priority: Major > > https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html > {noformat} > Although INSERT is classified as a DML statement, when the SYNC_DDL option is > enabled, INSERT statements also delay their completion until all the > underlying data and metadata changes are propagated to all Impala nodes. > Internally, Impala inserts have similarities with DDL statements in > traditional database systems, because they create metadata needed to track > HDFS block locations for new files and they potentially add new partitions to > partitioned tables. > {noformat} > I saw someone read this as applying to all tables (Kudu, HBase, etc) but it > only inherently applies to traditional non-transactional filesystem-based > tables. It also applies to transactional tables until IMPALA-8631 is fixed, > after which they will be more strongly consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10475) SYNC_DDL docs should clarify that it only affects DML
Tim Armstrong created IMPALA-10475: -- Summary: SYNC_DDL docs should clarify that it only affects DML Key: IMPALA-10475 URL: https://issues.apache.org/jira/browse/IMPALA-10475 Project: IMPALA Issue Type: Bug Components: Docs Reporter: Tim Armstrong https://impala.apache.org/docs/build/html/topics/impala_sync_ddl.html {noformat} Although INSERT is classified as a DML statement, when the SYNC_DDL option is enabled, INSERT statements also delay their completion until all the underlying data and metadata changes are propagated to all Impala nodes. Internally, Impala inserts have similarities with DDL statements in traditional database systems, because they create metadata needed to track HDFS block locations for new files and they potentially add new partitions to partitioned tables. {noformat} I saw someone read this as applying to all tables (Kudu, HBase, etc) but it only inherently applies to traditional non-transactional filesystem-based tables. It also applies to transactional tables until IMPALA-8631 is fixed, after which they will be more strongly consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-4373) Wrong results with correlated WHERE-clause subquery inside a NULL-checking conditional function.
[ https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-4373: - Assignee: (was: Tim Armstrong) > Wrong results with correlated WHERE-clause subquery inside a NULL-checking > conditional function. > > > Key: IMPALA-4373 > URL: https://issues.apache.org/jira/browse/IMPALA-4373 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, > Impala 2.8.0, Impala 2.9.0 >Reporter: Alexander Behm >Priority: Critical > Labels: correctness > > Impala may generate an incorrect plan for queries that have a correlated > scalar subquery as a parameter to a NULL-checking conditional function like > ISNULL(). > Example query and incorrect plan: > {code} > select t1.int_col > from functional.alltypessmall as t1 > where t1.int_col >= isnull > ( >( > SELECT > MAX(t2.bigint_col) > FROM > functional.alltypestiny AS t2 > WHERE > t1.id = t2.id + 1 > ), >0 > ) > Fetched 0 row(s) in 1.09s > Single-node plan: > +---+ > | Explain String| > +---+ > | Estimated Per-Host Requirements: Memory=0B VCores=0 | > | | > | PLAN-ROOT SINK| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN] | > | | hash predicates: t1.id = t2.id + 1 | > | | other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) | > | | runtime filters: RF000 <- t2.id + 1| > | | | > | |--02:AGGREGATE [FINALIZE]| > | | | output: max(t2.bigint_col) | > | | | group by: t2.id | > | | | | > | | 01:SCAN HDFS [functional.alltypestiny t2] | > | | partitions=4/4 files=4 size=460B| > | | | > | 00:SCAN HDFS [functional.alltypessmall t1]| > |partitions=4/4 files=4 size=6.32KB | > |runtime filters: RF000 -> t1.id| > +---+ > {code} > The query returns an empty result set but instead should return all rows from > t1 because all invocations of the subquery return NULL, and all rows from t1 > satisfy "t1.int_col >= 0". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-4373) Wrong results with correlated WHERE-clause subquery inside a NULL-checking conditional function.
[ https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278346#comment-17278346 ] Tim Armstrong commented on IMPALA-4373: --- The fix might be similar to IMPALA-10382 that [~xqhe] fixed recently. > Wrong results with correlated WHERE-clause subquery inside a NULL-checking > conditional function. > > > Key: IMPALA-4373 > URL: https://issues.apache.org/jira/browse/IMPALA-4373 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, > Impala 2.8.0, Impala 2.9.0 >Reporter: Alexander Behm >Assignee: Tim Armstrong >Priority: Critical > Labels: correctness > > Impala may generate an incorrect plan for queries that have a correlated > scalar subquery as a parameter to a NULL-checking conditional function like > ISNULL(). > Example query and incorrect plan: > {code} > select t1.int_col > from functional.alltypessmall as t1 > where t1.int_col >= isnull > ( >( > SELECT > MAX(t2.bigint_col) > FROM > functional.alltypestiny AS t2 > WHERE > t1.id = t2.id + 1 > ), >0 > ) > Fetched 0 row(s) in 1.09s > Single-node plan: > +---+ > | Explain String| > +---+ > | Estimated Per-Host Requirements: Memory=0B VCores=0 | > | | > | PLAN-ROOT SINK| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN] | > | | hash predicates: t1.id = t2.id + 1 | > | | other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) | > | | runtime filters: RF000 <- t2.id + 1| > | | | > | |--02:AGGREGATE [FINALIZE]| > | | | output: max(t2.bigint_col) | > | | | group by: t2.id | > | | | | > | | 01:SCAN HDFS [functional.alltypestiny t2] | > | | partitions=4/4 files=4 size=460B| > | | | > | 00:SCAN HDFS [functional.alltypessmall t1]| > |partitions=4/4 files=4 size=6.32KB | > |runtime filters: RF000 -> t1.id| > +---+ > {code} > The query returns an empty result set but instead should return all rows from > t1 because all invocations of the subquery return NULL, and all rows from t1 > satisfy "t1.int_col >= 0". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10470) Update wiki with info about Impala quickstart
Tim Armstrong created IMPALA-10470: -- Summary: Update wiki with info about Impala quickstart Key: IMPALA-10470 URL: https://issues.apache.org/jira/browse/IMPALA-10470 Project: IMPALA Issue Type: Sub-task Reporter: Tim Armstrong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10469) Support pushing quickstart images to Apache repo
Tim Armstrong created IMPALA-10469: -- Summary: Support pushing quickstart images to Apache repo Key: IMPALA-10469 URL: https://issues.apache.org/jira/browse/IMPALA-10469 Project: IMPALA Issue Type: Sub-task Components: Infrastructure Reporter: Tim Armstrong We need a naming scheme and maybe a script to do the push. We've so far assumed a different repository for each image, but in the Apache docker, we only have a single repository and need to encode the image type and version into the tag See https://hub.docker.com/repository/docker/apache/kudu for an example. They have: apache/kudu: apache/kudu:kudu-python- apache/kudu:impala-latest -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-6452) RegEx option support for regexp_extract
[ https://issues.apache.org/jira/browse/IMPALA-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-6452: - Assignee: (was: Tim Armstrong) > RegEx option support for regexp_extract > --- > > Key: IMPALA-6452 > URL: https://issues.apache.org/jira/browse/IMPALA-6452 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Harsh J >Priority: Minor > Labels: ramp-up > > Impala's {{regexp_like}} supports passing options that enable newline and > multi-line matching patterns. The same isn't supported for > {{regexp_extract}}, forcing users to resort to using {{split_part}} or other > techniques that work with newline characters in a string. > > Please consider supporting options similar to those available in > {{regexp_like}} in {{regexp_extract}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9690) Bump minimum x86-64 CPU requirements
[ https://issues.apache.org/jira/browse/IMPALA-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9690: - Assignee: (was: Tim Armstrong) > Bump minimum x86-64 CPU requirements > > > Key: IMPALA-9690 > URL: https://issues.apache.org/jira/browse/IMPALA-9690 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Blocker > Labels: performance > > We still have a minimum CPU requirement of SSSE3 support > https://impala.apache.org/docs/build/html/topics/impala_prereqs.html. I.e. we > don't assume SSE4.2 or AVX or AVX2. > There is a lot of legacy code to support CPUs without SSE4.2 and various > other extensions. As a start, here are all the locations in the code where we > branch based on CPU feature: > {noformat} > :~/impala/impala$ git grep CpuInfo::IsSupport > be/src/benchmarks/int-hash-benchmark.cc: if > (CpuInfo::IsSupported(CpuInfo::SSE4_2)) suite32.BENCH(uint32_t, CRC); > be/src/benchmarks/int-hash-benchmark.cc: if > (CpuInfo::IsSupported(CpuInfo::SSE4_2)) { > be/src/benchmarks/int-hash-benchmark.cc: if > (CpuInfo::IsSupported(CpuInfo::SSE4_1)) { > be/src/benchmarks/int-hash-benchmark.cc: if > (CpuInfo::IsSupported(CpuInfo::AVX2)) { > be/src/benchmarks/string-compare-benchmark.cc: if > (CpuInfo::IsSupported(CpuInfo::SSE4_2)) { > be/src/exec/delimited-text-parser.cc: if > (CpuInfo::IsSupported(CpuInfo::SSE4_2)) { > be/src/exec/delimited-text-parser.cc: if > (CpuInfo::IsSupported(CpuInfo::SSE4_2)) { > be/src/exec/delimited-text-parser.inline.h: > DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/exec/delimited-text-parser.inline.h: if > (LIKELY(CpuInfo::IsSupported(CpuInfo::SSE4_2))) { > be/src/runtime/io/disk-io-mgr.cc: if > (!CpuInfo::IsSupported(CpuInfo::SSE4_2)) { > be/src/util/bit-util-test.cc: if (CpuInfo::IsSupported(CpuInfo::SSSE3)) { > be/src/util/bit-util-test.cc: if (CpuInfo::IsSupported(CpuInfo::AVX2)) { > be/src/util/bit-util-test.cc: if (CpuInfo::IsSupported(cpu_info_flag)) { > be/src/util/bit-util-test.cc:// CpuInfo::IsSupported() checks. This doesn't > test the bug precisely but is a canary for > be/src/util/bit-util.cc:if (CpuInfo::IsSupported(CpuInfo::AVX2)) { > be/src/util/bit-util.cc:} else if > (LIKELY(CpuInfo::IsSupported(CpuInfo::SSSE3))) { > be/src/util/bit-util.cc:if (LIKELY(CpuInfo::IsSupported(CpuInfo::SSSE3))) > { > be/src/util/bit-util.h:if (LIKELY(CpuInfo::IsSupported(CpuInfo::POPCNT))) > { > be/src/util/bloom-filter.cc: if (CpuInfo::IsSupported(CpuInfo::AVX)) { > be/src/util/bloom-filter.h: if (CpuInfo::IsSupported(CpuInfo::AVX2)) { > be/src/util/bloom-filter.h: if (CpuInfo::IsSupported(CpuInfo::AVX2)) { > be/src/util/cpu-info.cc: if (!CpuInfo::IsSupported(CpuInfo::SSSE3)) { > be/src/util/cpu-info.h: /// // line, CpuInfo::IsSupported(CpuInfo::AVX2) > will return false. > be/src/util/cpu-info.h: : feature_(feature), > reenable_(CpuInfo::IsSupported(feature)) { > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2)); > be/src/util/hash-util.h:if > (LIKELY(CpuInfo::IsSupported(CpuInfo::SSE4_2))) { > be/src/util/openssl-util.cc: return > (CpuInfo::IsSupported(CpuInfo::PCLMULQDQ) > {noformat} > We also ship two versions of the codegen module, one of which (nosse42) is > essentially never used. > I think it would be uncontroversial to bump the minimum requirement to > SSE4.2, which would allow us to delete some old fallbacks. I think the last > time Intel or AMD shipped a processor without this was 2010 or 2011. Jumping > to AVX is probably almost as uncontroversial, since it looks like that has > been universal for nearly as long: > https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX. > Some older lower-performance cloud instance types don't support AVX, it looks > like, but I think this is an edge case. > It would be very nice to require AVX2 because that could remove a bunch of > conditional code (I think the contributors adding ARM support might want to > keep the scalar fallback code though, potentially). It looks like most Intel > and AMD processors have supported it since 2013 and 2015 respectively, except > some low-end Intel processors: > https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_
[jira] [Assigned] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization
[ https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-2138: - Assignee: (was: Tim Armstrong) > Get rid of unused columns by upstream operators at points of materialization > > > Key: IMPALA-2138 > URL: https://issues.apache.org/jira/browse/IMPALA-2138 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2 >Reporter: Ippokratis Pandis >Priority: Major > Labels: performance > Attachments: 0001-Projection-prototype.patch, performance_result.txt > > > It would be a very good performance improvement if we were able to get rid of > columns as soon as we know that they are not going to be used from any other > operators upstream. The amount of data we are handling will reduce making the > network and I/O (spilling) transfers more efficient. It will also improve > cache performance. > The current row-wise in-memory format does not make it very easy to get rid > of such unused columns. However, there are points of materialization where we > copy-out the tuples and we can actually perform these projections. There are > multiple points of materialization, notably: > * The exchange operator > * The build side of hash join > * The probe side of hash join when we have spilling > * The aggregation > * Sorts and analytic function evaluation > In order to do these projections we need to modify the FE and know at each > operator what's the minimum set of columns that are being referenced by this > operator and all the upstream ones. (That minimum set is very easy to be > calculated during an additional top-down traversal of the plan.) We also need > to modify the BE and make the copy-out operation aware of such projections. > Assigning first to Alex, because of the needed FE changes. Happy to take care > of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, > the FE and the BE changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-2268) implicit casting of string to timestamp for functions
[ https://issues.apache.org/jira/browse/IMPALA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-2268: - Assignee: (was: Tim Armstrong) > implicit casting of string to timestamp for functions > - > > Key: IMPALA-2268 > URL: https://issues.apache.org/jira/browse/IMPALA-2268 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.2 >Reporter: Bharath Vissapragada >Priority: Minor > Labels: newbie, usability > > Consider date_add() builtin. string is automatically cast to a timestamp. > {code} > select date_add( "1900-01-01", 1 ) ; > Query: select date_add( "1900-01-01", 1 ) > +---+ > | date_add('1900-01-01', 1) | > +---+ > | 1900-01-02 00:00:00 | > +---+ > Fetched 1 row(s) in 0.12s > {code} > However with an "interval" > {code} > select date_add( '1900-01-01', interval 72 days ) ; > Query: select date_add( '1900-01-01', interval 72 days ) > ERROR: AnalysisException: Operand ''1900-01-01'' of timestamp arithmetic > expression 'DATE_ADD('1900-01-01', INTERVAL 72 days)' returns type 'STRING'. > Expected type 'TIMESTAMP'. > {code} > We need to manually cast it to a timestamp, something like, > {code} > select date_add(cast("1900-01-01" as TIMESTAMP), interval 10 days ) ; > Query: select date_add(cast("1900-01-01" as TIMESTAMP), interval 10 days ) > +-+ > | date_add(cast('1900-01-01' as timestamp), interval 10 days) | > +-+ > | 1900-01-11 00:00:00 | > +-+ > Fetched 1 row(s) in 0.02s > {code} > Its convenient to make this behavior consistent across all builtins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8306) Debug WebUI's Sessions page verbiage clarification
[ https://issues.apache.org/jira/browse/IMPALA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8306. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Debug WebUI's Sessions page verbiage clarification > -- > > Key: IMPALA-8306 > URL: https://issues.apache.org/jira/browse/IMPALA-8306 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.12.0, Impala 3.1.0 >Reporter: Vincent Tran >Assignee: Tim Armstrong >Priority: Minor > Labels: supportability > Fix For: Impala 4.0 > > Attachments: sessions.png > > > Currently, the Debug WebUI's Sessions page captures both active sessions and > expired sessions. On the top of the page there is a message along the line of: > {noformat} > There are {{num_sessions}} sessions, of which {{num_active}} are active. > Sessions may be closed either when they are idle for some time (see Idle > Timeout > below), or if they are deliberately closed, otherwise they are called active. > {noformat} > This text is ambiguous for me. If all non-active sessions are expired > sessions, it should explicitly tell the user that. And since an active > session becomes an expired session when it breaches the Session Idle Timeout, > the second sentence is also somewhat misleading. User has to "deliberately > close" both active sessions and expired sessions to close them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-1652) Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed column.
[ https://issues.apache.org/jira/browse/IMPALA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273369#comment-17273369 ] Tim Armstrong commented on IMPALA-1652: --- [~stigahuang] it does require some additional logic in the CHAR->STRING cast - https://gerrit.cloudera.org/#/c/16339/3/be/src/exprs/cast-functions-ir.cc. But it doesn't require additional casts or any additional copies. I haven't measured but I think it would be very cheap in practice. > Maybe we can store the actual length of each CHAR value in the tuple layout, > and calculte the actual length once when materializing the value I think if we're going to do change the slot layout, it's probably easier to treat it as variable-length and store it in a StringValue. > Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed > column. > --- > > Key: IMPALA-1652 > URL: https://issues.apache.org/jira/browse/IMPALA-1652 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.1, Impala 2.3.0 >Reporter: Alexander Behm >Priority: Major > Labels: correctness, downgraded, usability > Attachments: 8be18d4.diff > > > Repro: > {code} > create table foo(col1 char(10)); > insert into foo values (cast('test1' as char(10))); > select * from foo where col1 = 'test1'; <-- returns an empty result set > select * from foo where col1 = cast('test1' as char(10)); <-- correctly > returns 1 row > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10404) Update docs to reflect RLE_DICTIONARY support
[ https://issues.apache.org/jira/browse/IMPALA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10404. Fix Version/s: Impala 4.0 Resolution: Fixed > Update docs to reflect RLE_DICTIONARY support > - > > Key: IMPALA-10404 > URL: https://issues.apache.org/jira/browse/IMPALA-10404 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9726) Update boilerplate in the PyPI sidebar for impala-shell supported versions
[ https://issues.apache.org/jira/browse/IMPALA-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9726: - Assignee: (was: Tim Armstrong) > Update boilerplate in the PyPI sidebar for impala-shell supported versions > -- > > Key: IMPALA-9726 > URL: https://issues.apache.org/jira/browse/IMPALA-9726 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Affects Versions: Impala 4.0 >Reporter: David Knupp >Priority: Minor > > The following lines need to be updated to reflect that the shell now supports > python 2.7+ and 3+. > https://github.com/apache/impala/blob/master/shell/packaging/setup.py#L164-167 > {noformat} > 'Programming Language :: Python :: 2 :: Only', > 'Programming Language :: Python :: 2.6', > 'Programming Language :: Python :: 2.7', > {noformat} > Note that this has no effect on the actual installation. The following line > is what manages that, and its value is correct for both Impala 3.4.0 and > Impala 4.0: > https://github.com/apache/impala/blob/master/shell/packaging/setup.py#L138 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8306) Debug WebUI's Sessions page verbiage clarification
[ https://issues.apache.org/jira/browse/IMPALA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8306 started by Tim Armstrong. - > Debug WebUI's Sessions page verbiage clarification > -- > > Key: IMPALA-8306 > URL: https://issues.apache.org/jira/browse/IMPALA-8306 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.12.0, Impala 3.1.0 >Reporter: Vincent Tran >Assignee: Tim Armstrong >Priority: Minor > Labels: supportability > Attachments: sessions.png > > > Currently, the Debug WebUI's Sessions page captures both active sessions and > expired sessions. On the top of the page there is a message along the line of: > {noformat} > There are {{num_sessions}} sessions, of which {{num_active}} are active. > Sessions may be closed either when they are idle for some time (see Idle > Timeout > below), or if they are deliberately closed, otherwise they are called active. > {noformat} > This text is ambiguous for me. If all non-active sessions are expired > sessions, it should explicitly tell the user that. And since an active > session becomes an expired session when it breaches the Session Idle Timeout, > the second sentence is also somewhat misleading. User has to "deliberately > close" both active sessions and expired sessions to close them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8306) Debug WebUI's Sessions page verbiage clarification
[ https://issues.apache.org/jira/browse/IMPALA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271701#comment-17271701 ] Tim Armstrong commented on IMPALA-8306: --- [~thundergun] I had a go at improving this here - http://gerrit.cloudera.org:8080/16981. Would welcome your feedback. > Debug WebUI's Sessions page verbiage clarification > -- > > Key: IMPALA-8306 > URL: https://issues.apache.org/jira/browse/IMPALA-8306 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.12.0, Impala 3.1.0 >Reporter: Vincent Tran >Assignee: Tim Armstrong >Priority: Minor > Labels: supportability > Attachments: sessions.png > > > Currently, the Debug WebUI's Sessions page captures both active sessions and > expired sessions. On the top of the page there is a message along the line of: > {noformat} > There are {{num_sessions}} sessions, of which {{num_active}} are active. > Sessions may be closed either when they are idle for some time (see Idle > Timeout > below), or if they are deliberately closed, otherwise they are called active. > {noformat} > This text is ambiguous for me. If all non-active sessions are expired > sessions, it should explicitly tell the user that. And since an active > session becomes an expired session when it breaches the Session Idle Timeout, > the second sentence is also somewhat misleading. User has to "deliberately > close" both active sessions and expired sessions to close them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-3657) Permission upon insert are wrong in hive warehouse table files
[ https://issues.apache.org/jira/browse/IMPALA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3657. --- Resolution: Not A Bug > Permission upon insert are wrong in hive warehouse table files > -- > > Key: IMPALA-3657 > URL: https://issues.apache.org/jira/browse/IMPALA-3657 > Project: IMPALA > Issue Type: Bug > Components: Security >Affects Versions: Impala 2.2.3 > Environment: Cluster is Kerberized and has sentry >Reporter: Bala Chander >Assignee: Tim Armstrong >Priority: Minor > Labels: security > > Found an issue with permissions on warehouse. > The Warehouse /user/hive/warehouse was set to owner hive:hive with 771 > permissions recursively. User was granted write privilege on table (tbl-1) on > database (db-1). > Initially all grants were done with beeline. > Next the user switched to impala-shell and inserted some data into tbl-1. The > permissions on the new hdfs file was the following: > ownership : impala:hive > permissions: 751 i.e. read and execute on group. > The user cannot use insert overwrite via beeline sine the group hive has read > only permissions. > The documentation: > http://www.cloudera.com/documentation/enterprise/latest/topics/impala_insert.html > has the following: > Related startup options: > By default, if an INSERT statement creates any new subdirectories underneath > a partitioned table, those subdirectories are assigned default HDFS > permissions for the impala user. To make each subdirectory have the same > permissions as its parent directory in HDFS, specify the > --insert_inherit_permissions startup option for the impalad daemon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3657) Permission upon insert are wrong in hive warehouse table files
[ https://issues.apache.org/jira/browse/IMPALA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271655#comment-17271655 ] Tim Armstrong commented on IMPALA-3657: --- This is controlled by the fs.permissions.umask-mode setting in hdfs-site.xml, which defaults to 022. It could make sense to change it to 002 if you're in a setup like this where Impala is in the hive group. This is probably not something that needs to be fixed in Apache Impala, but rather in management software that sets up users/groups etc. > Permission upon insert are wrong in hive warehouse table files > -- > > Key: IMPALA-3657 > URL: https://issues.apache.org/jira/browse/IMPALA-3657 > Project: IMPALA > Issue Type: Bug > Components: Security >Affects Versions: Impala 2.2.3 > Environment: Cluster is Kerberized and has sentry >Reporter: Bala Chander >Assignee: Tim Armstrong >Priority: Minor > Labels: security > > Found an issue with permissions on warehouse. > The Warehouse /user/hive/warehouse was set to owner hive:hive with 771 > permissions recursively. User was granted write privilege on table (tbl-1) on > database (db-1). > Initially all grants were done with beeline. > Next the user switched to impala-shell and inserted some data into tbl-1. The > permissions on the new hdfs file was the following: > ownership : impala:hive > permissions: 751 i.e. read and execute on group. > The user cannot use insert overwrite via beeline sine the group hive has read > only permissions. > The documentation: > http://www.cloudera.com/documentation/enterprise/latest/topics/impala_insert.html > has the following: > Related startup options: > By default, if an INSERT statement creates any new subdirectories underneath > a partitioned table, those subdirectories are assigned default HDFS > permissions for the impala user. To make each subdirectory have the same > permissions as its parent directory in HDFS, specify the > --insert_inherit_permissions startup option for the impalad daemon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-1652) Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed column.
[ https://issues.apache.org/jira/browse/IMPALA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271638#comment-17271638 ] Tim Armstrong commented on IMPALA-1652: --- I had a WIP here - https://gerrit.cloudera.org/#/c/16339/ that illustrated how it might be possible to tweak > Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed > column. > --- > > Key: IMPALA-1652 > URL: https://issues.apache.org/jira/browse/IMPALA-1652 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.1, Impala 2.3.0 >Reporter: Alexander Behm >Priority: Major > Labels: correctness, downgraded, usability > Attachments: 8be18d4.diff > > > Repro: > {code} > create table foo(col1 char(10)); > insert into foo values (cast('test1' as char(10))); > select * from foo where col1 = 'test1'; <-- returns an empty result set > select * from foo where col1 = cast('test1' as char(10)); <-- correctly > returns 1 row > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization
[ https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271635#comment-17271635 ] Tim Armstrong commented on IMPALA-2138: --- Abandoned - https://gerrit.cloudera.org/#/c/14216/ https://gerrit.cloudera.org/#/c/14399/1 > Get rid of unused columns by upstream operators at points of materialization > > > Key: IMPALA-2138 > URL: https://issues.apache.org/jira/browse/IMPALA-2138 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2 >Reporter: Ippokratis Pandis >Assignee: Tim Armstrong >Priority: Major > Labels: performance > Attachments: 0001-Projection-prototype.patch, performance_result.txt > > > It would be a very good performance improvement if we were able to get rid of > columns as soon as we know that they are not going to be used from any other > operators upstream. The amount of data we are handling will reduce making the > network and I/O (spilling) transfers more efficient. It will also improve > cache performance. > The current row-wise in-memory format does not make it very easy to get rid > of such unused columns. However, there are points of materialization where we > copy-out the tuples and we can actually perform these projections. There are > multiple points of materialization, notably: > * The exchange operator > * The build side of hash join > * The probe side of hash join when we have spilling > * The aggregation > * Sorts and analytic function evaluation > In order to do these projections we need to modify the FE and know at each > operator what's the minimum set of columns that are being referenced by this > operator and all the upstream ones. (That minimum set is very easy to be > calculated during an additional top-down traversal of the plan.) We also need > to modify the BE and make the copy-out operation aware of such projections. > Assigning first to Alex, because of the needed FE changes. Happy to take care > of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, > the FE and the BE changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7885) Create function to convert to ts from unix millis
[ https://issues.apache.org/jira/browse/IMPALA-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7885: -- Labels: ramp-up (was: ) > Create function to convert to ts from unix millis > - > > Key: IMPALA-7885 > URL: https://issues.apache.org/jira/browse/IMPALA-7885 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: eugen yushin >Assignee: Tim Armstrong >Priority: Major > Labels: ramp-up > > There're several functions like > `from_unixtime`/`unix_micros_to_utc_timestamp`/`to_timestamp` in Impala which > accepts seconds and micros, but none of them works with millis. > At the same time, Impala already has all necessary utility methods to add > such a functionality: > [https://github.com/apache/impala/blob/master/be/src/runtime/timestamp-value.inline.h#L54] > {code} > inline TimestampValue TimestampValue::UtcFromUnixTimeMillis(int64_t > unix_time_millis) { > return UtcFromUnixTimeTicks(unix_time_millis); > } > {code} > https://github.com/apache/impala/blob/master/be/src/exprs/timestamp-functions-ir.cc#L141 > {code} > TimestampVal TimestampFunctions::UnixMicrosToUtcTimestamp(FunctionContext* > context, > const BigIntVal& unix_time_micros) { > if (unix_time_micros.is_null) return TimestampVal::null(); > TimestampValue tv = > TimestampValue::UtcFromUnixTimeMicros(unix_time_micros.val); > TimestampVal result; > tv.ToTimestampVal(&result); > return result; > } > {code} > It would be better to have Unix millis to timestamp conversion function as > buit-in functionality to prevent from: > - creating cumbersome 'aliases' like: > {code} > select unix_micros_to_utc_timestamp(1513895588243 * 1000) > {code} > or > http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Why-not-from-unixtime-function-handles-an-unix-timestamp-in/m-p/63182#M3969 > {code} > select cast(1513895588243 div 1000 as timestamp) + interval (1513895588243 % > 1000) milliseconds; > {code} > - writing relatively slow udfs in java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9457) Lazy start of disk threads in I/O manager
[ https://issues.apache.org/jira/browse/IMPALA-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9457: - Assignee: (was: Tim Armstrong) > Lazy start of disk threads in I/O manager > - > > Key: IMPALA-9457 > URL: https://issues.apache.org/jira/browse/IMPALA-9457 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: ramp-up, supportability > > Currently DiskIoMgr starts all the I/O threads upfront for all supported > filesystems. This means there are 100s of idle threads in most impalads that > never do anything. It would be sensible to start the threads for a disk only > when the first range is submitted. It's not immediately obvious where the > best place to do this is. A couple of ideas: > * Try to do it in ScheduleContext in a lightweight way, e.g. check an atomic > to see if it's been initialised, then acquire a lock and create the threads > if needed. Propagating the status if thread creation fails may be the tricky > part > * Start up one thread per disk, so I/O can always make progress, and start an > extra thread per disk each time a range is pulled off the queue in > DiskQueue::GetNextRequestRange() so that the number of threads ramps up as > scan ranges are submitted. It could potentially be clever and try to track > how many threads are parked and only create new threads if 0 threads are > parked. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10229) Analytic limit pushdown optimization can be applied incorrectly based on predicates present
[ https://issues.apache.org/jira/browse/IMPALA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10229. Fix Version/s: Impala 4.0 Resolution: Fixed Finished both subtasks > Analytic limit pushdown optimization can be applied incorrectly based on > predicates present > --- > > Key: IMPALA-10229 > URL: https://issues.apache.org/jira/browse/IMPALA-10229 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: correctness > Fix For: Impala 4.0 > > > {noformat} > [localhost.EXAMPLE.COM:21050] default> select * from (select month, id, > rank() over (partition by month order by id desc) rnk from > functional_parquet.alltypes WHERE month >= 11) v order by month, id limit 3; > +---+--+-+ > | month | id | rnk | > +---+--+-+ > | 11| 6987 | 3 | > | 11| 6988 | 2 | > | 11| 6989 | 1 | > +---+--+-+ > Fetched 3 row(s) in 4.16s > {noformat} > These are not the top 3 rows when ordering by month, id . Hive's result is > correct: > {noformat} > +--+---++ > | v.month | v.id | v.rnk | > +--+---++ > | 11 | 3040 | 600| > | 11 | 3041 | 599| > | 11 | 3042 | 598| > +--+---++ > {noformat} > I think when there's no select predicates, that the ordering in the analytic > sort needs to exactly match the TOP N sort ordering. I'm not sure if there > are fixes needed for the case where there are select predicates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10453) Support file/partition pruning via runtime filters on Iceberg
Tim Armstrong created IMPALA-10453: -- Summary: Support file/partition pruning via runtime filters on Iceberg Key: IMPALA-10453 URL: https://issues.apache.org/jira/browse/IMPALA-10453 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong This is a placeholder to figure out what we'd need to do to support dynamic file-level pruning in Iceberg using runtime filters, i.e. have parity for partition pruning. * If there is a single partition value per file, then applying bloom filters to the row group stats would be effective at pruning files. * If there are partition transforms, e.g. hash-based, then I think we probably need to track the partition that the file is associated with and then have some custom logic in the parquet scanner to do partition pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10296) Fix analytic limit pushdown when predicates are present
[ https://issues.apache.org/jira/browse/IMPALA-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10296. Fix Version/s: Impala 4.0 Resolution: Fixed > Fix analytic limit pushdown when predicates are present > --- > > Key: IMPALA-10296 > URL: https://issues.apache.org/jira/browse/IMPALA-10296 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: correctness > Fix For: Impala 4.0 > > > This is to fix case 1 of the parent JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10448) observability/test_profile_tool.py fails missing impala-profile-tool during Docker-based tests
[ https://issues.apache.org/jira/browse/IMPALA-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268289#comment-17268289 ] Tim Armstrong commented on IMPALA-10448: Looks like I missed adding this in there. > observability/test_profile_tool.py fails missing impala-profile-tool during > Docker-based tests > -- > > Key: IMPALA-10448 > URL: https://issues.apache.org/jira/browse/IMPALA-10448 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Laszlo Gaal >Assignee: Laszlo Gaal >Priority: Major > > The test executable {{impala-profile-tool}} is missing during the execution > of the test suite EE_TEST_PARALLEL. This is specific to Docker-based tests > (driven by docker/test-with-docker-py), because in that environment test > executables are built only in the container execuring the suite BE_TEST. All > other containers receive only the core Impala binaries as built by > {code} > buildall.sh -notests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9458) Improve runtime profile counters for slow IO from remote stores
[ https://issues.apache.org/jira/browse/IMPALA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9458: - Assignee: (was: Tim Armstrong) > Improve runtime profile counters for slow IO from remote stores > --- > > Key: IMPALA-9458 > URL: https://issues.apache.org/jira/browse/IMPALA-9458 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Sahil Takiar >Priority: Major > Labels: observability > > Remote storage systems (e.g. cloud stores like S3 and ABFS) often have long > tail latencies. Most I/O finishes relatively quickly, but some calls make > take significantly longer. Even for HDFS, this is an issue (e.g. hedged reads > were developed to help mitigate tail latencies, although no such feature > exists for cloud storage connectors). > Currently, scan nodes just track the total amount of time spent reading data. > It would be good to have a summary stats counter that tracks the min, avg, > and max time spent reading data. This should at least allow us to identify > when calls to remote storage services are taking longer than usual. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10153) Support time travel for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266219#comment-17266219 ] Tim Armstrong commented on IMPALA-10153: WIth Kudu my understanding is that you could do temporal queries back until the ancient history marker, beyond which point per-row timestamps are no longer maintained - https://github.com/cloudera/kudu/blob/master/docs/design-docs/tablet-history-gc.md > Support time travel for Iceberg tables > -- > > Key: IMPALA-10153 > URL: https://issues.apache.org/jira/browse/IMPALA-10153 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > > Iceberg tables support snapshots/data versioning/time travel. > It means we can query an older version of the table. > Probably we'll need to extend Impala's SQL syntax to support such queries > (Hive will also support such queries, so we should use the same syntax). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10153) Support time travel for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266218#comment-17266218 ] Tim Armstrong commented on IMPALA-10153: [~patrickangeles] IMPALA-9773 is the JIRA > Support time travel for Iceberg tables > -- > > Key: IMPALA-10153 > URL: https://issues.apache.org/jira/browse/IMPALA-10153 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > > Iceberg tables support snapshots/data versioning/time travel. > It means we can query an older version of the table. > Probably we'll need to extend Impala's SQL syntax to support such queries > (Hive will also support such queries, so we should use the same syntax). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9865) Utility to pretty-print thrift profiles at various levels
[ https://issues.apache.org/jira/browse/IMPALA-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9865. --- Fix Version/s: Impala 4.0 Resolution: Fixed > Utility to pretty-print thrift profiles at various levels > - > > Key: IMPALA-9865 > URL: https://issues.apache.org/jira/browse/IMPALA-9865 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > Attachments: image-2020-11-20-09-25-08-082.png > > > The prototyping work in IMPALA-9382 revealed some hard trade-offs between > having a full-fidelity text profile and readability. > We want to have a text profile with less information by default so that it is > more readable, and rely on the thrift profile for more detailed debugging. > This would be easier if we provided a utility that can pretty-print a thrift > profile at different levels of detail. > This JIRA is to reduce the default level of pretty-printing for aggregated > profiles, but provide a utility that can dump both the basic and full > versions. My thought is to start off with the same 4 levels as explain, but > maybe only implement 2 levels to start off with - basic and extended. > The utility should be able to handle the same cases as > bin/parse-thrift-profile.py (profile log and single profile in a file) and > maybe print only a specified query from a profile log. We can use the > DeserializeFromArchiveString() method that was removed in IMPALA-9381, then > pretty-print the deserialised profile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9486) Creating a Kudu table via JDBC fails with "IllegalArgumentException"
[ https://issues.apache.org/jira/browse/IMPALA-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262890#comment-17262890 ] Tim Armstrong commented on IMPALA-9486: --- I wonder if IMPALA-10027 would prevent this in a different way, at least if the root cause is that an unauthenticated connection doesn't have a valid user set. > Creating a Kudu table via JDBC fails with "IllegalArgumentException" > > > Key: IMPALA-9486 > URL: https://issues.apache.org/jira/browse/IMPALA-9486 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.2.0 >Reporter: Grant Henke >Assignee: Fang-Yu Rao >Priority: Blocker > > A Kudu user reported that though creating tables via impala shell or Hue, > when using an external tool connected via JDBC the create statement fails > with the following: > {noformat} > [ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, > SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, > errorMessage:ImpalaRuntimeException: Error creating Kudu table > 'impala::default.foo' CAUSED BY: IllegalArgumentException: table owner must > not be null or empty ), Query: … > {noformat} > > When debugging the issue further it looks like the call to set the owner on > the Kudu table should not be called if an owner is not explicitly set: > [https://github.com/apache/impala/blob/497a17dbdc0669abd47c2360b8ca94de8b54d413/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L252] > > A possible fix could be to guard the call with _isSetOwner_: > {code:java} > if (msTbl.isSetOwner()) { >tableOpts.setOwner(msTbl.getOwner()); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9958) Implement Introsort by adding a heapsort case
[ https://issues.apache.org/jira/browse/IMPALA-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9958. --- Resolution: Won't Do This isn't an obvious win - we do a randomized median of three pivot selection that's fairly robust. I think we should look at the sort holistically instead of assuming this is the right solution. > Implement Introsort by adding a heapsort case > -- > > Key: IMPALA-9958 > URL: https://issues.apache.org/jira/browse/IMPALA-9958 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Shant Hovsepian >Priority: Minor > > Introsort is the standard hybrid sort implementation > [https://en.wikipedia.org/wiki/Introsort] which chooses between quicksort, > heapsort, and insertion sort given the current sort run size. > > Currently the Sorter uses quicksort with insertion sort for batches smaller > than 16. With introsort in cases where the quisksort partitions the data > above a threshold 2*log(N), then the algorithm switches to using heapsort. > This should help mitigate worse case pivot selections. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8900) Allow /healthz access without authentication
[ https://issues.apache.org/jira/browse/IMPALA-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8900. --- Resolution: Duplicate > Allow /healthz access without authentication > > > Key: IMPALA-8900 > URL: https://issues.apache.org/jira/browse/IMPALA-8900 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Lars Volker >Priority: Major > > When enabling SPNEGO authentication for the debug webpages, /healthz becomes > unavailable. Some tooling might rely on the endpoint being accessible without > authentication and it does not pose a security risk to make it available. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7427) Write Impala version information to writer.model.name footer field of Parquet
[ https://issues.apache.org/jira/browse/IMPALA-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7427: -- Labels: newbie parquet ramp-up (was: parquet) > Write Impala version information to writer.model.name footer field of Parquet > - > > Key: IMPALA-7427 > URL: https://issues.apache.org/jira/browse/IMPALA-7427 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Zoltan Ivanfi >Priority: Minor > Labels: newbie, parquet, ramp-up > > PARQUET-352 added support for the "writer.model.name" property in the Parquet > metadata to identify the object model (application) that wrote the file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6979) BloomFilterBenchmark hits DCHECK
[ https://issues.apache.org/jira/browse/IMPALA-6979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6979. --- Resolution: Later Don't need to track this, if someone wants to use the benchmark they'll need to fix it. > BloomFilterBenchmark hits DCHECK > > > Key: IMPALA-6979 > URL: https://issues.apache.org/jira/browse/IMPALA-6979 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tianyi Wang >Priority: Minor > > Leaving this here in case someone else runs into it and needs to fix the > benchmark. We don't run this benchmark as part of builds so it's not a high > priority to fix. > {noformat} > F0504 15:18:55.533821 26709 bloom-filter.cc:192] Check failed: > !out->always_false > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6895) Reduce flush, close and open calls in SimpleLogger::flush()
[ https://issues.apache.org/jira/browse/IMPALA-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6895. --- Resolution: Won't Do > Reduce flush, close and open calls in SimpleLogger::flush() > --- > > Key: IMPALA-6895 > URL: https://issues.apache.org/jira/browse/IMPALA-6895 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.0 >Reporter: Zoram Thanga >Priority: Minor > Labels: ramp-up > > Currently, SimpleLogger provides a Flush() interface which is used by its > client(s) to periodically (hard-coded to 5 seconds) flush the log file. We > could eliminate these flush threads by keeping track of last flush time, and > have the caller of SimpleLogger::AppendEntry() flush on demand (now - > last_flush_time >= 5 seconds or whatever). > This has the added benefit of reducing contention on the > SimpleLogger::log_file_lock_ mutex to just between the threads adding entries > to the log file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6555) Clean up relationship between DiskIoMgr::min_buffer_size_ and BufferPool::min_buffer_len_
[ https://issues.apache.org/jira/browse/IMPALA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6555. --- Resolution: Later > Clean up relationship between DiskIoMgr::min_buffer_size_ and > BufferPool::min_buffer_len_ > - > > Key: IMPALA-6555 > URL: https://issues.apache.org/jira/browse/IMPALA-6555 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Minor > > They are always the same value in practice, obtained from --min_buffer_size. > We should probably get rid of DiskIoMgr::min_buffer_size_ and fix up all > references to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6344) Optimize decimal multiplication
[ https://issues.apache.org/jira/browse/IMPALA-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6344. --- Resolution: Later > Optimize decimal multiplication > --- > > Key: IMPALA-6344 > URL: https://issues.apache.org/jira/browse/IMPALA-6344 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Taras Bobrovytsky >Priority: Major > Labels: decimal, perf > > Our current implementation of decimal multiplication can be slow and > non-optimal due to having branches in our code. > [~zamsden] suggested to use > [https://en.wikipedia.org/wiki/Karatsuba_algorithm] multiplication for int128 > * int128 -> int256 multiply. The following example implements this and uses 3 > hardware 64-bit multiplies to get a full 256 bit result. The code is written > in inline assembly and has no branches. > http://coliru.stacked-crooked.com/a/25a697389211189f > We should consider benchmarking this code and using this approach if it turns > out to be faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org