[jira] [Comment Edited] (IMPALA-9063) Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON
[ https://issues.apache.org/jira/browse/IMPALA-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955697#comment-16955697 ] Donghui Xu edited comment on IMPALA-9063 at 10/21/19 2:29 AM: -- I have build llvm with -DLLVM_ENABLE_TERMINFO=ON as following. But it still reporting the above error. cmake /llvm/llvm-5.0.1.src -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_LINKER=gold -DLLVM_ENABLE_RTTI=ON -DLLVM_ENABLE_TERMINFO=ON -DLLVM_PARALLEL_COMPILE_JOBS=4 -DLLVM_PARALLEL_LINK_JOBS=4 was (Author: davidxdh): I have build llvm with -DLLVM_ENABLE_TERMINFO=ON. But it still reporting the above error. > Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON > - > > Key: IMPALA-9063 > URL: https://issues.apache.org/jira/browse/IMPALA-9063 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Donghui Xu >Priority: Minor > > I failed to link impalad because of LLVM. I had compiled and installed > llvm-5.0.1. > The error message is as follows: > usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'setupterm' > /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'tigetnum' > /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'set_curterm' > /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'del_curterm' > /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function > ldap_int_poll: warning: `sys_nerr' is deprecated; use `strerror' or > `strerror_r' instead > /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function > ldap_int_poll: warning: `sys_errlist' is deprecated; use `strerror' or > `strerror_r' instead > collect2: error: ld returned 1 exit status > be/src/service/CMakeFiles/impalad.dir/build.make:208: recipe for target > 'be/build/release/service/impalad' failed > make[3]: *** [be/build/release/service/impalad] Error 1 > CMakeFiles/Makefile2:7075: recipe for target > 'be/src/service/CMakeFiles/impalad.dir/all' failed > make[2]: *** [be/src/service/CMakeFiles/impalad.dir/all] Error 2 > CMakeFiles/Makefile2:7087: recipe for target > 'be/src/service/CMakeFiles/impalad.dir/rule' failed > make[1]: *** [be/src/service/CMakeFiles/impalad.dir/rule] Error 2 > Makefile:: recipe for target 'impalad' failed > make: *** [impalad] Error 2 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9063) Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON
[ https://issues.apache.org/jira/browse/IMPALA-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955697#comment-16955697 ] Donghui Xu commented on IMPALA-9063: I have build llvm with -DLLVM_ENABLE_TERMINFO=ON. But it still reporting the above error. > Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON > - > > Key: IMPALA-9063 > URL: https://issues.apache.org/jira/browse/IMPALA-9063 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Donghui Xu >Priority: Minor > > I failed to link impalad because of LLVM. I had compiled and installed > llvm-5.0.1. > The error message is as follows: > usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'setupterm' > /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'tigetnum' > /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'set_curterm' > /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function > llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference > to 'del_curterm' > /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function > ldap_int_poll: warning: `sys_nerr' is deprecated; use `strerror' or > `strerror_r' instead > /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function > ldap_int_poll: warning: `sys_errlist' is deprecated; use `strerror' or > `strerror_r' instead > collect2: error: ld returned 1 exit status > be/src/service/CMakeFiles/impalad.dir/build.make:208: recipe for target > 'be/build/release/service/impalad' failed > make[3]: *** [be/build/release/service/impalad] Error 1 > CMakeFiles/Makefile2:7075: recipe for target > 'be/src/service/CMakeFiles/impalad.dir/all' failed > make[2]: *** [be/src/service/CMakeFiles/impalad.dir/all] Error 2 > CMakeFiles/Makefile2:7087: recipe for target > 'be/src/service/CMakeFiles/impalad.dir/rule' failed > make[1]: *** [be/src/service/CMakeFiles/impalad.dir/rule] Error 2 > Makefile:: recipe for target 'impalad' failed > make: *** [impalad] Error 2 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9061) Update ant version for centos in bootstrap_system.sh
[ https://issues.apache.org/jira/browse/IMPALA-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fucun Chu resolved IMPALA-9061. --- Resolution: Fixed > Update ant version for centos in bootstrap_system.sh > > > Key: IMPALA-9061 > URL: https://issues.apache.org/jira/browse/IMPALA-9061 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure > Environment: Centos 7 >Reporter: Fucun Chu >Assignee: Fucun Chu >Priority: Major > Labels: easyfix > Original Estimate: 5h > Remaining Estimate: 5h > > {{bootstrap_system.sh}} currently use [ant > 1.9.13|https://github.com/apache/impala/blob/b0c6740faec6b0a00dcfee126ab39324026c0ca9/bin/bootstrap_system.sh#L239] > on CentOS/Redhat environment. The file ant-1.9.13-bin.tar.gz release cannot > be accessed , the earliest version was 1.9.14. please see > [here|https://www-us.apache.org/dist/ant/binaries/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3741) Push bloom filters to Kudu scanners
[ https://issues.apache.org/jira/browse/IMPALA-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955678#comment-16955678 ] Mauricio Aristizabal commented on IMPALA-3741: -- This just became a huge blocker for Kudu adoption for us, and I'm worried that this ticket hasn't had any movement in over a year. We just wanted to move our aggregation/cube tables to Kudu, and have them be true aggregations: one record per combination of dimension columns gets updated as the measures increase (as opposed to additive aggregations in Parquet that we routinely re-aggregate/compact which is very resource intensive and hard to manage). The problem is that to update a 1 billion record table with just 10K arriving changes, the min-max filter in the join between the target agg table and the table with the updates is pretty useless, as the updates will typically be for just a handful of the dimensions yes, however they are not nicely consecutive or even close values but all over the place. So it ends up scanning most of the big table and therefore it gets slower and slower as the table grows. So we'll have to hold off on adopting Kudu until this (and support in Kudu) is added, or until we switch ETL to programmatically mutate the records individually with the Kudu Java client (perhaps in a Spark RDD). Please prioritize this, otherwise Kudu is good only for end-user queries with highly selective filters and joins, and doesn't really support ETL or large-scale analysis via SQL. > Push bloom filters to Kudu scanners > --- > > Key: IMPALA-3741 > URL: https://issues.apache.org/jira/browse/IMPALA-3741 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Kudu_Impala >Reporter: Matthew Jacobs >Priority: Major > Labels: kudu, performance > > Impala relies on bloom filters to reduce number of rows from coming out of > the scan node for selective joins. > Queries get up to 20x speedup, not having bloom filter support in Kudu will > create a big performance gap between Parquet and Kudu. > https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/util/bloom-filter.h -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9073) Failed test during pre-commit: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency
[ https://issues.apache.org/jira/browse/IMPALA-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Mantripragada updated IMPALA-9073: - Description: Observed this test failure in pre-commit test of an unrelated change. Looks like the expected number of concurrent queries reached 4 while it is expected to be 3 or less. {code:java} custom_cluster/test_executor_groups.py:293: in test_executor_concurrency assert max(num_running) == 3, \ E AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] E assert 4 == 3 E+ where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code} was: Observed this test failure in pre-commit test of an unrelated change. Looks like the expected number of concurrent queries reached 4 while it is expected to be 3 or less. {code:java} custom_cluster/test_executor_groups.py:293: in test_executor_concurrency assert max(num_running) == 3, \ E AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] E assert 4 == 3 E+ where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code} > Failed test during pre-commit: > custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency > --- > > Key: IMPALA-9073 > URL: https://issues.apache.org/jira/browse/IMPALA-9073 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Anurag Mantripragada >Priority: Major > Labels: build-failure > Attachments: TEST-impala-custom-cluster.log, > TEST-impala-custom-cluster.xml, > impalad.ip-172-31-3-83.ubuntu.log.INFO.20191020-182539.109469 > > > Observed this test failure in pre-commit test of an unrelated change. Looks > like the expected number of concurrent queries reached 4 while it is expected > to be 3 or less. > {code:java} > custom_cluster/test_executor_groups.py:293: in test_executor_concurrency > assert max(num_running) == 3, \ > E AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, > 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] > E assert 4 == 3 > E+ where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9073) Failed test during pre-commit: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency
Anurag Mantripragada created IMPALA-9073: Summary: Failed test during pre-commit: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency Key: IMPALA-9073 URL: https://issues.apache.org/jira/browse/IMPALA-9073 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Anurag Mantripragada Attachments: TEST-impala-custom-cluster.log, TEST-impala-custom-cluster.xml, impalad.ip-172-31-3-83.ubuntu.log.INFO.20191020-182539.109469 Observed this test failure in pre-commit test of an unrelated change. Looks like the expected number of concurrent queries reached 4 while it is expected to be 3 or less. {code:java} custom_cluster/test_executor_groups.py:293: in test_executor_concurrency assert max(num_running) == 3, \ E AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] E assert 4 == 3 E+ where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9071) When metastore.warehouse.dir != metastore.warehouse.external.dir, Impala writes to the wrong location for external tables
[ https://issues.apache.org/jira/browse/IMPALA-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955542#comment-16955542 ] Joe McDonnell commented on IMPALA-9071: --- >From CreateTableAsSelectStmt.java >([https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java#L217)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java#L217]: {code:java} if (msTbl.getSd().getLocation() == null || msTbl.getSd().getLocation().isEmpty()) { msTbl.getSd().setLocation(getPathForNewTable(db, msTbl)); }{code} [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java#L241] {code:java} private static String getPathForNewTable(FeDb db, Table msTbl) { String dbLocation = db.getMetaStoreDb().getLocationUri(); return new Path(dbLocation, msTbl.getTableName().toLowerCase()).toString(); } {code} It assumes the new table is placed under the database's location, which is no longer true. > When metastore.warehouse.dir != metastore.warehouse.external.dir, Impala > writes to the wrong location for external tables > - > > Key: IMPALA-9071 > URL: https://issues.apache.org/jira/browse/IMPALA-9071 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Priority: Blocker > Labels: broken-build > > Hive introduced a translation layer that can convert a normal table to an > external table. When doing so without a specified location, the translated > external table uses metastore.warehouse.external.dir as the location rather > than metastore.warehouse.dir. Impala does not know about this distinction, so > it writes to the location it thinks the table should be (under > metastore.warehouse.dir). This means I can do the following: > {noformat} > [localhost:21000] joetest> select count(*) from functional.alltypes; > Query: select count(*) from functional.alltypes > Query submitted at: 2019-10-19 13:08:24 (Coordinator: > http://joemcdonnell:25000) > Query progress can be monitored at: > http://joemcdonnell:25000/query_plan?query_id=68434b05e2badd50:a18a2e30 > +--+ > | count(*) | > +--+ > | 7300 | > +--+ > Fetched 1 row(s) in 0.14s > [localhost:21000] joetest> create table testtable as select * from > functional.alltypes; > Query: create table testtable as select * from functional.alltypes > Query submitted at: 2019-10-19 13:08:36 (Coordinator: > http://joemcdonnell:25000) > Query progress can be monitored at: > http://joemcdonnell:25000/query_plan?query_id=794b92fb68f36ab0:910d0364 > +--+ > | summary | > +--+ > | Inserted 7300 row(s) | > +--+ > Fetched 1 row(s) in 0.50s > [localhost:21000] joetest> select count(*) from testtable; > Query: select count(*) from testtable > Query submitted at: 2019-10-19 13:08:43 (Coordinator: > http://joemcdonnell:25000) > Query progress can be monitored at: > http://joemcdonnell:25000/query_plan?query_id=66423abf016e65af:83624609 > +--+ > | count(*) | > +--+ > | 0| > +--+ > Fetched 1 row(s) in 0.13s > {noformat} > We inserted 7300 rows, but we can't select them back because they were > written to the wrong location. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal
[ https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers reassigned IMPALA-7655: --- Assignee: (was: Paul Rogers) > Codegen output for conditional functions (if,isnull, coalesce) is very > suboptimal > - > > Key: IMPALA-7655 > URL: https://issues.apache.org/jira/browse/IMPALA-7655 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: codegen, perf, performance > > https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation > involving an if() function was very slow, 10x slower than the equivalent > version using a case: > {noformat} > [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case > when l_orderkey is NULL then 1 else NULL end) from > tpch10_parquet.lineitem;summary; > NUM_NODES set to 1 > MT_DOP set to 1 > Query: select count(case when l_orderkey is NULL then 1 else NULL end) from > tpch10_parquet.lineitem > Query submitted at: 2018-10-04 11:17:31 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642 > +--+ > | count(case when l_orderkey is null then 1 else null end) | > +--+ > | 0| > +--+ > Fetched 1 row(s) in 0.51s > +--++--+--+++--+---+-+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak > Mem | Est. Peak Mem | Detail | > +--++--+--+++--+---+-+ > | 01:AGGREGATE | 1 | 44.03ms | 44.03ms | 1 | 1 | 25.00 > KB | 10.00 MB | FINALIZE| > | 00:SCAN HDFS | 1 | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 > MB | 88.00 MB | tpch10_parquet.lineitem | > +--++--+--+++--+---+-+ > [localhost:21000] default> set num_nodes=1; set mt_dop=1; select > count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary; > NUM_NODES set to 1 > MT_DOP set to 1 > Query: select count(if(l_orderkey is NULL, 1, NULL)) from > tpch10_parquet.lineitem > Query submitted at: 2018-10-04 11:23:07 (Coordinator: > http://tarmstrong-box:25000) > Query progress can be monitored at: > http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca26 > ++ > | count(if(l_orderkey is null, 1, null)) | > ++ > | 0 | > ++ > Fetched 1 row(s) in 1.01s > +--++--+--+++--+---+-+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak > Mem | Est. Peak Mem | Detail | > +--++--+--+++--+---+-+ > | 01:AGGREGATE | 1 | 422.07ms | 422.07ms | 1 | 1 | 25.00 > KB | 10.00 MB | FINALIZE| > | 00:SCAN HDFS | 1 | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 > MB | 88.00 MB | tpch10_parquet.lineitem | > +--++--+--+++--+---+-+ > {noformat} > It turns out that this is because we don't have good codegen support for > ConditionalFunction, and just fall back to emitting a call to the interpreted > path: > https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28 > See CaseExpr for an example of much better codegen support: > https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org