[jira] [Comment Edited] (IMPALA-9063) Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON

2019-10-20 Thread Donghui Xu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955697#comment-16955697
 ] 

Donghui Xu edited comment on IMPALA-9063 at 10/21/19 2:29 AM:
--

I have build llvm with -DLLVM_ENABLE_TERMINFO=ON as following. But it still 
reporting the above error.
cmake /llvm/llvm-5.0.1.src -DLLVM_TARGETS_TO_BUILD=X86 
-DCMAKE_BUILD_TYPE=Release -DLLVM_USE_LINKER=gold -DLLVM_ENABLE_RTTI=ON 
-DLLVM_ENABLE_TERMINFO=ON -DLLVM_PARALLEL_COMPILE_JOBS=4 
-DLLVM_PARALLEL_LINK_JOBS=4


was (Author: davidxdh):
I have build llvm with -DLLVM_ENABLE_TERMINFO=ON. But it still reporting the 
above error.

> Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON
> -
>
> Key: IMPALA-9063
> URL: https://issues.apache.org/jira/browse/IMPALA-9063
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Donghui Xu
>Priority: Minor
>
> I failed to link impalad because of LLVM.  I had compiled and installed 
> llvm-5.0.1.
> The error message is as follows:
> usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'setupterm'
> /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'tigetnum'
> /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'set_curterm'
> /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'del_curterm'
> /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function
>  ldap_int_poll: warning: `sys_nerr' is deprecated; use `strerror' or 
> `strerror_r' instead
> /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function
>  ldap_int_poll: warning: `sys_errlist' is deprecated; use `strerror' or 
> `strerror_r' instead
> collect2: error: ld returned 1 exit status
> be/src/service/CMakeFiles/impalad.dir/build.make:208: recipe for target 
> 'be/build/release/service/impalad' failed
> make[3]: *** [be/build/release/service/impalad] Error 1
> CMakeFiles/Makefile2:7075: recipe for target 
> 'be/src/service/CMakeFiles/impalad.dir/all' failed
> make[2]: *** [be/src/service/CMakeFiles/impalad.dir/all] Error 2
> CMakeFiles/Makefile2:7087: recipe for target 
> 'be/src/service/CMakeFiles/impalad.dir/rule' failed
> make[1]: *** [be/src/service/CMakeFiles/impalad.dir/rule] Error 2
> Makefile:: recipe for target 'impalad' failed
> make: *** [impalad] Error 2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9063) Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON

2019-10-20 Thread Donghui Xu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955697#comment-16955697
 ] 

Donghui Xu commented on IMPALA-9063:


I have build llvm with -DLLVM_ENABLE_TERMINFO=ON. But it still reporting the 
above error.

> Allow building Impala against LLVM with -DLLVM_ENABLE_TERMINFO=ON
> -
>
> Key: IMPALA-9063
> URL: https://issues.apache.org/jira/browse/IMPALA-9063
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Donghui Xu
>Priority: Minor
>
> I failed to link impalad because of LLVM.  I had compiled and installed 
> llvm-5.0.1.
> The error message is as follows:
> usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'setupterm'
> /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'tigetnum'
> /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'set_curterm'
> /usr/local/lib/libLLVMSupport.a(Process.cpp.o):Process.cpp:function 
> llvm::sys::Process::FileDescriptorHasColors(int): error: undefined reference 
> to 'del_curterm'
> /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function
>  ldap_int_poll: warning: `sys_nerr' is deprecated; use `strerror' or 
> `strerror_r' instead
> /media/B/impala/apache/toolchain/openldap-2.4.47/lib/libldap.a(os-ip.o):os-ip.c:function
>  ldap_int_poll: warning: `sys_errlist' is deprecated; use `strerror' or 
> `strerror_r' instead
> collect2: error: ld returned 1 exit status
> be/src/service/CMakeFiles/impalad.dir/build.make:208: recipe for target 
> 'be/build/release/service/impalad' failed
> make[3]: *** [be/build/release/service/impalad] Error 1
> CMakeFiles/Makefile2:7075: recipe for target 
> 'be/src/service/CMakeFiles/impalad.dir/all' failed
> make[2]: *** [be/src/service/CMakeFiles/impalad.dir/all] Error 2
> CMakeFiles/Makefile2:7087: recipe for target 
> 'be/src/service/CMakeFiles/impalad.dir/rule' failed
> make[1]: *** [be/src/service/CMakeFiles/impalad.dir/rule] Error 2
> Makefile:: recipe for target 'impalad' failed
> make: *** [impalad] Error 2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9061) Update ant version for centos in bootstrap_system.sh

2019-10-20 Thread Fucun Chu (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fucun Chu resolved IMPALA-9061.
---
Resolution: Fixed

> Update ant version for centos in bootstrap_system.sh
> 
>
> Key: IMPALA-9061
> URL: https://issues.apache.org/jira/browse/IMPALA-9061
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
> Environment: Centos 7
>Reporter: Fucun Chu
>Assignee: Fucun Chu
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> {{bootstrap_system.sh}}  currently use [ant 
> 1.9.13|https://github.com/apache/impala/blob/b0c6740faec6b0a00dcfee126ab39324026c0ca9/bin/bootstrap_system.sh#L239]
>  on CentOS/Redhat environment.  The file ant-1.9.13-bin.tar.gz release cannot 
> be accessed , the earliest version was 1.9.14. please see 
> [here|https://www-us.apache.org/dist/ant/binaries/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3741) Push bloom filters to Kudu scanners

2019-10-20 Thread Mauricio Aristizabal (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955678#comment-16955678
 ] 

Mauricio Aristizabal commented on IMPALA-3741:
--

This just became a huge blocker for Kudu adoption for us, and I'm worried that 
this ticket hasn't had any movement in over a year.

We just wanted to move our aggregation/cube tables to Kudu, and have them be 
true aggregations: one record per combination of dimension columns gets updated 
as the measures increase (as opposed to additive aggregations in Parquet that 
we routinely re-aggregate/compact which is very resource intensive and hard to 
manage).

The problem is that to update a 1 billion record table with just 10K arriving 
changes, the min-max filter in the join between the target agg table and the 
table with the updates is pretty useless, as the updates will typically be for 
just a handful of the dimensions yes, however they are not nicely consecutive 
or even close values but all over the place.  So it ends up scanning most of 
the big table and therefore it gets slower and slower as the table grows.

So we'll have to hold off on adopting Kudu until this (and support in Kudu) is 
added, or until we switch ETL to programmatically mutate the records 
individually with the Kudu Java client (perhaps in a Spark RDD).

Please prioritize this, otherwise Kudu is good only for end-user queries with 
highly selective filters and joins, and doesn't really support ETL or 
large-scale analysis via SQL.

> Push bloom filters to Kudu scanners
> ---
>
> Key: IMPALA-3741
> URL: https://issues.apache.org/jira/browse/IMPALA-3741
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Kudu_Impala
>Reporter: Matthew Jacobs
>Priority: Major
>  Labels: kudu, performance
>
> Impala relies on bloom filters to reduce number of rows from coming out of 
> the scan node for selective joins. 
> Queries get up to 20x speedup, not having bloom filter support in Kudu will 
> create a big performance gap between Parquet and Kudu.
> https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/util/bloom-filter.h



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9073) Failed test during pre-commit: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency

2019-10-20 Thread Anurag Mantripragada (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Mantripragada updated IMPALA-9073:
-
Description: 
Observed this test failure in pre-commit test of an unrelated change. Looks 
like the expected number of concurrent queries reached 4 while it is expected 
to be 3 or less.
{code:java}
custom_cluster/test_executor_groups.py:293: in test_executor_concurrency
assert max(num_running) == 3, \
E   AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
E   assert 4 == 3
E+  where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code}
 

  was:
Observed this test failure in pre-commit test of an unrelated change. Looks 
like the expected number of concurrent queries reached 4 while it is expected 
to be 3 or less.

 
{code:java}
custom_cluster/test_executor_groups.py:293: in test_executor_concurrency
assert max(num_running) == 3, \
E   AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
E   assert 4 == 3
E+  where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code}
 


> Failed test during pre-commit: 
> custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency
> ---
>
> Key: IMPALA-9073
> URL: https://issues.apache.org/jira/browse/IMPALA-9073
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Anurag Mantripragada
>Priority: Major
>  Labels: build-failure
> Attachments: TEST-impala-custom-cluster.log, 
> TEST-impala-custom-cluster.xml, 
> impalad.ip-172-31-3-83.ubuntu.log.INFO.20191020-182539.109469
>
>
> Observed this test failure in pre-commit test of an unrelated change. Looks 
> like the expected number of concurrent queries reached 4 while it is expected 
> to be 3 or less.
> {code:java}
> custom_cluster/test_executor_groups.py:293: in test_executor_concurrency
> assert max(num_running) == 3, \
> E   AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 
> 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
> E   assert 4 == 3
> E+  where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9073) Failed test during pre-commit: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency

2019-10-20 Thread Anurag Mantripragada (Jira)
Anurag Mantripragada created IMPALA-9073:


 Summary: Failed test during pre-commit: 
custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_concurrency
 Key: IMPALA-9073
 URL: https://issues.apache.org/jira/browse/IMPALA-9073
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Anurag Mantripragada
 Attachments: TEST-impala-custom-cluster.log, 
TEST-impala-custom-cluster.xml, 
impalad.ip-172-31-3-83.ubuntu.log.INFO.20191020-182539.109469

Observed this test failure in pre-commit test of an unrelated change. Looks 
like the expected number of concurrent queries reached 4 while it is expected 
to be 3 or less.

 
{code:java}
custom_cluster/test_executor_groups.py:293: in test_executor_concurrency
assert max(num_running) == 3, \
E   AssertionError: Unexpected number of running queries: [3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
E   assert 4 == 3
E+  where 4 = max([3, 3, 3, 3, 3, 3, ...]) {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9071) When metastore.warehouse.dir != metastore.warehouse.external.dir, Impala writes to the wrong location for external tables

2019-10-20 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955542#comment-16955542
 ] 

Joe McDonnell commented on IMPALA-9071:
---

>From CreateTableAsSelectStmt.java 
>([https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java#L217)|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java#L217]:
{code:java}
if (msTbl.getSd().getLocation() == null || 
msTbl.getSd().getLocation().isEmpty()) {
  msTbl.getSd().setLocation(getPathForNewTable(db, msTbl));
}{code}
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java#L241]
{code:java}
private static String getPathForNewTable(FeDb db, Table msTbl) {
  String dbLocation = db.getMetaStoreDb().getLocationUri();
  return new Path(dbLocation, msTbl.getTableName().toLowerCase()).toString();
}
{code}
It assumes the new table is placed under the database's location, which is no 
longer true.

> When metastore.warehouse.dir != metastore.warehouse.external.dir, Impala 
> writes to the wrong location for external tables
> -
>
> Key: IMPALA-9071
> URL: https://issues.apache.org/jira/browse/IMPALA-9071
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
>
> Hive introduced a translation layer that can convert a normal table to an 
> external table. When doing so without a specified location, the translated 
> external table uses metastore.warehouse.external.dir as the location rather 
> than metastore.warehouse.dir. Impala does not know about this distinction, so 
> it writes to the location it thinks the table should be (under 
> metastore.warehouse.dir). This means I can do the following:
> {noformat}
> [localhost:21000] joetest> select count(*) from functional.alltypes;
> Query: select count(*) from functional.alltypes
> Query submitted at: 2019-10-19 13:08:24 (Coordinator: 
> http://joemcdonnell:25000)
> Query progress can be monitored at: 
> http://joemcdonnell:25000/query_plan?query_id=68434b05e2badd50:a18a2e30
> +--+
> | count(*) |
> +--+
> | 7300 |
> +--+
> Fetched 1 row(s) in 0.14s
> [localhost:21000] joetest> create table testtable as select * from 
> functional.alltypes;
> Query: create table testtable as select * from functional.alltypes
> Query submitted at: 2019-10-19 13:08:36 (Coordinator: 
> http://joemcdonnell:25000)
> Query progress can be monitored at: 
> http://joemcdonnell:25000/query_plan?query_id=794b92fb68f36ab0:910d0364
> +--+
> | summary  |
> +--+
> | Inserted 7300 row(s) |
> +--+
> Fetched 1 row(s) in 0.50s
> [localhost:21000] joetest> select count(*) from testtable;
> Query: select count(*) from testtable
> Query submitted at: 2019-10-19 13:08:43 (Coordinator: 
> http://joemcdonnell:25000)
> Query progress can be monitored at: 
> http://joemcdonnell:25000/query_plan?query_id=66423abf016e65af:83624609
> +--+
> | count(*) |
> +--+
> | 0|
> +--+
> Fetched 1 row(s) in 0.13s
> {noformat}
> We inserted 7300 rows, but we can't select them back because they were 
> written to the wrong location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

2019-10-20 Thread Paul Rogers (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7655:
---

Assignee: (was: Paul Rogers)

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> -
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642
> +--+
> | count(case when l_orderkey is null then 1 else null end) |
> +--+
> | 0|
> +--+
> Fetched 1 row(s) in 0.51s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 44.03ms  | 44.03ms  | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
> +--++--+--+++--+---+-+
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select 
> count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(if(l_orderkey is NULL, 1, NULL)) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:23:07 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca26
> ++
> | count(if(l_orderkey is null, 1, null)) |
> ++
> | 0  |
> ++
> Fetched 1 row(s) in 1.01s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 422.07ms | 422.07ms | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 511.13ms | 511.13ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
> +--++--+--+++--+---+-+
> {noformat}
> It turns out that this is because we don't have good codegen support for 
> ConditionalFunction, and just fall back to emitting a call to the interpreted 
> path: 
> https://github.com/apache/impala/blob/master/be/src/exprs/conditional-functions.cc#L28
> See CaseExpr for an example of much better codegen support: 
> https://github.com/apache/impala/blob/master/be/src/exprs/case-expr.cc#L178



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org