[jira] [Created] (IMPALA-6628) Use unqualified table references in .test files run from test_queries.py
Alexander Behm created IMPALA-6628: -- Summary: Use unqualified table references in .test files run from test_queries.py Key: IMPALA-6628 URL: https://issues.apache.org/jira/browse/IMPALA-6628 Project: IMPALA Issue Type: Improvement Components: Infrastructure Reporter: Alexander Behm To increase our test coverage over different file formats we should go through the .test files referenced from test_queries.py and switch to using unqualified table references where possible. The state today is that in the exhaustive exploration strategy we run every .test file once for every file format. However, since many .test use fully-qualified table references we are not actually getting coverage over all formats, so we are spending the time to run the tests but not getting the coverage we'd like. I skimmed a few files and identified that at least these could be improved: analytic-fns.test subquery.test limit.test top-n.test Likely there are more .test files. Probably there are similar issues in different .py files as well, but to keep this JIRA focused I propose we focus on test_queries.py first. *What to do* * Go through the .test files and change fully-qualified table references to unqualified table references where possible. Our test framework issues a "use
[jira] [Created] (IMPALA-6627) Document Hive-incompatible behavior with the serialization.null.format table property
Alexander Behm created IMPALA-6627: -- Summary: Document Hive-incompatible behavior with the serialization.null.format table property Key: IMPALA-6627 URL: https://issues.apache.org/jira/browse/IMPALA-6627 Project: IMPALA Issue Type: Improvement Components: Docs Reporter: Alexander Behm Assignee: Alex Rodoni Impala only respects the "serialization.null.format" table property for TEXT tables and ignores it for Parquet and other formats. Hive respects the "serialization.null.format" property even for other formats, converting matching values to NULL during the scan. There's is a separate discussion to be had about which behavior makes more sense, but let's document this as an incompatibility for now since it has come up several times already. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6595) Hit crash freeing buffer in NljBuilder::Close()
[ https://issues.apache.org/jira/browse/IMPALA-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6595. --- Resolution: Fixed Fix Version/s: Impala 2.12.0 > Hit crash freeing buffer in NljBuilder::Close() > --- > > Key: IMPALA-6595 > URL: https://issues.apache.org/jira/browse/IMPALA-6595 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala > 2.11.0, Impala 3.0, Impala 2.12.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: crash > Fix For: Impala 3.0, Impala 2.12.0 > > > I think this is related to the ExchangeNode's use of buffers. > Repro: increase the mem_limit for this test: > {noformat} > tarmstrong@tarmstrong-ubuntu:~/Impala$ git diff > diff --git > a/testdata/workloads/functional-query/queries/QueryTest/single-node-nlj-exhaustive.test > > b/testdata/workloads/functional-query/queries/QueryTest/single-node-nlj-exhaustive.test > index a6b3cae..7e8e862 100644 > --- > a/testdata/workloads/functional-query/queries/QueryTest/single-node-nlj-exhaustive.test > +++ > b/testdata/workloads/functional-query/queries/QueryTest/single-node-nlj-exhaustive.test > @@ -1,7 +1,7 @@ > > QUERY > # IMPALA-2207: Analytic eval node feeding into build side of nested loop > join. > -set mem_limit=200m; > +set mem_limit=220m; > select straight_join * from (values(1 id), (2), (3)) v1, > (select *, count(*) over() from tpch.lineitem where l_orderkey < 10) v2 > order by id, l_orderkey, l_partkey, l_suppkey, l_linenumber > {noformat} > Loop the test: > {noformat} > while impala-py.test tests/query_test/test_join_queries.py -k > 'test_single_node_nested_loop_joins_exhaustive' > --workload_exploration_strategy=functional-query:exhaustive; do :; done > {noformat} > Boom: > {noformat} > (gdb) bt > #0 0x7fcffc535428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > #1 0x7fcffc53702a in __GI_abort () at abort.c:89 > #2 0x7fcfff479069 in os::abort(bool) (dump_core=) at > /build/openjdk-8-W2Qe27/openjdk-8-8u151-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:1509 > #3 0x7fcfff62c997 in VMError::report_and_die() > (this=this@entry=0x7fcf8b201f50) at > /build/openjdk-8-W2Qe27/openjdk-8-8u151-b12/src/hotspot/src/share/vm/utilities/vmError.cpp:1060 > #4 0x7fcfff48254f in JVM_handle_linux_signal(int, siginfo_t*, void*, > int) (sig=sig@entry=11, info=info@entry=0x7fcf8b2021f0, > ucVoid=ucVoid@entry=0x7fcf8b2020c0, > abort_if_unrecognized=abort_if_unrecognized@entry=1) at > /build/openjdk-8-W2Qe27/openjdk-8-8u151-b12/src/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 > #5 0x7fcfff4761a8 in signalHandler(int, siginfo_t*, void*) (sig=11, > info=0x7fcf8b2021f0, uc=0x7fcf8b2020c0) at > /build/openjdk-8-W2Qe27/openjdk-8-8u151-b12/src/hotspot/src/os/linux/vm/os_linux.cpp:4346 > #6 0x7fcffc8db390 in () at > /lib/x86_64-linux-gnu/libpthread.so.0 > #7 0x7fcffc8d3d44 in __GI___pthread_mutex_lock (mutex=0xd8) at > ../nptl/pthread_mutex_lock.c:67 > #8 0x00d7f680 in > impala::BufferPool::FreeBuffer(impala::BufferPool::ClientHandle*, > impala::BufferPool::BufferHandle*) (m=0xd8) at > toolchain/boost-1.57.0-p3/include/boost/thread/pthread/mutex.hpp:62 > #9 0x00d7f680 in > impala::BufferPool::FreeBuffer(impala::BufferPool::ClientHandle*, > impala::BufferPool::BufferHandle*) (this=0xd8) at > toolchain/boost-1.57.0-p3/include/boost/thread/pthread/mutex.hpp:116 > #10 0x00d7f680 in > impala::BufferPool::FreeBuffer(impala::BufferPool::ClientHandle*, > impala::BufferPool::BufferHandle*) (m_=..., this=) > at toolchain/boost-1.57.0-p3/include/boost/thread/lock_guard.hpp:38 > #11 0x00d7f680 in > impala::BufferPool::FreeBuffer(impala::BufferPool::ClientHandle*, > impala::BufferPool::BufferHandle*) (len=2097152, this=0x0) at > be/src/runtime/bufferpool/buffer-pool-internal.h:262 > #12 0x00d7f680 in > impala::BufferPool::FreeBuffer(impala::BufferPool::ClientHandle*, > impala::BufferPool::BufferHandle*) (this=, client= out>, handle=handle@entry=0xb971418) > at be/src/runtime/bufferpool/buffer-pool.cc:254 > #13 0x00b66fc2 in impala::RowBatch::FreeBuffers() > (this=this@entry=0x1bfbfea0) at be/src/runtime/row-batch.cc:425 > #14 0x00b67002 in impala::RowBatch::~RowBatch() (this=0x1bfbfea0, > __in_chrg=) at be/src/runtime/row-batch.cc:220 > #15 0x0107ecc0 in impala::NljBuilder::Close(impala::RuntimeState*) > (this=, __ptr=0x1bfbfea0) at > /home/tarmstrong/Impala/toolchain/gcc-4.9.2/include/c++/4.9.2/bits/unique_ptr.h:76 > #16 0x0107ecc0 in impala::NljBuilder::Close(impala::RuntimeState*) >
[jira] [Created] (IMPALA-6626) Failure to assign dictionary predicates should not result in query failure
Alexander Behm created IMPALA-6626: -- Summary: Failure to assign dictionary predicates should not result in query failure Key: IMPALA-6626 URL: https://issues.apache.org/jira/browse/IMPALA-6626 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0 Reporter: Alexander Behm Assigning dictionary predicates to Parquet scans may involve evaluation of expressions in the BE which could fail for various reasons. Such failures should lead to non-assignment of dictionary predicates but not to query failure. See HdfsScanNode: {code} private void addDictionaryFilter(...) { ... try { if (analyzer.isTrueWithNullSlots(conjunct)) return; } catch (InternalException e) { <--- does not handle Exception which will cause query to fail // Expr evaluation failed in the backend. Skip this conjunct since we cannot // determine whether it is safe to apply it against a dictionary. LOG.warn("Skipping dictionary filter because backend evaluation failed: " + conjunct.toSql(), e); return; } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6625) Skip dictionary and collection conjunct assignment for non-Parquet scans.
Alexander Behm created IMPALA-6625: -- Summary: Skip dictionary and collection conjunct assignment for non-Parquet scans. Key: IMPALA-6625 URL: https://issues.apache.org/jira/browse/IMPALA-6625 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 2.11.0, Impala 2.10.0, Impala 2.9.0 Reporter: Alexander Behm In HdfsScanNode.init() we try to assign dictionary and collection conjuncts even for non-Parquet scans. Such predicates only make sense for Parquet scans, so there is no point in collecting them for other scans. The current behavior is undesirable because: * init() can be substantially slower because assigning dictionary filters may involve evaluating exprs in the BE which can be expensive * the explain plan of non-Parquet scans may have a section "parquet dictionary predicates" which is confusing/misleading Relevant code snippet from HdfsScanNode: {code} @Override public void init(Analyzer analyzer) throws ImpalaException { conjuncts_ = orderConjunctsByCost(conjuncts_); checkForSupportedFileFormats(); assignCollectionConjuncts(analyzer); computeDictionaryFilterConjuncts(analyzer); // compute scan range locations with optional sampling Set fileFormats = computeScanRangeLocations(analyzer); ... if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should go in here computeMinMaxTupleAndConjuncts(analyzer); } ... } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6623) Impala Doc: Update ltrim and rtrim functions
Alex Rodoni created IMPALA-6623: --- Summary: Impala Doc: Update ltrim and rtrim functions Key: IMPALA-6623 URL: https://issues.apache.org/jira/browse/IMPALA-6623 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6347) Monitor queue depth size for outgoing RPCs for Reactor threads
[ https://issues.apache.org/jira/browse/IMPALA-6347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sailesh Mukil resolved IMPALA-6347. --- Resolution: Fixed Fix Version/s: Impala 2.12.0 https://github.com/apache/impala/commit/8dcff3aa41e7f252aa27c6ab1275712103ed5d2c > Monitor queue depth size for outgoing RPCs for Reactor threads > -- > > Key: IMPALA-6347 > URL: https://issues.apache.org/jira/browse/IMPALA-6347 > Project: IMPALA > Issue Type: Sub-task >Reporter: Mostafa Mokhtar >Assignee: Sailesh Mukil >Priority: Major > Fix For: Impala 2.12.0 > > > On systems with slow networking large queuing can occur in the Reactor > threads which may result untracked memory. > It would be good to quantify how much queueing occurred and how much memory > is tied to that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6477) rpc-mgr-kerberized-test fails on CentOS 6.4
[ https://issues.apache.org/jira/browse/IMPALA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sailesh Mukil resolved IMPALA-6477. --- Resolution: Fixed Fix Version/s: Impala 2.12.0 > rpc-mgr-kerberized-test fails on CentOS 6.4 > --- > > Key: IMPALA-6477 > URL: https://issues.apache.org/jira/browse/IMPALA-6477 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Affects Versions: Impala 2.12.0 >Reporter: Alexander Behm >Assignee: Sailesh Mukil >Priority: Blocker > Labels: broken-build > Fix For: Impala 2.12.0 > > > From the Jenkins logs: > {code} > 15:25:38 > /data/jenkins/workspace/impala-cdh5-trunk-core-data-load/repos/Impala/be/src/rpc/thrift-server-test.cc:176: > Failure > 15:25:38 Value of: "No more data to read" > 15:25:38 Actual: "No more data to read" > 15:25:38 Expected: a substring of non_ssl_client.Open().GetDetail() > 15:25:38 Which is: "Couldn't open transport for localhost:55428 (write() > send(): Broken pipe) > 15:25:38 " > 15:25:38 [ FAILED ] > KerberosOnAndOff/ThriftKerberizedParamsTest.SslConnectivity/2, where > GetParam() = 2 (411 ms) > 15:25:38 [--] 3 tests from > KerberosOnAndOff/ThriftKerberizedParamsTest (894 ms total) > 15:25:38 > 15:25:38 [--] Global test environment tear-down > 15:25:38 [==] 17 tests from 6 test cases ran. (1716 ms total) > 15:25:38 [ PASSED ] 16 tests. > 15:25:38 [ FAILED ] 1 test, listed below: > 15:25:38 [ FAILED ] > KerberosOnAndOff/ThriftKerberizedParamsTest.SslConnectivity/2, where > GetParam() = 2 > 15:25:38 > 15:25:38 1 FAILED TEST > 15:25:38 YOU HAVE 1 DISABLED TEST > {code} > Any ideas, Sailesh? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6622) Backport parts of IMPALA-4924 to 2.x
Lars Volker created IMPALA-6622: --- Summary: Backport parts of IMPALA-4924 to 2.x Key: IMPALA-6622 URL: https://issues.apache.org/jira/browse/IMPALA-6622 Project: IMPALA Issue Type: Task Components: Frontend, Infrastructure Affects Versions: Impala 2.12.0 Reporter: Lars Volker Assignee: Taras Bobrovytsky We should consider backporting parts of the change that enabled Decimal V2 by default to the 2.x branch: https://gerrit.cloudera.org/#/c/9062/ Some of the tests and infrastructure code should be compatible, and backporting them will reduce the chance of future merge conflicts, e.g. IMPALA-6405. This should not enable Decimal V2 by default. This came out of a discussion with [~alex.behm] and [~tarmstrong]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6621) Improve In-predicate performance by using an alternative data structure for checking set membership
Bikramjeet Vig created IMPALA-6621: -- Summary: Improve In-predicate performance by using an alternative data structure for checking set membership Key: IMPALA-6621 URL: https://issues.apache.org/jira/browse/IMPALA-6621 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 2.13.0 Reporter: Bikramjeet Vig Assignee: Bikramjeet Vig Attachments: release_build_BoostFlatset.txt.txt, release_build_BoostUnorderedset.txt.txt, release_build_StdSet.txt.txt Currently when using a SET_LOOKUP strategy for in-predicates in impala we use an std:set object for checking membership. Using other data structures like boost::unordere[^release_build_BoostFlatset.txt.txt]d_set and boost::flat_set we can get a significant performance improvement. Please see attached results of micro benchmarks using std::set, flat_Set, unordered_set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-4257) Parallel Queries fired from Cognos get stuck forever.
[ https://issues.apache.org/jira/browse/IMPALA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-4257. --- Resolution: Cannot Reproduce > Parallel Queries fired from Cognos get stuck forever. > - > > Key: IMPALA-4257 > URL: https://issues.apache.org/jira/browse/IMPALA-4257 > Project: IMPALA > Issue Type: Bug > Components: Clients >Affects Versions: Impala 2.5.0 > Environment: Dev >Reporter: Deepak Nagar >Priority: Major > Labels: performance > Attachments: profile (1).txt, profile (2).txt, profile (3).txt, > profile (4).txt, profile (5).txt, profile (6).txt, profile (7).txt, profile > (7).txt > > > We are using cloudera Impala 2.5.0 and created Parquet table (snappy > compression). These tables are being accessed by Cognos 10.2 version for > generating the reports and cognos 10.2 runs the queries in Dynamic Query Mode > (Dynamic Query Mode introduces the concept of a memory-resident data cache in > cognos. The query engine ultimately issue SQL code against the database in > order to populate the cache, only minimal and simplified queries are issued > against the database). These simplified queries are however fired to Impala > in parallel. > The parallel execution of these queries seems to slowing down the whole > process and most of the queries (6 out of 8 fired in parallel in our case) > never finish, they run for days. The same set of queries when run > individually in sequence take only few seonds to complete. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6620) Compute incremental stats for groups of partitions does not update stats correctly
H Milyakov created IMPALA-6620: -- Summary: Compute incremental stats for groups of partitions does not update stats correctly Key: IMPALA-6620 URL: https://issues.apache.org/jira/browse/IMPALA-6620 Project: IMPALA Issue Type: Bug Components: Catalog Affects Versions: Impala 2.8.0 Environment: Impala - v2.8.0-cdh5.11.1 We are using Hive Metastore Database embedded (by cloudera) It's postgres 8.4.20 OS: Centos Reporter: H Milyakov Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`) does not compute statistics correctly (computes 0) when `partition clause` matches more than one partition. Executing the same command when `partition clause` matches just a single partition results in statistics being computed correctly (non 0 and non -1). The issue was observed on our production cluster for a table with 40 000 partitions and 20 columns. I have copied the table to separate isolated cluster and observed the same behaviour. We use Impala 2.8.0 in Cloudera CDH 5.11 The issue could be simulated with the following: 1. CREATE TABLE my_test_table ( some_ints BIGINT ) PARTITIONED BY ( part_1 BIGINT, part_2 STRING ) STORED AS PARQUET; 2. The only column 'some_ints' is populated so that there are 10 000 different partitions (part_1, part_2). Total number of records in the table does not matter and could be same as the number of different partitions. 3. Then running the compute incremental as described above simulates the issue. Did anybody faced similar issue or does have more info on the case? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6619) Alter table recover partitions creates unneeded partitions when faces percent sign
Miklos Szurap created IMPALA-6619: - Summary: Alter table recover partitions creates unneeded partitions when faces percent sign Key: IMPALA-6619 URL: https://issues.apache.org/jira/browse/IMPALA-6619 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.11.0 Reporter: Miklos Szurap When a table has a partition with a special character in it's name, then the HDFS directory contains a percent sign (due to an escaped/UrlEncoded sequence). This is not decoded and compared properly when running {{alter table recover partitions}}. This creates new, unneeded partitions on each execution. The steps to reproduce/demonstrate the issue: {noformat} [nightly-2:21000] > CREATE TABLE tbl_with_partition(col1 string) partitioned by (p string); Query: CREATE TABLE tbl_with_partition(col1 string) partitioned by (p string) Fetched 0 row(s) in 1.08s [nightly-2:21000] > ALTER TABLE tbl_with_partition add partition (p='100%'); Query: ALTER TABLE tbl_with_partition add partition (p='100%') Fetched 0 row(s) in 5.72s [nightly-2:21000] > show partitions tbl_with_partition; Query: show partitions tbl_with_partition +---+---++--+--+---++---++ | p | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +---+---++--+--+---++---++ | 100% | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://ns1/user/hive/warehouse/tbl_with_partition/p=100%25 | | Total | -1 | 0 | 0B | 0B | | | | | +---+---++--+--+---++---++ Fetched 2 row(s) in 0.02s [nightly-2:21000] > ALTER TABLE tbl_with_partition recover partitions; Query: ALTER TABLE tbl_with_partition recover partitions Fetched 0 row(s) in 0.29s [nightly-2:21000] > show partitions tbl_with_partition; Query: show partitions tbl_with_partition ++---++--+--+---++---+--+ | p | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | ++---++--+--+---++---+--+ | 100% | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://ns1/user/hive/warehouse/tbl_with_partition/p=100%25 | | 100%25 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://ns1/user/hive/warehouse/tbl_with_partition/p=100%2525 | | Total | -1 | 0 | 0B | 0B | | | | | ++---++--+--+---++---+--+ Fetched 3 row(s) in 0.02s [nightly-2:21000] > ALTER TABLE tbl_with_partition recover partitions; Query: ALTER TABLE tbl_with_partition recover partitions Fetched 0 row(s) in 0.27s [nightly-2:21000] > show partitions tbl_with_partition; Query: show partitions tbl_with_partition +--+---++--+--+---++---++ | p | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +--+---++--+--+---++---++ | 100% | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://ns1/user/hive/warehouse/tbl_with_partition/p=100%25 | | 100%25 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://ns1/user/hive/warehouse/tbl_with_partition/p=100%2525 | | 100%2525 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://ns1/user/hive/warehouse/tbl_with_partition/p=100%252525 | | Total | -1 | 0 | 0B | 0B | | | | | +--+---++--+--+---++---++ Fetched 4 row(s) in 0.02s{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6618) Compare expr results against PostgreSQL in test time
Attila Jeges created IMPALA-6618: Summary: Compare expr results against PostgreSQL in test time Key: IMPALA-6618 URL: https://issues.apache.org/jira/browse/IMPALA-6618 Project: IMPALA Issue Type: Improvement Components: Infrastructure Reporter: Attila Jeges We already depend on PostgreSQL in development environments because of the mini-cluster. It would be helpful to have test infrastructure to make it easy to compare Impala expr results against PostgreSQL in cases where we expect them to be the same. It might also be useful to compare against Hive too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)