[jira] [Created] (IMPALA-9426) Download Python dependencies even skipping bootstrap toolchain
zhaorenhai created IMPALA-9426: -- Summary: Download Python dependencies even skipping bootstrap toolchain Key: IMPALA-9426 URL: https://issues.apache.org/jira/browse/IMPALA-9426 Project: IMPALA Issue Type: Sub-task Reporter: zhaorenhai Assignee: zhaorenhai Download Python dependencies even skipping bootstrap toolchain. Because when you set SKIP_TOOLCHAIN_BOOTSTRAP=true, the python dependencies still need to be downloaded. The toolchain building process will not download the python dependencies autometically -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9425) Statestore may fail to report when an impalad has failed
Thomas Tauber-Marshall created IMPALA-9425: -- Summary: Statestore may fail to report when an impalad has failed Key: IMPALA-9425 URL: https://issues.apache.org/jira/browse/IMPALA-9425 Project: IMPALA Issue Type: Bug Components: Distributed Exec Affects Versions: Impala 3.4.0 Reporter: Thomas Tauber-Marshall Assignee: Thomas Tauber-Marshall If an impalad fails and another is restarted at the same host:port combination quickly, the statestore may fail to report to the coordinators that the impalad went down. The reason for this is that in the cluster membership topic, impalads are keyed by their statestore subscriber id, which is "impalad@host:port". If the new impalad registers itself before a topic update has been generated for a particular coordinator, the statestore has no way of knowing that the particular key was deleted and then re-added since the last update. The result is that queries that were running on the impalad that failed may not be cancelled by the coordinator until they pass the unresponsive backend timeout, which by default is ~12 minutes. I propose as a solution that we add a concept of uuids for impalads, where each impalad will generate its own uuid on startup. This allows us to differentiate between different impalads running at the same host:port combination. It can also be used to simplify some logic in the scheduler and ExecutorGroup/ExecutorBlacklist etc. where we currently have data structures containing info about impalads that are keyed off host/port combinations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9389) Impala Doc: support reading zstd text files
[ https://issues.apache.org/jira/browse/IMPALA-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045026#comment-17045026 ] Kris Hahn commented on IMPALA-9389: --- Here are some possible places to document reading zstd files: * [Release notes|https://impala.apache.org/docs/build/html/topics/impala_new_features.html ] under Zstd Compression for Parquet files * [How Impala Works with Hadoop File Formats|https://impala.apache.org/docs/build/html/topics/impala_file_formats.html] briefly mention the new functionality in the Zstd bullet. * [Compressions for Parquet Data Files|https://impala.apache.org/docs/build/html/topics/impala_parquet.html] how about adding an example of setting and the compression, writing some data, reading the file? > Impala Doc: support reading zstd text files > --- > > Key: IMPALA-9389 > URL: https://issues.apache.org/jira/browse/IMPALA-9389 > Project: IMPALA > Issue Type: Documentation > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Xiaomeng Zhang >Assignee: Kris Hahn >Priority: Major > > [https://gerrit.cloudera.org/#/c/15023/] > We add support for reading zstd encoded text files. > This includes: > # support reading zstd file written by Hive which uses streaming. > # 2. support reading zstd file compressed by standard zstd library which > uses block. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9424) Add six python library to shell/ext-py
David Knupp created IMPALA-9424: --- Summary: Add six python library to shell/ext-py Key: IMPALA-9424 URL: https://issues.apache.org/jira/browse/IMPALA-9424 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 3.4.0 Reporter: David Knupp A couple of impala-shell changes that are coming in the near future (thrift_sasl update, possible changes to THttpClient, python 3 support) will require the six python library. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9389) Impala Doc: support reading zstd text files
[ https://issues.apache.org/jira/browse/IMPALA-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kris Hahn reassigned IMPALA-9389: - Assignee: Kris Hahn (was: Xiaomeng Zhang) > Impala Doc: support reading zstd text files > --- > > Key: IMPALA-9389 > URL: https://issues.apache.org/jira/browse/IMPALA-9389 > Project: IMPALA > Issue Type: Documentation > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Xiaomeng Zhang >Assignee: Kris Hahn >Priority: Major > > [https://gerrit.cloudera.org/#/c/15023/] > We add support for reading zstd encoded text files. > This includes: > # support reading zstd file written by Hive which uses streaming. > # 2. support reading zstd file compressed by standard zstd library which > uses block. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9381) Lazily convert and/or cache different representations of the query profile
[ https://issues.apache.org/jira/browse/IMPALA-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-9381. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Lazily convert and/or cache different representations of the query profile > -- > > Key: IMPALA-9381 > URL: https://issues.apache.org/jira/browse/IMPALA-9381 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 3.4.0 > > > There are some obvious inefficiencies with how the query state record works: > * We do an unnecessary copy of the archive string when adding it to the query > log > https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1812. > * We eagerly convert the profile to text and JSON, when in many cases they > won't be needed - > https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1839 > . I think it is generally rare for more than one profile format to be > downloaded from the web UI. I know of tools that scrape the thrift profile, > but the human-readable version would usually only be consumed by humans. We > could avoid this by only storing the thrift representation of the profile, > then reconstituting the other representations from thrift if requested. > * After ComputeExecSummary(), the profile shouldn't change, but we'll > regenerate the thrift representation for every web request to get the > encoded. This may waste a lot of CPU for tools scraping the profiles. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9381) Lazily convert and/or cache different representations of the query profile
[ https://issues.apache.org/jira/browse/IMPALA-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044855#comment-17044855 ] ASF subversion and git services commented on IMPALA-9381: - Commit 1bd45d295ebfc3f526a98eebb9b61525b9332c91 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1bd45d2 ] IMPALA-9381: on-demand conversion of runtime profile Converting the runtime profile to JSON and text representations at the end of the query used significant CPU and time. These representations will commonly never be accessed, because they need to be explicitly requested by a client via the HTTP debug interface or via a thrift profile request. So it is a waste of resources to eagerly convert them, and in particular it is a bad idea to do so on the critical path of a query. This commit switches to generating alternative profile representations on-demand. Only the compressed thrift version of the profile is stored in QueryStateRecord. This is the most compact representation of the profile and it is relatively convenient to convert into other formats. Also use a move() when constructing QueryStateRecord to avoid copying the profile unnecessarily. Fix a couple of potential use-after-free issues where Json objects generated by RuntimeProfile::ToJson() could reference strings owned by the object pool. These were detected by running an ASAN build, because after this change, the temporary object pool used to hold the deserialized profile was freed before the JSON tree was returned. The "kind" field of counters is removed from the JSON profile. This couldn't be round-tripped correctly through thrift, and probably isn't necessary. It also helps slim down the profiles. Also make sure to preserve the "indent" field when round-tripping to thrift. Testing: Ran core tests. Diffed JSON and text profiles download from web UI from before and after to make sure there were no unexpected changes as a result of the round-trip via thrift. Change-Id: Ic2f5133cc146adc3b044cf4b64aae0a9688449fa Reviewed-on: http://gerrit.cloudera.org:8080/15236 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Lazily convert and/or cache different representations of the query profile > -- > > Key: IMPALA-9381 > URL: https://issues.apache.org/jira/browse/IMPALA-9381 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > There are some obvious inefficiencies with how the query state record works: > * We do an unnecessary copy of the archive string when adding it to the query > log > https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1812. > * We eagerly convert the profile to text and JSON, when in many cases they > won't be needed - > https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1839 > . I think it is generally rare for more than one profile format to be > downloaded from the web UI. I know of tools that scrape the thrift profile, > but the human-readable version would usually only be consumed by humans. We > could avoid this by only storing the thrift representation of the profile, > then reconstituting the other representations from thrift if requested. > * After ComputeExecSummary(), the profile shouldn't change, but we'll > regenerate the thrift representation for every web request to get the > encoded. This may waste a lot of CPU for tools scraping the profiles. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9423) Fix cookie auth with Knox
Thomas Tauber-Marshall created IMPALA-9423: -- Summary: Fix cookie auth with Knox Key: IMPALA-9423 URL: https://issues.apache.org/jira/browse/IMPALA-9423 Project: IMPALA Issue Type: Bug Components: Clients Reporter: Thomas Tauber-Marshall Assignee: Thomas Tauber-Marshall When Apache Knox is being used to proxy connections to Impala, it used to be the case that Knox would return the authentication cookies generated by Impala, saving extra round trips and authentications to Kerberos/LDAP. This was broken by KNOX-2223 - Knox only returns auth cookies that it thinks are for it, which it determines by checking for its Kerberos principal in the cookie string. With KNOX-2223, the principal is expected to be preceded by a '=', which Impala doesn't do. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous
[ https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall resolved IMPALA-8712. Fix Version/s: Impala 3.4.0 Resolution: Fixed > Convert ExecQueryFInstance() RPC to become asynchronous > --- > > Key: IMPALA-8712 > URL: https://issues.apache.org/jira/browse/IMPALA-8712 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Affects Versions: Impala 3.3.0 >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Major > Fix For: Impala 3.4.0 > > > Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC > capabilities of KRPC instead of relying on the half-baked way of using > {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We > already have a reactor thread pool in KRPC to handle sending client RPCs > asynchronously. Also various tasks under IMPALA-5486 can also benefit from > making ExecQueryFInstance() asynchronous so the RPCs can be cancelled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9075) Add support for reading zstd text files
[ https://issues.apache.org/jira/browse/IMPALA-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044664#comment-17044664 ] ASF subversion and git services commented on IMPALA-9075: - Commit 571131fdc11acecf4c2003668dbccde0667efe07 in impala's branch refs/heads/master from xiaomeng [ https://gitbox.apache.org/repos/asf?p=impala.git;h=571131f ] IMPALA-9075: Add support for reading zstd text files In this patch, we add support for reading zstd encoded text files. This includes: 1. support reading zstd file written by Hive which uses streaming. 2. support reading zstd file compressed by standard zstd library which uses block. To support decompressing both formats, a function ProcessBlockStreaming is added in zstd decompressor. Testing done: Added two backend tests: 1. streaming decompress test. 2. large data test for both block and streaming decompress. Added two end to end tests: 1. hive and impala integration. For four compression codecs, write in hive and read from impala. 2. zstd library and impala integration. Copy a zstd lib compressed file to HDFS, and read from impala. Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4 Reviewed-on: http://gerrit.cloudera.org:8080/15023 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add support for reading zstd text files > --- > > Key: IMPALA-9075 > URL: https://issues.apache.org/jira/browse/IMPALA-9075 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.3.0 >Reporter: Andrew Sherman >Assignee: Xiaomeng Zhang >Priority: Critical > > IMPALA-8450 added support for zstd in parquet. > We should also support support for reading zstd encoded text files. > Another useful jira to look at is IMPALA-8549 (Add support for scanning > DEFLATE text files) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"
[ https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044665#comment-17044665 ] ASF subversion and git services commented on IMPALA-8852: - Commit 777d0d203f138183d65885b523b619421b487714 in impala's branch refs/heads/master from Tamas Mate [ https://gitbox.apache.org/repos/asf?p=impala.git;h=777d0d2 ] IMPALA-8852: Skip short-circuit config check for dedicated coordinator ImpalaD should not abort when running as dedicated coodinator and DataNode is not available on the host. This change adds a condition to skip the short- circuit socket path directory checks when ImpalaD is started with 'is_executor=false' flag. Testing: - Added test to JniFrontendTest.java to verify the short-circuit directory check is skipped if ImpalaD is started as dedicated coordinator mode. - Manually tested the appearance of the warning message with: start-impala-cluster.py --num_coordinators 1 --use_exclusive_coordinators true Change-Id: I373d4037f4cee203322a398b77b75810ba708bb5 Reviewed-on: http://gerrit.cloudera.org:8080/15173 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads > configuration" > > > Key: IMPALA-8852 > URL: https://issues.apache.org/jira/browse/IMPALA-8852 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0, Impala 3.3.0 >Reporter: Adriano >Assignee: Tamas Mate >Priority: Major > Labels: ramp-up > > On coordinator only nodes ([typically the edge > nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]): > {code:java} > --is_coordinator=true > --is_executor=false > {code} > the *dfs.domain.socket.path* (can be nonexistent on the local FS as the > Datanode role eventually is not installed on the edge node). > The non existing path prevent the ImpalaD to start with the message: > {code:java} > I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads > configuration: > - Impala cannot read or execute the parent directory of > dfs.domain.socket.path > @ 0xb35f19 > @ 0x100e2fe > @ 0x103f274 > @ 0x102836f > @ 0xa9f573 > @ 0x7f97807e93d4 > @ 0xafb3b8 > E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads > configuration: > - Impala cannot read or execute the parent directory of > dfs.domain.socket.path > {code} > despite a coordinator-only ImpalaD does not do short circuit reads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9226) Improve string allocations of the ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044670#comment-17044670 ] ASF subversion and git services commented on IMPALA-9226: - Commit f22812144279a3f722fcace5925cfb2f52efb598 in impala's branch refs/heads/master from norbert.luksa [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f228121 ] IMPALA-9226: Bump ORC version to 1.6.2-p7 Bump ORC version to include patch for ORC-600 that unblocks IMPALA-9226. Tests: - Run scanner tests for orc/def/block. Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65 Reviewed-on: http://gerrit.cloudera.org:8080/15287 Reviewed-by: Gabor Kaszab Tested-by: Impala Public Jenkins > Improve string allocations of the ORC scanner > - > > Key: IMPALA-9226 > URL: https://issues.apache.org/jira/browse/IMPALA-9226 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Norbert Luksa >Priority: Major > Labels: orc > > Currently the ORC scanner allocates new memory for each string values (except > for fixed size strings): > [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172] > Besides the too many allocations and copying it's also bad for memory > locality. > Since ORC-501 StringVectorBatch has a member named 'blob' that contains the > strings in the batch: > [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126] > 'blob' has type DataBuffer which is movable, so Impala might be able to get > ownership of it. Or, at least we could copy the whole blob array instead of > copying the strings one-by-one. > ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC > 1.5.5. > ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch: > [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153] > It uses dictionary encoding for storing the values. Impala could copy/move > the dictionary as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9226) Improve string allocations of the ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044669#comment-17044669 ] ASF subversion and git services commented on IMPALA-9226: - Commit f22812144279a3f722fcace5925cfb2f52efb598 in impala's branch refs/heads/master from norbert.luksa [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f228121 ] IMPALA-9226: Bump ORC version to 1.6.2-p7 Bump ORC version to include patch for ORC-600 that unblocks IMPALA-9226. Tests: - Run scanner tests for orc/def/block. Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65 Reviewed-on: http://gerrit.cloudera.org:8080/15287 Reviewed-by: Gabor Kaszab Tested-by: Impala Public Jenkins > Improve string allocations of the ORC scanner > - > > Key: IMPALA-9226 > URL: https://issues.apache.org/jira/browse/IMPALA-9226 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Norbert Luksa >Priority: Major > Labels: orc > > Currently the ORC scanner allocates new memory for each string values (except > for fixed size strings): > [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172] > Besides the too many allocations and copying it's also bad for memory > locality. > Since ORC-501 StringVectorBatch has a member named 'blob' that contains the > strings in the batch: > [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126] > 'blob' has type DataBuffer which is movable, so Impala might be able to get > ownership of it. Or, at least we could copy the whole blob array instead of > copying the strings one-by-one. > ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC > 1.5.5. > ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch: > [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153] > It uses dictionary encoding for storing the values. Impala could copy/move > the dictionary as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous
[ https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044667#comment-17044667 ] ASF subversion and git services commented on IMPALA-8712: - Commit 1e616774d4d3a00e002d1e383ccd89c46f6d9010 in impala's branch refs/heads/master from Thomas Tauber-Marshall [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e61677 ] IMPALA-8712: Make ExecQueryFInstances async This patch refactors the ExecQueryFInstances rpc to be asychronous. Previously, Impala would issue all the Exec()s, wait for all of them to complete, and then check if any of them resulted in an error. We now stop issuing Exec()s and cancel any that are still in flight as soon as an error occurs. It also performs some cleanup around the thread safety of Coordinator::BackendState, including adding comments and DCHECKS. === Exec RPC Thread Pool === This patch also removes the 'exec_rpc_thread_pool_' from ExecEnv. This thread pool was used to partially simulate async Exec() prior to the switch to KRPC, which provides built-in async rpc capabilities. Removing this thread pool has potential performance implications, as it means that the Exec() parameters are serialized in serialize rather than in parallel (with the level of parallelism determined by the size of the thread pool, which was configurable by an Advanced flag and defaulted to 12). To ensure we don't regress query startup times, I did some performance testing. All tests were done on a 10 node cluster. The baseline used for the tests did not include IMPALA-9181, a perf optimization for query startup done to facilitate this work. I ran TPCH 100 at concurrency levels of 1, 4, and 8 and extracted the query startup times from the profiles. For each concurrency level, the average regression in query startup time was < 2ms. Because query e2e running time was much longer than this, there was no noticable change in total query time. I also ran a 'worst case scenario' with a table with 10,000 pertitions to create a very large Exec() payload to serialize (~1.21MB vs. ~10KB-30KB for TPCH 100). Again, change in query startup time was neglible. Testing: - Added a e2e test that verifies that a query where an Exec() fails doesn't wait for all Exec()s to complete before cancelling and returning the error to the client. Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32 Reviewed-on: http://gerrit.cloudera.org:8080/15154 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins > Convert ExecQueryFInstance() RPC to become asynchronous > --- > > Key: IMPALA-8712 > URL: https://issues.apache.org/jira/browse/IMPALA-8712 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Affects Versions: Impala 3.3.0 >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Major > > Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC > capabilities of KRPC instead of relying on the half-baked way of using > {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We > already have a reactor thread pool in KRPC to handle sending client RPCs > asynchronously. Also various tasks under IMPALA-5486 can also benefit from > making ExecQueryFInstance() asynchronous so the RPCs can be cancelled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9181) Serialize TQueryCtx once per query
[ https://issues.apache.org/jira/browse/IMPALA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044668#comment-17044668 ] ASF subversion and git services commented on IMPALA-9181: - Commit 1e616774d4d3a00e002d1e383ccd89c46f6d9010 in impala's branch refs/heads/master from Thomas Tauber-Marshall [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e61677 ] IMPALA-8712: Make ExecQueryFInstances async This patch refactors the ExecQueryFInstances rpc to be asychronous. Previously, Impala would issue all the Exec()s, wait for all of them to complete, and then check if any of them resulted in an error. We now stop issuing Exec()s and cancel any that are still in flight as soon as an error occurs. It also performs some cleanup around the thread safety of Coordinator::BackendState, including adding comments and DCHECKS. === Exec RPC Thread Pool === This patch also removes the 'exec_rpc_thread_pool_' from ExecEnv. This thread pool was used to partially simulate async Exec() prior to the switch to KRPC, which provides built-in async rpc capabilities. Removing this thread pool has potential performance implications, as it means that the Exec() parameters are serialized in serialize rather than in parallel (with the level of parallelism determined by the size of the thread pool, which was configurable by an Advanced flag and defaulted to 12). To ensure we don't regress query startup times, I did some performance testing. All tests were done on a 10 node cluster. The baseline used for the tests did not include IMPALA-9181, a perf optimization for query startup done to facilitate this work. I ran TPCH 100 at concurrency levels of 1, 4, and 8 and extracted the query startup times from the profiles. For each concurrency level, the average regression in query startup time was < 2ms. Because query e2e running time was much longer than this, there was no noticable change in total query time. I also ran a 'worst case scenario' with a table with 10,000 pertitions to create a very large Exec() payload to serialize (~1.21MB vs. ~10KB-30KB for TPCH 100). Again, change in query startup time was neglible. Testing: - Added a e2e test that verifies that a query where an Exec() fails doesn't wait for all Exec()s to complete before cancelling and returning the error to the client. Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32 Reviewed-on: http://gerrit.cloudera.org:8080/15154 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins > Serialize TQueryCtx once per query > -- > > Key: IMPALA-9181 > URL: https://issues.apache.org/jira/browse/IMPALA-9181 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Thomas Tauber-Marshall >Assignee: Thomas Tauber-Marshall >Priority: Major > Fix For: Impala 3.4.0 > > > When issuing Exec() rpcs to backends, we currently serialize the TQueryCtx > once per backend. This is inefficient as the TQueryCtx is the same for all > backends and really only needs to be serialized once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9422) Improve join builder profiles
[ https://issues.apache.org/jira/browse/IMPALA-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9422: - Assignee: Tim Armstrong > Improve join builder profiles > - > > Key: IMPALA-9422 > URL: https://issues.apache.org/jira/browse/IMPALA-9422 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: multithreading > > We should clean up/improve the join builder profiles for the separate build. > First, for the separate build, we should ensure that all time spent in the > builder is counted against the builder. E.g. calls into public methods like > BeginSpilledProbe(). These should be counted as idle time for the actual join > implementation, so that we can see that the time is spent in the (serial) > builder instead of the (parallel) probe. > We might need to fix things like Send() being called by > RepartitionBuildInput, resulting in double counting. > Second, we should revisit the assortment of timers - BuildRowsPartitionTime, > HashTablesBuildTime, RepartitionTime. Maybe it makes sense to make them child > counters of total time to make the relationship clearer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9422) Improve join builder profiles
Tim Armstrong created IMPALA-9422: - Summary: Improve join builder profiles Key: IMPALA-9422 URL: https://issues.apache.org/jira/browse/IMPALA-9422 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong We should clean up/improve the join builder profiles for the separate build. First, for the separate build, we should ensure that all time spent in the builder is counted against the builder. E.g. calls into public methods like BeginSpilledProbe(). These should be counted as idle time for the actual join implementation, so that we can see that the time is spent in the (serial) builder instead of the (parallel) probe. We might need to fix things like Send() being called by RepartitionBuildInput, resulting in double counting. Second, we should revisit the assortment of timers - BuildRowsPartitionTime, HashTablesBuildTime, RepartitionTime. Maybe it makes sense to make them child counters of total time to make the relationship clearer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.
[ https://issues.apache.org/jira/browse/IMPALA-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Jeges updated IMPALA-9421: - Description: Show database operation takes ~ 3 - 4 seconds, sometimes ~ 8 - 9 seconds in impala-shell when connecting to a coordinator using hs2-http with LDAP authentication: {code:java} $ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u username -l impala-shell> show database; ++--+ | name | comment | ++--+ | _impala_builtins | System database for Impala builtin functions | | airline_ontime_orc | | | airline_ontime_parquet | | | default | Default Hive database | ++--+ Fetched 4 row(s) in 8.87s {code} impala-coordinator logs show that there are multiple new connections set up and authenticated: {code:java} I0225 16:07:58.143942 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 16:07:58.144042 321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 16:07:58.144101 321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 16:07:58.144338 128883 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 16:07:58.155827 128883 authentication.cc:273] LDAP bind successful I0225 16:07:58.155901 128883 impala-hs2-server.cc:1085] PingImpalaHS2Service(): request=TPingImpalaHS2ServiceReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h", 02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04", }, }, } I0225 16:07:58.876168 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 16:07:58.876317 320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 16:07:58.876364 320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 16:07:58.876847 128884 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 16:07:58.887931 128884 authentication.cc:273] LDAP bind successful I0225 16:07:58.888008 128884 impala-hs2-server.cc:442] ExecuteStatement(): request=TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h", 02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 16:07:58.888049 128884 impala-hs2-server.cc:230] TExecuteStatementReq: TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h", 02: secret (string) = "\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 16:07:58.898981 128884 impala-hs2-server.cc:268] TClientRequest.queryOptions: TQueryOptions { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 100, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 24: disable_outermost_topn (bool) = false, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, 32: optimize_partition_key_scans (bool) = false, 33: replica_preference (i32) = 0, 34: schedule_random_replica (bool) = false, 36: disable_streaming_preaggregations (bool) = false, 37: runtime_filter_mode (i32) = 2, 38: runtime_bloom_filter_size (i32) = 1048576, 39:
[jira] [Resolved] (IMPALA-7496) Schedule query taking in account the mem available on the impalad nodes
[ https://issues.apache.org/jira/browse/IMPALA-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7496. --- Resolution: Later It's unclear that we want to do this - it's a pretty significant change to how Impala works and it has complex interactions with data locality, scheduling, etc. The executor group support we added in the meantime can solve some use cases like this but in a simpler and more predictable way - the query will run on the first executor group with resources. Closing as "later" to indicate that it might be something to revisit later. > Schedule query taking in account the mem available on the impalad nodes > --- > > Key: IMPALA-7496 > URL: https://issues.apache.org/jira/browse/IMPALA-7496 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Adriano >Priority: Major > Labels: admission-control, resource-management, scheduler > > Environment description: cluster scale (50/100/150 nodes and terabyte of ram > available) - Admission Control enabled. > Issue description: > Despite the coordinator chosen (with data and statistics unchanged) a query > will be planned always in the same way based on the metainfo that the > coordinator have. > The query will be scheduled always on the same nodes if the memory > requirements for the admission are satisfied: > https://github.com/cloudera/Impala/blob/cdh5-2.7.0_5.9.1/be/src/scheduling/admission-controller.cc#L307-L333 > Equal queries are planned/scheduled always in the same way (to hit always the > same nodes). > This often lead to queue the queries that are hitting the same nodes are > queued (not admitted) as on those nodes there's no more memory available > within its process limit despite the pool have lot of free memory and the > overall cluster load is low. > When the plan is finished and the query can be evaluated to be admitted often > happen that the admission is denied because one of the node have not enough > memory to run the query operation (and the query is moved in the pool queue) > despite the cluster have 50/100/150 nodes and terabyte of ram available. > Why the scheduler does not take in consideration the memory available on the > nodes involved in the query before to buid the schedule, (maybe preferring a > remote read/operation on a free memory node instead to include in the plan > always the same nodes that will end to be: > 1- overloaded > 2- the query will be not immediately admitted, risking to be timedout in the > pool queue > Since 2.7 the REPLICA_PREFERENCE can possibly help, but it's not good enough > as it does not prevent the scheduler to choose busy nodes (with the same > potential effect: query queued for lack of resource on specific node despite > there are terabytes of free memory). > Feature Request: > It would be good if Impala had an option to execute queries (even with worse > performance) excluding the nodes overloaded and including different nodes in > order to get the query immediately admitted and executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.
[ https://issues.apache.org/jira/browse/IMPALA-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Jeges updated IMPALA-9421: - Description: Show database operation takes over 3-4 seconds in impala-shell when connecting to a coordinator using hs2-http with LDAP authentication: {code:java} $ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u username -l impala-shell> show database; ++--+ | name | comment | ++--+ | _impala_builtins | System database for Impala builtin functions | | airline_ontime_orc | | | airline_ontime_parquet | | | default | Default Hive database | ++--+ Fetched 4 row(s) in 3.79s {code} impala-coordinator logs show that there are multiple new connections set up and authenticated: {code:java} I0225 15:45:28.478219 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 15:45:28.478384 321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 15:45:28.478454 321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 15:45:28.478729 126270 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 15:45:28.491451 126270 authentication.cc:273] LDAP bind successful I0225 15:45:28.491571 126270 impala-hs2-server.cc:1085] PingImpalaHS2Service(): request=TPingImpalaHS2ServiceReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "\xd7U\x11\x89\xf0\xd4J\x12\xbc\x9c\x0e\x19\xff\xd5\xec?", 02: secret (string) = "\xcd\xf9\x86\a\x90\xa6E\xaf\x92\x19\xee\x1e6S\xea\x85", }, }, } I0225 15:45:29.199357 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 15:45:29.199455 320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 15:45:29.199498 320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 15:45:29.199753 126271 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 15:45:29.210222 126271 authentication.cc:273] LDAP bind successful I0225 15:45:29.210384 126271 impala-hs2-server.cc:442] ExecuteStatement(): request=TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "\xd7U\x11\x89\xf0\xd4J\x12\xbc\x9c\x0e\x19\xff\xd5\xec?", 02: secret (string) = "\xcd\xf9\x86\a\x90\xa6E\xaf\x92\x19\xee\x1e6S\xea\x85", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 15:45:29.210427 126271 impala-hs2-server.cc:230] TExecuteStatementReq: TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "\xd7U\x11\x89\xf0\xd4J\x12\xbc\x9c\x0e\x19\xff\xd5\xec?", 02: secret (string) = "\xcd\xf9\x86\a\x90\xa6E\xaf\x92\x19\xee\x1e6S\xea\x85", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 15:45:29.220592 126271 impala-hs2-server.cc:268] TClientRequest.queryOptions: TQueryOptions { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 100, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 24: disable_outermost_topn (bool) = false, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, 32: optimize_partition_key_scans (bool) = false, 33: replica_preference (i32) = 0, 34: schedule_random_replica (bool) = false, 36: disable_streaming_preaggregations (bool) = false, 37: runtime_filter_mode (i32) = 2, 38: runtime_bloom_filter_size (i32) = 1048576, 39:
[jira] [Updated] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.
[ https://issues.apache.org/jira/browse/IMPALA-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Jeges updated IMPALA-9421: - Description: Show database operation takes over 3-4 seconds in impala-shell when connecting to a coordinator using hs2-http with LDAP authentication: {code:java} $ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u username -l impala-shell> show database; ++--+ | name | comment | ++--+ | _impala_builtins | System database for Impala builtin functions | | airline_ontime_orc | | | airline_ontime_parquet | | | default | Default Hive database | ++--+ Fetched 4 row(s) in 3.66s {code} impala-coordinator logs show that there are multiple new connections set up and authenticated: {code:java} I0225 14:15:48.976776 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 14:15:48.976878 320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 14:15:48.976912 320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 14:15:48.977216 115929 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 14:15:48.989554 115929 authentication.cc:273] LDAP bind successful I0225 14:15:48.989639 115929 impala-hs2-server.cc:1085] PingImpalaHS2Service(): request=TPingImpalaHS2ServiceReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93", 02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0", }, }, } I0225 14:15:50.152348 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 14:15:50.152446 321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 14:15:50.152493 321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 14:15:50.152722 115930 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 14:15:50.163576 115930 authentication.cc:273] LDAP bind successful I0225 14:15:50.163733 115930 impala-hs2-server.cc:442] ExecuteStatement(): request=TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93", 02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 14:15:50.163775 115930 impala-hs2-server.cc:230] TExecuteStatementReq: TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93", 02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 14:15:50.173715 115930 impala-hs2-server.cc:268] TClientRequest.queryOptions: TQueryOptions { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 100, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 24: disable_outermost_topn (bool) = false, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, 32: optimize_partition_key_scans (bool) = false, 33: replica_preference (i32) = 0, 34: schedule_random_replica (bool) = false, 36: disable_streaming_preaggregations (bool) = false, 37: runtime_filter_mode (i32) = 2, 38: runtime_bloom_filter_size (i32) = 1048576, 39: runtime_filter_wait_time_ms (i32) = 0, 40:
[jira] [Created] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.
Attila Jeges created IMPALA-9421: Summary: Metadata operations are slow in impala-shell when using hs2-http with LDAP auth. Key: IMPALA-9421 URL: https://issues.apache.org/jira/browse/IMPALA-9421 Project: IMPALA Issue Type: Improvement Components: Clients Affects Versions: Impala 3.4.0 Reporter: Attila Jeges Show database operation takes over 3-4 seconds in impala-shell when connecting to an CDW Azure environment: {code:java} $ impala-shell.sh --protocol='hs2-http' --ssl -i "coordinator-attilaj-test-impala-vw.env-q52cn6.dwx.workload-dev.cloudera.com:443" -u csso_attilaj -l impala-shell> show database; ++--+ | name | comment | ++--+ | _impala_builtins | System database for Impala builtin functions | | airline_ontime_orc | | | airline_ontime_parquet | | | default | Default Hive database | ++--+ Fetched 4 row(s) in 3.66s {code} impala-coordinator logs show that there are multiple new connections set up and authenticated: {code:java} I0225 14:15:48.976776 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 14:15:48.976878 320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 14:15:48.976912 320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 14:15:48.977216 115929 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 14:15:48.989554 115929 authentication.cc:273] LDAP bind successful I0225 14:15:48.989639 115929 impala-hs2-server.cc:1085] PingImpalaHS2Service(): request=TPingImpalaHS2ServiceReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93", 02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0", }, }, } I0225 14:15:50.152348 317 TAcceptQueueServer.cpp:340] New connection to server hiveserver2-http-frontend from client I0225 14:15:50.152446 321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-http-frontend started connection setup for client I0225 14:15:50.152493 321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: hiveserver2-http-frontend finished connection setup for client I0225 14:15:50.152722 115930 authentication.cc:261] Trying simple LDAP bind for: uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work I0225 14:15:50.163576 115930 authentication.cc:273] LDAP bind successful I0225 14:15:50.163733 115930 impala-hs2-server.cc:442] ExecuteStatement(): request=TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93", 02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 14:15:50.163775 115930 impala-hs2-server.cc:230] TExecuteStatementReq: TExecuteStatementReq { 01: sessionHandle (struct) = TSessionHandle { 01: sessionId (struct) = THandleIdentifier { 01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93", 02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0", }, }, 02: statement (string) = "show databases", 03: confOverlay (map) = map[1] { "CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri Feb 14 14:15:26 CET 2020", }, 04: runAsync (bool) = true, } I0225 14:15:50.173715 115930 impala-hs2-server.cc:268] TClientRequest.queryOptions: TQueryOptions { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 100, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 24: disable_outermost_topn (bool) = false, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, 32: optimize_partition_key_scans (bool) = false, 33: replica_preference
[jira] [Created] (IMPALA-9420) test_scanners.TestOrc.test_type conversions fails after first run
Norbert Luksa created IMPALA-9420: - Summary: test_scanners.TestOrc.test_type conversions fails after first run Key: IMPALA-9420 URL: https://issues.apache.org/jira/browse/IMPALA-9420 Project: IMPALA Issue Type: Bug Reporter: Norbert Luksa Assignee: Gabor Kaszab The mentioned test passes on the first run, but fails later on, finding more rows than expected. By running {code:java} hdfs dfs -ls -R / | grep union_comlextypes {code} we can find that the previously created files are not cleaned up, so Impala will find and scan them. The problem could be that the union_complextypes and ill_complextypes tables are created as externals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9420) test_scanners.TestOrc.test_type conversions fails after first run
[ https://issues.apache.org/jira/browse/IMPALA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Norbert Luksa updated IMPALA-9420: -- Labels: orc ramp-up (was: ) > test_scanners.TestOrc.test_type conversions fails after first run > - > > Key: IMPALA-9420 > URL: https://issues.apache.org/jira/browse/IMPALA-9420 > Project: IMPALA > Issue Type: Bug >Reporter: Norbert Luksa >Assignee: Gabor Kaszab >Priority: Major > Labels: orc, ramp-up > > The mentioned test passes on the first run, but fails later on, finding more > rows than expected. > By running > {code:java} > hdfs dfs -ls -R / | grep union_comlextypes > {code} > we can find that the previously created files are not cleaned up, so Impala > will find and scan them. > The problem could be that the union_complextypes and ill_complextypes tables > are created as externals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org