[jira] [Created] (IMPALA-9426) Download Python dependencies even skipping bootstrap toolchain

2020-02-25 Thread zhaorenhai (Jira)
zhaorenhai created IMPALA-9426:
--

 Summary: Download Python dependencies even skipping bootstrap 
toolchain
 Key: IMPALA-9426
 URL: https://issues.apache.org/jira/browse/IMPALA-9426
 Project: IMPALA
  Issue Type: Sub-task
Reporter: zhaorenhai
Assignee: zhaorenhai


Download Python dependencies even skipping bootstrap toolchain.

Because when you set SKIP_TOOLCHAIN_BOOTSTRAP=true, the python dependencies 
still need to be downloaded.  The toolchain building process will not download 
the python dependencies autometically



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9425) Statestore may fail to report when an impalad has failed

2020-02-25 Thread Thomas Tauber-Marshall (Jira)
Thomas Tauber-Marshall created IMPALA-9425:
--

 Summary: Statestore may fail to report when an impalad has failed
 Key: IMPALA-9425
 URL: https://issues.apache.org/jira/browse/IMPALA-9425
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.4.0
Reporter: Thomas Tauber-Marshall
Assignee: Thomas Tauber-Marshall


If an impalad fails and another is restarted at the same host:port combination 
quickly, the statestore may fail to report to the coordinators that the impalad 
went down.

The reason for this is that in the cluster membership topic, impalads are keyed 
by their statestore subscriber id, which is "impalad@host:port". If the new 
impalad registers itself before a topic update has been generated for a 
particular coordinator, the statestore has no way of knowing that the 
particular key was deleted and then re-added since the last update.

The result is that queries that were running on the impalad that failed may not 
be cancelled by the coordinator until they pass the unresponsive backend 
timeout, which by default is ~12 minutes.

I propose as a solution that we add a concept of uuids for impalads, where each 
impalad will generate its own uuid on startup. This allows us to differentiate 
between different impalads running at the same host:port combination.

It can also be used to simplify some logic in the scheduler and 
ExecutorGroup/ExecutorBlacklist etc. where we currently have data structures 
containing info about impalads that are keyed off host/port combinations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9389) Impala Doc: support reading zstd text files

2020-02-25 Thread Kris Hahn (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045026#comment-17045026
 ] 

Kris Hahn commented on IMPALA-9389:
---

Here are some possible places to document reading zstd files:
 * [Release 
notes|https://impala.apache.org/docs/build/html/topics/impala_new_features.html 
] under  Zstd Compression for Parquet files
* [How Impala Works with Hadoop File 
Formats|https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]
 briefly mention the new functionality in the Zstd bullet.
* [Compressions for Parquet Data 
Files|https://impala.apache.org/docs/build/html/topics/impala_parquet.html] how 
about adding an example of setting and the compression, writing some data, 
reading the file?

> Impala Doc: support reading zstd text files
> ---
>
> Key: IMPALA-9389
> URL: https://issues.apache.org/jira/browse/IMPALA-9389
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Xiaomeng Zhang
>Assignee: Kris Hahn
>Priority: Major
>
> [https://gerrit.cloudera.org/#/c/15023/]
> We add support for reading zstd encoded text files.
> This includes:
>  # support reading zstd file written by Hive which uses streaming.
>  # 2. support reading zstd file compressed by standard zstd library which 
> uses block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9424) Add six python library to shell/ext-py

2020-02-25 Thread David Knupp (Jira)
David Knupp created IMPALA-9424:
---

 Summary: Add six python library to shell/ext-py
 Key: IMPALA-9424
 URL: https://issues.apache.org/jira/browse/IMPALA-9424
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: David Knupp


A couple of impala-shell changes that are coming in the near future 
(thrift_sasl update, possible changes to THttpClient, python 3 support) will 
require the six python library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9389) Impala Doc: support reading zstd text files

2020-02-25 Thread Kris Hahn (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kris Hahn reassigned IMPALA-9389:
-

Assignee: Kris Hahn  (was: Xiaomeng Zhang)

> Impala Doc: support reading zstd text files
> ---
>
> Key: IMPALA-9389
> URL: https://issues.apache.org/jira/browse/IMPALA-9389
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Xiaomeng Zhang
>Assignee: Kris Hahn
>Priority: Major
>
> [https://gerrit.cloudera.org/#/c/15023/]
> We add support for reading zstd encoded text files.
> This includes:
>  # support reading zstd file written by Hive which uses streaming.
>  # 2. support reading zstd file compressed by standard zstd library which 
> uses block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9381) Lazily convert and/or cache different representations of the query profile

2020-02-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9381.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Lazily convert and/or cache different representations of the query profile
> --
>
> Key: IMPALA-9381
> URL: https://issues.apache.org/jira/browse/IMPALA-9381
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> There are some obvious inefficiencies with how the query state record works:
> * We do an unnecessary copy of the archive string when adding it to the query 
> log
> https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1812.
> * We eagerly convert the profile to text and JSON, when in many cases they 
> won't be needed - 
> https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1839
>  . I think it is generally rare for more than one profile format to be 
> downloaded from the web UI. I know of tools that scrape the thrift profile, 
> but the human-readable version would usually only be consumed by humans. We 
> could avoid this by only storing the thrift representation of the profile, 
> then reconstituting the other representations from thrift if requested.
> * After ComputeExecSummary(), the profile shouldn't change, but we'll 
> regenerate the thrift representation for every web request to get the 
> encoded. This may waste a lot of CPU for tools scraping the profiles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9381) Lazily convert and/or cache different representations of the query profile

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044855#comment-17044855
 ] 

ASF subversion and git services commented on IMPALA-9381:
-

Commit 1bd45d295ebfc3f526a98eebb9b61525b9332c91 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1bd45d2 ]

IMPALA-9381: on-demand conversion of runtime profile

Converting the runtime profile to JSON and text representations
at the end of the query used significant CPU and time. These
representations will commonly never be accessed, because
they need to be explicitly requested by a client via the
HTTP debug interface or via a thrift profile request.
So it is a waste of resources to eagerly convert them, and
in particular it is a bad idea to do so on the critical path
of a query.

This commit switches to generating alternative profile
representations on-demand. Only the compressed thrift version
of the profile is stored in QueryStateRecord. This is the
most compact representation of the profile and it is
relatively convenient to convert into other formats.

Also use a move() when constructing QueryStateRecord to avoid
copying the profile unnecessarily.

Fix a couple of potential use-after-free issues where Json
objects generated by RuntimeProfile::ToJson() could reference
strings owned by the object pool. These were detected by
running an ASAN build, because after this change, the temporary
object pool used to hold the deserialized profile was freed before
the JSON tree was returned.

The "kind" field of counters is removed from the JSON profile.
This couldn't be round-tripped correctly through thrift, and
probably isn't necessary. It also helps slim down the profiles.

Also make sure to preserve the "indent" field when round-tripping
to thrift.

Testing:
Ran core tests.

Diffed JSON and text profiles download from web UI from before and
after to make sure there were no unexpected changes as a result
of the round-trip via thrift.

Change-Id: Ic2f5133cc146adc3b044cf4b64aae0a9688449fa
Reviewed-on: http://gerrit.cloudera.org:8080/15236
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Lazily convert and/or cache different representations of the query profile
> --
>
> Key: IMPALA-9381
> URL: https://issues.apache.org/jira/browse/IMPALA-9381
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> There are some obvious inefficiencies with how the query state record works:
> * We do an unnecessary copy of the archive string when adding it to the query 
> log
> https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1812.
> * We eagerly convert the profile to text and JSON, when in many cases they 
> won't be needed - 
> https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1839
>  . I think it is generally rare for more than one profile format to be 
> downloaded from the web UI. I know of tools that scrape the thrift profile, 
> but the human-readable version would usually only be consumed by humans. We 
> could avoid this by only storing the thrift representation of the profile, 
> then reconstituting the other representations from thrift if requested.
> * After ComputeExecSummary(), the profile shouldn't change, but we'll 
> regenerate the thrift representation for every web request to get the 
> encoded. This may waste a lot of CPU for tools scraping the profiles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9423) Fix cookie auth with Knox

2020-02-25 Thread Thomas Tauber-Marshall (Jira)
Thomas Tauber-Marshall created IMPALA-9423:
--

 Summary: Fix cookie auth with Knox
 Key: IMPALA-9423
 URL: https://issues.apache.org/jira/browse/IMPALA-9423
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Thomas Tauber-Marshall
Assignee: Thomas Tauber-Marshall


When Apache Knox is being used to proxy connections to Impala, it used to be 
the case that Knox would return the authentication cookies generated by Impala, 
saving extra round trips and authentications to Kerberos/LDAP.

This was broken by KNOX-2223 - Knox only returns auth cookies that it thinks 
are for it, which it determines by checking for its Kerberos principal in the 
cookie string. With KNOX-2223, the principal is expected to be preceded by a 
'=', which Impala doesn't do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2020-02-25 Thread Thomas Tauber-Marshall (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-8712.

Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9075) Add support for reading zstd text files

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044664#comment-17044664
 ] 

ASF subversion and git services commented on IMPALA-9075:
-

Commit 571131fdc11acecf4c2003668dbccde0667efe07 in impala's branch 
refs/heads/master from xiaomeng
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=571131f ]

IMPALA-9075: Add support for reading zstd text files

In this patch, we add support for reading zstd encoded text files.
This includes:
1. support reading zstd file written by Hive which uses streaming.
2. support reading zstd file compressed by standard zstd library which
uses block.
To support decompressing both formats, a function ProcessBlockStreaming
is added in zstd decompressor.

Testing done:
Added two backend tests:
1. streaming decompress test.
2. large data test for both block and streaming decompress.
Added two end to end tests:
1. hive and impala integration. For four compression codecs, write in
hive and read from impala.
2. zstd library and impala integration. Copy a zstd lib compressed file
to HDFS, and read from impala.

Change-Id: I2adce9fe00190558525fa5cd3d50cf5e0f0b0aa4
Reviewed-on: http://gerrit.cloudera.org:8080/15023
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add support for reading zstd text files
> ---
>
> Key: IMPALA-9075
> URL: https://issues.apache.org/jira/browse/IMPALA-9075
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Andrew Sherman
>Assignee: Xiaomeng Zhang
>Priority: Critical
>
> IMPALA-8450 added support for zstd in parquet.
> We should also support support for reading  zstd encoded text files.
> Another useful jira to look at is IMPALA-8549 (Add support for scanning 
> DEFLATE text files)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044665#comment-17044665
 ] 

ASF subversion and git services commented on IMPALA-8852:
-

Commit 777d0d203f138183d65885b523b619421b487714 in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=777d0d2 ]

IMPALA-8852: Skip short-circuit config check for dedicated coordinator

ImpalaD should not abort when running as dedicated coodinator and DataNode is
not available on the host. This change adds a condition to skip the short-
circuit socket path directory checks when ImpalaD is started with
'is_executor=false' flag.

Testing:
 - Added test to JniFrontendTest.java to verify the short-circuit directory
check is skipped if ImpalaD is started as dedicated coordinator mode.
 - Manually tested the appearance of the warning message with:
start-impala-cluster.py --num_coordinators 1 --use_exclusive_coordinators true

Change-Id: I373d4037f4cee203322a398b77b75810ba708bb5
Reviewed-on: http://gerrit.cloudera.org:8080/15173
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0
>Reporter: Adriano
>Assignee: Tamas Mate
>Priority: Major
>  Labels: ramp-up
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9226) Improve string allocations of the ORC scanner

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044670#comment-17044670
 ] 

ASF subversion and git services commented on IMPALA-9226:
-

Commit f22812144279a3f722fcace5925cfb2f52efb598 in impala's branch 
refs/heads/master from norbert.luksa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f228121 ]

IMPALA-9226: Bump ORC version to 1.6.2-p7

Bump ORC version to include patch for ORC-600 that
unblocks IMPALA-9226.

Tests:
 - Run scanner tests for orc/def/block.

Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65
Reviewed-on: http://gerrit.cloudera.org:8080/15287
Reviewed-by: Gabor Kaszab 
Tested-by: Impala Public Jenkins 


> Improve string allocations of the ORC scanner
> -
>
> Key: IMPALA-9226
> URL: https://issues.apache.org/jira/browse/IMPALA-9226
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Norbert Luksa
>Priority: Major
>  Labels: orc
>
> Currently the ORC scanner allocates new memory for each string values (except 
> for fixed size strings):
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]
> Besides the too many allocations and copying it's also bad for memory 
> locality.
> Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
> strings in the batch: 
> [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]
> 'blob' has type DataBuffer which is movable, so Impala might be able to get 
> ownership of it. Or, at least we could copy the whole blob array instead of 
> copying the strings one-by-one.
> ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
> 1.5.5.
> ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:
> [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]
> It uses dictionary encoding for storing the values. Impala could copy/move 
> the dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9226) Improve string allocations of the ORC scanner

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044669#comment-17044669
 ] 

ASF subversion and git services commented on IMPALA-9226:
-

Commit f22812144279a3f722fcace5925cfb2f52efb598 in impala's branch 
refs/heads/master from norbert.luksa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f228121 ]

IMPALA-9226: Bump ORC version to 1.6.2-p7

Bump ORC version to include patch for ORC-600 that
unblocks IMPALA-9226.

Tests:
 - Run scanner tests for orc/def/block.

Change-Id: I444bfac435e5b05eee1ff7c8cf6a32ff5b65
Reviewed-on: http://gerrit.cloudera.org:8080/15287
Reviewed-by: Gabor Kaszab 
Tested-by: Impala Public Jenkins 


> Improve string allocations of the ORC scanner
> -
>
> Key: IMPALA-9226
> URL: https://issues.apache.org/jira/browse/IMPALA-9226
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Norbert Luksa
>Priority: Major
>  Labels: orc
>
> Currently the ORC scanner allocates new memory for each string values (except 
> for fixed size strings):
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]
> Besides the too many allocations and copying it's also bad for memory 
> locality.
> Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
> strings in the batch: 
> [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]
> 'blob' has type DataBuffer which is movable, so Impala might be able to get 
> ownership of it. Or, at least we could copy the whole blob array instead of 
> copying the strings one-by-one.
> ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
> 1.5.5.
> ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:
> [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]
> It uses dictionary encoding for storing the values. Impala could copy/move 
> the dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044667#comment-17044667
 ] 

ASF subversion and git services commented on IMPALA-8712:
-

Commit 1e616774d4d3a00e002d1e383ccd89c46f6d9010 in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e61677 ]

IMPALA-8712: Make ExecQueryFInstances async

This patch refactors the ExecQueryFInstances rpc to be asychronous.
Previously, Impala would issue all the Exec()s, wait for all of them
to complete, and then check if any of them resulted in an error. We
now stop issuing Exec()s and cancel any that are still in flight as
soon as an error occurs.

It also performs some cleanup around the thread safety of
Coordinator::BackendState, including adding comments and DCHECKS.

=== Exec RPC Thread Pool ===
This patch also removes the 'exec_rpc_thread_pool_' from ExecEnv. This
thread pool was used to partially simulate async Exec() prior to the
switch to KRPC, which provides built-in async rpc capabilities.

Removing this thread pool has potential performance implications, as
it means that the Exec() parameters are serialized in serialize rather
than in parallel (with the level of parallelism determined by the size
of the thread pool, which was configurable by an Advanced flag and
defaulted to 12).

To ensure we don't regress query startup times, I did some performance
testing. All tests were done on a 10 node cluster. The baseline used
for the tests did not include IMPALA-9181, a perf optimization for
query startup done to facilitate this work.

I ran TPCH 100 at concurrency levels of 1, 4, and 8 and extracted the
query startup times from the profiles. For each concurrency level, the
average regression in query startup time was < 2ms. Because query e2e
running time was much longer than this, there was no noticable change
in total query time.

I also ran a 'worst case scenario' with a table with 10,000 pertitions
to create a very large Exec() payload to serialize (~1.21MB vs.
~10KB-30KB for TPCH 100). Again, change in query startup time was
neglible.


Testing:
- Added a e2e test that verifies that a query where an Exec() fails
  doesn't wait for all Exec()s to complete before cancelling and
  returning the error to the client.

Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32
Reviewed-on: http://gerrit.cloudera.org:8080/15154
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 


> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9181) Serialize TQueryCtx once per query

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044668#comment-17044668
 ] 

ASF subversion and git services commented on IMPALA-9181:
-

Commit 1e616774d4d3a00e002d1e383ccd89c46f6d9010 in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e61677 ]

IMPALA-8712: Make ExecQueryFInstances async

This patch refactors the ExecQueryFInstances rpc to be asychronous.
Previously, Impala would issue all the Exec()s, wait for all of them
to complete, and then check if any of them resulted in an error. We
now stop issuing Exec()s and cancel any that are still in flight as
soon as an error occurs.

It also performs some cleanup around the thread safety of
Coordinator::BackendState, including adding comments and DCHECKS.

=== Exec RPC Thread Pool ===
This patch also removes the 'exec_rpc_thread_pool_' from ExecEnv. This
thread pool was used to partially simulate async Exec() prior to the
switch to KRPC, which provides built-in async rpc capabilities.

Removing this thread pool has potential performance implications, as
it means that the Exec() parameters are serialized in serialize rather
than in parallel (with the level of parallelism determined by the size
of the thread pool, which was configurable by an Advanced flag and
defaulted to 12).

To ensure we don't regress query startup times, I did some performance
testing. All tests were done on a 10 node cluster. The baseline used
for the tests did not include IMPALA-9181, a perf optimization for
query startup done to facilitate this work.

I ran TPCH 100 at concurrency levels of 1, 4, and 8 and extracted the
query startup times from the profiles. For each concurrency level, the
average regression in query startup time was < 2ms. Because query e2e
running time was much longer than this, there was no noticable change
in total query time.

I also ran a 'worst case scenario' with a table with 10,000 pertitions
to create a very large Exec() payload to serialize (~1.21MB vs.
~10KB-30KB for TPCH 100). Again, change in query startup time was
neglible.


Testing:
- Added a e2e test that verifies that a query where an Exec() fails
  doesn't wait for all Exec()s to complete before cancelling and
  returning the error to the client.

Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32
Reviewed-on: http://gerrit.cloudera.org:8080/15154
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 


> Serialize TQueryCtx once per query
> --
>
> Key: IMPALA-9181
> URL: https://issues.apache.org/jira/browse/IMPALA-9181
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> When issuing Exec() rpcs to backends, we currently serialize the TQueryCtx 
> once per backend. This is inefficient as the TQueryCtx is the same for all 
> backends and really only needs to be serialized once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9422) Improve join builder profiles

2020-02-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-9422:
-

Assignee: Tim Armstrong

> Improve join builder profiles
> -
>
> Key: IMPALA-9422
> URL: https://issues.apache.org/jira/browse/IMPALA-9422
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: multithreading
>
> We should clean up/improve the join builder profiles for the separate build.
> First, for the separate build, we should ensure that all time spent in the 
> builder is counted against the builder. E.g. calls into public methods like 
> BeginSpilledProbe(). These should be counted as idle time for the actual join 
> implementation, so that we can see that the time is spent in the (serial) 
> builder instead of the (parallel) probe.
> We might need to fix things like Send() being called by 
> RepartitionBuildInput, resulting in double counting.
> Second, we should revisit the assortment of timers - BuildRowsPartitionTime, 
> HashTablesBuildTime, RepartitionTime. Maybe it makes sense to make them child 
> counters of total time to make the relationship clearer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9422) Improve join builder profiles

2020-02-25 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-9422:
-

 Summary: Improve join builder profiles
 Key: IMPALA-9422
 URL: https://issues.apache.org/jira/browse/IMPALA-9422
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong


We should clean up/improve the join builder profiles for the separate build.

First, for the separate build, we should ensure that all time spent in the 
builder is counted against the builder. E.g. calls into public methods like 
BeginSpilledProbe(). These should be counted as idle time for the actual join 
implementation, so that we can see that the time is spent in the (serial) 
builder instead of the (parallel) probe.

We might need to fix things like Send() being called by RepartitionBuildInput, 
resulting in double counting.

Second, we should revisit the assortment of timers - BuildRowsPartitionTime, 
HashTablesBuildTime, RepartitionTime. Maybe it makes sense to make them child 
counters of total time to make the relationship clearer.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.

2020-02-25 Thread Attila Jeges (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges updated IMPALA-9421:
-
Description: 
Show database operation takes ~ 3 - 4 seconds, sometimes ~ 8 - 9 seconds in 
impala-shell when connecting to a coordinator using hs2-http with LDAP 
authentication:
{code:java}
$ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u 
username -l

impala-shell> show database;
++--+
| name | comment |
++--+
| _impala_builtins | System database for Impala builtin functions |
| airline_ontime_orc | |
| airline_ontime_parquet | |
| default | Default Hive database |
++--+
Fetched 4 row(s) in 8.87s
{code}
impala-coordinator logs show that there are multiple new connections set up and 
authenticated:
{code:java}
I0225 16:07:58.143942   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 16:07:58.144042   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 16:07:58.144101   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 16:07:58.144338 128883 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 16:07:58.155827 128883 authentication.cc:273] LDAP bind successful
I0225 16:07:58.155901 128883 impala-hs2-server.cc:1085] PingImpalaHS2Service(): 
request=TPingImpalaHS2ServiceReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h",
  02: secret (string) = 
"\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
},
  },
}
I0225 16:07:58.876168   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 16:07:58.876317   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 16:07:58.876364   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 16:07:58.876847 128884 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 16:07:58.887931 128884 authentication.cc:273] LDAP bind successful
I0225 16:07:58.888008 128884 impala-hs2-server.cc:442] ExecuteStatement(): 
request=TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h",
  02: secret (string) = 
"\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 16:07:58.888049 128884 impala-hs2-server.cc:230] TExecuteStatementReq: 
TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "\xab\x9bS/\r\xd1@\xab\x862z\xee(#\x14h",
  02: secret (string) = 
"\x81\x84\xf0\x7f\v\xac@\x9a\x9b\x9e\xdf#\xa1\xc3\xc4\x04",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 16:07:58.898981 128884 impala-hs2-server.cc:268] 
TClientRequest.queryOptions: TQueryOptions {
  01: abort_on_error (bool) = false,
  02: max_errors (i32) = 100,
  03: disable_codegen (bool) = false,
  04: batch_size (i32) = 0,
  05: num_nodes (i32) = 0,
  06: max_scan_range_length (i64) = 0,
  07: num_scanner_threads (i32) = 0,
  11: debug_action (string) = "",
  12: mem_limit (i64) = 0,
  15: hbase_caching (i32) = 0,
  16: hbase_cache_blocks (bool) = false,
  17: parquet_file_size (i64) = 0,
  18: explain_level (i32) = 1,
  19: sync_ddl (bool) = false,
  24: disable_outermost_topn (bool) = false,
  26: query_timeout_s (i32) = 0,
  28: appx_count_distinct (bool) = false,
  29: disable_unsafe_spills (bool) = false,
  31: exec_single_node_rows_threshold (i32) = 100,
  32: optimize_partition_key_scans (bool) = false,
  33: replica_preference (i32) = 0,
  34: schedule_random_replica (bool) = false,
  36: disable_streaming_preaggregations (bool) = false,
  37: runtime_filter_mode (i32) = 2,
  38: runtime_bloom_filter_size (i32) = 1048576,
  39: 

[jira] [Resolved] (IMPALA-7496) Schedule query taking in account the mem available on the impalad nodes

2020-02-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7496.
---
Resolution: Later

It's unclear that we want to do this - it's a pretty significant change to how 
Impala works and it has complex interactions with data locality, scheduling, 
etc.

The executor group support we added in the meantime can solve some use cases 
like this but in a simpler and more predictable way - the query will run on the 
first executor group with resources.

Closing as "later" to indicate that it might be something to revisit later.

> Schedule query taking in account the mem available on the impalad nodes
> ---
>
> Key: IMPALA-7496
> URL: https://issues.apache.org/jira/browse/IMPALA-7496
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Adriano
>Priority: Major
>  Labels: admission-control, resource-management, scheduler
>
> Environment description: cluster scale (50/100/150 nodes and terabyte of ram 
> available) - Admission Control enabled.
> Issue description:
> Despite the coordinator chosen (with data and statistics unchanged) a query 
> will be planned always in the same way based on the metainfo that the 
> coordinator have.
> The query will be scheduled always on the same nodes if the memory 
> requirements for the admission are satisfied:
> https://github.com/cloudera/Impala/blob/cdh5-2.7.0_5.9.1/be/src/scheduling/admission-controller.cc#L307-L333
> Equal queries are planned/scheduled always in the same way (to hit always the 
> same nodes). 
> This often lead to queue the queries that are hitting the same nodes are 
> queued (not admitted) as on those nodes there's no more memory available 
> within its process limit despite the pool have lot of free memory and the 
> overall cluster load is low.
> When the plan is finished and the query can be evaluated to be admitted often 
> happen that the admission is denied because one of the node have not enough 
> memory to run the query operation (and the query is moved in the pool queue) 
> despite the cluster have 50/100/150 nodes and terabyte of ram available.
> Why the scheduler does not take in consideration the memory available on the 
> nodes involved in the query before to buid the schedule, (maybe preferring a 
> remote read/operation on a free memory node instead to include in the plan 
> always the same nodes that will end to be:
> 1- overloaded
> 2- the query will be not immediately admitted, risking to be timedout in the 
> pool queue
> Since 2.7 the REPLICA_PREFERENCE can possibly help, but it's not good enough 
> as it does not prevent the scheduler to choose busy nodes (with the same 
> potential effect: query queued for lack of resource on specific node despite 
> there are terabytes of free memory).
> Feature Request:
> It would be good if Impala had an option to execute queries (even with worse 
> performance) excluding the nodes overloaded and including different nodes in 
> order to get the query immediately admitted and executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.

2020-02-25 Thread Attila Jeges (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges updated IMPALA-9421:
-
Description: 
Show database operation takes over 3-4 seconds in impala-shell when connecting 
to a coordinator using hs2-http with LDAP authentication:
{code:java}
$ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u 
username -l

impala-shell> show database;
++--+
| name | comment |
++--+
| _impala_builtins | System database for Impala builtin functions |
| airline_ontime_orc | |
| airline_ontime_parquet | |
| default | Default Hive database |
++--+

Fetched 4 row(s) in 3.79s
{code}
impala-coordinator logs show that there are multiple new connections set up and 
authenticated:
{code:java}
I0225 15:45:28.478219   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 15:45:28.478384   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 15:45:28.478454   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 15:45:28.478729 126270 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 15:45:28.491451 126270 authentication.cc:273] LDAP bind successful
I0225 15:45:28.491571 126270 impala-hs2-server.cc:1085] PingImpalaHS2Service(): 
request=TPingImpalaHS2ServiceReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = 
"\xd7U\x11\x89\xf0\xd4J\x12\xbc\x9c\x0e\x19\xff\xd5\xec?",
  02: secret (string) = 
"\xcd\xf9\x86\a\x90\xa6E\xaf\x92\x19\xee\x1e6S\xea\x85",
},
  },
}
I0225 15:45:29.199357   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 15:45:29.199455   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 15:45:29.199498   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 15:45:29.199753 126271 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 15:45:29.210222 126271 authentication.cc:273] LDAP bind successful
I0225 15:45:29.210384 126271 impala-hs2-server.cc:442] ExecuteStatement(): 
request=TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = 
"\xd7U\x11\x89\xf0\xd4J\x12\xbc\x9c\x0e\x19\xff\xd5\xec?",
  02: secret (string) = 
"\xcd\xf9\x86\a\x90\xa6E\xaf\x92\x19\xee\x1e6S\xea\x85",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 15:45:29.210427 126271 impala-hs2-server.cc:230] TExecuteStatementReq: 
TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = 
"\xd7U\x11\x89\xf0\xd4J\x12\xbc\x9c\x0e\x19\xff\xd5\xec?",
  02: secret (string) = 
"\xcd\xf9\x86\a\x90\xa6E\xaf\x92\x19\xee\x1e6S\xea\x85",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 15:45:29.220592 126271 impala-hs2-server.cc:268] 
TClientRequest.queryOptions: TQueryOptions {
  01: abort_on_error (bool) = false,
  02: max_errors (i32) = 100,
  03: disable_codegen (bool) = false,
  04: batch_size (i32) = 0,
  05: num_nodes (i32) = 0,
  06: max_scan_range_length (i64) = 0,
  07: num_scanner_threads (i32) = 0,
  11: debug_action (string) = "",
  12: mem_limit (i64) = 0,
  15: hbase_caching (i32) = 0,
  16: hbase_cache_blocks (bool) = false,
  17: parquet_file_size (i64) = 0,
  18: explain_level (i32) = 1,
  19: sync_ddl (bool) = false,
  24: disable_outermost_topn (bool) = false,
  26: query_timeout_s (i32) = 0,
  28: appx_count_distinct (bool) = false,
  29: disable_unsafe_spills (bool) = false,
  31: exec_single_node_rows_threshold (i32) = 100,
  32: optimize_partition_key_scans (bool) = false,
  33: replica_preference (i32) = 0,
  34: schedule_random_replica (bool) = false,
  36: disable_streaming_preaggregations (bool) = false,
  37: runtime_filter_mode (i32) = 2,
  38: runtime_bloom_filter_size (i32) = 1048576,
  39: 

[jira] [Updated] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.

2020-02-25 Thread Attila Jeges (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges updated IMPALA-9421:
-
Description: 
Show database operation takes over 3-4 seconds in impala-shell when connecting 
to a coordinator using hs2-http with LDAP authentication:
{code:java}
$ impala-shell.sh --protocol='hs2-http' --ssl -i "impala-coordinator:443" -u 
username -l

impala-shell> show database;
++--+
| name | comment |
++--+
| _impala_builtins | System database for Impala builtin functions |
| airline_ontime_orc | |
| airline_ontime_parquet | |
| default | Default Hive database |
++--+

Fetched 4 row(s) in 3.66s
{code}
impala-coordinator logs show that there are multiple new connections set up and 
authenticated:
{code:java}
I0225 14:15:48.976776   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 14:15:48.976878   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 14:15:48.976912   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 14:15:48.977216 115929 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 14:15:48.989554 115929 authentication.cc:273] LDAP bind successful
I0225 14:15:48.989639 115929 impala-hs2-server.cc:1085] PingImpalaHS2Service(): 
request=TPingImpalaHS2ServiceReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93",
  02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0",
},
  },
}
I0225 14:15:50.152348   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 14:15:50.152446   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 14:15:50.152493   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 14:15:50.152722 115930 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 14:15:50.163576 115930 authentication.cc:273] LDAP bind successful
I0225 14:15:50.163733 115930 impala-hs2-server.cc:442] ExecuteStatement(): 
request=TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93",
  02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 14:15:50.163775 115930 impala-hs2-server.cc:230] TExecuteStatementReq: 
TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93",
  02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 14:15:50.173715 115930 impala-hs2-server.cc:268] 
TClientRequest.queryOptions: TQueryOptions {
  01: abort_on_error (bool) = false,
  02: max_errors (i32) = 100,
  03: disable_codegen (bool) = false,
  04: batch_size (i32) = 0,
  05: num_nodes (i32) = 0,
  06: max_scan_range_length (i64) = 0,
  07: num_scanner_threads (i32) = 0,
  11: debug_action (string) = "",
  12: mem_limit (i64) = 0,
  15: hbase_caching (i32) = 0,
  16: hbase_cache_blocks (bool) = false,
  17: parquet_file_size (i64) = 0,
  18: explain_level (i32) = 1,
  19: sync_ddl (bool) = false,
  24: disable_outermost_topn (bool) = false,
  26: query_timeout_s (i32) = 0,
  28: appx_count_distinct (bool) = false,
  29: disable_unsafe_spills (bool) = false,
  31: exec_single_node_rows_threshold (i32) = 100,
  32: optimize_partition_key_scans (bool) = false,
  33: replica_preference (i32) = 0,
  34: schedule_random_replica (bool) = false,
  36: disable_streaming_preaggregations (bool) = false,
  37: runtime_filter_mode (i32) = 2,
  38: runtime_bloom_filter_size (i32) = 1048576,
  39: runtime_filter_wait_time_ms (i32) = 0,
  40: 

[jira] [Created] (IMPALA-9421) Metadata operations are slow in impala-shell when using hs2-http with LDAP auth.

2020-02-25 Thread Attila Jeges (Jira)
Attila Jeges created IMPALA-9421:


 Summary: Metadata operations are slow in impala-shell when using 
hs2-http with LDAP auth.
 Key: IMPALA-9421
 URL: https://issues.apache.org/jira/browse/IMPALA-9421
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Affects Versions: Impala 3.4.0
Reporter: Attila Jeges


Show database operation takes over 3-4 seconds in impala-shell when connecting 
to an CDW Azure environment:
{code:java}
$ impala-shell.sh --protocol='hs2-http' --ssl -i 
"coordinator-attilaj-test-impala-vw.env-q52cn6.dwx.workload-dev.cloudera.com:443"
 -u csso_attilaj -l

impala-shell> show database;
++--+
| name | comment |
++--+
| _impala_builtins | System database for Impala builtin functions |
| airline_ontime_orc | |
| airline_ontime_parquet | |
| default | Default Hive database |
++--+

Fetched 4 row(s) in 3.66s
{code}
impala-coordinator logs show that there are multiple new connections set up and 
authenticated:
{code:java}
I0225 14:15:48.976776   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 14:15:48.976878   320 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 14:15:48.976912   320 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 14:15:48.977216 115929 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 14:15:48.989554 115929 authentication.cc:273] LDAP bind successful
I0225 14:15:48.989639 115929 impala-hs2-server.cc:1085] PingImpalaHS2Service(): 
request=TPingImpalaHS2ServiceReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93",
  02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0",
},
  },
}
I0225 14:15:50.152348   317 TAcceptQueueServer.cpp:340] New connection to 
server hiveserver2-http-frontend from client 
I0225 14:15:50.152446   321 TAcceptQueueServer.cpp:227] TAcceptQueueServer: 
hiveserver2-http-frontend started connection setup for client 
I0225 14:15:50.152493   321 TAcceptQueueServer.cpp:245] TAcceptQueueServer: 
hiveserver2-http-frontend finished connection setup for client 
I0225 14:15:50.152722 115930 authentication.cc:261] Trying simple LDAP bind 
for: 
uid=csso_attilaj,cn=users,cn=accounts,dc=attilaj,dc=xcu2-8y8x,dc=dev,dc=cldr,dc=work
I0225 14:15:50.163576 115930 authentication.cc:273] LDAP bind successful
I0225 14:15:50.163733 115930 impala-hs2-server.cc:442] ExecuteStatement(): 
request=TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93",
  02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 14:15:50.163775 115930 impala-hs2-server.cc:230] TExecuteStatementReq: 
TExecuteStatementReq {
  01: sessionHandle (struct) = TSessionHandle {
01: sessionId (struct) = THandleIdentifier {
  01: guid (string) = "#\x8f\xdf\x01\xd7\xd6Bv\xa5\xec\xcd\x17Q\xb9q\x93",
  02: secret (string) = "\xd6\xaaO\v\xedXE!\x89}x\xbds\x1f\xe1\xf0",
},
  },
  02: statement (string) = "show databases",
  03: confOverlay (map) = map[1] {
"CLIENT_IDENTIFIER" -> "Impala Shell v3.4.0-SNAPSHOT (cad1561) built on Fri 
Feb 14 14:15:26 CET 2020",
  },
  04: runAsync (bool) = true,
}
I0225 14:15:50.173715 115930 impala-hs2-server.cc:268] 
TClientRequest.queryOptions: TQueryOptions {
  01: abort_on_error (bool) = false,
  02: max_errors (i32) = 100,
  03: disable_codegen (bool) = false,
  04: batch_size (i32) = 0,
  05: num_nodes (i32) = 0,
  06: max_scan_range_length (i64) = 0,
  07: num_scanner_threads (i32) = 0,
  11: debug_action (string) = "",
  12: mem_limit (i64) = 0,
  15: hbase_caching (i32) = 0,
  16: hbase_cache_blocks (bool) = false,
  17: parquet_file_size (i64) = 0,
  18: explain_level (i32) = 1,
  19: sync_ddl (bool) = false,
  24: disable_outermost_topn (bool) = false,
  26: query_timeout_s (i32) = 0,
  28: appx_count_distinct (bool) = false,
  29: disable_unsafe_spills (bool) = false,
  31: exec_single_node_rows_threshold (i32) = 100,
  32: optimize_partition_key_scans (bool) = false,
  33: replica_preference 

[jira] [Created] (IMPALA-9420) test_scanners.TestOrc.test_type conversions fails after first run

2020-02-25 Thread Norbert Luksa (Jira)
Norbert Luksa created IMPALA-9420:
-

 Summary: test_scanners.TestOrc.test_type conversions fails after 
first run
 Key: IMPALA-9420
 URL: https://issues.apache.org/jira/browse/IMPALA-9420
 Project: IMPALA
  Issue Type: Bug
Reporter: Norbert Luksa
Assignee: Gabor Kaszab


The mentioned test passes on the first run, but fails later on, finding more 
rows than expected.
 By running
{code:java}
hdfs dfs -ls -R / | grep union_comlextypes
{code}
we can find that the previously created files are not cleaned up, so Impala 
will find and scan them.

The problem could be that the union_complextypes and ill_complextypes tables 
are created as externals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9420) test_scanners.TestOrc.test_type conversions fails after first run

2020-02-25 Thread Norbert Luksa (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Luksa updated IMPALA-9420:
--
Labels: orc ramp-up  (was: )

> test_scanners.TestOrc.test_type conversions fails after first run
> -
>
> Key: IMPALA-9420
> URL: https://issues.apache.org/jira/browse/IMPALA-9420
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Norbert Luksa
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: orc, ramp-up
>
> The mentioned test passes on the first run, but fails later on, finding more 
> rows than expected.
>  By running
> {code:java}
> hdfs dfs -ls -R / | grep union_comlextypes
> {code}
> we can find that the previously created files are not cleaned up, so Impala 
> will find and scan them.
> The problem could be that the union_complextypes and ill_complextypes tables 
> are created as externals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org