[jira] [Created] (KUDU-3595) Add a way to set Kudu client's rpc_max_message_size via KuduClientBuilder

2024-07-24 Thread Joe McDonnell (Jira)
Joe McDonnell created KUDU-3595:
---

 Summary: Add a way to set Kudu client's rpc_max_message_size via 
KuduClientBuilder
 Key: KUDU-3595
 URL: https://issues.apache.org/jira/browse/KUDU-3595
 Project: Kudu
  Issue Type: Task
  Components: client
Affects Versions: 1.17.0
Reporter: Joe McDonnell


In some Impala workloads, we have seen issues fetching data from Kudu, because 
the RPC message size exceeds the rpc_max_message_size for the Kudu client 
(which defaults to 50MB). This is likely due to very large strings or binary 
data. See IMPALA-13202.

We should add a way to tune the rpc_max_message_size for the Kudu client. In 
discussions on IMPALA-13202, the option we preferred is to add a way to specify 
this on the KuduClientBuilder (similar to how we handle num_reactors today).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KUDU-3484) Kudu C++ client needs to invalidate cache if Java client issued a DDL op on same partition

2023-05-30 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727684#comment-17727684
 ] 

Joe McDonnell commented on KUDU-3484:
-

[~araina] From Impala's point of view, we want the Kudu C++ client to behave 
transparently as if the cache doesn't exist. I think #3 is closest to that 
approach, so here is a sketch of #3:
 # Operation arrives
 # Take a timestamp before accessing the cache
 # Try with the cache
 ## If success, then everything is the same
 # If failure, look up the tablet
 # If the tablet is older than the timestamp from before the cache access, then 
the tablet existed before the operation started. Retry with the new tablet.
 # If the tablet is newer than the timestamp from before the cache access, 
throw an error (tablet not found)

If necessary, Impala can pass in a timestamp for when the operation starts.

> Kudu C++ client needs to invalidate cache if Java client issued a DDL op on 
> same partition
> --
>
> Key: KUDU-3484
> URL: https://issues.apache.org/jira/browse/KUDU-3484
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Ashwani Raina
>Assignee: Ashwani Raina
>Priority: Major
>
> This Jira is created to track the work for improve Kudu C++ client's 
> metacache integrity when both C++ as well Java clients are used on same 
> partition in a particular sequence of steps that result in cache entry 
> becoming stale in Kudu C++ client.
> Here is a detailed step-wise execution of a test that could result in such a 
> situation:
> +++
> metacache at Kudu client doesn't cleanup the old tablet from its cache after 
> new partition with same range is created.
> This new tablet id is the valid one and hence received from server response 
> and should be used everywhere onwards.
> If we look at the steps to repro:
> 1. We create a table first with following query:
> +++
> /** 1. Create table **/ drop table if exists impala_crash; create table if 
> not exists impala_crash ( dt string, col string, primary key(dt) ) partition 
> by range(dt) ( partition values <= '' ) stored as kudu;
> +++
> 2. Then, table is altered by adding a partition with range 20230301:
> +++
> alter table impala_crash drop if exists range partition value='20230301'; 
> alter table impala_crash add if not exists range partition value='20230301'; 
> insert into impala_crash values ('20230301','abc');
> +++
> 3. Then, we alter the table again by adding a partition with same range after 
> deleting old partition:
> +++
> alter table impala_crash drop if exists range partition value='20230301'; 
> alter table impala_crash add if not exists range partition value='20230301'; 
> insert into impala_crash values ('20230301','abc');
> +++
> Even though old partition is dropped and new one is added, old cache entry 
> (with old tablet id) reference still remains in kudu client metacache, 
> although it is marked as stale.
> When we try to write the new value to same range, it first searches entry 
> (using tablet id) inside the metacache and finds it to be stale. As a result, 
> rpc lookup is issued which connects with server and fetches payload response 
> that contains new tablet id as there is no old tablet entry on server 
> anymore. This new tablet id is recorded in client metacache. When PickLeader 
> resumes again, it goes into rpc lookup cycle which now does a successful 
> fastpath lookup because the latest entry is present in cache. But when its 
> callback is invoked, it again resumes work with old tablet id at hand which 
> never gets updated.
> +++
> Different approaches were discussed to address this. Following are some of 
> the approaches captured here for posterity:
> +++
> 1. Maintain a context in impala that can be shared among different clients. 
> The same context can be used to notify the c++ client to get rid of cache if 
> there has been set of operations that could invalidate a cache. Simply 
> passing tablet id may not work because that may not be enough for a client 
> take the decision.
> 2. Impala sends a hint to c++ client to remove the cache entry after a DDL 
> operation (invoked via java client) and perform a remote lookup instead of 
> relying on the local cache.
> 3. Kudu detects the problem internally and returns up to RPC layer and there 
> it over-writes the rpc structure with new tablet object and retry. This is a 
> tricky and unclean approach and has potential of introducing bugs.
> 4. Change the tablet id in the RPC itself. This is a non-trivial and error 
> prone approach as tablet id is defined const and implementation of rpc, 
> batcher and client is done with assumption that tablet id becomes read-only 
> after RPC is registered for an incoming op.
> +++
> The likelihood of 

[jira] [Resolved] (KUDU-3475) Build on Ubuntu 20 ARM hits errors due to redeclaration of vld1q_u8_x4

2023-05-19 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved KUDU-3475.
-
Fix Version/s: 1.17.0
 Assignee: Joe McDonnell
   Resolution: Fixed

> Build on Ubuntu 20 ARM hits errors due to redeclaration of vld1q_u8_x4
> --
>
> Key: KUDU-3475
> URL: https://issues.apache.org/jira/browse/KUDU-3475
> Project: Kudu
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 1.17.0
>
>
> When building on an Ubunu 20 ARM machine, it now uses GCC 9.4, which includes 
> a definition of vld1q_u8_x4. The build fails with messages like this:
> {noformat}
> 20:28:53 In file included from 
> /home/ubuntu/kudu/src/kudu/util/group_varint-inl.h:25,
> 20:28:53  from 
> /home/ubuntu/kudu/src/kudu/util/group_varint.cc:18:
> 20:28:53 /home/ubuntu/kudu/src/kudu/util/sse2neon.h:184:27: error: 
> ‘uint8x16x4_t vld1q_u8_x4(const uint8_t*)’ redeclared inline without 
> ‘gnu_inline’ attribute
> 20:28:53   184 | FORCE_INLINE uint8x16x4_t vld1q_u8_x4(const uint8_t *p) {
> 20:28:53   |   ^~~
> 20:28:53 In file included from /home/ubuntu/kudu/src/kudu/util/sse2neon.h:66,
> 20:28:53  from 
> /home/ubuntu/kudu/src/kudu/util/group_varint-inl.h:25,
> 20:28:53  from 
> /home/ubuntu/kudu/src/kudu/util/group_varint.cc:18:
> 20:28:53 /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h:18122:1: note: 
> ‘uint8x16x4_t vld1q_u8_x4(const uint8_t*)’ previously defined here
> 20:28:53 18122 | vld1q_u8_x4 (const uint8_t *__a)
> 20:28:53   | ^~~{noformat}
> There have been major changes in the logic of sse2neon.h over the past couple 
> years. The code uses a different name to avoid collisions and it has more 
> sophisticated version checks.
> See these commits: 
> [https://github.com/DLTcollab/sse2neon/commit/e96c9818e25f019629a6b96f62382d42179eab3c]
> [https://github.com/DLTcollab/sse2neon/commit/26011f2ca7f22fd2b93b85fa84a2465ffc489710]
> One possible fix is to update sse2neon to a more recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KUDU-3474) Build on Ubuntu 20 ARM fails if zlib is installed

2023-05-19 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved KUDU-3474.
-
Fix Version/s: 1.17.0
 Assignee: Joe McDonnell
   Resolution: Fixed

> Build on Ubuntu 20 ARM fails if zlib is installed
> -
>
> Key: KUDU-3474
> URL: https://issues.apache.org/jira/browse/KUDU-3474
> Project: Kudu
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 1.17.0
>
>
> Here is a quick note on a Kudu build failure I saw on an Ubuntu 20 ARM 
> machine recently. The Kudu thirdparty builds fine, but then when we get to 
> building Kudu it fails with this:
>  
> {noformat}
> 21:07:31 [  7%] Linking CXX executable ../../../bin/protoc-gen-insertions
> 21:07:31 
> /home/ubuntu/kudu/thirdparty/src/libunwind-1.5.0/src/dwarf/Gfind_proc_info-lsb.c:140:
>  error: undefined reference to 'uncompress'
> 21:07:32 collect2: error: ld returned 1 exit status
> 21:07:32 make[2]: *** 
> [src/kudu/util/CMakeFiles/protoc-gen-insertions.dir/build.make:113: 
> bin/protoc-gen-insertions] Error 1{noformat}
> Here's what is going on:
>  
>  # libunwind's .debug_frame support is enabled for ARM/aarch64, but it is 
> disabled for other platforms. 
> [https://github.com/libunwind/libunwind/blob/master/configure.ac#L262-L276]
>  # The .debug_frame support uses zlib uncompress if zlib is available. 
> [https://github.com/libunwind/libunwind/blob/master/src/dwarf/Gfind_proc_info-lsb.c#L139-L168]
> [https://github.com/libunwind/libunwind/blob/master/configure.ac#L322-L337]
>  # If thirdparty is built on an ARM machine that has zlib installed, then 
> CONFIG_DEBUG_FRAME is true and HAVE_ZLIB is true and the uncompress() 
> reference is compiled in.
>  # The Kudu build doesn't know that libunwind needs zlib, so the list of 
> libraries linked in for protoc-gen-insertions doesn't include zlib.
> One potential fix is to add zlib as a dependency for libunwind for 
> ARM/aarch64. It might be worth compiling libunwind after zlib in thirdparty 
> so that it always has the zlib support on ARM.
> Reproducing steps on Ubuntu 20 ARM machine:
> {noformat}
> export DEBIAN_FRONTEND=noninteractive
> sudo DEBIAN_FRONTEND=noninteractive apt-get install -y autoconf automake curl 
> flex g++ gcc gdb git \
>   krb5-admin-server krb5-kdc krb5-user libkrb5-dev libsasl2-dev 
> libsasl2-modules \
>   libsasl2-modules-gssapi-mit libssl-dev libtool lsb-release make ntp \
>   openjdk-8-jdk openssl patch pkg-config python rsync unzip vim-common 
> libz-devrm -rf kudu
> mkdir kudu
> cd kudu
> git init
> git fetch "${KUDU_REPO_URL}"
> git fetch "${KUDU_REPO_URL}" "${KUDU_REPO_BRANCH}"
> git checkout FETCH_HEAD
> git rev-parse FETCH_HEADthirdparty/build-if-necessary.sh
>   
> mkdir -p build/release
> cd build/release
> ../../thirdparty/installed/common/bin/cmake -DCMAKE_BUILD_TYPE=release 
> -DNO_TESTS=1 ../..
> make -j{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KUDU-3475) Build on Ubuntu 20 ARM hits errors due to redeclaration of vld1q_u8_x4

2023-04-25 Thread Joe McDonnell (Jira)
Joe McDonnell created KUDU-3475:
---

 Summary: Build on Ubuntu 20 ARM hits errors due to redeclaration 
of vld1q_u8_x4
 Key: KUDU-3475
 URL: https://issues.apache.org/jira/browse/KUDU-3475
 Project: Kudu
  Issue Type: Bug
  Components: build
Affects Versions: 1.17.0
Reporter: Joe McDonnell


When building on an Ubunu 20 ARM machine, it now uses GCC 9.4, which includes a 
definition of vld1q_u8_x4. The build fails with messages like this:
{noformat}
20:28:53 In file included from 
/home/ubuntu/kudu/src/kudu/util/group_varint-inl.h:25,
20:28:53  from 
/home/ubuntu/kudu/src/kudu/util/group_varint.cc:18:
20:28:53 /home/ubuntu/kudu/src/kudu/util/sse2neon.h:184:27: error: 
‘uint8x16x4_t vld1q_u8_x4(const uint8_t*)’ redeclared inline without 
‘gnu_inline’ attribute
20:28:53   184 | FORCE_INLINE uint8x16x4_t vld1q_u8_x4(const uint8_t *p) {
20:28:53   |   ^~~
20:28:53 In file included from /home/ubuntu/kudu/src/kudu/util/sse2neon.h:66,
20:28:53  from 
/home/ubuntu/kudu/src/kudu/util/group_varint-inl.h:25,
20:28:53  from 
/home/ubuntu/kudu/src/kudu/util/group_varint.cc:18:
20:28:53 /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h:18122:1: note: 
‘uint8x16x4_t vld1q_u8_x4(const uint8_t*)’ previously defined here
20:28:53 18122 | vld1q_u8_x4 (const uint8_t *__a)
20:28:53   | ^~~{noformat}
There have been major changes in the logic of sse2neon.h over the past couple 
years. The code uses a different name to avoid collisions and it has more 
sophisticated version checks.

See these commits: 
[https://github.com/DLTcollab/sse2neon/commit/e96c9818e25f019629a6b96f62382d42179eab3c]

[https://github.com/DLTcollab/sse2neon/commit/26011f2ca7f22fd2b93b85fa84a2465ffc489710]

One possible fix is to update sse2neon to a more recent version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KUDU-3474) Build on Ubuntu 20 ARM fails if zlib is installed

2023-04-25 Thread Joe McDonnell (Jira)
Joe McDonnell created KUDU-3474:
---

 Summary: Build on Ubuntu 20 ARM fails if zlib is installed
 Key: KUDU-3474
 URL: https://issues.apache.org/jira/browse/KUDU-3474
 Project: Kudu
  Issue Type: Bug
  Components: build
Affects Versions: 1.17.0
Reporter: Joe McDonnell


Here is a quick note on a Kudu build failure I saw on an Ubuntu 20 ARM machine 
recently. The Kudu thirdparty builds fine, but then when we get to building 
Kudu it fails with this:

 
{noformat}
21:07:31 [  7%] Linking CXX executable ../../../bin/protoc-gen-insertions
21:07:31 
/home/ubuntu/kudu/thirdparty/src/libunwind-1.5.0/src/dwarf/Gfind_proc_info-lsb.c:140:
 error: undefined reference to 'uncompress'
21:07:32 collect2: error: ld returned 1 exit status
21:07:32 make[2]: *** 
[src/kudu/util/CMakeFiles/protoc-gen-insertions.dir/build.make:113: 
bin/protoc-gen-insertions] Error 1{noformat}
Here's what is going on:

 
 # libunwind's .debug_frame support is enabled for ARM/aarch64, but it is 
disabled for other platforms. 
[https://github.com/libunwind/libunwind/blob/master/configure.ac#L262-L276]
 # The .debug_frame support uses zlib uncompress if zlib is available. 
[https://github.com/libunwind/libunwind/blob/master/src/dwarf/Gfind_proc_info-lsb.c#L139-L168]
[https://github.com/libunwind/libunwind/blob/master/configure.ac#L322-L337]
 # If thirdparty is built on an ARM machine that has zlib installed, then 
CONFIG_DEBUG_FRAME is true and HAVE_ZLIB is true and the uncompress() reference 
is compiled in.
 # The Kudu build doesn't know that libunwind needs zlib, so the list of 
libraries linked in for protoc-gen-insertions doesn't include zlib.

One potential fix is to add zlib as a dependency for libunwind for ARM/aarch64. 
It might be worth compiling libunwind after zlib in thirdparty so that it 
always has the zlib support on ARM.

Reproducing steps on Ubuntu 20 ARM machine:
{noformat}
export DEBIAN_FRONTEND=noninteractive
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y autoconf automake curl 
flex g++ gcc gdb git \
  krb5-admin-server krb5-kdc krb5-user libkrb5-dev libsasl2-dev 
libsasl2-modules \
  libsasl2-modules-gssapi-mit libssl-dev libtool lsb-release make ntp \
  openjdk-8-jdk openssl patch pkg-config python rsync unzip vim-common 
libz-devrm -rf kudu
mkdir kudu
cd kudu
git init
git fetch "${KUDU_REPO_URL}"
git fetch "${KUDU_REPO_URL}" "${KUDU_REPO_BRANCH}"
git checkout FETCH_HEAD
git rev-parse FETCH_HEADthirdparty/build-if-necessary.sh
  
mkdir -p build/release
cd build/release
../../thirdparty/installed/common/bin/cmake -DCMAKE_BUILD_TYPE=release 
-DNO_TESTS=1 ../..
make -j{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KUDU-3461) Kudu client can blow the stack with infinite recursions between PickLeader() and LookupTabletByKey()

2023-03-20 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702895#comment-17702895
 ] 

Joe McDonnell commented on KUDU-3461:
-

>From what I can tell, this seems to be what happens:

MetaCacheServerPicker::PickLeader() goes through this chunk of logic without 
finding a leader 
([https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L461-L506])

Then, it calls into MetaCache::LookupTabletByKey() with LookUpTabletCb() as the 
callback function 
([https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L521-L529])

MetaCache::LookupTabletByKey() finds it in the hot path and calls the callback 
function LookUpTabletCb(). 
[https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L1405-L1410]

MetaCacheServerPicker::LookUpTabletCb() calls into PickLeader():

[https://github.com/apache/kudu/blob/master/src/kudu/client/meta_cache.cc#L585]

Repeat until the stack is gone.

> Kudu client can blow the stack with infinite recursions between PickLeader() 
> and LookupTabletByKey()
> 
>
> Key: KUDU-3461
> URL: https://issues.apache.org/jira/browse/KUDU-3461
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Priority: Blocker
>
> In an Impala cluster, we ran into a scenario that causes Impala to crash with 
> a SIGSEGV. When reproducing while running in gdb, we see the stack get blown 
> out with this recursion:
> {noformat}
> #0  0x7f983e031a1c in clock_gettime ()
> #1  0x7f983bfda0b5 in __GI___clock_gettime (clock_id=clock_id@entry=1, 
> tp=0x7f967bd8b070) at ../sysdeps/unix/sysv/linux/clock_gettime.c:38
> #2  0x7f983c9f8e48 in kudu::Stopwatch::GetTimes (times=0x7f967bd8b1b0, 
> this=, this=) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:294
> #3  0x7f983ca09829 in kudu::Stopwatch::stop (this=0x7f967bd8b320) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:218
> #4  kudu::Stopwatch::stop (this=0x7f967bd8b320) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:213
> #5  kudu::sw_internal::LogTiming::Print (max_expected_millis=50, 
> this=0x7f967bd8b320) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:359
> #6  kudu::sw_internal::LogTiming::~LogTiming (this=0x7f967bd8b320, 
> __in_chrg=) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:329
> #7  0x7f983c9fe32c in 
> kudu::client::internal::MetaCache::LookupEntryByKeyFastPath (this= out>, table=, partition_key=..., entry=0x7f967bd8b4c0) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/locks.h:99
> #8  0x7f983c9fe656 in kudu::client::internal::MetaCache::DoFastPathLookup 
> (this=0xde431e0, table=0xf899300, partition_key=0x7f967bd8b700, 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1243
> #9  0x7f983ca05731 in 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr*, std::function (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1405
> #10 0x7f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #11 0x7f983ca0575f in std::function const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8b8c0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #12 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr*, std::function (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #13 0x7f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec, callback=..., 

[jira] [Created] (KUDU-3461) Kudu client can blow the stack with infinite recursions between PickLeader() and LookupTabletByKey()

2023-03-20 Thread Joe McDonnell (Jira)
Joe McDonnell created KUDU-3461:
---

 Summary: Kudu client can blow the stack with infinite recursions 
between PickLeader() and LookupTabletByKey()
 Key: KUDU-3461
 URL: https://issues.apache.org/jira/browse/KUDU-3461
 Project: Kudu
  Issue Type: Bug
  Components: client
Affects Versions: 1.17.0
Reporter: Joe McDonnell


In an Impala cluster, we ran into a scenario that causes Impala to crash with a 
SIGSEGV. When reproducing while running in gdb, we see the stack get blown out 
with this recursion:
{noformat}
#0  0x7f983e031a1c in clock_gettime ()
#1  0x7f983bfda0b5 in __GI___clock_gettime (clock_id=clock_id@entry=1, 
tp=0x7f967bd8b070) at ../sysdeps/unix/sysv/linux/clock_gettime.c:38
#2  0x7f983c9f8e48 in kudu::Stopwatch::GetTimes (times=0x7f967bd8b1b0, 
this=, this=) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:294
#3  0x7f983ca09829 in kudu::Stopwatch::stop (this=0x7f967bd8b320) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:218
#4  kudu::Stopwatch::stop (this=0x7f967bd8b320) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:213
#5  kudu::sw_internal::LogTiming::Print (max_expected_millis=50, 
this=0x7f967bd8b320) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:359
#6  kudu::sw_internal::LogTiming::~LogTiming (this=0x7f967bd8b320, 
__in_chrg=) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:329
#7  0x7f983c9fe32c in 
kudu::client::internal::MetaCache::LookupEntryByKeyFastPath (this=, table=, partition_key=..., entry=0x7f967bd8b4c0) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/locks.h:99
#8  0x7f983c9fe656 in kudu::client::internal::MetaCache::DoFastPathLookup 
(this=0xde431e0, table=0xf899300, partition_key=0x7f967bd8b700, 
lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
remote_tablet=0x0)
    at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1243
#9  0x7f983ca05731 in 
kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
const*, kudu::PartitionKey, kudu::MonoTime const&, 
kudu::client::internal::MetaCache::LookupType, 
scoped_refptr*, std::function const&) (this=0xde431e0, table=0xf899300, 
partition_key=..., deadline=..., 
lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
remote_tablet=0x0, callback=...)
    at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1405
#10 0x7f983ca0598c in 
kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function const&, 
kudu::MonoTime const&) (this=0xdec, callback=..., deadline=...)
    at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
#11 0x7f983ca0575f in std::function::operator()(kudu::Status const&) const (__args#0=..., 
this=0x7f967bd8b8c0) at 
/mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
#12 
kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
const*, kudu::PartitionKey, kudu::MonoTime const&, 
kudu::client::internal::MetaCache::LookupType, 
scoped_refptr*, std::function const&) (this=0xde431e0, table=0xf899300, 
partition_key=..., deadline=..., 
lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
remote_tablet=0x0, callback=...) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
#13 0x7f983ca0598c in 
kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function const&, 
kudu::MonoTime const&) (this=0xdec, callback=..., deadline=...)
    at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
#14 0x7f983ca0575f in std::function::operator()(kudu::Status const&) const (__args#0=..., 
this=0x7f967bd8bad0) at 
/mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
#15 
kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
const*, kudu::PartitionKey, kudu::MonoTime const&, 
kudu::client::internal::MetaCache::LookupType, 
scoped_refptr*, std::function const&) (this=0xde431e0, table=0xf899300, 
partition_key=..., deadline=..., 
lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
remote_tablet=0x0, callback=...) at 
/mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
#16 0x7f983ca0598c in 
kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function const&, 
kudu::MonoTime const&) (this=0xdec, callback=..., deadline=...)
    at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
#17 0x7f983ca0575f in std::function::operator()(kudu::Status const&) const (__args#0=..., 
this=0x7f967bd8bce0) at 
/mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
#18 
kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
const*, kudu::PartitionKey, kudu::MonoTime const&, 
kudu::client::internal::MetaCache::LookupType, 
scoped_refptr*, std::function const&) (this=0xde431e0, 

[jira] [Resolved] (KUDU-3416) Building javadoc on non-Unicode system fails with "unmappable character for encoding ASCII"

2022-11-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved KUDU-3416.
-
Fix Version/s: 1.17.0
   Resolution: Fixed

> Building javadoc on non-Unicode system fails with "unmappable character for 
> encoding ASCII"
> ---
>
> Key: KUDU-3416
> URL: https://issues.apache.org/jira/browse/KUDU-3416
> Project: Kudu
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 1.17.0
>
>
> When building the Javadocs, I see the following error:
> {noformat}
> > Task :kudu-client:javadoc FAILED
> /mnt/source/kudu/kudu-892bda293f/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:926:
>  error: unmappable character for encoding ASCII
>* tables or all tables???i.e. soft deleted tables and regular 
> tables???{noformat}
> It corresponds to this comment:
> [https://github.com/apache/kudu/blob/master/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java#L926]
> It looks like the parentheses are Unicode characters?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KUDU-3416) Building javadoc on non-Unicode system fails with "unmappable character for encoding ASCII"

2022-11-02 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated KUDU-3416:

Summary: Building javadoc on non-Unicode system fails with "unmappable 
character for encoding ASCII"  (was: Building javadoc fails with "unmappable 
character for encoding ASCII")

> Building javadoc on non-Unicode system fails with "unmappable character for 
> encoding ASCII"
> ---
>
> Key: KUDU-3416
> URL: https://issues.apache.org/jira/browse/KUDU-3416
> Project: Kudu
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> When building the Javadocs, I see the following error:
> {noformat}
> > Task :kudu-client:javadoc FAILED
> /mnt/source/kudu/kudu-892bda293f/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:926:
>  error: unmappable character for encoding ASCII
>* tables or all tables???i.e. soft deleted tables and regular 
> tables???{noformat}
> It corresponds to this comment:
> [https://github.com/apache/kudu/blob/master/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java#L926]
> It looks like the parentheses are Unicode characters?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KUDU-3416) Building javadoc fails with "unmappable character for encoding ASCII"

2022-11-02 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned KUDU-3416:
---

Assignee: Joe McDonnell

> Building javadoc fails with "unmappable character for encoding ASCII"
> -
>
> Key: KUDU-3416
> URL: https://issues.apache.org/jira/browse/KUDU-3416
> Project: Kudu
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> When building the Javadocs, I see the following error:
> {noformat}
> > Task :kudu-client:javadoc FAILED
> /mnt/source/kudu/kudu-892bda293f/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:926:
>  error: unmappable character for encoding ASCII
>* tables or all tables???i.e. soft deleted tables and regular 
> tables???{noformat}
> It corresponds to this comment:
> [https://github.com/apache/kudu/blob/master/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java#L926]
> It looks like the parentheses are Unicode characters?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KUDU-3416) Building javadoc fails with "unmappable character for encoding ASCII"

2022-10-31 Thread Joe McDonnell (Jira)
Joe McDonnell created KUDU-3416:
---

 Summary: Building javadoc fails with "unmappable character for 
encoding ASCII"
 Key: KUDU-3416
 URL: https://issues.apache.org/jira/browse/KUDU-3416
 Project: Kudu
  Issue Type: Bug
  Components: java
Affects Versions: 1.17.0
Reporter: Joe McDonnell


When building the Javadocs, I see the following error:
{noformat}
> Task :kudu-client:javadoc FAILED
/mnt/source/kudu/kudu-892bda293f/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:926:
 error: unmappable character for encoding ASCII
   * tables or all tables???i.e. soft deleted tables and regular 
tables???{noformat}
It corresponds to this comment:

[https://github.com/apache/kudu/blob/master/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java#L926]

It looks like the parentheses are Unicode characters?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KUDU-3404) glog 0.6.0 increases the TLS usage of libkudu_client.so substantially

2022-10-04 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612735#comment-17612735
 ] 

Joe McDonnell commented on KUDU-3404:
-

It sounds like using thread local storage would avoid some memory allocations, 
so there may be a performance impact to setting WITH_TLS=OFF.

[~MikaelSmith] Impala is capable of changing our build steps for Kudu to handle 
it, but right now we would need to patch Kudu source. I'm definitely ok with 
some build setting that Impala can use which turns glog's WITH_TLS to OFF. It's 
up to Kudu whether WITH_TLS=ON is something that seems useful for Kudu server 
binaries.

> glog 0.6.0 increases the TLS usage of libkudu_client.so substantially
> -
>
> Key: KUDU-3404
> URL: https://issues.apache.org/jira/browse/KUDU-3404
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Assignee: Marton Greber
>Priority: Critical
> Attachments: 0001-Add-WITH_TLS-OFF-to-glog-build-definition.patch
>
>
> Glog 0.4 introduced supported for using thread local storage for its buffer. 
> This feature is controlled by the WITH_TLS CMake variable, and it defaults to 
> ON. See 
> [https://github.com/google/glog/commit/2df0ca34aa3000dadf76633ca700abf0bf50756d]
>  . When Kudu upgraded to glog 0.6.0 as part of the M1 fixes in "[KUDU-3374 
> Add support for M1 and macOS 
> Monterey|https://github.com/apache/kudu/commit/543e128d473f8f7836e605bba8cd6512fa918550];,
>  it increased the thread local storage usage by >3 bytes.
> {noformat}
> # Older libkudu_client.so has 0x100 = 256 bytes of TLS:
> $ readelf -l libkudu_client.so | grep "TLS" -A1
>   TLS            0x007d14c0 0x007d24c0 0x007d24c0
>                  0x0080 0x0100  R      0x40
> # Newer libkudu_client.so has 0x77b9 = 30649 bytes of TLS:
> $ readelf -l libkudu_client.so.0 | grep TLS -A1
>   TLS            0x00751280 0x00752280 0x00752280
>                  0x0080 0x77b9  R      40{noformat}
> This is a problem for Impala, because Impala starts a JVM. There are certain 
> JVM threads (like the "reaper thread") that have very small stacks (e.g. 
> 32KB) and with glibc the TLS space is allocated at the expense of stack 
> space. 30k of TLS usage leaves very little for the reaper thread. There are a 
> series of bugs where the Java reaper thread hits a StackOverflowException 
> because of high TLS usage. This can cause various symptoms including hangs.
> GLIBC message thread: [https://sourceware.org/bugzilla/show_bug.cgi?id=11787]
> JDK bugs:  
> [JDK-8217475|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8217475], 
> [JDK-8225035|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225035]
> To resolve Impala's problem, it would be useful to build libkudu_client.so 
> with glog's WITH_TLS=OFF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KUDU-3404) glog 0.6.0 increases the TLS usage of libkudu_client.so substantially

2022-10-03 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated KUDU-3404:

Attachment: 0001-Add-WITH_TLS-OFF-to-glog-build-definition.patch

> glog 0.6.0 increases the TLS usage of libkudu_client.so substantially
> -
>
> Key: KUDU-3404
> URL: https://issues.apache.org/jira/browse/KUDU-3404
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.17.0
>Reporter: Joe McDonnell
>Priority: Critical
> Attachments: 0001-Add-WITH_TLS-OFF-to-glog-build-definition.patch
>
>
> Glog 0.4 introduced supported for using thread local storage for its buffer. 
> This feature is controlled by the WITH_TLS CMake variable, and it defaults to 
> ON. See 
> [https://github.com/google/glog/commit/2df0ca34aa3000dadf76633ca700abf0bf50756d]
>  . When Kudu upgraded to glog 0.6.0 as part of the M1 fixes in "[KUDU-3374 
> Add support for M1 and macOS 
> Monterey|https://github.com/apache/kudu/commit/543e128d473f8f7836e605bba8cd6512fa918550];,
>  it increased the thread local storage usage by >3 bytes.
> {noformat}
> # Older libkudu_client.so has 0x100 = 256 bytes of TLS:
> $ readelf -l libkudu_client.so | grep "TLS" -A1
>   TLS            0x007d14c0 0x007d24c0 0x007d24c0
>                  0x0080 0x0100  R      0x40
> # Newer libkudu_client.so has 0x77b9 = 30649 bytes of TLS:
> $ readelf -l libkudu_client.so.0 | grep TLS -A1
>   TLS            0x00751280 0x00752280 0x00752280
>                  0x0080 0x77b9  R      40{noformat}
> This is a problem for Impala, because Impala starts a JVM. There are certain 
> JVM threads (like the "reaper thread") that have very small stacks (e.g. 
> 32KB) and with glibc the TLS space is allocated at the expense of stack 
> space. 30k of TLS usage leaves very little for the reaper thread. There are a 
> series of bugs where the Java reaper thread hits a StackOverflowException 
> because of high TLS usage. This can cause various symptoms including hangs.
> GLIBC message thread: [https://sourceware.org/bugzilla/show_bug.cgi?id=11787]
> JDK bugs:  
> [JDK-8217475|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8217475], 
> [JDK-8225035|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225035]
> To resolve Impala's problem, it would be useful to build libkudu_client.so 
> with glog's WITH_TLS=OFF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KUDU-3404) glog 0.6.0 increases the TLS usage of libkudu_client.so substantially

2022-10-03 Thread Joe McDonnell (Jira)
Joe McDonnell created KUDU-3404:
---

 Summary: glog 0.6.0 increases the TLS usage of libkudu_client.so 
substantially
 Key: KUDU-3404
 URL: https://issues.apache.org/jira/browse/KUDU-3404
 Project: Kudu
  Issue Type: Bug
  Components: client
Affects Versions: 1.17.0
Reporter: Joe McDonnell


Glog 0.4 introduced supported for using thread local storage for its buffer. 
This feature is controlled by the WITH_TLS CMake variable, and it defaults to 
ON. See 
[https://github.com/google/glog/commit/2df0ca34aa3000dadf76633ca700abf0bf50756d]
 . When Kudu upgraded to glog 0.6.0 as part of the M1 fixes in "[KUDU-3374 Add 
support for M1 and macOS 
Monterey|https://github.com/apache/kudu/commit/543e128d473f8f7836e605bba8cd6512fa918550];,
 it increased the thread local storage usage by >3 bytes.
{noformat}
# Older libkudu_client.so has 0x100 = 256 bytes of TLS:
$ readelf -l libkudu_client.so | grep "TLS" -A1
  TLS            0x007d14c0 0x007d24c0 0x007d24c0
                 0x0080 0x0100  R      0x40

# Newer libkudu_client.so has 0x77b9 = 30649 bytes of TLS:
$ readelf -l libkudu_client.so.0 | grep TLS -A1
  TLS            0x00751280 0x00752280 0x00752280
                 0x0080 0x77b9  R      40{noformat}
This is a problem for Impala, because Impala starts a JVM. There are certain 
JVM threads (like the "reaper thread") that have very small stacks (e.g. 32KB) 
and with glibc the TLS space is allocated at the expense of stack space. 30k of 
TLS usage leaves very little for the reaper thread. There are a series of bugs 
where the Java reaper thread hits a StackOverflowException because of high TLS 
usage. This can cause various symptoms including hangs.

GLIBC message thread: [https://sourceware.org/bugzilla/show_bug.cgi?id=11787]

JDK bugs:  
[JDK-8217475|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8217475], 
[JDK-8225035|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225035]

To resolve Impala's problem, it would be useful to build libkudu_client.so with 
glog's WITH_TLS=OFF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-03-22 Thread Joe McDonnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409830#comment-16409830
 ] 

Joe McDonnell commented on KUDU-2086:
-

[~tlipcon] Good point, I changed this to an Improvement and dropped the 
priority.

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-03-22 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated KUDU-2086:

Priority: Major  (was: Critical)

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2305) Local variables can overflow when serializing a 2GB message

2018-03-09 Thread Joe McDonnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393792#comment-16393792
 ] 

Joe McDonnell commented on KUDU-2305:
-

Other thought: 
InboundCall::AddOutboundSidecar()/RpcController::AddOutboundSidecar() already 
return Status and are used to construct the list of sidecars. If those would 
fail when the total size of the sidecars reaches INT_MAX, then we would never 
even get to 
OutboundCall::SetRequestPayload()/InboundCall::SerializeResponseBuffer(). 
Essentially, no one could construct a message that is too big. We already limit 
the total number of sidecars. It makes sense to limit the total size of the 
sidecars.

> Local variables can overflow when serializing a 2GB message
> ---
>
> Key: KUDU-2305
> URL: https://issues.apache.org/jira/browse/KUDU-2305
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.6.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 1.7.0
>
>
> When rpc_max_message_size is set to its maximum of INT_MAX (2147483647), 
> certain local variables in SerializeMessage can overflow as messages approach 
> this size. Specifically, recorded_size, size_with_delim, and total_size are 4 
> byte signed integers and could overflow when additional_size becomes large.
> Since INT_MAX is the largest allowable value for rpc_max_message_size (a 4 
> byte signed integer), these variables will not overflow if changed to 4 byte 
> unsigned integers. This would eliminate the potential problem for 
> serialization.
> A similar problem exists in the InboundTransfer::ReceiveBuffer() and similar 
> codepaths. Changing those variables to unsigned integers should resolve the 
> issue.
> This does not impact existing systems, because the default value of 
> rpc_max_message_size is 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2305) Local variables can overflow when serializing a 2GB message

2018-03-09 Thread Joe McDonnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393738#comment-16393738
 ] 

Joe McDonnell commented on KUDU-2305:
-

The rpc header encodes the sidecar offsets with a uint32. So, we know that it 
is impossible to correctly serialize a message larger than UINT_MAX, because 
the sidecar offsets will wrap. It is also pointless because the receiver would 
always reject a message larger than INT_MAX anyway. One way to fix this is for 
the code in OutboundCall:SetRequestPayload() and 
InboundCall::SerializeResponseBuffer() to fail up front if the sidecars are too 
large. This would avoid the serialization and transfer codepath. This would 
require allowing these functions to return Status and routing it correctly.

> Local variables can overflow when serializing a 2GB message
> ---
>
> Key: KUDU-2305
> URL: https://issues.apache.org/jira/browse/KUDU-2305
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.6.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 1.7.0
>
>
> When rpc_max_message_size is set to its maximum of INT_MAX (2147483647), 
> certain local variables in SerializeMessage can overflow as messages approach 
> this size. Specifically, recorded_size, size_with_delim, and total_size are 4 
> byte signed integers and could overflow when additional_size becomes large.
> Since INT_MAX is the largest allowable value for rpc_max_message_size (a 4 
> byte signed integer), these variables will not overflow if changed to 4 byte 
> unsigned integers. This would eliminate the potential problem for 
> serialization.
> A similar problem exists in the InboundTransfer::ReceiveBuffer() and similar 
> codepaths. Changing those variables to unsigned integers should resolve the 
> issue.
> This does not impact existing systems, because the default value of 
> rpc_max_message_size is 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2305) Local variables can overflow when serializing a 2GB message

2018-02-16 Thread Joe McDonnell (JIRA)
Joe McDonnell created KUDU-2305:
---

 Summary: Local variables can overflow when serializing a 2GB 
message
 Key: KUDU-2305
 URL: https://issues.apache.org/jira/browse/KUDU-2305
 Project: Kudu
  Issue Type: Bug
  Components: rpc
Affects Versions: 1.6.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


When rpc_max_message_size is set to its maximum of INT_MAX (2147483647), 
certain local variables in SerializeMessage can overflow as messages approach 
this size. Specifically, recorded_size, size_with_delim, and total_size are 4 
byte signed integers and could overflow when additional_size becomes large.

Since INT_MAX is the largest allowable value for rpc_max_message_size (a 4 byte 
signed integer), these variables will not overflow if changed to 4 byte 
unsigned integers. This would eliminate the potential problem for serialization.

A similar problem exists in the InboundTransfer::ReceiveBuffer() and similar 
codepaths. Changing those variables to unsigned integers should resolve the 
issue.

This does not impact existing systems, because the default value of 
rpc_max_message_size is 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2296) Kudu RPC cannot deserialize messages larger than 64MB

2018-02-13 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved KUDU-2296.
-
   Resolution: Fixed
Fix Version/s: 1.7.0

commit dc3543300ebbe8121ec0a96b7076f69b46dc9868
Author: Joe McDonnell 
Date: Tue Feb 13 13:53:33 2018 -0800

KUDU-2296: Fix deserialization of messages larger than 64MB
 
 Protobuf's CodedInputStream has a 64MB total byte limit by
 default. When trying to deserialize messages larger than
 this, ParseMessage() hits this limit and mistakenly
 think that the packet is too short. This issue is dormant
 due to Kudu's default rpc_max_message_size of 50MB.
 However, Impala will be using a larger value for
 rpc_max_message_size and requires this fix.
 
 The fix is to override the default 64MB limit by calling
 CodedInputStream::SetTotalByteLimit() with the buffer's
 size.
 
 Change-Id: I57d3f3ca6ec0aa8be0e67e6a13c4b560c9d2c63a

> Kudu RPC cannot deserialize messages larger than 64MB
> -
>
> Key: KUDU-2296
> URL: https://issues.apache.org/jira/browse/KUDU-2296
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.6.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: 1.7.0
>
>
> Impala has been testing Kudu RPC with a larger value for 
> rpc_max_message_size. I noticed that when the message size exceeds 64MB, 
> rpc::serialization::ParseMessage() hits this condition:
> {code:java}
> if (PREDICT_FALSE(!in.Skip(main_msg_len))) {
>   return Status::Corruption(
>   StringPrintf("Invalid packet: data too short, expected %d byte 
> main_msg", main_msg_len),
>   KUDU_REDACT(buf.ToDebugString()));
> }
> {code}
> The actual buffer is the appropriate size. What is happening is that protobuf 
> imposes a 64MB total byte limit by default. Once a message exceeds that, the 
> Skip() call will return false when trying to go past the 64MB limit. The 
> deserialization code can get around this by setting the total byte limit with 
> CodedInputSTream::SetTotalByteLimit().
> This should not impact existing systems at the moment, because the default 
> value for rpc_max_message_size is 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2296) Kudu RPC cannot deserialize messages larger than 64MB

2018-02-13 Thread Joe McDonnell (JIRA)
Joe McDonnell created KUDU-2296:
---

 Summary: Kudu RPC cannot deserialize messages larger than 64MB
 Key: KUDU-2296
 URL: https://issues.apache.org/jira/browse/KUDU-2296
 Project: Kudu
  Issue Type: Bug
  Components: rpc
Affects Versions: 1.6.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Impala has been testing Kudu RPC with a larger value for rpc_max_message_size. 
I noticed that when the message size exceeds 64MB, 
rpc::serialization::ParseMessage() hits this condition:
{code:java}
if (PREDICT_FALSE(!in.Skip(main_msg_len))) {
  return Status::Corruption(
  StringPrintf("Invalid packet: data too short, expected %d byte main_msg", 
main_msg_len),
  KUDU_REDACT(buf.ToDebugString()));
}
{code}
The actual buffer is the appropriate size. What is happening is that protobuf 
imposes a 64MB total byte limit by default. Once a message exceeds that, the 
Skip() call will return false when trying to go past the 64MB limit. The 
deserialization code can get around this by setting the total byte limit with 
CodedInputSTream::SetTotalByteLimit().

This should not impact existing systems at the moment, because the default 
value for rpc_max_message_size is 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2296) Kudu RPC cannot deserialize messages larger than 64MB

2018-02-13 Thread Joe McDonnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362963#comment-16362963
 ] 

Joe McDonnell commented on KUDU-2296:
-

Working on a patch

> Kudu RPC cannot deserialize messages larger than 64MB
> -
>
> Key: KUDU-2296
> URL: https://issues.apache.org/jira/browse/KUDU-2296
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.6.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Impala has been testing Kudu RPC with a larger value for 
> rpc_max_message_size. I noticed that when the message size exceeds 64MB, 
> rpc::serialization::ParseMessage() hits this condition:
> {code:java}
> if (PREDICT_FALSE(!in.Skip(main_msg_len))) {
>   return Status::Corruption(
>   StringPrintf("Invalid packet: data too short, expected %d byte 
> main_msg", main_msg_len),
>   KUDU_REDACT(buf.ToDebugString()));
> }
> {code}
> The actual buffer is the appropriate size. What is happening is that protobuf 
> imposes a 64MB total byte limit by default. Once a message exceeds that, the 
> Skip() call will return false when trying to go past the 64MB limit. The 
> deserialization code can get around this by setting the total byte limit with 
> CodedInputSTream::SetTotalByteLimit().
> This should not impact existing systems at the moment, because the default 
> value for rpc_max_message_size is 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-01-18 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated KUDU-2086:

Priority: Critical  (was: Blocker)

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Critical
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-01-18 Thread Joe McDonnell (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331492#comment-16331492
 ] 

Joe McDonnell commented on KUDU-2086:
-

[~tlipcon] Good point, this is not a blocker. I will lower the priority.

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Blocker
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-01-02 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated KUDU-2086:

Attachment: krpc_hash_test.c

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Blocker
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2017-12-08 Thread Joe McDonnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell reassigned KUDU-2086:
---

Assignee: Joe McDonnell  (was: Michael Ho)

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Blocker
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-1880) IS NULL predicates eliminate some NULL values

2017-02-14 Thread Joe McDonnell (JIRA)
Joe McDonnell created KUDU-1880:
---

 Summary: IS NULL predicates eliminate some NULL values
 Key: KUDU-1880
 URL: https://issues.apache.org/jira/browse/KUDU-1880
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Joe McDonnell


Impala is implementing IS NULL/IS NOT NULL Kudu predicates as part of 
IMPALA-4859 (see review: https://gerrit.cloudera.org/#/c/5958/ ). In testing, 
Kudu IS NULL is eliminating valid NULL values from results returned.

Here is an example:
select id, float_col from functional_kudu.alltypesagg where id < 10;
| id | float_col |
| 3  | 3.29952316284 |
| 7  | 7.69809265137 |
| 0  | NULL  |
| 6  | 6.59904632568 |
| 8  | 8.80190734863 |
| 9  | 9.89618530273 |
| 0  | NULL  |
| 1  | 1.10023841858 |
| 2  | 2.20047683716 |
| 4  | 4.40095367432 |
| 5  | 5.5   |
Fetched 11 row(s) in 0.57s

When adding an IS NULL condition on float_col, this does not return any rows.
select id, float_col from functional_kudu.alltypesagg where id < 10 and 
float_col is null;
Fetched 0 row(s) in 0.25s

This is also true for other tables, such as functional_kudu.nulltable.

select * from functional_kudu.nulltable;
| a | b | c| d| e| f  | g |
| a |   | NULL | NULL | NULL | ab |   |

Fetched 1 row(s) in 0.49s

The following SQLs return no rows:
select * from functional_kudu.nulltable where c is null;
select * from functional_kudu.nulltable where d is null;
select * from functional_kudu.nulltable where e is null;

Impala statistics indicate that Kudu is not returning any rows. IS NOT NULL 
seems to work correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)