[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-29 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

> (5 comments)
 >
 > >When I run Impala-TSAN locally, it uses the TSAN code under:
 > '/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'.
 > I would guess it should be using the LLVM from the toolchain?
 > Although that class doesn't seem to exist anywhere in the
 > toolchain/ dir.
 >
 > I think that might be where the source was mounted during the
 > toolchain build?

The same thing happens with ASAN 
(/mnt/source/llvm/llvm-5.0.1.src-p1/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:67).
 So I guess its okay.

 >
 > >I'm a bit confused about running TSAN builds with/without the
 > -build_shared_libs option and how TSAN interacts with static vs
 > dynamic linking (trying to mimic Kudu's TSAN usage
 > https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385)
 > I don't think I know any more
 >
 > >According to the TSAN docs, everything (including third party
 > libs) should be build with -fsanitize=thread 
 > (https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code).
 > Not sure if we are doing that in Impala?
 > We're not, the toolchain is just a regular release build.

Might consider doing this in a follow up JIRA, seems like a considerable amount 
of effort. It seems we can surface a lot of race conditions using TSAN without 
this.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jan 2020 01:54:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

> > Does ignore_noninstrumented_modules work on Linux? Looking at the
 > > LLVM source, https://reviews.llvm.org/D61708 doesn't look like
 > it's
 > > been merged.
 > >
 > > FWIW, it didn't exist (and certainly didn't work on Linux) when
 > > Kudu's TSAN support was added, which is why we went the other
 > > direction and recompile all of our dependencies with
 > > -fsanitize=thread if doing a TSAN build.
 >
 > Maybe https://reviews.llvm.org/D28263 fixed it?
 >
 > It definitely does something on my dev machine, which is running
 > Ubuntu 16.04.

mm ignore that review link. I'll try to see if I can find the patch that fixed 
it, but confirmed it works on my dev machine.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 22:32:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

> Does ignore_noninstrumented_modules work on Linux? Looking at the
 > LLVM source, https://reviews.llvm.org/D61708 doesn't look like it's
 > been merged.
 >
 > FWIW, it didn't exist (and certainly didn't work on Linux) when
 > Kudu's TSAN support was added, which is why we went the other
 > direction and recompile all of our dependencies with
 > -fsanitize=thread if doing a TSAN build.

Maybe https://reviews.llvm.org/D28263 fixed it?

It definitely does something on my dev machine, which is running Ubuntu 16.04.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 22:30:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

Does ignore_noninstrumented_modules work on Linux? Looking at the LLVM source, 
https://reviews.llvm.org/D61708 doesn't look like it's been merged.

FWIW, it didn't exist (and certainly didn't work on Linux) when Kudu's TSAN 
support was added, which is why we went the other direction and recompile all 
of our dependencies with -fsanitize=thread if doing a TSAN build.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 22:01:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

(5 comments)

>When I run Impala-TSAN locally, it uses the TSAN code under: 
> '/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'.
>  I would guess it should be using the LLVM from the toolchain? Although that 
> class doesn't seem to exist anywhere in the toolchain/ dir.

I think that might be where the source was mounted during the toolchain build?

>I'm a bit confused about running TSAN builds with/without the 
> -build_shared_libs option and how TSAN interacts with static vs dynamic 
> linking (trying to mimic Kudu's TSAN usage 
> https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385)
I don't think I know any more

>According to the TSAN docs, everything (including third party libs) should 
> be build with -fsanitize=thread 
> (https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code).
>  Not sure if we are doing that in Impala?
We're not, the toolchain is just a regular release build.

http://gerrit.cloudera.org:8080/#/c/15116/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15116/1//COMMIT_MSG@10
PS1, Line 10: existing -tsan build flag. -tsan_full is equivalent to the 
current -tsan
Is -tsan_full useful? Wonder if we should just get rid of it.


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/reservation-tracker.h
File be/src/runtime/bufferpool/reservation-tracker.h:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/reservation-tracker.h@316
PS1, Line 316:   AtomicInt64 reservation_;
I think these members need some explanation that they can be read without 
holding the lock (otherwise it's confusing whether code should be holding the 
lock or not).


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/system-allocator.cc
File be/src/runtime/bufferpool/system-allocator.cc:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/bufferpool/system-allocator.cc@124
PS1, Line 124: #if defined(ADDRESS_SANITIZER) || defined(THREAD_SANITIZER)
Did you run into this bug? I think it's meant to be fixed: 
https://bugs.llvm.org/show_bug.cgi?id=32968


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator-backend-state.cc@179
PS1, Line 179:   unique_lock lock(lock_);
Could this cause a deadlock if we exit from one of the points below where 
'lock_' is held?. I think maybe not since the destructor for lock_guard below 
runs first. But is there a way we can make the correctness more obvious?


http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/runtime/coordinator.h@292
PS1, Line 292:   AtomicBool has_called_wait_;  // if true, Wait() was called
I think the writer should still hold wait_lock_?



--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 21:37:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5536/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 19:07:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

A few things I could use input on:

* When I run Impala-TSAN locally, it uses the TSAN code under: 
'/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc'.
 I would guess it should be using the LLVM from the toolchain? Although that 
class doesn't seem to exist anywhere in the toolchain/ dir.

* I'm a bit confused about running TSAN builds with/without the 
-build_shared_libs option and how TSAN interacts with static vs dynamic linking 
(trying to mimic Kudu's TSAN usage 
https://github.com/apache/kudu/blob/master/CMakeLists.txt#L385)

* According to the TSAN docs, everything (including third party libs) should be 
build with -fsanitize=thread 
(https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code).
 Not sure if we are doing that in Impala?


--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 28 Jan 2020 18:37:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15116 )

Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/15116/1/be/src/common/init.cc@423
PS1, Line 423:   // 
https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code),
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jan 2020 18:23:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5904: Add tsan full option and fix several TSAN bugs

2020-01-28 Thread Sahil Takiar (Code Review)
Sahil Takiar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15116


Change subject: IMPALA-5904: Add tsan_full option and fix several TSAN bugs
..

IMPALA-5904: Add tsan_full option and fix several TSAN bugs

This patch adds an additional build flag -tsan_full in addition to the
existing -tsan build flag. -tsan_full is equivalent to the current -tsan
behavior, and -tsan is changed to set the ignore_noninstrumented_modules
flag to true. ignore_noninstrumented_modules causes TSAN to ignore any
modules that are not TSAN-instrumented. This is necessary to get TSAN to
play nicely with Java, since Java is not TSAN-instrumented (see
https://wiki.openjdk.java.net/display/tsan/Main and JDK-8208520). While
this might decrease the number of issues surfaced by TSAN, it drastically
decreases the amount of noise produced by TSAN because the JVM is not
running TSAN-instrumented code. Without this flag set to true, almost
every single backend test fails with the error:

WARNING: ThreadSanitizer: data race (pid=12939)
  Write of size 1 at 0x7fcbe379c4c6 by thread T31:
#0 strncpy 
/mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:650
 (unifiedbetests+0x1b2a4ad)
#1   (libjvm.so+0x90e706)

This patch fixes various TSAN bugs (e.g. data races) reported while
running backend tests and E2E against a TSAN build (it does not make
Impala completely TSAN-clean). This patch makes the following changes:
* Fixes several bugs involving issues with updating shared variables
  between threads
* Fixes a few race conditions in test classes
* Where possible, existing locks are used to fix any data races; in cases
  where the locking logic is non-trivial, atomics are used
* There are a few places where variables are marked as 'volatile'
  presumably for synchronization purposes; TSAN flags these 'volatile'
  variables as unsafe, and according to
  
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rconc-volatile
  using 'volatile' for synchronization is dangerous; in these cases, the
  'volatile' variables are changed to 'atomic' variables
* This patch adds a suppression file (bin/tsan-suppresions.txt) similar to
  the UBSAN suppresion file (bin/ubsan-suppresions.txt)

Testing:
* Ran exhaustive tests
* Manually re-ran backend tests against a TSAN build and made sure the
reported errors are gone

Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/common/init.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/rpc/thrift-thread.cc
M be/src/rpc/thrift-thread.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M be/src/runtime/bufferpool/reservation-tracker.h
M be/src/runtime/bufferpool/system-allocator.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-stress.h
M be/src/runtime/io/scan-range.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/util/runtime-profile-test.cc
M be/src/util/stopwatch.h
M bin/run-backend-tests.sh
A bin/tsan-suppressions.txt
M buildall.sh
M tests/common/environ.py
M tests/webserver/test_web_pages.py
30 files changed, 199 insertions(+), 94 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/15116/1
--
To view, visit http://gerrit.cloudera.org:8080/15116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Gerrit-Change-Number: 15116
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar