gamezhoulei opened a new pull request, #3282:
URL: https://github.com/apache/brpc/pull/3282

   ### What problem does this PR solve?
   
   Issue Number: resolve
   
   Problem Summary:
   
   `bvar::read_proc_status()` may be sampled while default bvars are initialized
   before `main()`. If reading or parsing `/proc/self/stat` fails in that phase,
   the current code logs the failure with `PLOG`/`PLOG_ONCE`. For applications 
that
   initialize glog in `main()`, this can call glog before it is ready and crash
   during static initialization.
   
   The crash happens before entering `main()`, so application code has no 
chance to
   call `google::InitGoogleLogging()` first.
   
   One observed stack of the crashing thread is:
   
   ```text
   Program terminated with signal SIGSEGV, Segmentation fault.
   #0  0x0000000001c67520 in google::LogMessage::Init(char const*, int, int, 
void (google::LogMessage::*)()) ()
   #1  0x0000000001c68291 in google::ErrnoLogMessage::ErrnoLogMessage(char 
const*, int, int, long, void (google::LogMessage::*)()) ()
   #2  0x000000000186b771 in bvar::read_proc_status(bvar::ProcStat&) ()
   #3  0x000000000186fad8 in unsigned long 
bvar::ProcStatReader::get_field<unsigned long, 32ul>(void*) ()
   #4  0x000000000186d156 in 
bvar::detail::ReducerSampler<bvar::PassiveStatus<unsigned long>, unsigned long, 
bvar::detail::AddTo<unsigned long>, bvar::detail::MinusFrom<unsigned long> 
>::take_sample() ()
   #5  0x000000000187148c in bvar::PerSecond<bvar::PassiveStatus<unsigned long> 
>::PerSecond(butil::BasicStringPiece<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > const&, 
bvar::PassiveStatus<unsigned long>*) ()
   #6  0x0000000001871b11 in __static_initialization_and_destruction_0(int, 
int) [clone .constprop.0] ()
   #7  0x00007f40f3a8073b in __libc_start_main_impl ()
   #8  0x0000000000a5a665 in _start ()
   ```
   
   Another thread was already the bvar sampler thread, which shows that the 
crash
   was triggered by the default bvar sampling path:
   
   ```text
   Thread 1:
   #0  google::LogMessage::Init(...)
   #1  google::ErrnoLogMessage::ErrnoLogMessage(...)
   #2  bvar::read_proc_status(...)
   #3  bvar::ProcStatReader::get_field<unsigned long, 32ul>(...)
   #4  bvar::detail::ReducerSampler<...>::take_sample()
   #5  bvar::PerSecond<...>::PerSecond(...)
   #6  __static_initialization_and_destruction_0(...)
   #7  __libc_start_main_impl()
   #8  _start()
   
   Thread 2:
   #0  clock_nanosleep@GLIBC_2.2.5()
   #1  nanosleep()
   #2  usleep()
   #3  bvar::detail::SamplerCollector::run()
   #4  bvar::detail::SamplerCollector::sampling_thread(void*)
   #5  start_thread()
   #6  clone3()
   ```
   
   The faulting instruction dereferenced a null glog flag pointer while 
initializing
   a log message:
   
   ```asm
   0x1c67519 <google::LogMessage::Init+665>:
     mov    0x150e468(%rip),%rdx        # fLS::FLAGS_log_backtrace_at[abi:cxx11]
   => 0x1c67520 <google::LogMessage::Init+672>:
     cmpq   $0x0,0x8(%rdx)
   ```
   
   The corresponding variable value in the core was null:
   
   ```text
   fLS::FLAGS_log_backtrace_at[abi:cxx11]: 0x0000000000000000
   ```
   
   The effective trigger path is:
   
   ```text
   default bvar construction
   -> PerSecond construction
   -> ReducerSampler::take_sample()
   -> PassiveStatus::get_value()
   -> read_proc_status()
   -> PLOG/PLOG_ONCE
   -> google::LogMessage::Init()
   -> dereference uninitialized glog flag
   -> SIGSEGV
   ```
   
   This is similar to the static-initialization crash fixed for 
`read_proc_io()` in
   #3184, but this path goes through `/proc/self/stat`.
   
   ### What is changed and the side effects?
   
   Changed:
   
   - Replace the Linux `/proc/self/stat` open-failure `PLOG_ONCE` in
     `read_proc_status()` with a one-time `fprintf(stderr, ...)` warning.
   - Replace the Linux `/proc/self/stat` parse-failure `PLOG` in
     `read_proc_status()` with `fprintf(stderr, ...)`.
   - Keep the existing return values and bvar sampling behavior unchanged.
   
   Side effects:
   - Performance effects: none for the normal path; the change only affects 
failure
     paths.
   - Breaking backward compatibility: no.
   
   ---
   ### Check List:
   - Please make sure your changes are compilable.
   - When providing us with a new feature, it is best to add related tests.
   - Please follow [Contributor Covenant Code of 
Conduct](https://github.com/apache/brpc/blob/master/CODE_OF_CONDUCT.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to