[ 
https://issues.apache.org/jira/browse/KUDU-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315958#comment-15315958
 ] 

Todd Lipcon commented on KUDU-1472:
-----------------------------------

In this section of the stack, something seems to be wrong:
{code}

#2 kudu::BlockId::CopyToPB (this=this@entry=0x42a70848, pb=0xd00) at 
/export/ldb/kudu_build/kudu-gitlab/src/kudu/fs/block_id.cc:44
#3 0x00000000008e7e9b in kudu::tablet::RowSetMetadata::ToProtobuf 
(this=0x42a70820, pb=0x1234fe100) at 
/export/ldb/kudu_build/kudu-gitlab/src/kudu/tablet/rowset_metadata.cc:129
#4 0x00000000008e208f in kudu::tablet::TabletMetadata::ToSuperBlockUnlocked 
(this=this@entry=0x42ab6480, super_block=super_block@entry=0x7fe0bcc8cdf0, 
rowsets=...)
{code}

Notice the address 'pb=0xd00' in frame #2. This is obviously a bad pointer, but 
I can't figure out how it got that way. The 'pb' argument in frame 3 might be 
correct and might not be. Do you think it would be possible to run a binary 
instrumented with ASAN on your test machines?

Also do you think there is any possibility that your machine has bad RAM?

> kudu-tserver crash unexpected
> -----------------------------
>
>                 Key: KUDU-1472
>                 URL: https://issues.apache.org/jira/browse/KUDU-1472
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: zhangsong
>            Priority: Critical
>
> kudu-tserver will crash under some case, in jd.com 200-node env, it occurring 
> frequently.
> some crash   info  from core file:
> (gdb) bt
> #0  0x0000000000a2489f in kudu::tablet::RowSetDataPB::SharedDtor 
> (this=0x58fb5b180)
>    at /export/ldb/kudu-master/build/release/src/kudu/tablet/metadata.pb.cc:815
> #1  kudu::tablet::RowSetDataPB::~RowSetDataPB (this=0x58fb5b180, 
> __in_chrg=<optimized out>)
>    at /export/ldb/kudu-master/build/release/src/kudu/tablet/metadata.pb.cc:809
> #2  kudu::tablet::RowSetDataPB::~RowSetDataPB (this=0x58fb5b180, 
> __in_chrg=<optimized out>)
>    at /export/ldb/kudu-master/build/release/src/kudu/tablet/metadata.pb.cc:810
> #3  
> google::protobuf::internal::GenericTypeHandler<kudu::tablet::RowSetDataPB>::Delete
>  (value=0x58fb5b180)
>    at 
> /export/ldb/kudu-master/thirdparty/installed-deps/include/google/protobuf/repeated_field.h:363
> #4  
> google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<kudu::tablet::RowSetDataPB>::TypeHandler>
>  (
>    this=<optimized out>, this=<optimized out>) at 
> /export/ldb/kudu-master/thirdparty/installed-deps/include/google/protobuf/repeated_field.h:869
> Backtrace stopped: Cannot access memory at address 0x7fc1f230fd08
> after crash , kudu-tserver will not be restarted successfully, due to some pb 
> validation  check failed, for example:
>  check failed: _s.ok() Bad status: IO error: Could not init Tablet Manager: 
> Failed to open tablet metadata for tablet: 260359a41a134c1f91631e9094847bcf: 
> Failed to load tablet metadata for tablet id 
> 260359a41a134c1f91631e9094847bcf: Could not load tablet metadata from 
> /export/servers/kudu/tserver_data_7052/tablet-meta/260359a41a134c1f91631e9094847bcf:
>  Unable to parse PB from path: 
> /export/servers/kudu/tserver_data_7052/tablet-meta/260359a41a134c1f91631e9094847bcf
> kudu version is 0.9.0-snapshot, last commit id :  
> be10f8514c48950b64c7d59bbce848f3792ec52d 
> workload is: several write tasks  keeps inserting into kudu table, some task 
> using java api, while others using impala.
> kudu-table will be scanned while whose tasks are running.
> almost everyday there will be a crash case. same phenomenon as described 
> above. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to