This error is more "sane" than the previous one, as the DB operation is
retried and finally achieved, instead of just being dropped.
However, there is definitely something wrong with the DB locking.
We also see these deadlock errors on our systems.
I first suspected an issue with the robinhood internal mecanism to avoid
concurrent requests on the same id,
checked it all the afternoon, and found no issue.
I seams to be a side effet of the request batching on MySQL locking
strategy,
with a wrong effet when several threads perform batched requests
simultaneously.
As robinhood prevents from simultaneous operations on the same primary key,
perhaps the locking issue is more about other indexes (like parent_id in
the names table).
What would help for a deeper investigation is the detailed description
of the DB deadlock.
The last deadlock description is displayed by following SQL command:
SHOW ENGINE INNODB STATUS ;
The interesting section is "LATEST DETECTED DEADLOCK"
Thomas
On 12/17/13 20:34, Brock Palen wrote:
> So I set:
>
> innodb_lock_wait_timeout=3600
>
> Started a new scan and I am still getting the errors,
>
> 2013/12/17 14:33:35 robinhood@flux-xfer1[30211/13]: ListMgr | Deadlock during
> DB request 'INSERT INTO
> ENTRIES(id,owner,gr_name,size,blocks,last_access,last_mod,type,mode,nlink,md_update,invalid,release_class,rel_cl_update)
> VALUES
> ('483A93D8:45A3873D','gvr','screms',2807,8,1387229860,1387229860,'file',420,1,1387308576,0,'@default@',1387308577),('483A93D8:45A38E5A','gvr','screms',3618,8,1387230001,1387230001,'file',420,1,1387308576,0,'@default@',1387308577)'
> after 0 sec. Retrying in 1 sec.
>
>
> after 0 seconds? None of the querries show taking more than a few seconds,
> and still the log is getting constant input of messages like the above.
>
> mysql> show processlist;
> +-------+-----------+-----------+--------------------+---------+------+--------+------------------------------------------------------------------------------------------------------+
> | Id | User | Host | db | Command | Time | State
> | Info
> |
> +-------+-----------+-----------+--------------------+---------+------+--------+------------------------------------------------------------------------------------------------------+
> | 17923 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17924 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17925 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17926 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17927 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17928 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17929 | robinhood | localhost | robinhood_brockp | Sleep | 1782 |
> | NULL
> |
> | 17930 | robinhood | localhost | robinhood_brockp | Sleep | 1781 |
> | NULL
> |
> | 17931 | robinhood | localhost | robinhood_brockp | Sleep | 120 |
> | NULL
> |
> | 17936 | robinhood | localhost | robinhood_nobackup | Query | 4 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:73495EE','1F02CF0C:734934B',' |
> | 17937 | robinhood | localhost | robinhood_nobackup | Sleep | 2 |
> | NULL
> |
> | 17938 | robinhood | localhost | robinhood_nobackup | Query | 3 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:731205A','1F02CF0C:7312058',' |
> | 17939 | robinhood | localhost | robinhood_nobackup | Query | 27 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:7349700','1F02CF0C:73496F2',' |
> | 17940 | robinhood | localhost | robinhood_nobackup | Query | 28 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:73CA236','1F02CF0C:73CA0A3',' |
> | 17941 | robinhood | localhost | robinhood_nobackup | Sleep | 2 |
> | NULL
> |
> | 17942 | robinhood | localhost | robinhood_nobackup | Query | 15 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:73CA065','1F02CF0C:73C9E17',' |
> | 17943 | robinhood | localhost | robinhood_nobackup | Sleep | 2 |
> | NULL
> |
> | 17944 | robinhood | localhost | robinhood_nobackup | Query | 4 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:7348738','1F02CF0C:734872D',' |
> | 17945 | robinhood | localhost | robinhood_nobackup | Query | 15 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:7348C3F','1F02CF0C:7348C1E',' |
> | 17946 | robinhood | localhost | robinhood_nobackup | Sleep | 2 |
> | NULL
> |
> | 17947 | robinhood | localhost | robinhood_nobackup | Query | 15 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('1F02CF0C:73C95CF','1F02CF0C:73C93D0',' |
> | 17948 | robinhood | localhost | robinhood_nobackup | Sleep | 840 |
> | NULL
> |
> | 17953 | robinhood | localhost | robinhood_scratch | Query | 3 |
> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
> | 17954 | robinhood | localhost | robinhood_scratch | Sleep | 10 |
> | NULL
> |
> | 17955 | robinhood | localhost | robinhood_scratch | Sleep | 3 |
> | NULL
> |
> | 17956 | robinhood | localhost | robinhood_scratch | Sleep | 3 |
> | NULL
> |
> | 17957 | robinhood | localhost | robinhood_scratch | Query | 2 |
> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
> | 17958 | robinhood | localhost | robinhood_scratch | Sleep | 3 |
> | NULL
> |
> | 17959 | robinhood | localhost | robinhood_scratch | Query | 1 |
> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
> | 17960 | robinhood | localhost | robinhood_scratch | Query | 1 |
> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
> | 17961 | robinhood | localhost | robinhood_scratch | Query | 3 |
> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
> ('483A93D8:4603567A','483A93D8:46030B64' |
> | 17962 | robinhood | localhost | robinhood_scratch | Sleep | 3 |
> | NULL
> |
> | 17963 | robinhood | localhost | robinhood_scratch | Sleep | 10 |
> | NULL
> |
> | 17964 | robinhood | localhost | robinhood_scratch | Query | 0 |
> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
> | 17965 | robinhood | localhost | robinhood_scratch | Sleep | 810 |
> | NULL
> |
> | 18031 | root | localhost | NULL | Query | 0 | NULL
> | show processlist
> |
> +-------+-----------+-----------+--------------------+---------+------+--------+------------------------------------------------------------------------------------------------------+
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> [email protected]
> (734)936-1985
>
>
>
> On Dec 17, 2013, at 4:02 AM, LEIBOVICI Thomas <[email protected]> wrote:
>
>> Hi Brock,
>>
>> I have news about the lock wait timeout issue, that may be caused by many
>> simultaneous long requests on the tables.
>>
>> You can try increasing the following MySQL parameter in my.cnf:
>> for innodb: innodb_lock_wait_timeout (default is 50sec)
>> for myisam: lock_wait_timeout (you should not have to increase it, as the
>> default is 1 year!)
>>
>> Also, you can add the following quick fix in robinhood code, to handle this
>> error properly:
>> in src/list_mgr/mysql_wrapper.c, near line 52:
>>
>> case ER_LOCK_DEADLOCK:
>> + case ER_LOCK_WAIT_TIMEOUT:
>> DisplayLog( LVL_MAJOR, LISTMGR_TAG, "DB deadlock detected" );
>> return DB_DEADLOCK;
>>
>> Thomas
>>
>> On 12/13/13 16:47, Brock Palen wrote:
>>> We do plan to use Sqoop at the end of each run to copy the data into
>>> Haddop/Pig for us to run a series of stats. We are not currently running
>>> that as we have been struggling to get robinhood up to the performance we
>>> had in earlier versions. We would just run the early versions but we want
>>> the post scan hook to start the sqoop. So right now nothing long running.
>>> Also no robinhood commands being ran, other than rbh-report -a every so
>>> often to check status.
>>>
>>> commit_behavior is not set in any of our robinhood configs
>>>
>>> Currently we have
>>> innodb_flush_log_at_trx_commit=0
>>>
>>> mysql> show processlist;
>>> +------+-----------+-----------+--------------------+---------+-------+--------+------------------------------------------------------------------------------------------------------+
>>> | Id | User | Host | db | Command | Time |
>>> State | Info
>>> |
>>> +------+-----------+-----------+--------------------+---------+-------+--------+------------------------------------------------------------------------------------------------------+
>>> | 558 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 559 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 560 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 561 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 562 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 563 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 564 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 565 | robinhood | localhost | robinhood_brockp | Sleep | 21925 |
>>> | NULL
>>> |
>>> | 566 | robinhood | localhost | robinhood_brockp | Sleep | 124 |
>>> | NULL
>>> |
>>> | 571 | robinhood | localhost | robinhood_nobackup | Query | 4 |
>>> update | INSERT INTO NAMES(id,parent_id,name,path_update,pkn) VALUES
>>> ('1F02CF0C:B0DE249','1F02CF0C:B0DDADF',' |
>>> | 572 | robinhood | localhost | robinhood_nobackup | Query | 13 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 573 | robinhood | localhost | robinhood_nobackup | Sleep | 3 |
>>> | NULL
>>> |
>>> | 574 | robinhood | localhost | robinhood_nobackup | Sleep | 3 |
>>> | NULL
>>> |
>>> | 575 | robinhood | localhost | robinhood_nobackup | Sleep | 3 |
>>> | NULL
>>> |
>>> | 576 | robinhood | localhost | robinhood_nobackup | Sleep | 3 |
>>> | NULL
>>> |
>>> | 577 | robinhood | localhost | robinhood_nobackup | Query | 37 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 578 | robinhood | localhost | robinhood_nobackup | Query | 13 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 579 | robinhood | localhost | robinhood_nobackup | Query | 4 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 580 | robinhood | localhost | robinhood_nobackup | Query | 37 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 581 | robinhood | localhost | robinhood_nobackup | Query | 37 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 582 | robinhood | localhost | robinhood_nobackup | Query | 41 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 583 | robinhood | localhost | robinhood_nobackup | Sleep | 96 |
>>> | NULL
>>> |
>>> | 588 | robinhood | localhost | robinhood_scratch | Query | 1 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 589 | robinhood | localhost | robinhood_scratch | Query | 41 |
>>> NULL | INSERT INTO
>>> ENTRIES(id,owner,gr_name,size,blocks,last_access,last_mod,type,mode,nlink,md_update,inva
>>> |
>>> | 590 | robinhood | localhost | robinhood_scratch | Query | 8 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 591 | robinhood | localhost | robinhood_scratch | Sleep | 1 |
>>> | NULL
>>> |
>>> | 592 | robinhood | localhost | robinhood_scratch | Query | 30 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 593 | robinhood | localhost | robinhood_scratch | Query | 24 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 594 | robinhood | localhost | robinhood_scratch | Query | 20 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 595 | robinhood | localhost | robinhood_scratch | Sleep | 1 |
>>> | NULL
>>> |
>>> | 596 | robinhood | localhost | robinhood_scratch | Sleep | 1 |
>>> | NULL
>>> |
>>> | 597 | robinhood | localhost | robinhood_scratch | Query | 5 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 598 | robinhood | localhost | robinhood_scratch | Query | 41 |
>>> update | INSERT INTO ACCT_STAT(owner,gr_name,type,size,blocks, count, sz0,
>>> sz1, sz32, sz1K, sz32K, sz1M, sz32 |
>>> | 599 | robinhood | localhost | robinhood_scratch | Sleep | 1 |
>>> | NULL
>>> |
>>> | 600 | robinhood | localhost | robinhood_scratch | Sleep | 68 |
>>> | NULL
>>> |
>>> | 8208 | root | localhost | NULL | Query | 0 |
>>> NULL | show processlist
>>> |
>>> +------+-----------+-----------+--------------------+---------+-------+--------+------------------------------------------------------------------------------------------------------+
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> [email protected]
>>> (734)936-1985
>>>
>>>
>>>
>>> On Dec 13, 2013, at 8:50 AM, LEIBOVICI Thomas <[email protected]>
>>> wrote:
>>>
>>>> Hi Brock,
>>>>
>>>> I'm pleased to see how your DB requests are batched in single
>>>> transactions, as the INSERT requests shows. good!
>>>>
>>>> About your issue:
>>>>
>>>>> grep 1205 /usr/include/mysql/mysqld_error.h
>>>> #define ER_LOCK_WAIT_TIMEOUT 1205
>>>>
>>>> It seams the request timed out because a table is locked, perhaps by a
>>>> long running request.
>>>>
>>>> - do you run external tools that perform long SQL request on robinhood
>>>> database?
>>>> => I recommend running such requests with "READ UNCOMMITTED"
>>>> isolation level to avoid locking robinhood tables.
>>>> Example:
>>>> SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
>>>> START TRANSACTION;
>>>> SELECT ...
>>>> COMMIT;
>>>>
>>>> - do you run robinhood policies (purge, ...)?
>>>> - do you execute rbh-report commands that take a long time?
>>>> - did you set a value for commit_behavior in robinhood config?
>>>> - did you set a value for innodb_flush_log_at_trx_commit in /etc/my.cnf?
>>>>
>>>> When the problem occurs, you can run the following SQL command to know
>>>> what's going on:
>>>> show processlist;
>>>> It displays the current request and for how long they are running.
>>>>
>>>> HTH,
>>>> Thomas
>>>>
>>>>
>>>> On 12/12/13 18:11, Brock Palen wrote:
>>>>> I have version 2.5 up and running. I twas scanning find by now I see
>>>>> many errors in the log files:
>>>>> Any thoughts what might cause this?
>>>>>
>>>>> 2013/12/12 12:09:40 robinhood@flux-xfer1[25340/7]: ListMgr | DB query
>>>>> failed in ListMgr_BatchInsert line 267: code=7:
>>>>> 2013/12/12 12:09:40 robinhood@flux-xfer1[25340/10]: ListMgr | DB query
>>>>> failed in ListMgr_BatchInsert line 267: code=7:
>>>>> 2013/12/12 12:09:40 robinhood@flux-xfer1[25340/7]: EntryProc | Error 7
>>>>> performing batch database operation.
>>>>> 2013/12/12 12:09:40 robinhood@flux-xfer1[25340/10]: EntryProc | Error 7
>>>>> performing batch database operation.
>>>>> 2013/12/12 12:10:35 robinhood@flux-xfer1[25340/12]: ListMgr | Unhandled
>>>>> error 1205: default conversion to DB_REQUEST_FAILED
>>>>> 2013/12/12 12:10:35 robinhood@flux-xfer1[25340/13]: ListMgr | Unhandled
>>>>> error 1205: default conversion to DB_REQUEST_FAILED
>>>>> 2013/12/12 12:10:35 robinhood@flux-xfer1[25340/12]: ListMgr | Error 1
>>>>> executing query 'INSERT INTO
>>>>> ENTRIES(id,owner,gr_name,size,blocks,last_access,last_mod,type,mode,nlink,md_update,invalid,release_class,rel_cl_update)
>>>>> VALUES
>>>>> ('483A93D8:3C2A4652','tsurma','chem',217300,432,1368541226,1367435409,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:347E4380','tsurma','chem',217300,432,1370891082,1361301663,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:E4DBE36','tsurma','chem',217300,432,1371056025,1362183131,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:5B499DC7','tsurma','chem',217300,432,1371162008,1363399269,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:3B638944','tsurma','chem',217300,432,1366390280,1365556977,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:4D46F45','tsurma','chem',217300,432,1364568182,1364147613,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:34701C7F','tsurma','chem',217300,432
,137089
>>>>> 1082,1361132304,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:EAD8FD8','tsurma','chem',217300,432,1371056025,1362292647,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:4DA410D','tsurma','chem',217300,432,1364568183,1364253065,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:3B5BF664','tsurma','chem',217300,432,1366390272,1365537100,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:5B49E1FD','tsurma','chem',217300,432,1371162009,1363425511,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:E6D99AE','tsurma','chem',217300,432,1371056026,1362218974,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:348E2FE4','tsurma','chem',217300,432,1370891082,1361343903,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:3B63E110','tsurma','chem',217300,432,1366390238,1365649768,'file',420,1,1386868165,0,'@default@',1386868165),('483A93D8:4F25415','tsurma','chem',217300,432,1364568183,1364399293,'file',420,1,1386868
165,0,'
>>>>> @default@',1386868165),('483A93D8:4451F970','tsur... <Line truncated.
>>>>> Real size=60069>
>>>>> 2013/12/12 12:10:35 robinhood@flux-xfer1[25340/12]: ListMgr | Unhandled
>>>>> error 1205: default conversion to DB_REQUEST_FAILED
>>>>>
>>>>>
>>>>> Brock Palen
>>>>>
>>>>> www.umich.edu/~brockp
>>>>>
>>>>> CAEN Advanced Computing
>>>>> XSEDE Campus Champion
>>>>>
>>>>> [email protected]
>>>>>
>>>>> (734)936-1985
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>>> organizations don't have a clear picture of how application performance
>>>>> affects their revenue. With AppDynamics, you get 100% visibility into your
>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>>>>> Pro!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> robinhood-support mailing list
>>>>>
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/robinhood-support
>>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support