IIRC, that was a full paste of all the bitcask.write.locks. Riak fails pretty
much immediately while running my test suite, maybe before a lock is opened for
each partition?
My ulimit was set to 256, which is obviously no good. After boosting it to
9000 and running my test suite, I have the locks shown below. Riak is still
running. I guess that makes it an issue with max open files rather than a
write lock issue?
$ lsof -p 53113 | awk '{print $9}'| uniq -c | grep lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1438665674247607560106752257205091097473808596992/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/22835963083295358096932575511191922182123945984/bitcask.write.lock
1 /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/0/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/776422744832042175295707567380525354192214163456/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/730750818665451459101842416358141509827966271488/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/753586781748746817198774991869333432010090217472/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1164634117248063262943561351070788031288321245184/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1187470080331358621040493926581979953470445191168/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1210306043414653979137426502093171875652569137152/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/45671926166590716193865151022383844364247891968/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/91343852333181432387730302044767688728495783936/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/137015778499772148581595453067151533092743675904/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/114179815416476790484662877555959610910619729920/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/182687704666362864775460604089535377456991567872/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/228359630832953580969325755111919221821239459840/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/205523667749658222872393179600727299639115513856/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/639406966332270026714112114313373821099470487552/bitcask.write.lock
1
/usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/616571003248974668617179538802181898917346541568/bitcask.write.lock
On Jul 30, 2010, at 8:34 PM, David Smith wrote:
> That's only a partial paste, correct? How many partitions
> ({ring_creation_size, 64} in your etc/app.config) do you have defined? There
> should be a write lock file open for each partition. Also, what is your
> ulimit -n set to?
>
> Thanks,
>
> D.
>
> On Fri, Jul 30, 2010 at 5:09 PM, Alex Wolfe <[email protected]> wrote:
> $ lsof -p 16129 | awk '{print $9}'| uniq -c | grep lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/913438523331814323877303020447676887284957839360/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/959110449498405040071168171470060731649205731328/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/936274486415109681974235595958868809467081785344/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/411047335499316445744786359201454599278231027712/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/456719261665907161938651510223838443642478919680/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/433883298582611803841718934712646521460354973696/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/388211372416021087647853783690262677096107081728/bitcask.write.lock
>
>
> On Jul 30, 2010, at 6:03 PM, David Smith wrote:
>
> > Yup, that looks like the file handle leak. You can verify by using
> > lsof on the server and looking for multiple handles to
> > bitcask.write.lock. Something like:
> >
> > lsof -p pid | awk '{print $9}'| uniq -c
> >
> > D.
> >
> > On Friday, July 30, 2010, Alex Wolfe <[email protected]> wrote:
> >> Hey David.
> >> Does the below log output look like it could be caused by the issue you
> >> fixed?
> >> Alex
> >>
> >> ==== Fri Jul 30 14:22:34 CDT 2010
> >> =ERROR REPORT==== 30-Jul-2010::14:22:34 ===** State machine <0.176.0>
> >> terminating ** Last event in was {riak_vnode_req_v1,
> >> 593735040165679310520246963290989976735222595584,
> >> {fsm,undefined,<0.12466.0>}, {riak_kv_put_req_v1,
> >> {<<"test.groups">>,<<"EghzXywWrGGtp2fCcSLoatIdjML">>},
> >> {r_object,<<"test.groups">>,
> >> <<"EghzXywWrGGtp2fCcSLoatIdjML">>, [{r_content,
> >> {dict,5,16,16,8,80,48,
> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> >> {{[],[], [[<<"Links">>]],
> >> [],[],[],[],[],[],[],
> >> [[<<"content-type">>,97,112,112,108,105,99,97,
> >> 116,105,111,110,47,106,115,111,110],
> >> [<<"X-Riak-VTag">>,89,69,78,55,55,111,66,121,73,
> >> 69,78,53,122,101,85,105,117,68,89,80,52]],
> >> [],[], [[<<"X-Riak-Last-Modified">>|
> >> {1280,517754,951062}]], [],
> >> [[<<"X-Riak-Meta">>]]}}},
> >> <<"{\"name\":\"foo\",\"created_at\":\"2010-07-30T19:22:34.947Z\",\"type\":\"group\",\"version\":1}">>}],
> >> [{<<0,55,119,231>>,{1,63447736954}}],
> >> {dict,1,16,16,8,80,48,
> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> >> [[clean|true]], []}}},
> >> undefined}, 33218311,63447736954,
> >> [{returnbody,true}]}}** When State == active** Data ==
> >> {state,593735040165679310520246963290989976735222595584,
> >> riak_kv_vnode,
> >> {state,593735040165679310520246963290989976735222595584,
> >> riak_kv_bitcask_backend,
> >> {#Ref<0.0.0.611>,
> >> "data/bitcask/593735040165679310520246963290989976735222595584"},
> >> [],false}, undefined,none}**
> >> Reason for termination = ** {{badmatch,{error,emfile}},
> >> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3},
> >> {riak_kv_bitcask_backend,put,3}, {riak_kv_vnode,perform_put,3},
> >> {riak_kv_vnode,do_put,7}, {riak_kv_vnode,handle_command,3},
> >> {riak_core_vnode,vnode_command,3}, {gen_fsm,handle_msg,7}]}
> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===Failed to open lock file
> >> data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock:
> >> emfile
> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===** State machine <0.12471.0>
> >> terminating ** Last event in was {riak_vnode_req_v1,
> >> 593735040165679310520246963290989976735222595584,
> >> {fsm,undefined,<0.12470.0>}, {riak_kv_put_req_v1,
> >> {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>},
> >> {r_object,<<"test.users">>,
> >> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>, [{r_content,
> >> {dict,5,16,16,8,80,48,
> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> >> {{[],[], [[<<"Links">>,
> >> {{<<"test.groups">>,
> >> <<"EghzXywWrGGtp2fCcSLoatIdjML">>},
> >> <<"groups">>}]], [],[],[],[],[],[],[],
> >> [[<<"content-type">>,97,112,112,108,105,99,97,
> >> 116,105,111,110,47,106,115,111,110],
> >> [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55,
> >> 54,48,71,113,115,75,103,54,102,84,56,118,84]],
> >> [],[],
> >> [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]],
> >> [], [[<<"X-Riak-Meta">>]]}}},
> >>
> >> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}],
> >> [{<<3,30,180,15>>,{1,63447736955}}],
> >> {dict,1,16,16,8,80,48,
> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> >> [[clean|true]], []}}},
> >> undefined}, 122945524,63447736955,
> >> [{returnbody,true}]}}** When State == active** Data ==
> >> {state,593735040165679310520246963290989976735222595584,
> >> riak_kv_vnode,
> >> {state,593735040165679310520246963290989976735222595584,
> >> riak_kv_bitcask_backend,
> >> {#Ref<0.0.0.16357>,
> >> "data/bitcask/593735040165679310520246963290989976735222595584"},
> >> [],false}, undefined,none}**
> >> Reason for termination = **
> >> {bad_return_value,{error,{write_locked,emfile}}}
> >> ==> sasl-error.log <==
> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher: initial call:
> >> riak_core_vnode:init/1 pid: <0.176.0> registered_name: [] exception
> >> exit: {{badmatch,{error,emfile}},
> >> [{bitcask_fileops,create_file_loop,3},
> >> {bitcask,put,3}, {riak_kv_bitcask_backend,put,3},
> >> {riak_kv_vnode,perform_put,3},
> >> {riak_kv_vnode,do_put,7},
> >> {riak_kv_vnode,handle_command,3},
> >> {riak_core_vnode,vnode_command,3},
> >> {gen_fsm,handle_msg,7}]} in function gen_fsm:terminate/7 ancestors:
> >> [riak_core_vnode_sup,riak_core_sup,<0.98.0>] messages: [] links:
> >> [#Port<0.3507>,<0.100.0>] dictionary: [] trap_exit: true status:
> >> running heap_size: 1597 stack_size: 24 reductions: 11185 neighbours:
> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 === Supervisor:
> >> {local,riak_core_vnode_sup} Context: child_terminated Reason:
> >> {{badmatch,{error,emfile}},
> >> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3},
> >> {riak_kv_bitcask_backend,put,3},
> >> {riak_kv_vnode,perform_put,3}, {riak_kv_vnode,do_put,7},
> >> {riak_kv_vnode,handle_command,3},
> >> {riak_core_vnode,vnode_command,3},
> >> {gen_fsm,handle_msg,7}]} Offender: [{pid,<0.176.0>},
> >> {name,undefined}, {mfa,
> >> {riak_core_vnode,start_link, [riak_kv_vnode,
> >> 593735040165679310520246963290989976735222595584]}},
> >> {restart_type,temporary},
> >> {shutdown,brutal_kill}, {child_type,worker}]
> >>
> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher: initial call:
> >> riak_core_vnode:init/1 pid: <0.12471.0> registered_name: []
> >> exception exit: {bad_return_value,{error,{write_locked,emfile}}} in
> >> function gen_fsm:terminate/7 ancestors:
> >> [riak_core_vnode_sup,riak_core_sup,<0.98.0>] messages: [] links:
> >> [<0.100.0>] dictionary: [] trap_exit: true status: running
> >> heap_size: 2584 stack_size: 24 reductions: 1955 neighbours:
> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 === Supervisor:
> >> {local,riak_core_vnode_sup} Context: child_terminated Reason:
> >> {bad_return_value,{error,{write_locked,emfile}}} Offender:
> >> [{pid,<0.12471.0>}, {name,undefined},
> >> {mfa, {riak_core_vnode,start_link,
> >> [riak_kv_vnode,
> >> 593735040165679310520246963290989976735222595584]}},
> >> {restart_type,temporary}, {shutdown,brutal_kill},
> >> {child_type,worker}]
> >>
> >> ==> erlang.log.4 <==
> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
> >> terminating ** Last message in was {'EXIT',<0.21020.78>,
> >> {emfile, [{erlang,open_port,
> >> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
> >> [stream]]},
> >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
> >> false,
> >> {1897500000,7776508000}, {<0.81.0>,972008},
> >> false,60000,30000,0.8,0.05,<0.21020.78>,
> >> #Ref<0.0.2.58535>,undefined,
> >> [reg], []}** Reason for termination ==
> >> ** {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd
> >> 2>&1"},[stream]]}, {os,start_port_srv_loop,2}]}
> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
> >> terminating ** Last message in was {'EXIT',<0.21186.78>,
> >> {emfile, [{erlang,open_port,
> >> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
> >> [stream]]},
> >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
> >> false,undefined,undefined,false,60000,30000,
> >> 0.8,0.05,<0.21186.78>,#Ref<0.0.2.58551>,
> >> undefined, [reg],
> >> []}** Reason for termination == **
> >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
> >> {os,start_port_srv_loop,2}]}
> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
> >> terminating ** Last message in was {'EXIT',<0.21252.78>,
> >> {emfile, [{erlang,open_port,
> >> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
> >> [stream]]},
> >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
> >> false,undefined,undefined,false,60000,30000,
> >> 0.8,0.05,<0.21252.78>,#Ref<0.0.2.58559>,
> >> undefined, [reg],
> >> []}** Reason for termination == **
> >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
> >> {os,start_port_srv_loop,2}]}
> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
> >> terminating ** Last message in was {'EXIT',<0.21374.78>,
> >> {emfile, [{erlang,open_port,
> >> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
> >> [stream]]},
> >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
> >> false,undefined,undefined,false,60000,30000,
> >> 0.8,0.05,<0.21374.78>,#Ref<0.0.2.58569>,
> >> undefined, [reg],
> >> []}** Reason for termination == **
> >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
> >> {os,start_port_srv_loop,2}]}
> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
> >> terminating ** Last message in was {'EXIT',<0.21396.78>,
> >> {emfile, [{erlang,open_port,
> >> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
> >> [stream]]},
> >> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
> >> false,undefined,undefined,false,60000,30000,
> >> 0.8,0.05,<0.21396.78>,#Ref<0.0.2.58575>,
> >> undefined, [reg],
> >> []}** Reason for termination == **
> >> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
> >> {os,start_port_srv_loop,2}]}
> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** State machine <0.175.0>
> >> terminating ** Last event in was {riak_vnode_req_v1,
> >> 570899077082383952423314387779798054553098649600,
> >> {fsm,undefined,<0.12470.0>}, {riak_kv_put_req_v1,
> >> {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>},
> >> {r_object,<<"test.users">>,
> >> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>, [{r_content,
> >> {dict,5,16,16,8,80,48,
> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> >> {{[],[], [[<<"Links">>,
> >> {{<<"test.groups">>,
> >> <<"EghzXywWrGGtp2fCcSLoatIdjML">>},
> >> <<"groups">>}]], [],[],[],[],[],[],[],
> >> [[<<"content-type">>,97,112,112,108,105,99,97,
> >> 116,105,111,110,47,106,115,111,110],
> >> [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55,
> >> 54,48,71,113,115,75,103,54,102,84,56,118,84]],
> >> [],[],
> >> [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]],
> >> [], [[<<"X-Riak-Meta">>]]}}},
> >>
> >> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}],
> >> [{<<3,30,180,15>>,{1,63447736955}}],
> >> {dict,1,16,16,8,80,48,
> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
> >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
> >> [[clean|true]], []}}},
> >> undefined}, 122945524,63447736955,
> >> [{returnbody,true}]}}** When State == active** Data ==
> >> {state,570899077082383952423314387779798054553098649600,
> >> riak_kv_vnode,
> >> {state,570899077082383952423314387779798054553098649600,
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com