On Sun, Aug 1, 2010 at 11:40 AM, Alex Wolfe <[email protected]> wrote:
> IIRC, that was a full paste of all the bitcask.write.locks. Riak fails
> pretty much immediately while running my test suite, maybe before a lock is
> opened for each partition?
>
If that was a full paste, yes, you weren't even getting the whole system
spun up. You should have one .write.lock open per partition.
> My ulimit was set to 256, which is obviously no good. After boosting it to
> 9000 and running my test suite, I have the locks shown below. Riak is still
> running. I guess that makes it an issue with max open files rather than a
> write lock issue?
>
Yup. But, leave it running long enough with enough traffic and the write
lock bug will eventually show up. :)
D.
>
> $ lsof -p 53113 | awk '{print $9}'| uniq -c | grep lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1438665674247607560106752257205091097473808596992/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/22835963083295358096932575511191922182123945984/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/0/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/776422744832042175295707567380525354192214163456/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/730750818665451459101842416358141509827966271488/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/753586781748746817198774991869333432010090217472/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1164634117248063262943561351070788031288321245184/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1187470080331358621040493926581979953470445191168/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/1210306043414653979137426502093171875652569137152/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/45671926166590716193865151022383844364247891968/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/91343852333181432387730302044767688728495783936/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/137015778499772148581595453067151533092743675904/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/114179815416476790484662877555959610910619729920/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/182687704666362864775460604089535377456991567872/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/228359630832953580969325755111919221821239459840/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/205523667749658222872393179600727299639115513856/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/639406966332270026714112114313373821099470487552/bitcask.write.lock
> 1
> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/616571003248974668617179538802181898917346541568/bitcask.write.lock
>
>
> On Jul 30, 2010, at 8:34 PM, David Smith wrote:
>
> That's only a partial paste, correct? How many partitions
> ({ring_creation_size, 64} in your etc/app.config) do you have defined? There
> should be a write lock file open for each partition. Also, what is your
> ulimit -n set to?
>
> Thanks,
>
> D.
>
> On Fri, Jul 30, 2010 at 5:09 PM, Alex Wolfe <[email protected]>wrote:
>
>> $ lsof -p 16129 | awk '{print $9}'| uniq -c | grep lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/913438523331814323877303020447676887284957839360/bitcask.write.lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/959110449498405040071168171470060731649205731328/bitcask.write.lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/936274486415109681974235595958868809467081785344/bitcask.write.lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/411047335499316445744786359201454599278231027712/bitcask.write.lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/456719261665907161938651510223838443642478919680/bitcask.write.lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/433883298582611803841718934712646521460354973696/bitcask.write.lock
>> 1
>> /usr/local/Cellar/riak/0.12.0/libexec/data/bitcask/388211372416021087647853783690262677096107081728/bitcask.write.lock
>>
>>
>> On Jul 30, 2010, at 6:03 PM, David Smith wrote:
>>
>> > Yup, that looks like the file handle leak. You can verify by using
>> > lsof on the server and looking for multiple handles to
>> > bitcask.write.lock. Something like:
>> >
>> > lsof -p pid | awk '{print $9}'| uniq -c
>> >
>> > D.
>> >
>> > On Friday, July 30, 2010, Alex Wolfe <[email protected]> wrote:
>> >> Hey David.
>> >> Does the below log output look like it could be caused by the issue you
>> fixed?
>> >> Alex
>> >>
>> >> ==== Fri Jul 30 14:22:34 CDT 2010
>> >> =ERROR REPORT==== 30-Jul-2010::14:22:34 ===** State machine <0.176.0>
>> terminating ** Last event in was {riak_vnode_req_v1,
>> 593735040165679310520246963290989976735222595584,
>> {fsm,undefined,<0.12466.0>}, {riak_kv_put_req_v1,
>> {<<"test.groups">>,<<"EghzXywWrGGtp2fCcSLoatIdjML">>},
>> {r_object,<<"test.groups">>,
>> <<"EghzXywWrGGtp2fCcSLoatIdjML">>, [{r_content,
>> {dict,5,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[], [[<<"Links">>]],
>> [],[],[],[],[],[],[],
>> [[<<"content-type">>,97,112,112,108,105,99,97,
>> 116,105,111,110,47,106,115,111,110],
>> [<<"X-Riak-VTag">>,89,69,78,55,55,111,66,121,73,
>> 69,78,53,122,101,85,105,117,68,89,80,52]],
>> [],[], [[<<"X-Riak-Last-Modified">>|
>> {1280,517754,951062}]], [],
>> [[<<"X-Riak-Meta">>]]}}},
>> <<"{\"name\":\"foo\",\"created_at\":\"2010-07-30T19:22:34.947Z\",\"type\":\"group\",\"version\":1}">>}],
>> [{<<0,55,119,231>>,{1,63447736954}}],
>> {dict,1,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> [[clean|true]], []}}},
>> undefined}, 33218311,63447736954,
>> [{returnbody,true}]}}** When State == active** Data ==
>> {state,593735040165679310520246963290989976735222595584,
>> riak_kv_vnode,
>> {state,593735040165679310520246963290989976735222595584,
>> riak_kv_bitcask_backend,
>> {#Ref<0.0.0.611>,
>> "data/bitcask/593735040165679310520246963290989976735222595584"},
>> [],false}, undefined,none}** Reason
>> for termination = ** {{badmatch,{error,emfile}},
>> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3},
>> {riak_kv_bitcask_backend,put,3}, {riak_kv_vnode,perform_put,3},
>> {riak_kv_vnode,do_put,7}, {riak_kv_vnode,handle_command,3},
>> {riak_core_vnode,vnode_command,3}, {gen_fsm,handle_msg,7}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===Failed to open lock file
>> data/bitcask/593735040165679310520246963290989976735222595584/bitcask.write.lock:
>> emfile
>> >> =ERROR REPORT==== 30-Jul-2010::14:22:35 ===** State machine <0.12471.0>
>> terminating ** Last event in was {riak_vnode_req_v1,
>> 593735040165679310520246963290989976735222595584,
>> {fsm,undefined,<0.12470.0>}, {riak_kv_put_req_v1,
>> {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>},
>> {r_object,<<"test.users">>,
>> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>, [{r_content,
>> {dict,5,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[], [[<<"Links">>,
>> {{<<"test.groups">>,
>> <<"EghzXywWrGGtp2fCcSLoatIdjML">>},
>> <<"groups">>}]], [],[],[],[],[],[],[],
>> [[<<"content-type">>,97,112,112,108,105,99,97,
>> 116,105,111,110,47,106,115,111,110],
>> [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55,
>> 54,48,71,113,115,75,103,54,102,84,56,118,84]],
>> [],[],
>> [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]],
>> [], [[<<"X-Riak-Meta">>]]}}},
>>
>> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}],
>> [{<<3,30,180,15>>,{1,63447736955}}],
>> {dict,1,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> [[clean|true]], []}}},
>> undefined}, 122945524,63447736955,
>> [{returnbody,true}]}}** When State == active** Data ==
>> {state,593735040165679310520246963290989976735222595584,
>> riak_kv_vnode,
>> {state,593735040165679310520246963290989976735222595584,
>> riak_kv_bitcask_backend,
>> {#Ref<0.0.0.16357>,
>> "data/bitcask/593735040165679310520246963290989976735222595584"},
>> [],false}, undefined,none}** Reason
>> for termination = ** {bad_return_value,{error,{write_locked,emfile}}}
>> >> ==> sasl-error.log <==
>> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher: initial call:
>> riak_core_vnode:init/1 pid: <0.176.0> registered_name: [] exception
>> exit: {{badmatch,{error,emfile}},
>> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3},
>> {riak_kv_bitcask_backend,put,3},
>> {riak_kv_vnode,perform_put,3}, {riak_kv_vnode,do_put,7},
>> {riak_kv_vnode,handle_command,3},
>> {riak_core_vnode,vnode_command,3},
>> {gen_fsm,handle_msg,7}]} in function gen_fsm:terminate/7 ancestors:
>> [riak_core_vnode_sup,riak_core_sup,<0.98.0>] messages: [] links:
>> [#Port<0.3507>,<0.100.0>] dictionary: [] trap_exit: true status:
>> running heap_size: 1597 stack_size: 24 reductions: 11185 neighbours:
>> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 === Supervisor:
>> {local,riak_core_vnode_sup} Context: child_terminated Reason:
>> {{badmatch,{error,emfile}},
>> [{bitcask_fileops,create_file_loop,3}, {bitcask,put,3},
>> {riak_kv_bitcask_backend,put,3},
>> {riak_kv_vnode,perform_put,3}, {riak_kv_vnode,do_put,7},
>> {riak_kv_vnode,handle_command,3},
>> {riak_core_vnode,vnode_command,3},
>> {gen_fsm,handle_msg,7}]} Offender: [{pid,<0.176.0>},
>> {name,undefined}, {mfa,
>> {riak_core_vnode,start_link, [riak_kv_vnode,
>> 593735040165679310520246963290989976735222595584]}},
>> {restart_type,temporary}, {shutdown,brutal_kill},
>> {child_type,worker}]
>> >>
>> >> =CRASH REPORT==== 30-Jul-2010::14:22:35 === crasher: initial call:
>> riak_core_vnode:init/1 pid: <0.12471.0> registered_name: [] exception
>> exit: {bad_return_value,{error,{write_locked,emfile}}} in function
>> gen_fsm:terminate/7 ancestors:
>> [riak_core_vnode_sup,riak_core_sup,<0.98.0>] messages: [] links:
>> [<0.100.0>] dictionary: [] trap_exit: true status: running
>> heap_size: 2584 stack_size: 24 reductions: 1955 neighbours:
>> >> =SUPERVISOR REPORT==== 30-Jul-2010::14:22:35 === Supervisor:
>> {local,riak_core_vnode_sup} Context: child_terminated Reason:
>> {bad_return_value,{error,{write_locked,emfile}}} Offender:
>> [{pid,<0.12471.0>}, {name,undefined}, {mfa,
>> {riak_core_vnode,start_link,
>> [riak_kv_vnode,
>> 593735040165679310520246963290989976735222595584]}},
>> {restart_type,temporary}, {shutdown,brutal_kill},
>> {child_type,worker}]
>> >>
>> >> ==> erlang.log.4 <==
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21020.78>,
>> {emfile, [{erlang,open_port,
>> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>> [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>> false,
>> {1897500000,7776508000}, {<0.81.0>,972008},
>> false,60000,30000,0.8,0.05,<0.21020.78>,
>> #Ref<0.0.2.58535>,undefined,
>> [reg], []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>> {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21186.78>,
>> {emfile, [{erlang,open_port,
>> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>> [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>> false,undefined,undefined,false,60000,30000,
>> 0.8,0.05,<0.21186.78>,#Ref<0.0.2.58551>,
>> undefined, [reg],
>> []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>> {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21252.78>,
>> {emfile, [{erlang,open_port,
>> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>> [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>> false,undefined,undefined,false,60000,30000,
>> 0.8,0.05,<0.21252.78>,#Ref<0.0.2.58559>,
>> undefined, [reg],
>> []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>> {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21374.78>,
>> {emfile, [{erlang,open_port,
>> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>> [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>> false,undefined,undefined,false,60000,30000,
>> 0.8,0.05,<0.21374.78>,#Ref<0.0.2.58569>,
>> undefined, [reg],
>> []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>> {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** Generic server memsup
>> terminating ** Last message in was {'EXIT',<0.21396.78>,
>> {emfile, [{erlang,open_port,
>> [{spawn,"/bin/sh -s unix:cmd 2>&1"},
>> [stream]]},
>> {os,start_port_srv_loop,2}]}}** When Server state == {state,{unix,darwin},
>> false,undefined,undefined,false,60000,30000,
>> 0.8,0.05,<0.21396.78>,#Ref<0.0.2.58575>,
>> undefined, [reg],
>> []}** Reason for termination == **
>> {emfile,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd 2>&1"},[stream]]},
>> {os,start_port_srv_loop,2}]}
>> >> =ERROR REPORT==== 30-Jul-2010::14:23:27 ===** State machine <0.175.0>
>> terminating ** Last event in was {riak_vnode_req_v1,
>> 570899077082383952423314387779798054553098649600,
>> {fsm,undefined,<0.12470.0>}, {riak_kv_put_req_v1,
>> {<<"test.users">>,<<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>},
>> {r_object,<<"test.users">>,
>> <<"ZrAxzFghd51VG902GuCcJ2gYOMJ">>, [{r_content,
>> {dict,5,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[], [[<<"Links">>,
>> {{<<"test.groups">>,
>> <<"EghzXywWrGGtp2fCcSLoatIdjML">>},
>> <<"groups">>}]], [],[],[],[],[],[],[],
>> [[<<"content-type">>,97,112,112,108,105,99,97,
>> 116,105,111,110,47,106,115,111,110],
>> [<<"X-Riak-VTag">>,50,88,97,105,85,49,70,49,55,
>> 54,48,71,113,115,75,103,54,102,84,56,118,84]],
>> [],[],
>> [[<<"X-Riak-Last-Modified">>|{1280,517755,638}]],
>> [], [[<<"X-Riak-Meta">>]]}}},
>>
>> <<"{\"name\":\"bar\",\"created_at\":\"2010-07-30T19:22:34.998Z\",\"type\":\"user\",\"version\":1}">>}],
>> [{<<3,30,180,15>>,{1,63447736955}}],
>> {dict,1,16,16,8,80,48,
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>> [[clean|true]], []}}},
>> undefined}, 122945524,63447736955,
>> [{returnbody,true}]}}** When State == active** Data ==
>> {state,570899077082383952423314387779798054553098649600,
>> riak_kv_vnode,
>> {state,570899077082383952423314387779798054553098649600,
>>
>>
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com