Re: failover of nimbus server
sure, now I merge official release to my branch 2014-09-10 13:54 GMT+08:00 Jiang Jacky jiang0...@gmail.com: normally, i do not directly modify libraries, it will be complex to merge into the future official release, otherwise it will cause some unpredictable bugs. The better way to do this is inherit the class from official release in the separate place(of course, it still will cause bugs, but should be less), or directly send feedback to the author. 2014-09-10 0:59 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: I modify many storm code, and i maintains self branch for dev.. change write jar/conf/topo to local file system to hdfs 2014-09-10 12:30 GMT+08:00 Jiang Jacky jiang0...@gmail.com: I am also interested in how can you make the storm be connected with HDFS? have you modified the lib from storm? can you guys roughly describe the steps? thanks 2014-09-10 0:16 GMT-04:00 Jiang Jacky jiang0...@gmail.com: the best solution is that we can add the multiple nimbus server in the storm.yaml, those should be for failover, it also will be easy to configure 2014-09-09 22:19 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: yes, I have implement this way.. and it ok in fact.. I implement a total ha solution for nimbus. and our team write a total scheduler for storm(such as yarn for support 700+ cluster) 2014-09-10 10:02 GMT+08:00 Ankit Toshniwal ankitoshni...@gmail.com: Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
*According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
yes, I have implement this way.. and it ok in fact.. I implement a total ha solution for nimbus. and our team write a total scheduler for storm(such as yarn for support 700+ cluster) 2014-09-10 10:02 GMT+08:00 Ankit Toshniwal ankitoshni...@gmail.com: Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
the best solution is that we can add the multiple nimbus server in the storm.yaml, those should be for failover, it also will be easy to configure 2014-09-09 22:19 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: yes, I have implement this way.. and it ok in fact.. I implement a total ha solution for nimbus. and our team write a total scheduler for storm(such as yarn for support 700+ cluster) 2014-09-10 10:02 GMT+08:00 Ankit Toshniwal ankitoshni...@gmail.com: Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
I am also interested in how can you make the storm be connected with HDFS? have you modified the lib from storm? can you guys roughly describe the steps? thanks 2014-09-10 0:16 GMT-04:00 Jiang Jacky jiang0...@gmail.com: the best solution is that we can add the multiple nimbus server in the storm.yaml, those should be for failover, it also will be easy to configure 2014-09-09 22:19 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: yes, I have implement this way.. and it ok in fact.. I implement a total ha solution for nimbus. and our team write a total scheduler for storm(such as yarn for support 700+ cluster) 2014-09-10 10:02 GMT+08:00 Ankit Toshniwal ankitoshni...@gmail.com: Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
I modify many storm code, and i maintains self branch for dev.. change write jar/conf/topo to local file system to hdfs 2014-09-10 12:30 GMT+08:00 Jiang Jacky jiang0...@gmail.com: I am also interested in how can you make the storm be connected with HDFS? have you modified the lib from storm? can you guys roughly describe the steps? thanks 2014-09-10 0:16 GMT-04:00 Jiang Jacky jiang0...@gmail.com: the best solution is that we can add the multiple nimbus server in the storm.yaml, those should be for failover, it also will be easy to configure 2014-09-09 22:19 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: yes, I have implement this way.. and it ok in fact.. I implement a total ha solution for nimbus. and our team write a total scheduler for storm(such as yarn for support 700+ cluster) 2014-09-10 10:02 GMT+08:00 Ankit Toshniwal ankitoshni...@gmail.com: Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks
Re: failover of nimbus server
normally, i do not directly modify libraries, it will be complex to merge into the future official release, otherwise it will cause some unpredictable bugs. The better way to do this is inherit the class from official release in the separate place(of course, it still will cause bugs, but should be less), or directly send feedback to the author. 2014-09-10 0:59 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: I modify many storm code, and i maintains self branch for dev.. change write jar/conf/topo to local file system to hdfs 2014-09-10 12:30 GMT+08:00 Jiang Jacky jiang0...@gmail.com: I am also interested in how can you make the storm be connected with HDFS? have you modified the lib from storm? can you guys roughly describe the steps? thanks 2014-09-10 0:16 GMT-04:00 Jiang Jacky jiang0...@gmail.com: the best solution is that we can add the multiple nimbus server in the storm.yaml, those should be for failover, it also will be easy to configure 2014-09-09 22:19 GMT-04:00 潘臻轩 zhenxuan...@gmail.com: yes, I have implement this way.. and it ok in fact.. I implement a total ha solution for nimbus. and our team write a total scheduler for storm(such as yarn for support 700+ cluster) 2014-09-10 10:02 GMT+08:00 Ankit Toshniwal ankitoshni...@gmail.com: Yes, that's a problem area, and we have been discussing it internally on how we can handle it better. We are considering moving to an HDFS based solution where Nimbus will upload the jars into hdfs instead of local disk (as that is a single point of failure) and supervisors will be downloading the jar's from hdfs as well. The other problem we ran into was nic saturation on Nimbus host since too many machines were doing copy of the jar's (180MB in size) to worker machines leading to the total increase in time. Thus, with moving to HDFS based solution we can do this more effectively and faster plus it scales better. We do not have a working prototype for it, but something we are actively pursuing. Ankit On Tue, Sep 9, 2014 at 6:43 PM, 潘臻轩 zhenxuan...@gmail.com wrote: I not agree Nathan, if just nimbus down, it is fail-fast.but if the machine happen error(such as disk error), this may lead topology clear. 2014-09-10 9:39 GMT+08:00 潘臻轩 zhenxuan...@gmail.com: *According to my knowledge, is not the case。you should check it with script or other way.* 2014-09-10 0:49 GMT+08:00 Jiang Jacky jiang0...@gmail.com: Hi, I read the articles about the nimbus, it specifies the nimbus daemon is fail-fast. But I am not sure if it is like Hadoop, there is secondary server for failover, if the nimbus server is totally down, then the secondary server can be up. Thanks