Hello:
I have developed a set of procedures for simplifying operation and maintenance for DS. I maintain the TiDB cluster and CM/CDH cluster daily, and got some inspiration. I use Ansible to develop this program. My current idea is based on the DevOps, and does not make any adjustments to the architecture. Features: Install all roles fully automatically (JDK, DB, ZK, BIG DATA relate, DS relate, LOG collect) Install online / Install offline (The offline way is to manually download the package I prepared no need verification, Just put it in the specified directory) Start and stop, various roles. Remote role configuration modification. Butt BIG DATA. Cluster rolling upgrade, Cluster single role upgrade. Cluster expansion and reduction. (such as switch from cluster to single node) Log collect based on GaryLog. Destroy cluster. Adjustment: Adjust configuration files to two categories, main and other. The main thing is to run, others are used for tuning. (No add or del conf, just tell users to conf priority) Ignore previous install. Use ZK native method instead of Kazoo. Added original patches for start and stop scripts. Advantage: The previous installation method is only installation, The user cannot get the correct error prompt during installation or other operations.This program can automatically avoid various minor problems during installation and will tell the user what the error is. Prepare for future rolling upgrades. When a user needs to troubleshoot or upgrade, he can simply and comprehensively describe his cluster status. These are what I currently think, Already implement [Automatic installation 70%] [Remote role configuration modification 100%] [Start and stop, various roles 100%], This is the address of the program https://github.com/feloxx/ds-yibasuo, Welcome everyone to criticize and fix. chendapao [email protected] On 12/11/2019 20:26,leon bao<[email protected]> wrote: As an open source project, I think it's important to stay open and extensible. Our current alert does not have many shortcomings except for scanning the database with one thread, at the same time, the alert service as a separate module provides better scalability. So I don't think it is necessary to merge alert into other modules without much benefit. guo jiwei <[email protected]> 于2019年12月11日周三 下午7:33写道: To xiaochun. it's not a good way. Alert must be trigger by who is scheduling the task, in DS, it's MasterServer. Because, only by who is scheduling the task, it can know the task status in time. If the task is timeout, trigger timeout alert in time is very important for users. this is why we have to move alert into server module. Alert implementation will be refactor in the future. not scan db anymore. On Wed, Dec 11, 2019 at 7:26 PM Xiaochun Liu <[email protected]> wrote: To guo jiwei: Why not put together with the api server, the alert server function is very small, and the load will not be very high. If logs are stored together in the future, we can combine alert server, log server, api server together, these can be called management server. Best Regards --------------- DolphinScheduler(Incubator) Committer Xiaochun Liu 刘小春 [email protected] --------------- 在 2019年12月11日,下午7:15,guo jiwei <[email protected]> 写道: To ligang. it's right. But alert server is only a small function. we define it as an individual module and as a server. do you thing alert is expensive or taking more resource ? if not , why a single module ? And alert server trigger task event by scanning db, do you think it is a nice way ? Moving into server module is only our first step for simplifying user deployment. Extension of alert can be updated via redeploy server and it's not a frequent operation. As the architecture changes, alert implementation will change. On Wed, Dec 11, 2019 at 6:23 PM 李 岗 <[email protected]> wrote: I think from another angle,Master and Worker as key services,I think not to redeployment during normal execution. If tasks are still running,redeploy master and worker may be lead to missed the timed task. ________________________________ DolphinScheduler(Incubator) PPMC Gang Li 李岗 [email protected]<mailto:[email protected]> From: guo jiwei<mailto:[email protected]> Date: 2019-12-11 18:11 To: dev<mailto:[email protected]> Subject: Re: Aproposal for DolphinScheduler Simplified Deployment To ligang. redeploy is simple, but what about the latency of alert ? it's easy to redeploy master server to update alert On Wed, Dec 11, 2019 at 6:03 PM 李 岗 <[email protected]> wrote: I think the alert module can be retained. Currently, it only supports email and webchat, but more alarm modes can be added in the future. At present,alert is a independent service. the alert service only consumes alarm information in the database, other services produce these alarm information. If a new alarm mode is added, It is only necessary to redeploy the alert service. ________________________________ DolphinScheduler(Incubator) PPMC Gang Li 李岗 [email protected]<mailto:[email protected]> 发件人: qiao zhanwei<mailto:[email protected]> 发送时间: 2019-12-10 14:24 收件人: dev<mailto:[email protected]> 主题: Aproposal for DolphinScheduler Simplified Deployment Hello All , Now DolphinScheduler has so many Configuration files for example : dolphinscheduler-alert : alert.properties dolphinscheduler-api : application-api.properties application-combined.properties dolphinscheduler-common : hadoop.properties common.properties quartz.properties zookeeper.properties dolphinscheduler-dao : application-dao.properties dolphinscheduler-server : application-master.properties application-master.properties master.properties worker.properties .dolphinscheduler_env.sh Can we simplify deployment ? Main point : 1 configuration file simplification and merged configuration file 2 master server remove port 3 support offline installation,remove kazoo dependencies in install and monitor 4 instll.sh script simplification ————————————— DolphinScheduler(Incubator) PPMC Zhanwei Qiao 乔占卫 [email protected] -- DolphinScheduler(Incubator) PPMC BaoLiang 鲍亮 [email protected] guo jiwei <[email protected]> 于2019年12月11日周三 下午7:33写道: To xiaochun. it's not a good way. Alert must be trigger by who is scheduling the task, in DS, it's MasterServer. Because, only by who is scheduling the task, it can know the task status in time. If the task is timeout, trigger timeout alert in time is very important for users. this is why we have to move alert into server module. Alert implementation will be refactor in the future. not scan db anymore. On Wed, Dec 11, 2019 at 7:26 PM Xiaochun Liu <[email protected]> wrote: To guo jiwei: Why not put together with the api server, the alert server function is very small, and the load will not be very high. If logs are stored together in the future, we can combine alert server, log server, api server together, these can be called management server. Best Regards --------------- DolphinScheduler(Incubator) Committer Xiaochun Liu 刘小春 [email protected] --------------- 在 2019年12月11日,下午7:15,guo jiwei <[email protected]> 写道: To ligang. it's right. But alert server is only a small function. we define it as an individual module and as a server. do you thing alert is expensive or taking more resource ? if not , why a single module ? And alert server trigger task event by scanning db, do you think it is a nice way ? Moving into server module is only our first step for simplifying user deployment. Extension of alert can be updated via redeploy server and it's not a frequent operation. As the architecture changes, alert implementation will change. On Wed, Dec 11, 2019 at 6:23 PM 李 岗 <[email protected]> wrote: I think from another angle,Master and Worker as key services,I think not to redeployment during normal execution. If tasks are still running,redeploy master and worker may be lead to missed the timed task. ________________________________ DolphinScheduler(Incubator) PPMC Gang Li 李岗 [email protected]<mailto:[email protected]> From: guo jiwei<mailto:[email protected]> Date: 2019-12-11 18:11 To: dev<mailto:[email protected]> Subject: Re: Aproposal for DolphinScheduler Simplified Deployment To ligang. redeploy is simple, but what about the latency of alert ? it's easy to redeploy master server to update alert On Wed, Dec 11, 2019 at 6:03 PM 李 岗 <[email protected]> wrote: I think the alert module can be retained. Currently, it only supports email and webchat, but more alarm modes can be added in the future. At present,alert is a independent service. the alert service only consumes alarm information in the database, other services produce these alarm information. If a new alarm mode is added, It is only necessary to redeploy the alert service. ________________________________ DolphinScheduler(Incubator) PPMC Gang Li 李岗 [email protected]<mailto:[email protected]> 发件人: qiao zhanwei<mailto:[email protected]> 发送时间: 2019-12-10 14:24 收件人: dev<mailto:[email protected]> 主题: Aproposal for DolphinScheduler Simplified Deployment Hello All , Now DolphinScheduler has so many Configuration files for example : dolphinscheduler-alert : alert.properties dolphinscheduler-api : application-api.properties application-combined.properties dolphinscheduler-common : hadoop.properties common.properties quartz.properties zookeeper.properties dolphinscheduler-dao : application-dao.properties dolphinscheduler-server : application-master.properties application-master.properties master.properties worker.properties .dolphinscheduler_env.sh Can we simplify deployment ? Main point : 1 configuration file simplification and merged configuration file 2 master server remove port 3 support offline installation,remove kazoo dependencies in install and monitor 4 instll.sh script simplification ————————————— DolphinScheduler(Incubator) PPMC Zhanwei Qiao 乔占卫 [email protected] -- DolphinScheduler(Incubator) PPMC BaoLiang 鲍亮 [email protected]
