Hi Yiqun, Thanks for taking a look.
> Does the container data can be read by client side when container node is in DECOMMISSIONING/ DECOMMISSIONED state? If the container cannot be accessed, it can lost containers in a short time when multiple nodes be in decommissioning. There is no limitation on the DN side for this. I need to check the SCM read path to ensure nodes which are DECOMMISSIONING or ENTERING_MAINTENANCE are still returned when OM requests the block locations. I agree this is important and we need to ensure these nodes can still be read. > Do we have the rate limitation control for the node decommission? At the moment no. I feel this is something we should control in Replication Manager rather than decommissioning. We already have seen issues with RM where too many in-flight replication commands are sent to the DNs, which cannot complete them in time, and then more get scheduled etc. Each DN has a replication limit, so I think we need to enhance RM to hold back the commands until the DNs have capacity to service them. We may also want to give priority to under replicated containers due to a dead node rather than decommissioning containers etc. > For above command usage, will we support input the node with given a input node list file, that will be useful for admin users to use this feature. That is certainly something that can be added, and I would see as one of the "usability enhancements" I mentioned. What we can do is create a new epic Jira for "post branch merge enhancements" and start collecting these suggestions there? Thanks, Stephen. On Tue, Oct 27, 2020 at 7:09 AM Lin, Yiqun <[email protected]> wrote: > Hi Stephen, > > I haven't reviewed much of the decommission feature code but have a look > for the overview doc you attached. > > Just some questions and comments from me: > > * Does the container data can be read by client side when container node > is in DECOMMISSIONING/ DECOMMISSIONED state? If the container cannot be > accessed, it can lost containers in a short time when multiple nodes be in > decommissioning. > * Do we have the rate limitation control for the node decommission? Large > number of nodes concurrently decommissioned, lots of closed containers be > in replication. And this can impact the performance of SCM I think. > > Minor suggestion: > ozone admin datanode decommission <list of nodes to remove> > ozone admin datanode maintenance <list of nodes to put to maintenance > > ozone admin datanode recommission <list of nodes to recommission> > > For above command usage, will we support input the node with given a input > node list file, that will be useful for admin users to use this feature. > > Thanks, > Yiqun > > On 2020/10/27, 2:09 AM, "Stephen O'Donnell" <[email protected]> > wrote: > > External Email > > Someone reported that the attachment did not come through - perhaps the > mailing strips out attachments? > > I have attached it to the HDDS-1880 jia - here is the direct link: > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2Fattachment%2F13014144%2FDecommission%2520and%2520Maintenance%2520Overview.pdf&data=04%7C01%7Cyiqlin%40ebay.com%7Cdee6f8e2c0394a384c7108d879da576f%7C46326bff992841a0baca17c16c94ea99%7C0%7C1%7C637393325964258052%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YMi7AhzcN7XceFeC8ZRckPnsiJ2eMYjd34TpImIm0kM%3D&reserved=0 > > Thanks, > > Stephen. > > On Mon, Oct 26, 2020 at 5:47 PM Stephen O'Donnell < > [email protected]> > wrote: > > > Hi All, > > > > I am pleased to announce the Datanode Decommission and Maintenance > feature > > for Ozone - > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHDDS-1880&data=04%7C01%7Cyiqlin%40ebay.com%7Cdee6f8e2c0394a384c7108d879da576f%7C46326bff992841a0baca17c16c94ea99%7C0%7C1%7C637393325964258052%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3F%2Fwmrrh72uNAGkgv7k7OGi%2BwDxi24JpmkocMNY1LQU%3D&reserved=0 > > > > The feature is working in Integration tests and also via > docker-compose. > > There is still some work to improve monitoring and usability, but I > believe > > the feature is now complete enough to merge into master and continue > > development there. > > > > I would like to use this thread to discuss the feature and agree on > > whether we can merge it into master. To help with the discussion, I > have > > attached a short document describing the major changes. > > > > The decommission changes are all on the branch HDDS-1880-Decom. > > > > Please reply here with any questions and comments. > > > > Thanks, > > > > Stephen. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
