Thanks for Wangda's help, I am able to retrieve the recording of this session.
Please feel free to download the recording at: https://cloudera.zoom.us/rec/share/7MF_dLX0339OY5391xvkZP8NLrXieaa8gyZK-fYJnUkGOUUXvaUh5cl_6AVYetQl non-Mandarin speakers, please send me the feedback on how you think about the session this time. I served as the translator this time and I need your feedback to improve next time. On Fri, Jan 3, 2020 at 10:01 PM Wei-Chiu Chuang <weic...@apache.org> wrote: > > Hi, it was a well attended session with more than 40 attendees joined! > Thanks Fei Hui for giving us such a great talk. > > Here's the summary for your reference. > > > https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing > 01/02/2020 Didi talked about their large scale HDFS cluster upgrade > experience. > > Slides: > https://drive.google.com/open?id=1iwJ1asalYfgnOCBuE-RfeG-NpSocjIcy > > Didi studied two upgrade approaches from the community documentation: > express upgrade and rolling upgrade. Rolling upgrade was selected. > > The upgrade involved HDFS server side only. Clients are still on Hadoop > 2.7 because applications such as Hive and Spark does not support Hadoop 3 > yet. > > Zookeeper was not upgraded. > > Didi practiced upgrade + downgrade more than 10 times before doing it for > real. > > Didi’s largest cluster has 5 federated namespaces, and 10+ thousand nodes. > The upgrade took a month. JournalNodes took 1 week; NameNode: 2 weeks; > DataNodes took a week. > > During upgrade, HDFS does not clean up trash. Because the upgrade window > was a month long, the trash became a concern because it could exhaust all > available space. Didi has a (script?) to clean trash daily. > > A problem was encountered which may not be related: Clients were > occasionally unable to close files. Solution: reviewed DataNode log, and > found that the blocks were not reported in time, and that was because > delete blocks took too long. > > Two parameters were changed to address the issue: > > Increase dfs.client.block.write.locateFollowingBlock.retries and > > Reduce dfs.block.invalidate.limit (from the default 1000 to 500) > > Didi believes the new upstream change HDFS-14997 can alleviate this issue. > > Timeline: > > May 2019, verified the plan is good. > > July: trial run with a 100-node cluster, completed rolling upgrade > successfully. > > Oct: 300+ node cluster rolling upgrade completed. > > Nov: 10-thousand node cluster rolling upgrade completed. > > Offline test > > Had Spark, Hive and Hadoop full test set. Verified the upgrade/downgrade > has no impact. > > Reviewed the 4000+ patches between Hadoop 2.7 and 3.2, to make sure > there’s no incompatible changes. > > Authored 40+ internal wikis to document the process. > > Future: > > Didi’s interested in Ozone to address the small file problems. > > Want to incorporate the Consistent Read from Standby feature to increase > NameNode RPC performance. > > Finally, DataNode upgrade is hard. Will look into HDFS Maintenance Mode to > make this easier in the future. > > This is a HDFS-only upgrade work. YARN upgrade is planned in the second > half of 2020. Since the main purpose is to use EC to reduce space usage, > Didi ported EC client side code to Hadoop 2.7 clients, and these clients > can read/write EC blocks! > > > On Wed, Jan 1, 2020 at 7:42 PM Wei-Chiu Chuang <weic...@apache.org> wrote: > >> Hi, >> This is a gentle reminder for tomorrow's online meetup. Fei Hui from DiDi >> is going to give a presentation about DiDi's Hadoop 2 -> Hadoop 3 upgrade >> experience. >> >> We will extend this session to 1 hour. Fei will speak in Mandarin and I >> will help translate. So non-Mandarin speakers feel free to join! >> >> Time/Date: >> Jan 1 10PM (US west coast PST) / Jan 2 2pm (Beijing, China CST) / Jan 2 >> 11:30am (India, IST) / Jan 2 3pm (Tokyo, Japan, JST) >> >> Join Zoom Meeting >> >> https://cloudera.zoom.us/j/880548968 >> >> One tap mobile >> >> +16465588656,,880548968# US (New York) >> >> +17207072699,,880548968# US >> >> Dial by your location >> >> +1 646 558 8656 US (New York) >> >> +1 720 707 2699 US >> >> 877 853 5257 US Toll-free >> >> 888 475 4499 US Toll-free >> >> Meeting ID: 880 548 968 >> Find your local number: https://zoom.us/u/acaGRDfMVl >> >