Hi Abhinav, Can you post your proposal to this JIRA and also start drafting it on the GSoC portal?
https://issues.apache.org/jira/browse/AIRAVATA-3608 Suresh On Apr 15, 2022, at 12:41 PM, Abhinav Sinha <[email protected]<mailto:[email protected]>> wrote: This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources. Hello Dev! I’ve attached the first draft of my project proposal for GSoC 2022. I’d love for you to take a look and suggest any improvements/changes. Thanks, Abhinav From: Abhinav Sinha <[email protected]<mailto:[email protected]>> Date: Sunday, April 3, 2022 at 7:48 AM To: Airavata Dev <[email protected]<mailto:[email protected]>> Cc: Ranawaka, Isuru Janith <[email protected]<mailto:[email protected]>>, Marru, Suresh <[email protected]<mailto:[email protected]>> Subject: Apache Custos for GSoC 2022 Hi, I had fruitful discussions with Isuru last week. 1. We started off with an overview of the Custos Portal. Isuru provided me with a demo-run of a sample use case. We went over the Authentication process->Authorization tiers-> Tenant creation->Role based configurations. As we were going through the demo, Isuru explained the different features of the application. (Following our discussion, I went through the tutorial here<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCUSTOS%2FTutorial%2BSteps%2Bfor%2BReference%2BPortal%2Band%2BCustos%2BPortal&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=kg%2FBDVRFZPbpIpMqQr7EfnjDkrq03mkfRE68tJaylgA%3D&reserved=0> to recap. 1. Isuru provided me with a brief summary of Custos Architecture as well. I had read the following papers suggested by Suresh earlier to build on my understanding – https://dl.acm.org/doi/pdf/10.1145/3311790.3396635<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdl.acm.org%2Fdoi%2Fpdf%2F10.1145%2F3311790.3396635&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=8wCqcxTA25MN8qdebMD9E3cfxTHIkp%2FD43N2crOtC9s%3D&reserved=0> https://arxiv.org/pdf/2107.04172.pdf<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2107.04172.pdf&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=4SXFrcA0eHL%2Fk5tF8JXFJ%2BE5NrPecP3LIm4eOQrzEpI%3D&reserved=0> 1. Then we went over the current deployment architecture. Here, Isuru demoed the Kubernetes cluster deployment and we looked at the various components and their config files. We also went over the Keycloak IDP and HashiCorp secret storage. (I went through the tutorial here<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCUSTOS%2FCustos%2BDeployment%2BArchitecture%2Band%2BInstallation%2BGuide&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=rrv59rQERtmHGgTEk%2BBqiMmvkuqDRQC703Qqd6tpPCY%3D&reserved=0> to recap our discussion) After the demos, we discussed the following open items that I could possibly work on as part of GSoC. 1. An important missing piece in Custos today is external data backups (as mentioned in the documentation here<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCUSTOS%2FCustos%2BDeployment%2BArchitecture%2Band%2BInstallation%2BGuide&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=rrv59rQERtmHGgTEk%2BBqiMmvkuqDRQC703Qqd6tpPCY%3D&reserved=0>). He suggested using Velero (which is an open source tool to backup Kubernetes resources) to create database backups. I am going over the documentation here to understand how to use Velero<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvelero.io%2Fdocs%2Fv1.8%2Fhow-velero-works%2F&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ieY%2BCuu276h7N%2FDWQzh07lSRq61JiMekq3GZCSTidnU%3D&reserved=0> and come up with a plan to implement the data back up feature. 1. Another open item is creating a Custos Operator<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairavata-custos%2Fissues%2F149&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=JtlbwdADQTrJSSCno9bsuZiQNglmHPoU4ghl5LOkVtk%3D&reserved=0>. The goal here is to automate deployments. Isuru briefly went over the Kubernetes Operator pattern<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Fconcepts%2Fextend-kubernetes%2Foperator%2F&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059319645%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=WS%2B8EL8k7nQYkk1yrZq3S3FS4qx4kN57C25ON%2FdpRmI%3D&reserved=0> that could help provide an abstract resource to manage deployments for all of the Custos microservices (Current deployments are done using a Maven task) As per Isuru’s suggestion, I am going over the keycloak operator<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkeycloak%2Fkeycloak-operator&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059475884%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=WZj7Zi9ZHW4L3TpEPfhig3lwwG3zNdTu9NcmstIRMhY%3D&reserved=0> as a guide to implement something similar for Custos. 1. Finally, Isuru highlighted the need for an intelligent way to divide microservices in deployment configs in order to boost performance and improve memory consumption. Currently, in Custos, this composition is based on the functions served by the microservices -> so we have 2 major units in the deployments - Core services and Integration services. This division approach isn’t very scalable. A better approach could be based on resource utilization (and other carefully designed heuristics – maybe like affinity as discussed in this paper<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fieeexplore.ieee.org%2Fdocument%2F6531761&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059475884%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=8rsr2nBPcI0dVY7v9ljgzmArPFBQChtB2EJQORqEZqQ%3D&reserved=0>. I am also going over this<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjisajournal.springeropen.com%2Farticles%2F10.1186%2Fs13174-019-0104-0&data=04%7C01%7Csmarru%40iu.edu%7Cbd8bb3114c99457bb9f008da1efec5ae%7C1113be34aed14d00ab4bcdd02510be91%7C0%7C0%7C637856378059475884%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=FSrCdPLcxcvy7%2BYaNa2Bl9m2GYc0K6NPHCzYXGGE7M0%3D&reserved=0> cool paper that discusses this problem. I would love to work on any of the 3 open items, but I need some of your ideas on which of these could be a good GSoC project. #1: Isuru pointed out that Data-backup is a high priority item for Custos at the moment, but it may/may not be an ideal choice for a full-fledged GSoC project. #2: Building a Custos operator, unlike that of the data back-up feature, is a full-fledged project in itself. It needs expert understanding of the Kubernetes Operator pattern – I am starting to explore it. I #3 ’d love to explore the open research problem of microservice placement, but I’d like to hear your opinion on it – and if you think this could be a good project. This closely aligns with my goal of exploring a research problem as part of my Master’s Thesis project. Thanks, Abhinav <GSoC_Proposal.pdf>
