Re: [GSoC 2024] ovfs proposal discussion

Xuanwo Wed, 20 Mar 2024 19:32:10 -0700

> As we know, the observability is very important for a service. I think 
> we might need to define and export some metric to let people know the 
> ovfs daemon service status
>
> For now, we have some layer out of box in the OpenDAL(like Prometheus, 
> OpenTelementry, Dtrace). I'm not sure we should add more metric like 
> cache hit rate and anything else or not.


Hi, I agree that observability is important. However, it seems we're 
going too far. The feature requests keep extending, making it a never-ending 
project.

I suggest we leave them as points for future expansion.

On Wed, Mar 20, 2024, at 23:24, Manjusaka wrote:
> On 2024/3/20 22:33, Runjie Yu wrote:
>> Got it, thanks for your suggestions, I'll keep this in mind.
>> 
>> I've put the main content of the proposal in a Google Docs, here's the link
>> https://docs.google.com/document/d/1huy8vHcoCTf-GausabR3PCwXIXfRJAEckeyTulp2gDU/edit?usp=sharing
>> 
>> I've specified in the deliverables section a description of the target for
>> each storage type, S3 for object storage and HDFS for file storage.
>> 
>> ```
>> 1) A code repository that implements the functions described in the project
>> details. The services implemented by OVFS in the code repository need to
>> meet the following requirements: (1) VirtioFS implementation, well
>> integrated with VMs and QEMU, able to correctly handle VMs read and write
>> requests to the file system. (2) Supports the use of distributed object
>> storage systems and distributed file systems as storage backends, and
>> provides complete and correct support for at least one specific storage
>> service type for each storage system type. S3 can be used as the target for
>> object storage systems, and HDFS can be used as the target for distributed
>> file systems. (3) Supports related configurations of various storage
>> systems. Users can configure storage system access and use according to
>> actual needs. When an error occurs, users can use the configuration file to
>> restart services.
>> ```
>> 
>> On Tue, Mar 19, 2024 at 10:41 PM Manjusaka <[email protected]> wrote:
>> 
>>> On 2024/3/19 20:57, 余润杰 wrote:
>>>> Thank you for your suggestion.
>>>>
>>>> For the first point, I will update this in the proposal with an exact
>>> goal for each storage system type. For the second point, I assume this
>>> cache is shared by all VMs running in the same host OS.
>>>>
>>>> Regarding cloud documents, I think this is a very good suggestion. Yes,
>>> I need to create and maintain a cloud document. This is not only easy to
>>> browse, but by updating and maintaining this document during the GSoC
>>> cycle, it helps us focus on our goals and demonstrate the phased results of
>>> development.
>>>>
>>>> For now, I will create a Google Docs tomorrow to display the content of
>>> the existing proposal.
>>>>
>>>> Thanks again for your advice!
>>>>
>>>> Manjusaka <[email protected] <mailto:[email protected]>>
>>> 于2024年3月19日周二 17:48写道：
>>>>
>>>>     On 2024/3/19 16:58, 余润杰 wrote:
>>>>     > Hi, Xuanwo and Manjusaka.
>>>>     >
>>>>     > I hope this email didn’t bother you!
>>>>     >
>>>>     > Applications for GSoC 2024 contributors opened today, and I hope
>>> to join the GSoC project in Apache OpenDAL as a candidate. I have added you
>>> to the list of mentors for the ovfsproject proposal and hope to have the
>>> opportunity to be mentored by you!
>>>>     >
>>>>     > /Project Mentors: Xuanwo ([email protected] <mailto:
>>> [email protected]> <mailto:[email protected] <mailto:[email protected]>>),
>>> Manjusaka ([email protected] <mailto:[email protected]> <mailto:
>>> [email protected] <mailto:[email protected]>>)/
>>>>     >
>>>>     > I have supplemented and modified some of the content based on
>>> previous proposal, mainly including the following points:
>>>>     >
>>>>     > 1) Based on the discussion in the previous email, the name of the
>>> project was changed from ovirtiofs to ovfs.
>>>>     >
>>>>     > 2) Added explanation of ovfs design philosophy.
>>>>     >
>>>>     > 3) Avoid ovfs persisting any metadata.
>>>>     >
>>>>     > 4) Added potential application scenarios.
>>>>     >
>>>>     > 5) Added project deliverables.
>>>>     >
>>>>     > 6) Added Why Me And Why Do I Wish To Take Part In GSoC 2024
>>> section.
>>>>     >
>>>>     > I hope to submit the proposal this week. I'd like to know if there
>>> are still areas that need to be revised or discussed before the proposal is
>>> formally submitted.
>>>>     >
>>>>     > Have a nice day!
>>>>
>>>>     Hi Runjie
>>>>
>>>>     Glad to hear from you!
>>>>
>>>>     Nice proposal! BTW maybe you can upload the document to a website
>>> like Google Docs, Gist, so other people can preview the docs online(LOL
>>>>
>>>>     Most LGTM about this version proposal, I may have some
>>> issues/suggestions
>>>>
>>>>     1. We can make our target to implement only one service backend for
>>> each category of the service(like S3 for blob, HDFS for file like, KV
>>> storage is not in the plan). This will help us to focus on the function,
>>> not the corner behavior.
>>>>     It would also can help us to reach the full-fuction tested target(I
>>> think it's important for us)
>>>>
>>>>     2. About the cache, I would like to ask: Is the cache shared by all
>>> the VM? or each VM would have their own cache
>>>>
>>>>
>>>>     Thanks for your nice proposal, have a nice day
>>>>
>>>>     Best
>>>>
>>>>     Manjusaka
>>>>
>>>
>>> Sorry, I forget something in the previous email
>>>
>>> About the cache, I have another suggestion. I think we should split it
>>> into two parts: the read cache and the write cache. The people can choose
>>> to enable the cache base on their circumstance
>>>
>>> For example, if the user mount a S3 bucket as backend which is modified in
>>> high frequency(modified by other serivce), the people shouldn't enable the
>>> read cache.
>>>
>>> I think this is would good for the production usage.
>>>
>>> Best
>>>
>>> Manjusaka
>>>
>> 
>
> Thanks for your public docs, I would like to say this is the most 
> extraordinary proposal I have ever seen before. Great Job!
>
> BTW, I'm not sure the following should be included into the original 
> proposal, for now, this is just personal idea.
>
> As we know, the observability is very important for a service. I think 
> we might need to define and export some metric to let people know the 
> ovfs daemon service status
>
> For now, we have some layer out of box in the OpenDAL(like Prometheus, 
> OpenTelementry, Dtrace). I'm not sure we should add more metric like 
> cache hit rate and anything else or not.
>
> WDYT?
>
> Best
>
> Manjusaka

-- 
Xuanwo

Re: [GSoC 2024] ovfs proposal discussion

Reply via email to