HiJoseph,

On 08/05/2015 07:18 PM, Joseph Qi wrote:
> On 2015/8/5 16:07, Ryan Ding wrote:
>> On 08/05/2015 02:40 PM, Joseph Qi wrote:
>>> On 2015/8/5 12:40, Ryan Ding wrote:
>>>> Hi Joseph,
>>>>
>>>>
>>>> On 08/04/2015 05:03 PM, Joseph Qi wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> On 2015/8/4 14:16, Ryan Ding wrote:
>>>>>> Hi Joseph,
>>>>>>
>>>>>> Sorry for bothering you with the old patches. But I really need to know 
>>>>>> what this patch is for.
>>>>>>
>>>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-January/010496.html
>>>>>>
>>>>>>    From above email archive, you mentioned those patches aim to reduce 
>>>>>> the host page cache consumption. But in my opinion, after append direct 
>>>>>> io, the page used for buffer is clean. System can realloc those cached 
>>>>>> pages. We can even call invalidate_mapping_pages to fast that process. 
>>>>>> Maybe more pages will be needed during direct io. But direct io size can 
>>>>>> not be too large, right?
>>>>>>
>>>>> We introduced the append direct io because originally ocfs2 would fall
>>>>> back to buffer io in case of thin provision, which was not the actual
>>>>> behavior that user expect.
>>>> direct io has 2 semantics:
>>>> 1. io is performed synchronously, data is guaranteed to be transferred 
>>>> after write syscall return.
>>>> 2. File I/O is done directly to/from user space buffers. No page buffer 
>>>> involved.
>>>> But I think #2 is invisible to user space, #1 is the only thing that user 
>>>> space is really interested in.
>>>> We should balance the benefit and disadvantage to determine whether #2 
>>>> should be supported.
>>>> The disadvantage is: bring too much complexity to the code, bugs will come 
>>>> along. And involved a incompatible feature.
>>>> For example, I did a single node sparse file test, and it failed.
>>> What do you mean by "failed"? Could you please send out the test case
>>> and the actual output?
>>> And which version did you test? Because some bug fixes were submitted later.
>>> Currently doing direct io with hole is not support.
>> I use linux 4.0 latest commit 39a8804455fb23f09157341d3ba7db6d7ae6ee76
>> A simplified test case is:
>> dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct && truncate 
>> /mnt/hello -s 2097152
>> file 'hello' is not exist before test. After this command, file 'hello' 
>> should be all zero. But 512~4096 is some random data.
> I've got the issue.
> dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct
> The above dd command will allocate a cluster, but only write 512B, left
> 512B to cluster size uninitialized.
> truncate /mnt/hello -s 2097152
> The above truncate is just block aligned, so it will only zero out 4k to
> 2M.
> In my design, I have only considered zeroing out the current allocated
> cluster head, and left the tail zeroing out to the next direct io for
> performance consideration (we have no need to zero out first and then
> write).
> So to fix this issue, we should at least zero out the block aligned pad.
> But this may be unnecessary in case of continuous direct io. Do you have
> any suggestions?
I havean idea to resole those problem. I will start a new mail and have 
it discussed.
>>>> The original way of ocfs2 handling direct io(turn to buffer io when it's 
>>>> append write or write to a file hole) has 2 consideration:
>>>> 1. easier to support cluster wide coherence.
>>>> 2. easier to support sparse file.
>>>> But it seems that your patch handle #2 not very well.
>>>> There may be more issues that I have not found.
>>>>> I didn't get you that more pages would be needed during direct io. Could
>>>>> you please explain it more clearly?
>>>> I mean the original way of handle append-dio will consume some page cache. 
>>>> The page cache size it consume depend on the direct io size. For example, 
>>>> 1MB direct io will consume 1MB page cache.But since direct io size can not 
>>>> be too large, the page cache it consume can not be too large also. And 
>>>> those pages can be freed after direct io finished by calling 
>>>> invalidate_mapping_pages().
>>> I've got your point. Please consider the following user scenario.
>>> 1. A node mounted several ocfs2 volumes, for example, 10.
>>> 2. For each ocfs2 volume, there are several thin provision VMs.
>> Is there many direct io in parallelthat had been tested out?
>> About o2net_wq will block reclaim cache issue you mentioned in another mail. 
>> invalidate_mapping_pages() only free the page cache pages that stored data. 
>> It will not affect meta data cache. So that will not wait unlock. Is that 
>> right?
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>
>>>>>>
>>>> .
>>>>
>>
>> .
>>
>


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to