[ 
https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675998#action_12675998
 ] 

Luca Telloli commented on HADOOP-5188:
--------------------------------------

Konstantin> No I don't think logSync() method should be modified at least not 
the way you did in your patch HADOOP-5189. Synchronization (locking) in the 
logSync() method prevents different handler threads to write the same 
information (edits) multiple times into the same edits file. Suppose you have 1 
file stream (one edits file) and 2 threads call logSync() at the same time. If 
you remove locking as you did in your patch then both threads may write the 
same edits transactions into the same file twice. This will simply corrupt the 
edits file.



Konstantin> logSync() may be modified to write in parallel into different 
streams. If you have 2 file streams then it makes sense to write into 2 files 
in parallel. But the information should be written into each file only once and 
in the right order. And your patch breaks that, afaiu.



I agree with you that the logSync() in the patch would not work with the 
file-based logging, and that's why I put in my comments that this patch breaks 
it, but the purpose of the patch is to work with BookKeeper! 



In the prototype included in the patch we do indeed remove synchronisation in 
logSync(), but at the same time each thread has its own local buffer (a 
ThreadLocal variable) where it's writing log modifications, and this guarantees 
that requests are processes in the order they are received by the client, thus 
they do not violate the semantics of the file system. As a proof I decided to 
run tests where I check the number of bytes written onto bookkeeper with 
different logging systems and that number is the same which comforts me on this 
issue. 



Konstantin> My main objection to your approach though, is abstracting the whole 
EditsLog class rather than just the EditLog*Streams. This leads to replication 
of a lot of code. Say, both classes EditLog and BKEditLogThreadBuf have the 
loadFSEdits() methods, which are literally identical, right? We should avoid 
that at all cost.



For BookKeeper integration *I really need to be able to override logSync()*, so 
I have to keep going with the current abstract class unless a different way of 
overriding that method is provided.  

In general, think that the abstract class has exactly the opposite effect of 
what you state, allowing to place in the root class the common parts, but I 
agree that I might not have provided you the cleanest code in this sense; I'll 
do some cleanup and post a new version. In the specific case of loadFSEdits, if 
they are identical then you could remove the code from BKEditLogThreadBuf. 



I also think that the abstract classes for Input/Output streams are cool and 
they do not conflict with having an abstract class EditLog; in each test 
prototype in fact we used them


I'll try to have a patch that applies against trunk; and I'd also like to try 
to address dhruba request of multiple logging systems, maybe I could try with 
HADOOP-4539? Is that patch supporting multiple logging through multiple 
instances of the EditLogInput/OutputStream classes? I'd like to avoid 
duplicating efforts but, in my case, I really need to be able to Override the 
current logSync(). 


> Modifications to enable multiple types of logging 
> --------------------------------------------------
>
>                 Key: HADOOP-5188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5188
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: HADOOP-5188.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to