[
https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675998#action_12675998
]
Luca Telloli commented on HADOOP-5188:
--------------------------------------
Konstantin> No I don't think logSync() method should be modified at least not
the way you did in your patch HADOOP-5189. Synchronization (locking) in the
logSync() method prevents different handler threads to write the same
information (edits) multiple times into the same edits file. Suppose you have 1
file stream (one edits file) and 2 threads call logSync() at the same time. If
you remove locking as you did in your patch then both threads may write the
same edits transactions into the same file twice. This will simply corrupt the
edits file.
Konstantin> logSync() may be modified to write in parallel into different
streams. If you have 2 file streams then it makes sense to write into 2 files
in parallel. But the information should be written into each file only once and
in the right order. And your patch breaks that, afaiu.
I agree with you that the logSync() in the patch would not work with the
file-based logging, and that's why I put in my comments that this patch breaks
it, but the purpose of the patch is to work with BookKeeper!
In the prototype included in the patch we do indeed remove synchronisation in
logSync(), but at the same time each thread has its own local buffer (a
ThreadLocal variable) where it's writing log modifications, and this guarantees
that requests are processes in the order they are received by the client, thus
they do not violate the semantics of the file system. As a proof I decided to
run tests where I check the number of bytes written onto bookkeeper with
different logging systems and that number is the same which comforts me on this
issue.
Konstantin> My main objection to your approach though, is abstracting the whole
EditsLog class rather than just the EditLog*Streams. This leads to replication
of a lot of code. Say, both classes EditLog and BKEditLogThreadBuf have the
loadFSEdits() methods, which are literally identical, right? We should avoid
that at all cost.
For BookKeeper integration *I really need to be able to override logSync()*, so
I have to keep going with the current abstract class unless a different way of
overriding that method is provided.
In general, think that the abstract class has exactly the opposite effect of
what you state, allowing to place in the root class the common parts, but I
agree that I might not have provided you the cleanest code in this sense; I'll
do some cleanup and post a new version. In the specific case of loadFSEdits, if
they are identical then you could remove the code from BKEditLogThreadBuf.
I also think that the abstract classes for Input/Output streams are cool and
they do not conflict with having an abstract class EditLog; in each test
prototype in fact we used them
I'll try to have a patch that applies against trunk; and I'd also like to try
to address dhruba request of multiple logging systems, maybe I could try with
HADOOP-4539? Is that patch supporting multiple logging through multiple
instances of the EditLogInput/OutputStream classes? I'd like to avoid
duplicating efforts but, in my case, I really need to be able to Override the
current logSync().
> Modifications to enable multiple types of logging
> --------------------------------------------------
>
> Key: HADOOP-5188
> URL: https://issues.apache.org/jira/browse/HADOOP-5188
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.19.0
> Reporter: Luca Telloli
> Attachments: HADOOP-5188.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.