[jira] [Commented] (HDFS-11266) libhdfs++: Redesign block reader with with simplicity and resource management in mind

Hadoop QA (JIRA) Fri, 26 Oct 2018 11:49:25 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665538#comment-16665538
 ]


Hadoop QA commented on HDFS-11266:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-11266 does not apply to HDFS-8707. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11266 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844303/HDFS-11266.HDFS-8707.000.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25376/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: Redesign block reader with with simplicity and resource management 
> in mind
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-11266
>                 URL: https://issues.apache.org/jira/browse/HDFS-11266
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>            Priority: Major
>         Attachments: HDFS-11266.HDFS-8707.000.patch
>
>
> The goal here is to significantly simplify the block reader and make it much 
> harder to introduce issues.  There are plenty of examples of these issues in 
> the subtasks of  HDFS-8707, the one that finally motivated a reimplementation 
> is HDFS-10931.
> Goals:
> -The read side protocol of the data transfer pipeline is fundamentally really 
> simple (even if done asynchronously).  The code should be equally simple.
> -Get the code in a state that should be easy enough to reason about with a 
> solid understanding of HDFS and basic understanding of C++ and vice versa: 
> improve comments and avoid using esoteric C++ constructs.  This is a 
> must-have in order to lower the bar to contribute.
> -Get rid of dependencies on the existing continuation stuff.  Myself and 
> others have spent far too much time debugging both the continuation code and 
> bugs introduced because the continuation code was hard to reason about.  
> Notable issues:
>   -It's cool from a theoretical perspective, but after 18 months of working 
> on this it's still unclear what problem the continuation pattern helped solve 
> that callbacks couldn't.
>   -They spend more time allocating memory than the rest of the code does 
> doing real work - seriously, profile it.  This can't be fixed because the 
> Pipeline takes ownership of all Continuation objects and then deletes them.
>   -The way the block reader really uses them is a hybrid of a state machine, 
> continuations, and directly using asio callbacks to bounce between the two.
> Proposed approach:
> Still have a BlockReader class that owns a PacketReader class, the packet 
> reader is analogous to the ReadPacketContinuation that the BlockReader builds 
> now.  The difference is that none of this will be stitched together at 
> runtime using continuations, and once we have a block reader with a member 
> packet reader that gets allocated up front.  The PacketReader can be recycled 
> in order to avoid allocations.  The block reader is only responsible for 
> requesting block info, after that it keeps invoking the PacketReader until 
> enough data has been read.
> Async chaining:
> Move to a state machine based approach.  This allows the readers to be pinned 
> in memory, where each state is represented as a method.  The asynchronous IO 
> becomes the state transitions.  A callback is supplied to the asio async call 
> that jumps to the next state upon completion of the IO operation.  Epsilon 
> transitions will be fairly rare, but if we need them to temporarily drop a 
> lock as is done in the RPC code io_service::post can be used rather than a 
> call that actually does IO.
> I'm fairly confident in this approach since I used the same to implement 
> various hardware async bus interfaces in VHDL to good effect i.e. high 
> performance and easy to understand.  An asio callback is roughly analogous to 
> a signal in a sensitivity list as the methods are to process blocks.
> Example state machine that would send some stuff, then wait to get something 
> back like what the current BlockReader::AsyncRequestBlock does using the 
> approach described above.
> {code}
> class ExampleHandshake {
>   // class would own any small buffers so they can be directly accessed
>  public:
>   void SendHandshake();
>  private:
>   void OnHandshakeSend();
>   void OnHandShakeDone();
>   asio::io_service service_;
>   asio::ip::tcp::socket socket_;
> }
> void ExampleHandshake::SendHandshake() {
>   // trampoline to jump into read state once write completes
>   auto trampoline[this](asio::error_code ec, size_t sz) {
>     //error checking here
>    this->OnHandshakeSend();
>   };
>   asio::write(service_, socket_, asio buffer of data here, trampoline);
> }
> void ExampleHandshake::OnHandshakeSend() {
>   // when read completes bounce into handler
>   auto trampoline = [this](asio::error_code ec, size_t sz) {
>     this->OnHandshakeDone();
>   };
>   asio::read(service_, socket_, asio buffer for received data, trampoline);
> }
> void ExampleHandshake::OnHandshakeDone() {
>   //just finished sending request, and receiving response, go do something
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11266) libhdfs++: Redesign block reader with with simplicity and resource management in mind

Reply via email to