[ https://issues.apache.org/jira/browse/HDFS-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665538#comment-16665538 ]
Hadoop QA commented on HDFS-11266: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-11266 does not apply to HDFS-8707. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11266 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844303/HDFS-11266.HDFS-8707.000.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25376/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > libhdfs++: Redesign block reader with with simplicity and resource management > in mind > ------------------------------------------------------------------------------------- > > Key: HDFS-11266 > URL: https://issues.apache.org/jira/browse/HDFS-11266 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Reporter: James Clampffer > Assignee: James Clampffer > Priority: Major > Attachments: HDFS-11266.HDFS-8707.000.patch > > > The goal here is to significantly simplify the block reader and make it much > harder to introduce issues. There are plenty of examples of these issues in > the subtasks of HDFS-8707, the one that finally motivated a reimplementation > is HDFS-10931. > Goals: > -The read side protocol of the data transfer pipeline is fundamentally really > simple (even if done asynchronously). The code should be equally simple. > -Get the code in a state that should be easy enough to reason about with a > solid understanding of HDFS and basic understanding of C++ and vice versa: > improve comments and avoid using esoteric C++ constructs. This is a > must-have in order to lower the bar to contribute. > -Get rid of dependencies on the existing continuation stuff. Myself and > others have spent far too much time debugging both the continuation code and > bugs introduced because the continuation code was hard to reason about. > Notable issues: > -It's cool from a theoretical perspective, but after 18 months of working > on this it's still unclear what problem the continuation pattern helped solve > that callbacks couldn't. > -They spend more time allocating memory than the rest of the code does > doing real work - seriously, profile it. This can't be fixed because the > Pipeline takes ownership of all Continuation objects and then deletes them. > -The way the block reader really uses them is a hybrid of a state machine, > continuations, and directly using asio callbacks to bounce between the two. > Proposed approach: > Still have a BlockReader class that owns a PacketReader class, the packet > reader is analogous to the ReadPacketContinuation that the BlockReader builds > now. The difference is that none of this will be stitched together at > runtime using continuations, and once we have a block reader with a member > packet reader that gets allocated up front. The PacketReader can be recycled > in order to avoid allocations. The block reader is only responsible for > requesting block info, after that it keeps invoking the PacketReader until > enough data has been read. > Async chaining: > Move to a state machine based approach. This allows the readers to be pinned > in memory, where each state is represented as a method. The asynchronous IO > becomes the state transitions. A callback is supplied to the asio async call > that jumps to the next state upon completion of the IO operation. Epsilon > transitions will be fairly rare, but if we need them to temporarily drop a > lock as is done in the RPC code io_service::post can be used rather than a > call that actually does IO. > I'm fairly confident in this approach since I used the same to implement > various hardware async bus interfaces in VHDL to good effect i.e. high > performance and easy to understand. An asio callback is roughly analogous to > a signal in a sensitivity list as the methods are to process blocks. > Example state machine that would send some stuff, then wait to get something > back like what the current BlockReader::AsyncRequestBlock does using the > approach described above. > {code} > class ExampleHandshake { > // class would own any small buffers so they can be directly accessed > public: > void SendHandshake(); > private: > void OnHandshakeSend(); > void OnHandShakeDone(); > asio::io_service service_; > asio::ip::tcp::socket socket_; > } > void ExampleHandshake::SendHandshake() { > // trampoline to jump into read state once write completes > auto trampoline[this](asio::error_code ec, size_t sz) { > //error checking here > this->OnHandshakeSend(); > }; > asio::write(service_, socket_, asio buffer of data here, trampoline); > } > void ExampleHandshake::OnHandshakeSend() { > // when read completes bounce into handler > auto trampoline = [this](asio::error_code ec, size_t sz) { > this->OnHandshakeDone(); > }; > asio::read(service_, socket_, asio buffer for received data, trampoline); > } > void ExampleHandshake::OnHandshakeDone() { > //just finished sending request, and receiving response, go do something > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org