[jira] [Commented] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool

Anatoli Shein (JIRA) Tue, 16 Aug 2016 08:36:00 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422897#comment-15422897
 ]


Anatoli Shein commented on HDFS-10679:
--------------------------------------

Thanks for the review, [~James C].

I addressed your comments as follows:
1) Using this many arguments for a large function makes it really hard to 
distinguish what are local variables and what was passed in. Could you bundle 
these up into a struct/class that represents the state? That way any time a 
developer sees some_arg_struct->lock they can infer that it was passed in as an 
argument. The other benefit that this gives is later on, when you do lambda 
capture by [=] you could just explicitly bind to that struct type for the 
recursion. This code tends to be very dense so being explicit with capture 
lists and type names e.g. avoiding "auto" in non-trivial statements can do a 
lot to improve maintainability.
(/) I changed it to populate a struct with the state and pass it to the shim.
(/) I updated lambda captures to be explicit.
(/) I replaced "auto" with explicit types in non-trivial statements.

2) You have a lot of good comments about the control flow. Could you add a few 
about higher level design?
(/) I added a big high-level comment just above the FindShim.

(/) Also, I discovered that there was a race condition on the user side in both 
Find and GetListing:
{code}
    if (!s.ok() || !has_more_results) {
      callstate->set_value(s);
      return false;
    }
{code}
Sometimes the above code failed when "return false" did not happen immediately 
after the "callstate->set_value(s)" and the function exited first. I added 
locks to fix this problem in both Find and GetListing.


> libhdfs++: Implement parallel find with wildcards tool
> ------------------------------------------------------
>
>                 Key: HDFS-10679
>                 URL: https://issues.apache.org/jira/browse/HDFS-10679
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Anatoli Shein
>            Assignee: Anatoli Shein
>         Attachments: HDFS-10679.HDFS-8707.000.patch, 
> HDFS-10679.HDFS-8707.001.patch, HDFS-10679.HDFS-8707.002.patch, 
> HDFS-10679.HDFS-8707.003.patch, HDFS-10679.HDFS-8707.004.patch, 
> HDFS-10679.HDFS-8707.005.patch, HDFS-10679.HDFS-8707.006.patch, 
> HDFS-10679.HDFS-8707.007.patch, HDFS-10679.HDFS-8707.008.patch, 
> HDFS-10679.HDFS-8707.009.patch
>
>
> The find tool will issue the GetListing namenode operation on a given 
> directory, and filter the results using posix globbing library.
> If the recursive option is selected, for each returned entry that is a 
> directory the tool will issue another asynchronous call GetListing and repeat 
> the result processing in a recursive fashion.
> One implementation issue that needs to be addressed is the way how results 
> are returned back to the user: we can either buffer the results and return 
> them to the user in bulk, or we can return results continuously as they 
> arrive. While buffering would be an easier solution, returning results as 
> they arrive would be more beneficial to the user in terms of performance, 
> since the result processing can start as soon as the first results arrive 
> without any delay. In order to do that we need the user to use a loop to 
> process arriving results, and we need to send a special message back to the 
> user when the search is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool

Reply via email to