[ https://issues.apache.org/jira/browse/HDFS-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anatoli Shein updated HDFS-10785: --------------------------------- Attachment: HDFS-10785.HDFS-8707.005.patch Thank you for the review [~James C]. I have addressed it in my new patch (attached) as follows: 1) FsInfo::str takes a URI object. Can we have the str method just format the contents of the FsInfo struct? If a tool wants to prefix those contents with a URI it can do it in a more application specific way elsewhere. The main issue here is that because you include common/uri.h you're pulling implementation details into the public headers, so those headers alone wouldn't be enough to link against the library. (/) I removed the URI object from being passed to the str() method of FsInfo, however it still takes string fs_name in order to output the FsInfo object using the correct df format. Maybe we should consider adding another str method separately to output just the struct? 2) In the headers for FsInfo, StatInfo, and ContentSummary you include iomanip and sstream. You can push these includes down to the implementation since they only need std::string. content_summary.h also includes stdint.h, but c++11 should have all the (u)int<width>_t types built in, was there some other reason you needed to include this? Maybe older compiler with partial c++11 support? (/) Done. I moved iomanip and sstream includes to the implementation files, and removed the stdint.h include. 3) in hdfs_ls your handler takes an std::promise by reference. This gets into race condition issues described in HDFS-11556 which you might be less likely to hit if you're using a single worker thread. For now just wrap the promise in a shared_ptr and capture the shared_ptr by copy. That issue is amplified here (from same block of code): {code:language=c++} if (!has_more_results) { promise.set_value(); //set promise return false; //request stop sending results } return true; //request more results {code} If you have something blocked on that promise it's possible that it unblocks right away and you have a race for assigning values. It looks like you can get away with it here but I'd really try to avoid that by having the lambda take a bool&. Tracing if it's possible to hit that race condition for every usage is very time consuming and easy to get wrong. (i) Actually, it this situation I don't think the code has a race condition. We first pass the promise to the handler by reference, and block until it is set. The function that will use this handler (either GetListing or Find) guarantees that the handler will only be called once at a time (there is locking in place for that). The function also guarantees that the handler will be called only once with the exit status set (meaning has_more_results set to false), therefore it is impossible for the promise to be set twice or referenced when it was deallocated from stack. Therefore I think this code is safe, and adding a shared pointer would add unnecessary overhead. I did also test it with multiple threads and never got an issue. Other: This is just personal preference so I'm not going to push 1 way or the other, but IMHO sprintf can be a lot more concise than iostream+iomanip for places you're doing fixed width output. If you prefer iomanip that's cool too. (i) I chose iostream+iomanip because it is easily extensible and is typesafe, however it does add more code, so we can convert to sprintf later if needed. Some of the handlers for the recursive async calls may be better off as structs with an overloaded operator() (lambdas are just shorthand for that anyway). Lambdas are good for tiny chunks of code but can become confusing once there's a mix of local variables along with stuff being captured particularly when you have them inside main(). Doesn't need to be fixed in this patch since the other tools do it but if you can think of a nice way to encapsulate that functionality (at least pull if from main) it'd make it easier to new contributors (and reviewers) to understand and debug. (i) Yes, we will have to think of a good way to do this. The recursive functions like chmod/chgrp/chown/du could be simplified a lot by generalizing FileSystem::Find into a file tree walk (http://man7.org/linux/man-pages/man3/ftw.3.html) and just plugging in a lambda that did the modification to change permissions or whatever else. Find does most of this work already but it seems like things that use it have a lot of glue code in common. (i) We can try and do this, but it would significantly increase the complexity of Find, since it allows wild cards in both path and name, while other methods just recurse once. Another thing that's not specific to your patch but might be worth considering in the future is when you have long function or method signatures it's hard to tell what bool flags are doing; especially once that are just at the end to toggle something like printing extra info. A 2 value enum e.g. {INCLUDE_QUOTA, OMIT_QUOTA} in the case of ContentSummary::str(<flag_type>) makes this more clear at the call site and machine code the compiler emits will generally look the same. (i) Normally I think it makes sense to use enums when we have more than two possible values of the variable in order to keep things simple, but we can always change this. For the time being I just renamed the boolean 'quota' to 'include_quota' so that it is more intuitive. > libhdfs++: Implement the rest of the tools > ------------------------------------------ > > Key: HDFS-10785 > URL: https://issues.apache.org/jira/browse/HDFS-10785 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Reporter: Anatoli Shein > Assignee: Anatoli Shein > Attachments: HDFS-10785.HDFS-8707.000.patch, > HDFS-10785.HDFS-8707.001.patch, HDFS-10785.HDFS-8707.002.patch, > HDFS-10785.HDFS-8707.003.patch, HDFS-10785.HDFS-8707.004.patch, > HDFS-10785.HDFS-8707.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org