[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667607#comment-15667607 ]
Sean Mackrory commented on HDFS-10702: -------------------------------------- {quote}My concern is that if a significant portion of read requests follow this scenario (needs a fresher TxId), that will cause a high writeLock contention on SbNN.{quote} Yes this certainly isn't for every scenario. I view this as being useful for offloading some workloads from the active NameNode. I was hoping to get some precise measurements of how this performed relative to other HA proxy methods for various workloads by now - but I actually found a bug where RequestHedgingProxyProvider was broadcasting more traffic than it needed to with > 2 NameNodes, so I'll need to revisit that. {quote}In the case of multiple standbys, one is the checkpointer, thus you can consider allowing client to connect to standbys not doing checkpoint.{quote} That's a good idea - I'd certainly like to make the logic for deciding which NameNodes are in standby more robust. Perhaps this should be included in the 'SyncInfo' structure? {quote}After NN failover, does StaleReadProxyProvider#standbyProxies get refreshed? If not, a long running client could keep using the old standby.{quote} It does not. It will reevaluate which proxies to use in the event of a failure (specifically, a failure of the active NN when writing, or a failure of all standby NNs when reading). I had thought about that possibility and decided to ignore it for now. The worst that will happen is they won't be using the optimal NameNode and you lose the benefit of the optimization. I was fine with that since the very nature of this feature is accepting sub-optimal results within reasonable bounds. But we could possibly add in some ability to reevaluate after a certain time period or number of requests or something. {quote}I am interested in knowing more how the applications plan to use it, specifically when they will decide to call getSyncInfo. In multi tenant environment, an application might care about specific files/directories, not necessarily the namespace has changed at a global level.{quote} That's an interesting idea to explore and I think it fits with the use case I had in mind. I'm picturing cases where someone is going to be doing some (almost entirely) read-only analytics of a dataset that is known to be complete (or close enough). We can make the assumption that the metadata won't be changing, and either speed up our analysis or minimize the impact of our analysis on other workloads. In that case, I would think restricting the stale reads to a specific subtree is perfectly reasonable (if it helps - tailing the edit log was already implemented). I suppose this might be used by someone wanting to search the whole filesystem for something and is okay with approximating results. But I would think this is less common, and one could always set '/' as the subtree they're concerned with. > Add a Client API and Proxy Provider to enable stale read from Standby > --------------------------------------------------------------------- > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Jiayi Zhou > Assignee: Jiayi Zhou > Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org