[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685075#comment-15685075 ]
Zhe Zhang commented on HDFS-10702: ---------------------------------- Thanks for the discussion Ming, Sean, Andrew. bq. Refreshing the metadata for a table or partition is a very RPC heavy operation. This is typically done when some new data has been written to HDFS. So, an ingest application would write the data, call getSyncInfo, then refresh metadata using the txid from getSyncInfo. Agreed that in this use case, the designed approach should work. But is do Hive/Impala usually have several seconds of delay before ingestion and querying? Actually a more common use case for us is where data ingestion and consumption belong to different apps. I guess in that use case, the ingestion app should send the txID to the consumer? bq. For apps that do not cache input streams, they can call getSyncInfo at job submission time, then pass this to the job's tasks. Since a couple seconds typically passes between submission and execution, we should be able to offload a lot from the SbNN. This is also a good use case. _Acquiring syncInfo_ will become a standard operation for a job startup (done by workflow managers like Oozie or Azkaban), similar to acquiring delegation token from NN. > Add a Client API and Proxy Provider to enable stale read from Standby > --------------------------------------------------------------------- > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Jiayi Zhou > Assignee: Jiayi Zhou > Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, > HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org