[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

Zhe Zhang (JIRA) Mon, 21 Nov 2016 15:27:24 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685075#comment-15685075
 ]


Zhe Zhang commented on HDFS-10702:
----------------------------------

Thanks for the discussion Ming, Sean, Andrew.

bq. Refreshing the metadata for a table or partition is a very RPC heavy 
operation. This is typically done when some new data has been written to HDFS. 
So, an ingest application would write the data, call getSyncInfo, then refresh 
metadata using the txid from getSyncInfo.
Agreed that in this use case, the designed approach should work. But is do 
Hive/Impala usually have several seconds of delay before ingestion and 
querying? Actually a more common use case for us is where data ingestion and 
consumption belong to different apps. I guess in that use case, the ingestion 
app should send the txID to the consumer?

bq. For apps that do not cache input streams, they can call getSyncInfo at job 
submission time, then pass this to the job's tasks. Since a couple seconds 
typically passes between submission and execution, we should be able to offload 
a lot from the SbNN.
This is also a good use case. _Acquiring syncInfo_ will become a standard 
operation for a job startup (done by workflow managers like Oozie or Azkaban), 
similar to acquiring delegation token from NN.

> Add a Client API and Proxy Provider to enable stale read from Standby
> ---------------------------------------------------------------------
>
>                 Key: HDFS-10702
>                 URL: https://issues.apache.org/jira/browse/HDFS-10702
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jiayi Zhou
>            Assignee: Jiayi Zhou
>            Priority: Minor
>         Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

Reply via email to