[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745093#comment-14745093
 ] 

Walter Su commented on HDFS-9040:
---------------------------------

2. If a streamer fails immediately after you add it to healthySet. Is the below 
code have endless wait()? Maybe we could recalculate healthySet? and a timeout 
wait? (Race condition between streamer and main-thread)
{code}
  private List<StripedDataStreamer> waitCreatingNewStreams(
      Set<StripedDataStreamer> healthyStreamers) throws IOException {
    final int expectedNum = healthyStreamers.size();
    synchronized (coordinator) {
      while (coordinator.updateStreamerMap.size() != expectedNum) {
        try {
          coordinator.wait();
{code}

3.again an issue about last stripe. (Race condition between streamer and 
main-thread). Once you trust a streamer is healthy, you wait endlessly, the 
streamer fails and betrays you. Maybe a timeout wait?
{code}
  private void allocateNewBlock() throws IOException {
    if (currentBlockGroup != null) {
      for (int i = 0; i < numAllBlocks; i++) {
        if (getStripedDataStreamer(i).isHealthy()) {
          // sync all the healthy streamers before writing to the new block
          final ExtendedBlock b = coordinator.takeEndBlock(i);
{code}

4.(Race condition between streamer and main-thread) You trust it's a healthy 
streamer. Then it fails immediately. You setExternalError. Does 
{{internalError}} get cleared by mistake?
{code}
  private Set<StripedDataStreamer> markExternalErrorOnStreamers() {
    Set<StripedDataStreamer> healthySet = new HashSet<>();
    for (StripedDataStreamer streamer : streamers) {
      if (streamer.isHealthy() &&
          streamer.getStage() == BlockConstructionStage.DATA_STREAMING) {
        streamer.setExternalError();
{code}

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040.00.patch, 
> HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and 
> StripedDataStreamer s only have to stream blocks to DNs.
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to