[ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745093#comment-14745093 ]
Walter Su commented on HDFS-9040: --------------------------------- 2. If a streamer fails immediately after you add it to healthySet. Is the below code have endless wait()? Maybe we could recalculate healthySet? and a timeout wait? (Race condition between streamer and main-thread) {code} private List<StripedDataStreamer> waitCreatingNewStreams( Set<StripedDataStreamer> healthyStreamers) throws IOException { final int expectedNum = healthyStreamers.size(); synchronized (coordinator) { while (coordinator.updateStreamerMap.size() != expectedNum) { try { coordinator.wait(); {code} 3.again an issue about last stripe. (Race condition between streamer and main-thread). Once you trust a streamer is healthy, you wait endlessly, the streamer fails and betrays you. Maybe a timeout wait? {code} private void allocateNewBlock() throws IOException { if (currentBlockGroup != null) { for (int i = 0; i < numAllBlocks; i++) { if (getStripedDataStreamer(i).isHealthy()) { // sync all the healthy streamers before writing to the new block final ExtendedBlock b = coordinator.takeEndBlock(i); {code} 4.(Race condition between streamer and main-thread) You trust it's a healthy streamer. Then it fails immediately. You setExternalError. Does {{internalError}} get cleared by mistake? {code} private Set<StripedDataStreamer> markExternalErrorOnStreamers() { Set<StripedDataStreamer> healthySet = new HashSet<>(); for (StripedDataStreamer streamer : streamers) { if (streamer.isHealthy() && streamer.getStage() == BlockConstructionStage.DATA_STREAMING) { streamer.setExternalError(); {code} > Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests > to Coordinator) > ------------------------------------------------------------------------------------------- > > Key: HDFS-9040 > URL: https://issues.apache.org/jira/browse/HDFS-9040 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Walter Su > Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040.00.patch, > HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch > > > The general idea is to simplify error handling logic. > Proposal 1: > A BlockGroupDataStreamer to communicate with NN to allocate/update block, and > StripedDataStreamer s only have to stream blocks to DNs. > Proposal 2: > See below the > [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388] > from [~jingzhao]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)