[ https://issues.apache.org/jira/browse/HDFS-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344536#comment-16344536 ]
Weiwei Yang commented on HDFS-12528: ------------------------------------ Revised the Jira title a bit to describe the fix better. > Add an option to not disable short-circuit reads on failures > ------------------------------------------------------------ > > Key: HDFS-12528 > URL: https://issues.apache.org/jira/browse/HDFS-12528 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, performance > Affects Versions: 2.6.0 > Reporter: Andre Araujo > Assignee: Xiao Chen > Priority: Major > Attachments: HDFS-12528.000.patch, HDFS-12528.01.patch, > HDFS-12528.02.patch, HDFS-12528.03.patch, HDFS-12528.04.patch, > HDFS-12528.05.patch > > > We have scenarios where data ingestion makes use of the -appendToFile > operation to add new data to existing HDFS files. In these situations, we're > frequently running into the problem described below. > We're using Impala to query the HDFS data with short-circuit reads (SCR) > enabled. After each file read, Impala "unbuffer"'s the HDFS file to reduce > the memory footprint. In some cases, though, Impala still keeps the HDFS file > handle open for reuse. > The "unbuffer" call, however, causes the file's current block reader to be > closed, which makes the associated ShortCircuitReplica evictable from the > ShortCircuitCache. When the cluster is under load, this means that the > ShortCircuitReplica can be purged off the cache pretty fast, which closes the > file descriptor to the underlying storage file. > That means that when Impala re-reads the file it has to re-open the storage > files associated with the ShortCircuitReplica's that were evicted from the > cache. If there were no appends to those blocks, the re-open will succeed > without problems. If one block was appended since the ShortCircuitReplica was > created, the re-open will fail with the following error: > {code} > Meta file for BP-810388474-172.31.113.69-1499543341726:blk_1074012183_273087 > not found > {code} > This error is handled as an "unknown response" by the BlockReaderFactory [1], > which disables short-circuit reads for 10 minutes [2] for the client. > These 10 minutes without SCR can have a big performance impact for the client > operations. In this particular case ("Meta file not found") it would suffice > to return null without disabling SCR. This particular block read would fall > back to the normal, non-short-circuited, path and other SCR requests would > continue to work as expected. > It might also be interesting to be able to control how long SCR is disabled > for in the "unknown response" case. 10 minutes seems a bit to long and not > being able to change that is a problem. > [1] > https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java#L646 > [2] > https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DomainSocketFactory.java#L97 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org