[ 
https://issues.apache.org/jira/browse/HDFS-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299515#comment-16299515
 ] 

Xiao Chen commented on HDFS-12528:
----------------------------------

Thanks all for the discussions, and [~jzhuge] for a test patch.

Hi John,
I tried to run {{TestBlockReaderFactory}} with your patch on my mac, with 
freshly built native libs. Got:
{noformat}
java.net.SocketException: error computing UNIX domain socket path: path too 
long.  The longest UNIX domain socket path possible on this host is 103 bytes.

        at org.apache.hadoop.net.unix.DomainSocket.bind0(Native Method)
        at 
org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:211)
        at 
org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:1189)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:1156)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1401)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:497)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2778)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2681)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1643)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:885)
        at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:497)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:456)
        at 
org.apache.hadoop.hdfs.client.impl.TestBlockReaderFactory.testShortCircuitCacheTemporaryFailure(TestBlockReaderFactory.java:241)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}
Searched around and found HDFS-3296, but even with the patches there I still 
run into this error. Any ideas how to get over it?

Regarding the behavior, not sure who's the SCR expert we should ping. I'm 
inclined to agree with [~cheersyang] that we can make the disable behavior 
configurable, and not changing the default behavior. (Don't think there is a 
good way to differentiate between the Meta file not found IOE and other IOEs.) 
We can also make the 10 minute disable interval configurable as Andre suggested 
on the initial report.

> Short-circuit reads unnecessarily disabled for a long time
> ----------------------------------------------------------
>
>                 Key: HDFS-12528
>                 URL: https://issues.apache.org/jira/browse/HDFS-12528
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, performance
>    Affects Versions: 2.6.0
>            Reporter: Andre Araujo
>            Assignee: John Zhuge
>         Attachments: HDFS-12528.000.patch
>
>
> We have scenarios where data ingestion makes use of the -appendToFile 
> operation to add new data to existing HDFS files. In these situations, we're 
> frequently running into the problem described below.
> We're using Impala to query the HDFS data with short-circuit reads (SCR) 
> enabled. After each file read, Impala "unbuffer"'s the HDFS file to reduce 
> the memory footprint. In some cases, though, Impala still keeps the HDFS file 
> handle open for reuse.
> The "unbuffer" call, however, causes the file's current block reader to be 
> closed, which makes the associated ShortCircuitReplica evictable from the 
> ShortCircuitCache. When the cluster is under load, this means that the 
> ShortCircuitReplica can be purged off the cache pretty fast, which closes the 
> file descriptor to the underlying storage file.
> That means that when Impala re-reads the file it has to re-open the storage 
> files associated with the ShortCircuitReplica's that were evicted from the 
> cache. If there were no appends to those blocks, the re-open will succeed 
> without problems. If one block was appended since the ShortCircuitReplica was 
> created, the re-open will fail with the following error:
> {code}
> Meta file for BP-810388474-172.31.113.69-1499543341726:blk_1074012183_273087 
> not found
> {code}
> This error is handled as an "unknown response" by the BlockReaderFactory [1], 
> which disables short-circuit reads for 10 minutes [2] for the client.
> These 10 minutes without SCR can have a big performance impact for the client 
> operations. In this particular case ("Meta file not found") it would suffice 
> to return null without disabling SCR. This particular block read would fall 
> back to the normal, non-short-circuited, path and other SCR requests would 
> continue to work as expected.
> It might also be interesting to be able to control how long SCR is disabled 
> for in the "unknown response" case. 10 minutes seems a bit to long and not 
> being able to change that is a problem.
> [1] 
> https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java#L646
> [2] 
> https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DomainSocketFactory.java#L97



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to