[ https://issues.apache.org/jira/browse/HDFS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Colin Patrick McCabe updated HDFS-8070: --------------------------------------- Priority: Blocker (was: Major) Affects Version/s: (was: 2.8.0) 2.7.0 > Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode > --------------------------------------------------------------------------- > > Key: HDFS-8070 > URL: https://issues.apache.org/jira/browse/HDFS-8070 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching > Affects Versions: 2.7.0 > Reporter: Gopal V > Assignee: Colin Patrick McCabe > Priority: Blocker > Attachments: HDFS-8070.001.patch > > > HDFS ShortCircuitShm layer keeps the task locked up during multi-threaded > split-generation. > I hit this immediately after I upgraded the data, so I wonder if the > ShortCircuitShim wire protocol has trouble when 2.8.0 DN talks to a 2.7.0 > Client? > {code} > 2015-04-06 00:04:30,780 INFO [ORC_GET_SPLITS #3] orc.OrcInputFormat: ORC > pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,781 ERROR [ShortCircuitCache_SlotReleaser] > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to > release short-circuit shared memory slot Slot(slotIdx=2, > shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending > ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. > Closing shared memory segment. > java.io.IOException: ERROR_INVALID: there is no shared memory segment > registered with shmId a86ee34576d93c4964005d90b0d97c38 > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-04-06 00:04:30,781 INFO [ORC_GET_SPLITS #5] orc.OrcInputFormat: ORC > pushdown predicate: leaf-0 = (IS_NULL ss_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,781 WARN [ShortCircuitCache_SlotReleaser] > shortcircuit.DfsClientShmManager: EndpointShmManager(172.19.128.60:50010, > parent=ShortCircuitShmManager(5e763476)): error shutting down shm: got > IOException calling shutdown(SHUT_RDWR) > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57) > at > org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387) > at > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378) > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2015-04-06 00:04:30,783 INFO [ORC_GET_SPLITS #7] orc.OrcInputFormat: ORC > pushdown predicate: leaf-0 = (IS_NULL cs_sold_date_sk) > expr = (not leaf-0) > 2015-04-06 00:04:30,785 ERROR [ShortCircuitCache_SlotReleaser] > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x29e82045): failed to > release short-circuit shared memory slot Slot(slotIdx=4, > shm=DfsClientShm(a86ee34576d93c4964005d90b0d97c38)) by sending > ReleaseShortCircuitAccessRequestProto to /grid/0/cluster/hdfs/dn_socket. > Closing shared memory segment. > java.io.IOException: ERROR_INVALID: there is no shared memory segment > registered with shmId a86ee34576d93c4964005d90b0d97c38 > at > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Looks like a double free-fd condition? > {code} > 2015-04-02 18:58:47,653 [DataXceiver for client > unix:/grid/0/cluster/hdfs/dn_socket [Passing file descriptors for block > BP-942051088-172.18.0.41-1370508013893:blk_1076973408_1099515627985]] INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Unregistering > SlotId(3bd7fd9aed791e95acfb5034e6617d83:0) because the > requestShortCircuitFdsForRead operation failed. > 2015-04-02 18:58:47,653 [DataXceiver for client > unix:/grid/0/cluster/hdfs/dn_socket [Passing file descriptors for block > BP-942051088-<ip>-1370508013893:blk_1076973408_1099515627985]] INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 127.0.0.1, > dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_FDS, blockid: 1076973408, srvID: > ba7b6f19-47e0-4b86-af50-23981649318c, success: false > 2015-04-02 18:58:47,654 [DataXceiver for client > unix:/grid/0/cluster/hdfs/dn_socket [Passing file descriptors for block > BP-942051088-172.18.0.41-1370508013893:blk_1076973408_1099515627985]] ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > cn060-10.l42scl.hortonworks.com:50010:DataXceiver error processing > REQUEST_SHORT_CIRCUIT_FDS operation src: unix:/grid/0/cluster/hdfs/dn_socket > dst: <local> > java.io.EOFException > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitFds(DataXceiver.java:352) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitFds(Receiver.java:187) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:89) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) > at java.lang.Thread.run(Thread.java:745) > {code} > Investigating more, since the exact exception from the DataNode call is not > logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)