Mostafa Mokhtar created IMPALA-6364:
---------------------------------------

             Summary: Lock contention in FileHandleCache results in >2x 
slowdown for remote HDFS reads
                 Key: IMPALA-6364
                 URL: https://issues.apache.org/jira/browse/IMPALA-6364
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 2.10.0, Impala 2.11.0
            Reporter: Mostafa Mokhtar
            Assignee: Joe McDonnell
            Priority: Blocker
         Attachments: d2402_cdh5.12_profile.txt, d2402_cdh5.13_profile.txt, 
remote_hdfs_scan_pstack.txt

IMPALA-4623 introduced a locking schema to the file handle cache which has 16 
buckets, this  results in lock contention between IO threads which limits 
system throughput. 

Most IO threads end-up in one of these stacks.

{code}
#0  0x0000000002085d47 in base::internal::SpinLockDelay(int volatile*, int, 
int) ()
#1  0x0000000002085c29 in base::SpinLock::SlowLock() ()
#2  0x00000000010fa76d in 
impala::io::FileHandleCache<16ul>::GetFileHandle(hdfs_internal* const&, 
std::string*, long, bool, bool*) ()
#3  0x00000000010f6e22 in 
impala::io::DiskIoMgr::GetCachedHdfsFileHandle(hdfs_internal* const&, 
std::string*, long, impala::io::RequestContext*, bool) ()
#4  0x00000000010fd514 in impala::io::ScanRange::Open(bool) ()
#5  0x00000000010f691f in 
impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
impala::io::RequestContext*, impala::io::ScanRange*) ()
#6  0x00000000010f6dc4 in 
impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
#7  0x0000000000d13333 in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
#8  0x0000000000d13a74 in boost::detail::thread_data<boost::_bi::bind_t<void, 
void (*)(std::string const&, std::string const&, boost::function<void ()>, 
impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
boost::_bi::value<impala::Promise<long>*> > > >::run() ()
#9  0x000000000128ea3a in thread_proxy ()
#10 0x00007f49f2bbadc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f49f28e976d in clone () from /lib64/libc.so.6
{code}

{code}
#0  0x0000000002085d47 in base::internal::SpinLockDelay(int volatile*, int, 
int) ()
#1  0x0000000002085c29 in base::SpinLock::SlowLock() ()
#2  0x00000000010f9929 in 
impala::io::FileHandleCache<16ul>::ReleaseFileHandle(std::string*, 
impala::io::HdfsFileHandle*, bool) ()
#3  0x00000000010fe69e in impala::io::ScanRange::Close() ()
#4  0x00000000010f6565 in 
impala::io::DiskIoMgr::HandleReadFinished(impala::io::DiskIoMgr::DiskQueue*, 
impala::io::RequestContext*, std::unique_ptr<impala::io::BufferDescriptor, 
std::default_delete<impala::io::BufferDescriptor> >) ()
#5  0x00000000010f695b in 
impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
impala::io::RequestContext*, impala::io::ScanRange*) ()
#6  0x00000000010f6dc4 in 
impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
#7  0x0000000000d13333 in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
#8  0x0000000000d13a74 in boost::detail::thread_data<boost::_bi::bind_t<void, 
void (*)(std::string const&, std::string const&, boost::function<void ()>, 
impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
boost::_bi::value<impala::Promise<long>*> > > >::run() ()
#9  0x000000000128ea3a in thread_proxy ()
#10 0x00007f49f2bbadc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f49f28e976d in clone () from /lib64/libc.so.6
{code}

Increasing the number of partitions to 256 made the contention go away, a 
simple fix would be to make the number of partitions a startup flag and change 
it to 256. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to