[ 
https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782500#comment-13782500
 ] 

Colin Patrick McCabe commented on HDFS-5276:
--------------------------------------------

bq. The counts from the threads, even though they are not running any more, 
should be included in stats count. Currently statistics object is passed from 
the client to the file system. This implementation may need incompatible 
changes.

There's nothing incompatible about it.  The objects used for thread-local 
storage are not the same object as the client is passing around.  My point is 
that, if you keep adding objects whenever a thread is created, you also have to 
get rid of them when the thread is destroyed.  Otherwise, you have a memory 
leak.

It would be really simple to come up with a patch that does thread-local 
counters.  I don't have time today, but maybe later this week.

bq. Controlling issues such as cache alignments, synchronization from JVM are 
also essential to avoid contentions. Since the information is simply 
unavailable to Java programs, in my personal opinions the problem might be 
better addressed in the JVM, or even lower abstraction levels.

The JVM has some problems, but this isn't one of them.  Accessing the same 
memory from many different threads at once is inherently slow on modern 
multicore CPUs because of cache coherency issues.  It's up to software 
designers to avoid this if they want the best performance.

> FileSystem.Statistics got performance issue on multi-thread read/write.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-5276
>                 URL: https://issues.apache.org/jira/browse/HDFS-5276
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.4-alpha
>            Reporter: Chengxiang Li
>         Attachments: DisableFSReadWriteBytesStat.patch, 
> HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG
>
>
> FileSystem.Statistics is a singleton variable for each FS scheme, each 
> read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does 
> not perform well in multi-threads(let's say more than 30 threads). so it may 
> cause  serious performance issue. during our spark test profile, 32 threads 
> read data from HDFS, about 70% cpu time is spent on 
> FileSystem.Statistics.incrementBytesRead().



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to