Re: Updated 2.8.0-SNAPSHOT artifact
I would also prefer releasing current 2.8 branch sooner. There are several incomplete features in branch-2 such as YARN-914 and HDFS-7877 that are better served if we can complete them in the next major release. Letting them span across multiple releases might not be desirable as there could be some potential compatibility issues involved. Therefore if we recut 2.8 it means we have to work on those items before the new 2.8 is released which could cause major delay on the schedule. On Mon, Nov 7, 2016 at 10:37 AM, Sangjin Lee wrote: > +1. Resetting the 2.8 effort and the branch at this point may be > counter-productive. IMO we should focus on resolving the remaining blockers > and getting it out the door. I also think that we should seriously consider > 2.9 as well, as a fairly large number of changes have accumulated in > branch-2 (over branch-2.8). > > > Sangjin > > On Fri, Nov 4, 2016 at 3:38 PM, Jason Lowe > wrote: > > > At this point my preference would be to do the most expeditious thing to > > release 2.8, whether that's sticking with the branch-2.8 we have today or > > re-cutting it on branch-2. Doing a quick JIRA query, there's been almost > > 2,400 JIRAs resolved in 2.8.0 (1). For many of them, it's well-past time > > they saw a release vehicle. If re-cutting the branch means we have to > wrap > > up a few extra things that are still in-progress on branch-2 or add a few > > more blockers to the list before we release then I'd rather stay where > > we're at and ship it ASAP. > > > > Jason > > (1) https://issues.apache.org/jira/issues/?jql=project%20in% > > 20%28hadoop%2C%20yarn%2C%20mapreduce%2C%20hdfs%29% > > 20and%20resolution%20%3D%20Fixed%20and%20fixVersion%20%3D%202.8.0 > > > > > > > > > > > > On Tuesday, October 25, 2016 5:31 PM, Karthik Kambatla < > > ka...@cloudera.com> wrote: > > > > > > Is there value in releasing current branch-2.8? Aren't we better off > > re-cutting the branch off of branch-2? > > > > On Tue, Oct 25, 2016 at 12:20 AM, Akira Ajisaka < > > ajisa...@oss.nttdata.co.jp> > > wrote: > > > > > It's almost a year since branch-2.8 has cut. > > > I'm thinking we need to release 2.8.0 ASAP. > > > > > > According to the following list, there are 5 blocker and 6 critical > > issues. > > > https://issues.apache.org/jira/issues/?filter=12334985 > > > > > > Regards, > > > Akira > > > > > > > > > On 10/18/16 10:47, Brahma Reddy Battula wrote: > > > > > >> Hi Vinod, > > >> > > >> Any plan on first RC for branch-2.8 ? I think, it has been long time. > > >> > > >> > > >> > > >> > > >> --Brahma Reddy Battula > > >> > > >> -Original Message- > > >> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] > > >> Sent: 20 August 2016 00:56 > > >> To: Jonathan Eagles > > >> Cc: common-dev@hadoop.apache.org > > >> Subject: Re: Updated 2.8.0-SNAPSHOT artifact > > >> > > >> Jon, > > >> > > >> That is around the time when I branched 2.8, so I guess you were > getting > > >> SNAPSHOT artifacts till then from the branch-2 nightly builds. > > >> > > >> If you need it, we can set up SNAPSHOT builds. Or just wait for the > > first > > >> RC, which is around the corner. > > >> > > >> +Vinod > > >> > > >> On Jul 28, 2016, at 4:27 PM, Jonathan Eagles > wrote: > > >>> > > >>> Latest snapshot is uploaded in Nov 2015, but checkins are still > coming > > >>> in quite frequently. > > >>> https://repository.apache.org/content/repositories/ > snapshots/org/apach > > >>> e/hadoop/hadoop-yarn-api/ > > >>> > > >>> Are there any plans to start producing updated SNAPSHOT artifacts for > > >>> current hadoop development lines? > > >>> > > >> > > >> > > >> - > > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > >> > > >> > > >> - > > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > >> > > >> > > > > > > - > > > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org > > > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org > > > > > > > > > > > > > > >
Re: [VOTE] Release Apache Hadoop 2.6.5 (RC0)
+1 Successfully compiled a standalone HDFS app using the 2.6.5 jars extracted from the release tar.gz. On Thu, Sep 29, 2016 at 10:33 AM, Chris Trezzo wrote: > +1 > > Thanks Sangjin! > > 1. Verified md5 checksums and signature on src, and release tar.gz. > 2. Built from source. > 3. Started up a pseudo distributed cluster. > 4. Successfully ran a PI job. > 5. Ran the balancer. > 6. Inspected UI for RM, NN, JobHistory. > > On Tue, Sep 27, 2016 at 4:11 PM, Lei Xu wrote: > > > +1 > > > > The steps I've done: > > > > * Downloaded release tar and source tar, verified MD5. > > * Run a HDFS cluster, and copy files between local filesystem and HDFS. > > > > > > On Tue, Sep 27, 2016 at 1:28 PM, Sangjin Lee wrote: > > > Hi folks, > > > > > > I have created a release candidate RC0 for the Apache Hadoop 2.6.5 > > release > > > (the next maintenance release in the 2.6.x release line). Below are the > > > details of this release candidate: > > > > > > The RC is available for validation at: > > > http://home.apache.org/~sjlee/hadoop-2.6.5-RC0/. > > > > > > The RC tag in git is release-2.6.5-RC0 and its git commit is > > > 6939fc935fba5651fdb33386d88aeb8e875cf27a. > > > > > > The maven artifacts are staged via repository.apache.org at: > > > https://repository.apache.org/content/repositories/ > orgapachehadoop-1048/ > > . > > > > > > You can find my public key at > > > http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS. > > > > > > Please try the release and vote. The vote will run for the usual 5 > days. > > > Huge thanks to Chris Trezzo for spearheading the release management and > > > doing all the work! > > > > > > Thanks, > > > Sangjin > > > > > > > > -- > > Lei (Eddy) Xu > > Software Engineer, Cloudera > > > > - > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > > > >
[jira] [Created] (HADOOP-13029) Have FairCallQueue try all lower priority sub queues before backoff
Ming Ma created HADOOP-13029: Summary: Have FairCallQueue try all lower priority sub queues before backoff Key: HADOOP-13029 URL: https://issues.apache.org/jira/browse/HADOOP-13029 Project: Hadoop Common Issue Type: Sub-task Reporter: Ming Ma Currently if FairCallQueue and backoff are enabled, backoff will kick in as soon as the assigned sub queue is filled up. {noformat} /** * Put and offer follow the same pattern: * 1. Get the assigned priorityLevel from the call by scheduler * 2. Get the nth sub-queue matching this priorityLevel * 3. delegate the call to this sub-queue. * * But differ in how they handle overflow: * - Put will move on to the next queue until it lands on the last queue * - Offer does not attempt other queues on overflow */ {noformat} Seems it is better to try lower priority sub queues when the assigned sub queue is full, just like the case when backoff is disabled. This will give regular users more opportunities and allow the cluster to be configured with smaller call queue length. [~chrili], [~arpitagarwal], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Local repo sharing for maven builds
The increase of frequency might have been due to the refactor of hadoop-hdfs-client-*.jar out of the main hadoop-hdfs-*.jar. I don't have the oveall metrics of how often this happens when anyone changes protobuf. But based on HDFS-9004, 4 of 5 runs have this issue, which is a lot for any patch that changes APIs. This isn't limited to HDFS. There are cases YARN API changes causing MR unit tests to fail. So far, the work around I use is to keep resubmitting the build until it succeed. Another approach we can consider is to provide an option for the patch submitter to use its local repo when it submits the patch. In that way, the majority of patches can still use the shared local repo. On Fri, Sep 18, 2015 at 3:14 PM, Andrew Wang wrote: > Okay, some browsing of Jenkins docs [1] says that we could key the > maven.repo.local off of $EXECUTOR_NUMBER to do per-executor repos like > Bernd recommended, but that still requires some hook into test-patch.sh. > > Regarding install, I thought all we needed to install was > hadoop-maven-plugins, but we do more than that now in test-patch.sh. Not > sure if we can reduce that. > > [1] > > https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables > > On Fri, Sep 18, 2015 at 2:42 PM, Allen Wittenauer > wrote: > > > > > The collisions have been happening for about a year now. The frequency > > is increasing, but not enough to be particularly worrisome. (So I'm > > slightly amused that one blowing up is suddenly a major freakout.) > > > > Making changes to the configuration without knowing what one is doing is > > probably a bad idea. For example, if people are removing the shared > cache, > > I hope they're also prepared for the bitching that is going to go with > the > > extremely significant slow down caused by downloading the java prereqs > for > > building for every test... > > > > As far as Yetus goes, we've got a JIRA open to provide for per-instance > > caches when using the docker container code. I've got it in my head how I > > think we can do it, but just haven't had a chance to code it. So once > that > > gets written up + turning on containers should make the problem go away > > without any significant impact on test time. Of course, that won't help > > the scheduled builds but those happen at an even smaller rate. > > > > > > On Sep 18, 2015, at 12:19 PM, Andrew Wang > > wrote: > > > > > Sangjin, you should have access to the precommit jobs if you log in > with > > > your Apache credentials, even as a branch committer. > > > > > > https://builds.apache.org/job/PreCommit-HDFS-Build/configure > > > > > > The actual maven invocation is managed by test-patch.sh though. > > > test-patch.sh has a MAVEN_ARGS which looks like what we want, but I > don't > > > think we can just set it before calling test-patch, since it'd get > > squashed > > > by setup_defaults. > > > > > > Allen/Chris/Yetus folks, any guidance here? > > > > > > Thanks, > > > Andrew > > > > > > On Fri, Sep 18, 2015 at 11:55 AM, wrote: > > > > > >> You can use one per build processor, that reduces concurrent updates > but > > >> still keeps the cache function. And then try to avoid using install. > > >> > > >> -- > > >> http://bernd.eckenfels.net > > >> > > >> -Original Message- > > >> From: Andrew Wang > > >> To: "common-dev@hadoop.apache.org" > > >> Cc: Andrew Bayer , Sangjin Lee < > > sj...@twitter.com>, > > >> Lei Xu , infrastruct...@apache.org > > >> Sent: Fr., 18 Sep. 2015 20:42 > > >> Subject: Re: Local repo sharing for maven builds > > >> > > >> I think each job should use a maven.repo.local within its workspace > like > > >> abayer said. This means lots of downloading, but it's isolated. > > >> > > >> If we care about download time, we could also bootstrap with a tarred > > >> .m2/repository after we've run a `mvn compile`, so before it installs > > the > > >> hadoop artifacts. > > >> > > >> On Fri, Sep 18, 2015 at 11:02 AM, Ming Ma > > > >> wrote: > > >> > > >>> +hadoop common dev. Any suggestions? > > >>> > > >>> > > >>> On Fri, Sep 18, 2015 at 10:41 AM, Andrew Bayer < > andrew.ba...@gmail.com > > >
Re: Local repo sharing for maven builds
+hadoop common dev. Any suggestions? On Fri, Sep 18, 2015 at 10:41 AM, Andrew Bayer wrote: > You can change your maven call to use a different repository - I believe > you do that with -Dmaven.repository.local=path/to/repo > On Sep 18, 2015 19:39, "Ming Ma" wrote: > >> Hi, >> >> We are seeing some strange behaviors in HDFS precommit build. It seems >> like it is caused by the local repo on the same machine being used by >> different concurrent jobs which can cause issues. >> >> In HDFS, the build and test of "hadoop-hdfs-project/hdfs" depend on >> "hadoop-hdfs-project/hdfs-client"'s hadoop-hdfs-client-3.0.0- >> SNAPSHOT.jar. HDFS-9004 adds some new method to >> hadoop-hdfs-client-3.0.0-SNAPSHOT.jar. >> In the precommit build for HDFS-9004, unit tests for >> "hadoop-hdfs-project/hdfs" >> complain the method isn't defined >> https://builds.apache.org/job/PreCommit-HDFS-Build/12522/testReport/. >> Interestingly sometimes it just works fine >> https://builds.apache.org/job/PreCommit-HDFS-Build/12507/testReport/. >> >> So we are suspecting that there is another job running at the same time >> that published different version of hadoop-hdfs-client-3.0.0-SNAPSHOT.jar >> which doesn't have the new methods defined to the local repo which is >> shared by all jobs on that machine. >> >> If the above analysis is correct, what is the best way to fix the issue >> so that different jobs can use their own maven local repo for build and >> test? >> >> Thanks. >> >> Ming >> >
[jira] [Created] (HADOOP-11916) TestStringUtils#testLowerAndUpperStrings failed on MAC due to a JVM bug
Ming Ma created HADOOP-11916: Summary: TestStringUtils#testLowerAndUpperStrings failed on MAC due to a JVM bug Key: HADOOP-11916 URL: https://issues.apache.org/jira/browse/HADOOP-11916 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma Priority: Minor The test fails with the belong exception. It turns out there is a JVM bug for MAC, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8047340. {noformat} testLowerAndUpperStrings(org.apache.hadoop.util.TestStringUtils) Time elapsed: 0.205 sec <<< ERROR! java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. at java.lang.UNIXProcess$1.run(UNIXProcess.java:104) at java.lang.UNIXProcess$1.run(UNIXProcess.java:93) at java.security.AccessController.doPrivileged(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:91) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) at org.apache.hadoop.util.Shell.runCommand(Shell.java:486) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:391) at org.apache.hadoop.util.Shell.(Shell.java:381) at org.apache.hadoop.util.StringUtils.(StringUtils.java:80) at org.apache.hadoop.util.TestStringUtils.testLowerAndUpperStrings(TestStringUtils.java:432) {noformat} Perhaps we can disable this test case on MAC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11305) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used
Ming Ma created HADOOP-11305: Summary: RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used Key: HADOOP-11305 URL: https://issues.apache.org/jira/browse/HADOOP-11305 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma This might be a known issue. Given FileSystemRMStateStore isn't used for HA scenario, it might not be that important, unless there is something we need to fix at RM layer to make it more tolerant to RMStore issue. When RM was hard shutdown, OS might not get a chance to persist blocks. Some of the stored application data end up with size zero after reboot. And RM didn't like that. {noformat} ls -al /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351 total 156 drwxr-xr-x.2 x y 4096 Nov 13 16:45 . drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 .. -rw-r--r--.1 x y 0 Nov 13 16:45 appattempt_1412702189634_324351_01 -rw-r--r--.1 x y 0 Nov 13 16:45 .appattempt_1412702189634_324351_01.crc -rw-r--r--.1 x y 0 Nov 13 16:45 application_1412702189634_324351 -rw-r--r--.1 x y 0 Nov 13 16:45 .application_1412702189634_324351.crc {noformat} When RM starts up {noformat} 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem opening checksum file: file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351. Ignoring exception: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501) ... 2014-11-13 17:40:48,876 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to load/recover state java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11295) RPC Reader thread can't be shutdowned if RPCCallQueue is full
Ming Ma created HADOOP-11295: Summary: RPC Reader thread can't be shutdowned if RPCCallQueue is full Key: HADOOP-11295 URL: https://issues.apache.org/jira/browse/HADOOP-11295 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma You RPC server is asked to stop when RPCCallQueue is full, {{reader.join()}} will just wait there. That is because 1. The reader thread is blocked on {{callQueue.put(call);}}. 2. When RPC server is asked to stop, it will interrupt all handler threads and thus no threads will drain the callQueue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11000) HAServiceProtocol's health state is incorrectly transitioned to SERVICE_NOT_RESPONDING
Ming Ma created HADOOP-11000: Summary: HAServiceProtocol's health state is incorrectly transitioned to SERVICE_NOT_RESPONDING Key: HADOOP-11000 URL: https://issues.apache.org/jira/browse/HADOOP-11000 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma When HAServiceProtocol.monitorHealth throws a HealthCheckFailedException, the actual exception from protocol buffer RPC is a RemoteException that wraps the real exception. Thus the state is incorrectly transitioned to SERVICE_NOT_RESPONDING {noformat} HealthMonitor.java doHealthChecks try { status = proxy.getServiceStatus(); proxy.monitorHealth(); healthy = true; } catch (HealthCheckFailedException e) { . enterState(State.SERVICE_UNHEALTHY); } catch (Throwable t) { . enterState(State.SERVICE_NOT_RESPONDING); . } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10673) Update rpc metrics when the call throw an exception
Ming Ma created HADOOP-10673: Summary: Update rpc metrics when the call throw an exception Key: HADOOP-10673 URL: https://issues.apache.org/jira/browse/HADOOP-10673 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Currently RPC metrics isn't updated when the call throws an exception. We can either update the existing metrics or have a new set of metrics in the case of exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10598) Support configurable RPC fair share
Ming Ma created HADOOP-10598: Summary: Support configurable RPC fair share Key: HADOOP-10598 URL: https://issues.apache.org/jira/browse/HADOOP-10598 Project: Hadoop Common Issue Type: Sub-task Reporter: Ming Ma It will be useful if we can support RPC min fair share on a per user or group basis. That will be useful for SLA jobs in a shared cluster. It will be complementary to the history-based soft policy defined in fair queue's history RPC server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load
Ming Ma created HADOOP-10597: Summary: Evaluate if we can have RPC client back off when server is under heavy load Key: HADOOP-10597 URL: https://issues.apache.org/jira/browse/HADOOP-10597 Project: Hadoop Common Issue Type: Sub-task Reporter: Ming Ma Currently if an application hits NN too hard, RPC requests be in blocking state, assuming OS connection doesn't run out. Alternatively RPC or NN can throw some well defined exception back to the client based on certain policies when it is under heavy load; client will understand such exception and do exponential back off, as another implementation of RetryInvocationHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10599) Support prioritization of DN RPCs over client RPCs
Ming Ma created HADOOP-10599: Summary: Support prioritization of DN RPCs over client RPCs Key: HADOOP-10599 URL: https://issues.apache.org/jira/browse/HADOOP-10599 Project: Hadoop Common Issue Type: Sub-task Reporter: Ming Ma We might need to prioritize DN RPC over client RPC so that no matter what application do to NN RPC and FSNamesystem's global lock, DN's requests will be processed timely. After a cluster is configured to have service RPC server separated from client RPC server, it is mitigated to some degree with fair FSNamesystem's global lock. Also if the NN global lock can be made more fine grained; such need becomes less important. Still, it will be good to evaluate if this is a good option. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10157) move doRead method from IPC Listener class to IPC Reader class
Ming Ma created HADOOP-10157: Summary: move doRead method from IPC Listener class to IPC Reader class Key: HADOOP-10157 URL: https://issues.apache.org/jira/browse/HADOOP-10157 Project: Hadoop Common Issue Type: Bug Components: ipc Reporter: Ming Ma Priority: Minor Current doRead method belongs to Listener class. Semantically it is better to move doRead method from Listener class to Reader class. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HADOOP-10125) no need to process RPC request if the client connection has been dropped
Ming Ma created HADOOP-10125: Summary: no need to process RPC request if the client connection has been dropped Key: HADOOP-10125 URL: https://issues.apache.org/jira/browse/HADOOP-10125 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma If the client has dropped the connection before the RPC is processed, RPC server doesn't need to process the RPC call. We have encountered issues where bad applications can bring down the NN. https://issues.apache.org/jira/i#browse/Hadoop-9640 tries to address that. When this occurs, NN's RPC queues are filled up with client requests and DN requests, sometimes we want to stop the flooding by stopping the bad applications and/or DNs. Some RPC processing like DatanodeProtocol::blockReport could take couple hundred milliseconds. So it is worthwhile to have NN skip the RPC calls if DNs have been stopped. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10106) Incorrect thread name RPC log message
Ming Ma created HADOOP-10106: Summary: Incorrect thread name RPC log message Key: HADOOP-10106 URL: https://issues.apache.org/jira/browse/HADOOP-10106 Project: Hadoop Common Issue Type: Bug Reporter: Ming Ma Priority: Minor INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: readAndProcess from client 10.115.201.46 threw exception org.apache.hadoop.ipc.RpcServerException: Unknown out of band call #-2147483647 This is thrown by a reader thread, so the message should be like INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client 10.115.201.46 threw exception org.apache.hadoop.ipc.RpcServerException: Unknown out of band call #-2147483647 Another example is Responder.processResponse, which can also be called by handler thread. When that happend, the thread name should be the handler thread, not the responder thread. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-8706) Provide rate metrics based on counter value
Ming Ma created HADOOP-8706: --- Summary: Provide rate metrics based on counter value Key: HADOOP-8706 URL: https://issues.apache.org/jira/browse/HADOOP-8706 Project: Hadoop Common Issue Type: Improvement Components: metrics Reporter: Ming Ma In production clusters, it is more useful to have ops / sec instead of increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, its current type is MutableCounterLong and thus the value is increasing all the time. Quite often "num of getBlockLocations" per second is more interesting for analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and DataNodeMetrics are more useful if they are expressed in terms of ops / sec. I looked at all the metrics objects provided in metrics 2.0, couldn't find such type. FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira