Re: Updated 2.8.0-SNAPSHOT artifact

2016-11-09 Thread Ming Ma
I would also prefer releasing current 2.8 branch sooner. There are several
incomplete features in branch-2 such as YARN-914 and HDFS-7877 that are
better served if we can complete them in the next major release. Letting
them span across multiple releases might not be desirable as there could be
some potential compatibility issues involved. Therefore if we recut 2.8 it
means we have to work on those items before the new 2.8 is released which
could cause major delay on the schedule.

On Mon, Nov 7, 2016 at 10:37 AM, Sangjin Lee  wrote:

> +1. Resetting the 2.8 effort and the branch at this point may be
> counter-productive. IMO we should focus on resolving the remaining blockers
> and getting it out the door. I also think that we should seriously consider
> 2.9 as well, as a fairly large number of changes have accumulated in
> branch-2 (over branch-2.8).
>
>
> Sangjin
>
> On Fri, Nov 4, 2016 at 3:38 PM, Jason Lowe 
> wrote:
>
> > At this point my preference would be to do the most expeditious thing to
> > release 2.8, whether that's sticking with the branch-2.8 we have today or
> > re-cutting it on branch-2.  Doing a quick JIRA query, there's been almost
> > 2,400 JIRAs resolved in 2.8.0 (1).  For many of them, it's well-past time
> > they saw a release vehicle.  If re-cutting the branch means we have to
> wrap
> > up a few extra things that are still in-progress on branch-2 or add a few
> > more blockers to the list before we release then I'd rather stay where
> > we're at and ship it ASAP.
> >
> > Jason
> > (1) https://issues.apache.org/jira/issues/?jql=project%20in%
> > 20%28hadoop%2C%20yarn%2C%20mapreduce%2C%20hdfs%29%
> > 20and%20resolution%20%3D%20Fixed%20and%20fixVersion%20%3D%202.8.0
> >
> >
> >
> >
> >
> > On Tuesday, October 25, 2016 5:31 PM, Karthik Kambatla <
> > ka...@cloudera.com> wrote:
> >
> >
> >  Is there value in releasing current branch-2.8? Aren't we better off
> > re-cutting the branch off of branch-2?
> >
> > On Tue, Oct 25, 2016 at 12:20 AM, Akira Ajisaka <
> > ajisa...@oss.nttdata.co.jp>
> > wrote:
> >
> > > It's almost a year since branch-2.8 has cut.
> > > I'm thinking we need to release 2.8.0 ASAP.
> > >
> > > According to the following list, there are 5 blocker and 6 critical
> > issues.
> > > https://issues.apache.org/jira/issues/?filter=12334985
> > >
> > > Regards,
> > > Akira
> > >
> > >
> > > On 10/18/16 10:47, Brahma Reddy Battula wrote:
> > >
> > >> Hi Vinod,
> > >>
> > >> Any plan on first RC for branch-2.8 ? I think, it has been long time.
> > >>
> > >>
> > >>
> > >>
> > >> --Brahma Reddy Battula
> > >>
> > >> -Original Message-
> > >> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org]
> > >> Sent: 20 August 2016 00:56
> > >> To: Jonathan Eagles
> > >> Cc: common-dev@hadoop.apache.org
> > >> Subject: Re: Updated 2.8.0-SNAPSHOT artifact
> > >>
> > >> Jon,
> > >>
> > >> That is around the time when I branched 2.8, so I guess you were
> getting
> > >> SNAPSHOT artifacts till then from the branch-2 nightly builds.
> > >>
> > >> If you need it, we can set up SNAPSHOT builds. Or just wait for the
> > first
> > >> RC, which is around the corner.
> > >>
> > >> +Vinod
> > >>
> > >> On Jul 28, 2016, at 4:27 PM, Jonathan Eagles 
> wrote:
> > >>>
> > >>> Latest snapshot is uploaded in Nov 2015, but checkins are still
> coming
> > >>> in quite frequently.
> > >>> https://repository.apache.org/content/repositories/
> snapshots/org/apach
> > >>> e/hadoop/hadoop-yarn-api/
> > >>>
> > >>> Are there any plans to start producing updated SNAPSHOT artifacts for
> > >>> current hadoop development lines?
> > >>>
> > >>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >>
> > >>
> > >
> > > -
> > > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
> > >
> > >
> >
> >
> >
> >
>


Re: [VOTE] Release Apache Hadoop 2.6.5 (RC0)

2016-09-30 Thread Ming Ma
+1

Successfully compiled a standalone HDFS app using the 2.6.5 jars extracted
from the release tar.gz.

On Thu, Sep 29, 2016 at 10:33 AM, Chris Trezzo  wrote:

> +1
>
> Thanks Sangjin!
>
> 1. Verified md5 checksums and signature on src, and release tar.gz.
> 2. Built from source.
> 3. Started up a pseudo distributed cluster.
> 4. Successfully ran a PI job.
> 5. Ran the balancer.
> 6. Inspected UI for RM, NN, JobHistory.
>
> On Tue, Sep 27, 2016 at 4:11 PM, Lei Xu  wrote:
>
> > +1
> >
> > The steps I've done:
> >
> > * Downloaded release tar and source tar, verified MD5.
> > * Run a HDFS cluster, and copy files between local filesystem and HDFS.
> >
> >
> > On Tue, Sep 27, 2016 at 1:28 PM, Sangjin Lee  wrote:
> > > Hi folks,
> > >
> > > I have created a release candidate RC0 for the Apache Hadoop 2.6.5
> > release
> > > (the next maintenance release in the 2.6.x release line). Below are the
> > > details of this release candidate:
> > >
> > > The RC is available for validation at:
> > > http://home.apache.org/~sjlee/hadoop-2.6.5-RC0/.
> > >
> > > The RC tag in git is release-2.6.5-RC0 and its git commit is
> > > 6939fc935fba5651fdb33386d88aeb8e875cf27a.
> > >
> > > The maven artifacts are staged via repository.apache.org at:
> > > https://repository.apache.org/content/repositories/
> orgapachehadoop-1048/
> > .
> > >
> > > You can find my public key at
> > > http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS.
> > >
> > > Please try the release and vote. The vote will run for the usual 5
> days.
> > > Huge thanks to Chris Trezzo for spearheading the release management and
> > > doing all the work!
> > >
> > > Thanks,
> > > Sangjin
> >
> >
> >
> > --
> > Lei (Eddy) Xu
> > Software Engineer, Cloudera
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
>


[jira] [Created] (HADOOP-13029) Have FairCallQueue try all lower priority sub queues before backoff

2016-04-15 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-13029:


 Summary: Have FairCallQueue try all lower priority sub queues 
before backoff
 Key: HADOOP-13029
 URL: https://issues.apache.org/jira/browse/HADOOP-13029
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma


Currently if FairCallQueue and backoff are enabled, backoff will kick in as 
soon as the assigned sub queue is filled up.

{noformat}
  /**
   * Put and offer follow the same pattern:
   * 1. Get the assigned priorityLevel from the call by scheduler
   * 2. Get the nth sub-queue matching this priorityLevel
   * 3. delegate the call to this sub-queue.
   *
   * But differ in how they handle overflow:
   * - Put will move on to the next queue until it lands on the last queue
   * - Offer does not attempt other queues on overflow
   */
{noformat}

Seems it is better to try lower priority sub queues when the assigned sub queue 
is full, just like the case when backoff is disabled. This will give regular 
users more opportunities and allow the cluster to be configured with smaller 
call queue length. [~chrili], [~arpitagarwal], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Local repo sharing for maven builds

2015-09-18 Thread Ming Ma
The increase of frequency might have been due to the refactor of
hadoop-hdfs-client-*.jar
out of the main
hadoop-hdfs-*.jar. I don't have the oveall metrics of how often this
happens when anyone changes protobuf. But based on HDFS-9004, 4 of 5 runs
have this issue, which is a lot for any patch that changes APIs. This isn't
limited to HDFS. There are cases YARN API changes causing MR unit tests to
fail.

So far, the work around I use is to keep resubmitting the build until it
succeed. Another approach we can consider is to provide an option for the
patch submitter to use its local repo when it submits the patch. In that
way, the majority of patches can still use the shared local repo.

On Fri, Sep 18, 2015 at 3:14 PM, Andrew Wang 
wrote:

> Okay, some browsing of Jenkins docs [1] says that we could key the
> maven.repo.local off of $EXECUTOR_NUMBER to do per-executor repos like
> Bernd recommended, but that still requires some hook into test-patch.sh.
>
> Regarding install, I thought all we needed to install was
> hadoop-maven-plugins, but we do more than that now in test-patch.sh. Not
> sure if we can reduce that.
>
> [1]
>
> https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables
>
> On Fri, Sep 18, 2015 at 2:42 PM, Allen Wittenauer 
> wrote:
>
> >
> > The collisions have been happening for about a year now.   The frequency
> > is increasing, but not enough to be particularly worrisome. (So I'm
> > slightly amused that one blowing up is suddenly a major freakout.)
> >
> > Making changes to the configuration without knowing what one is doing is
> > probably a bad idea. For example, if people are removing the shared
> cache,
> > I hope they're also prepared for the bitching that is going to go with
> the
> > extremely significant slow down caused by downloading the java prereqs
> for
> > building for every test...
> >
> > As far as Yetus goes, we've got a JIRA open to provide for per-instance
> > caches when using the docker container code. I've got it in my head how I
> > think we can do it, but just haven't had a chance to code it.  So once
> that
> > gets written up + turning on containers should make the problem go away
> > without any significant impact on test time.  Of course, that won't help
> > the scheduled builds but those happen at an even smaller rate.
> >
> >
> > On Sep 18, 2015, at 12:19 PM, Andrew Wang 
> > wrote:
> >
> > > Sangjin, you should have access to the precommit jobs if you log in
> with
> > > your Apache credentials, even as a branch committer.
> > >
> > > https://builds.apache.org/job/PreCommit-HDFS-Build/configure
> > >
> > > The actual maven invocation is managed by test-patch.sh though.
> > > test-patch.sh has a MAVEN_ARGS which looks like what we want, but I
> don't
> > > think we can just set it before calling test-patch, since it'd get
> > squashed
> > > by setup_defaults.
> > >
> > > Allen/Chris/Yetus folks, any guidance here?
> > >
> > > Thanks,
> > > Andrew
> > >
> > > On Fri, Sep 18, 2015 at 11:55 AM,  wrote:
> > >
> > >> You can use one per build processor, that reduces concurrent updates
> but
> > >> still keeps the cache function. And then try to avoid using install.
> > >>
> > >> --
> > >> http://bernd.eckenfels.net
> > >>
> > >> -Original Message-
> > >> From: Andrew Wang 
> > >> To: "common-dev@hadoop.apache.org" 
> > >> Cc: Andrew Bayer , Sangjin Lee <
> > sj...@twitter.com>,
> > >> Lei Xu , infrastruct...@apache.org
> > >> Sent: Fr., 18 Sep. 2015 20:42
> > >> Subject: Re: Local repo sharing for maven builds
> > >>
> > >> I think each job should use a maven.repo.local within its workspace
> like
> > >> abayer said. This means lots of downloading, but it's isolated.
> > >>
> > >> If we care about download time, we could also bootstrap with a tarred
> > >> .m2/repository after we've run a `mvn compile`, so before it installs
> > the
> > >> hadoop artifacts.
> > >>
> > >> On Fri, Sep 18, 2015 at 11:02 AM, Ming Ma  >
> > >> wrote:
> > >>
> > >>> +hadoop common dev. Any suggestions?
> > >>>
> > >>>
> > >>> On Fri, Sep 18, 2015 at 10:41 AM, Andrew Bayer <
> andrew.ba...@gmail.com
> > >

Re: Local repo sharing for maven builds

2015-09-18 Thread Ming Ma
+hadoop common dev. Any suggestions?


On Fri, Sep 18, 2015 at 10:41 AM, Andrew Bayer 
wrote:

> You can change your maven call to use a different repository - I believe
> you do that with -Dmaven.repository.local=path/to/repo
> On Sep 18, 2015 19:39, "Ming Ma"  wrote:
>
>> Hi,
>>
>> We are seeing some strange behaviors in HDFS precommit build. It seems
>> like it is caused by the local repo on the same machine being used by
>> different concurrent jobs which can cause issues.
>>
>> In HDFS, the build and test of "hadoop-hdfs-project/hdfs" depend on
>> "hadoop-hdfs-project/hdfs-client"'s  hadoop-hdfs-client-3.0.0-
>> SNAPSHOT.jar. HDFS-9004 adds some new method to 
>> hadoop-hdfs-client-3.0.0-SNAPSHOT.jar.
>> In the precommit build for HDFS-9004, unit tests for 
>> "hadoop-hdfs-project/hdfs"
>> complain the method isn't defined
>> https://builds.apache.org/job/PreCommit-HDFS-Build/12522/testReport/.
>> Interestingly sometimes it just works fine
>> https://builds.apache.org/job/PreCommit-HDFS-Build/12507/testReport/.
>>
>> So we are suspecting that there is another job running at the same time
>> that published different version of hadoop-hdfs-client-3.0.0-SNAPSHOT.jar
>> which doesn't have the new methods defined to the local repo which is
>> shared by all jobs on that machine.
>>
>> If the above analysis is correct, what is the best way to fix the issue
>> so that different jobs can use their own maven local repo for build and
>> test?
>>
>> Thanks.
>>
>> Ming
>>
>


[jira] [Created] (HADOOP-11916) TestStringUtils#testLowerAndUpperStrings failed on MAC due to a JVM bug

2015-05-04 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-11916:


 Summary: TestStringUtils#testLowerAndUpperStrings failed on MAC 
due to a JVM bug
 Key: HADOOP-11916
 URL: https://issues.apache.org/jira/browse/HADOOP-11916
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor


The test fails with the belong exception. It turns out there is a JVM bug for 
MAC, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8047340.

{noformat}
testLowerAndUpperStrings(org.apache.hadoop.util.TestStringUtils)  Time elapsed: 
0.205 sec  <<< ERROR!
java.lang.Error: posix_spawn is not a supported process launch mechanism on 
this platform.
at java.lang.UNIXProcess$1.run(UNIXProcess.java:104)
at java.lang.UNIXProcess$1.run(UNIXProcess.java:93)
at java.security.AccessController.doPrivileged(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:91)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:391)
at org.apache.hadoop.util.Shell.(Shell.java:381)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:80)
at 
org.apache.hadoop.util.TestStringUtils.testLowerAndUpperStrings(TestStringUtils.java:432)
{noformat}

Perhaps we can disable this test case on MAC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11305) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-13 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-11305:


 Summary: RM might not start if the machine was hard shutdown and 
FileSystemRMStateStore was used
 Key: HADOOP-11305
 URL: https://issues.apache.org/jira/browse/HADOOP-11305
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma


This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
scenario, it might not be that important, unless there is something we need to 
fix at RM layer to make it more tolerant to RMStore issue.

When RM was hard shutdown, OS might not get a chance to persist blocks. Some of 
the stored application data end up with size zero after reboot. And RM didn't 
like that.

{noformat}
ls -al 
/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
total 156
drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
-rw-r--r--.1 x y  0 Nov 13 16:45 appattempt_1412702189634_324351_01
-rw-r--r--.1 x y  0 Nov 13 16:45 
.appattempt_1412702189634_324351_01.crc
-rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
-rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
{noformat}


When RM starts up

{noformat}

2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
opening checksum file: 
file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
  Ignoring exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)

...

2014-11-13 17:40:48,876 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
load/recover state
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11295) RPC Reader thread can't be shutdowned if RPCCallQueue is full

2014-11-10 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-11295:


 Summary: RPC Reader thread can't be shutdowned if RPCCallQueue is 
full
 Key: HADOOP-11295
 URL: https://issues.apache.org/jira/browse/HADOOP-11295
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma


You RPC server is asked to stop when RPCCallQueue is full, {{reader.join()}} 
will just wait there. That is because

1. The reader thread is blocked on {{callQueue.put(call);}}.
2. When RPC server is asked to stop, it will interrupt all handler threads and 
thus no threads will drain the callQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11000) HAServiceProtocol's health state is incorrectly transitioned to SERVICE_NOT_RESPONDING

2014-08-25 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-11000:


 Summary: HAServiceProtocol's health state is incorrectly 
transitioned to SERVICE_NOT_RESPONDING
 Key: HADOOP-11000
 URL: https://issues.apache.org/jira/browse/HADOOP-11000
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma


When HAServiceProtocol.monitorHealth throws a HealthCheckFailedException, the 
actual exception from protocol buffer RPC is a RemoteException that wraps the 
real exception. Thus the state is incorrectly transitioned to 
SERVICE_NOT_RESPONDING

{noformat}
HealthMonitor.java
doHealthChecks

  try {
status = proxy.getServiceStatus();
proxy.monitorHealth();
healthy = true;
  } catch (HealthCheckFailedException e) {
.
enterState(State.SERVICE_UNHEALTHY);
  } catch (Throwable t) {
.
enterState(State.SERVICE_NOT_RESPONDING);
.
  }

{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10673) Update rpc metrics when the call throw an exception

2014-06-09 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10673:


 Summary: Update rpc metrics when the call throw an exception
 Key: HADOOP-10673
 URL: https://issues.apache.org/jira/browse/HADOOP-10673
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma


Currently RPC metrics isn't updated when the call throws an exception. We can 
either update the existing metrics or have a new set of metrics in the case of 
exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10598) Support configurable RPC fair share

2014-05-13 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10598:


 Summary: Support configurable RPC fair share
 Key: HADOOP-10598
 URL: https://issues.apache.org/jira/browse/HADOOP-10598
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma


It will be useful if we can support RPC min fair share on a per user or group 
basis. That will be useful for SLA jobs in a shared cluster. It will be 
complementary to the history-based soft policy defined in fair queue's history 
RPC server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-05-12 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10597:


 Summary: Evaluate if we can have RPC client back off when server 
is under heavy load
 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma


Currently if an application hits NN too hard, RPC requests be in blocking 
state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
throw some well defined exception back to the client based on certain policies 
when it is under heavy load; client will understand such exception and do 
exponential back off, as another implementation of RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10599) Support prioritization of DN RPCs over client RPCs

2014-05-12 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10599:


 Summary: Support prioritization of DN RPCs over client RPCs
 Key: HADOOP-10599
 URL: https://issues.apache.org/jira/browse/HADOOP-10599
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma


We might need to prioritize DN RPC over client RPC so that no matter what 
application do to NN RPC and FSNamesystem's global lock, DN's requests will be 
processed timely. After a cluster is configured to have service RPC server 
separated from client RPC server, it is mitigated to some degree with fair 
FSNamesystem's global lock. Also if the NN global lock can be made more fine 
grained; such need becomes less important. Still, it will be good to evaluate 
if this is a good option.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10157) move doRead method from IPC Listener class to IPC Reader class

2013-12-10 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10157:


 Summary: move doRead method from IPC Listener class to IPC Reader 
class
 Key: HADOOP-10157
 URL: https://issues.apache.org/jira/browse/HADOOP-10157
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Reporter: Ming Ma
Priority: Minor


Current doRead method belongs to Listener class. Semantically it is better to 
move doRead method from Listener class to Reader class.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HADOOP-10125) no need to process RPC request if the client connection has been dropped

2013-11-22 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10125:


 Summary: no need to process RPC request if the client connection 
has been dropped
 Key: HADOOP-10125
 URL: https://issues.apache.org/jira/browse/HADOOP-10125
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma


If the client has dropped the connection before the RPC is processed, RPC 
server doesn't need to process the RPC call. We have encountered issues where 
bad applications can bring down the NN. 
https://issues.apache.org/jira/i#browse/Hadoop-9640 tries to address that. When 
this occurs, NN's RPC queues are filled up with client requests and DN 
requests, sometimes we want to stop the flooding by stopping the bad 
applications and/or DNs. Some RPC processing like DatanodeProtocol::blockReport 
could take couple hundred milliseconds. So it is worthwhile to have NN skip the 
RPC calls if DNs have been stopped.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10106) Incorrect thread name RPC log message

2013-11-15 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-10106:


 Summary: Incorrect thread name RPC log message
 Key: HADOOP-10106
 URL: https://issues.apache.org/jira/browse/HADOOP-10106
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ming Ma
Priority: Minor


INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: readAndProcess 
from client 10.115.201.46 threw exception 
org.apache.hadoop.ipc.RpcServerException: Unknown out of band call #-2147483647

This is thrown by a reader thread, so the message should be like

INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: 
readAndProcess from client 10.115.201.46 threw exception 
org.apache.hadoop.ipc.RpcServerException: Unknown out of band call #-2147483647

Another example is Responder.processResponse, which can also be called by 
handler thread. When that happend, the thread name should be the handler 
thread, not the responder thread.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-8706) Provide rate metrics based on counter value

2012-08-15 Thread Ming Ma (JIRA)
Ming Ma created HADOOP-8706:
---

 Summary: Provide rate metrics based on counter value
 Key: HADOOP-8706
 URL: https://issues.apache.org/jira/browse/HADOOP-8706
 Project: Hadoop Common
  Issue Type: Improvement
  Components: metrics
Reporter: Ming Ma


In production clusters, it is more useful to have ops / sec instead of 
increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, 
its current type is MutableCounterLong and thus the value is increasing all the 
time. Quite often "num of getBlockLocations" per second is more interesting for 
analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and 
DataNodeMetrics are more useful if they are expressed in terms of ops / sec. 

I looked at all the metrics objects provided in metrics 2.0, couldn't find such 
type.

FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira