[jira] [Created] (HADOOP-17758) NPE and excessive warnings after HADOOP-17728

2021-06-10 Thread Jim Brennan (Jira)
Jim Brennan created HADOOP-17758:


 Summary: NPE and excessive warnings after HADOOP-17728
 Key: HADOOP-17758
 URL: https://issues.apache.org/jira/browse/HADOOP-17758
 Project: Hadoop Common
  Issue Type: Bug
  Components: common
Affects Versions: 3.4.0
Reporter: Jim Brennan


I'm noticing these warnings and NPE's when just running a simple pi test on a 
one node cluster:
{noformat}
2021-06-09 21:51:12,334 WARN  
[org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner] 
fs.FileSystem (FileSystem.java:run(4025)) - Exception in the cleaner thread but 
it will continue to run
java.lang.NullPointerException
at 
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:4020)
at java.lang.Thread.run(Thread.java:748){noformat}
This appears to be due to [HADOOP-17728].
I'm not sure I understand why that change was made?  Wasn't it by design that 
the remove should wait until something is queued?
[~kaifeiYi] can you please investigate?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] [VOTE] Hadoop 3.1.x EOL

2021-06-03 Thread Jim Brennan
+1


On Thu, Jun 3, 2021 at 1:14 AM Akira Ajisaka  wrote:

> Dear Hadoop developers,
>
> Given the feedback from the discussion thread [1], I'd like to start
> an official vote
> thread for the community to vote and start the 3.1 EOL process.
>
> What this entails:
>
> (1) an official announcement that no further regular Hadoop 3.1.x releases
> will be made after 3.1.4.
> (2) resolve JIRAs that specifically target 3.1.5 as won't fix.
>
> This vote will run for 7 days and conclude by June 10th, 16:00 JST [2].
>
> Committers are eligible to cast binding votes. Non-committers are welcomed
> to cast non-binding votes.
>
> Here is my vote, +1
>
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apache.org_w9ilb&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=ZWrvhaLN6hDkRibeGSwUQolnityfAhfxPBcxL9gc7P4&s=cuVKhSKq_G3un1VWO70Ri9KiZAn-VubytwVGtnljFTI&e=
> [2]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.timeanddate.com_worldclock_fixedtime.html-3Fmsg-3D4-26iso-3D20210610T16-26p1-3D248&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=ZWrvhaLN6hDkRibeGSwUQolnityfAhfxPBcxL9gc7P4&s=cM_wi2TYoRB0rofjBMmB-2Ln9s-LdbFNQtrBwhQfy5k&e=
>
> Regards,
> Akira
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: [E] Re: Java 8 Lambdas

2021-04-29 Thread Jim Brennan
I just think that we should be cognizant of changes (particularly bug
fixes), that will need to be ported to branch-2.10.  Since it is still on
Java7, anytime you use a lambda in code on trunk, we need to change it for
branch-2.10.   While not difficult, this is extra work and it increases the
differences between branches, which can also cause more conflicts when
porting bug fixes back.


On Wed, Apr 28, 2021 at 9:28 PM Ahmed Hussein  wrote:

> Thanks Eric for raising this issue!
>
> The debate about lambda is very complicated and won't be resolved any time
> soon.
>
>  I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
>
> No one probably knows :)
> - Lambda performance would depend on the JVM implementation. This changes
> between
> releases.
> - Java8+ features forces lambda. For example,
> ConcurrentHashMap.computeIfAbsent()
>
> I believe that we can transform this discussion into specific action items
> for future commits:
> For instance, a couple of those specifications would be:
> - No refactor just for the sake of using Lambda, unless there is a strong
> technical justification.
> - Usage of lambda in Unit-tests should be fine. If lambda makes the test
> more readable, and
>   allows passing method references, then this should make the unit-tests.
> - We put sample code in the "how-to-contribute" to elaborate "capturing Vs
> non-capturing"
>   lambda expressions and the implications of each type on the performance.
> - Without getting into much detail, IMHO, streams should be committed into
> the code
>   in exceptional cases. The possibility of executing code in parallel makes
> debugging
>   a nightmare. i.e., Usage of ForEach needs to be justified, what does it
> bring to the table?
>
> On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
>  wrote:
>
> > Hello all,
> >
> > I'd like to gauge the community on the usage of lambdas within Hadoop
> code.
> > I've been reviewing a lot of patches recently that either add or modify
> > lambdas and I'm beginning to think that sometimes we, as a community, are
> > writing lambdas because we can rather than because we should. To me, it
> > seems that lambdas often decrease the readability of the code, making it
> > more difficult to understand. I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
> > should be used. An additional argument is that lambdas aren't available
> in
> > Java 7, and branch-2.10 currently supports Java 7. So any code going back
> > to branch-2.10 has to be redone upon backporting. Anyway, my main point
> > here is to encourage us to rethink whether we should be using lambdas in
> > any given circumstance just because we can.
> >
> > Eric
> >
> > p.s. I'm also happy to accept this as my personal "old man yells at
> cloud"
> > issue if everyone else thinks lambdas are the greatest
> >
>
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>


Re: [E] Re: Problems with precommit builds

2021-03-04 Thread Jim Brennan
Thanks Ayush.  I was able to work around the problem for YARN-10664 by
re-submitting patch 002 as patch 003.  I will bookmark that page and try
re-running via Jenkins if this happens in the future.


On Thu, Mar 4, 2021 at 2:02 PM Ayush Saxena  wrote:

> Hey Jim,
> Seems that got triggered :
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_PreCommit-2DYARN-2DBuild_709_console&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=kETVCOG_yC4KNpizU-KmR0vv3uPVwXlst5M_wqzF6bI&s=BcL1BDm_23rZEXcrn2HAgjmuO9MZ53cXus1kipeKPSU&e=
>
> It has a mention of YARN-10664.002.patch
>
> It failed with :
>
> time="2021-03-03T00:00:31Z" level=error msg="error waiting for container:
> unexpected EOF"
>
> Guess was some infra issue, I don’t think yetus upgrade or any change at
> our side would bother this.
>
> Can’t say much about PR’s.The yetus upgrade does have some effect. It
> doesn’t put up a comment now, need to see in the github actions or stuff
> like that. Not sure if that was the case with Ahmed’s PR.
>
> Note: To retrigger the build instead canceling and resubmitting simply
> login using your Apache creds :
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_PreCommit-2DYARN-2DBuild_build-3Fdelay-3D0sec&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=kETVCOG_yC4KNpizU-KmR0vv3uPVwXlst5M_wqzF6bI&s=V6ccL_RObB69kGSBR6Tem7ug2tjoz8vg_lZQyqsGzTA&e=
>
> Click on Build with Parameters, and then just the jira Id. That works for
> most cases :-)
>
> -Ayush
>
>
> > On 05-Mar-2021, at 12:35 AM, Jim Brennan
>  wrote:
> >
> > Pre-commit builds in trunk seem to be very unreliable lately.  I know
> Ahmed
> > Hussein ran into an issue last week with one of his PRs. When he rebased
> > and updated it, the precommit builds failed to run,  We ended up having
> to
> > close and submit a new PR.
> >
> > And just this week I have been having similar problems with patches for
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_YARN-2D10664&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=kETVCOG_yC4KNpizU-KmR0vv3uPVwXlst5M_wqzF6bI&s=jlPHshdnp33evW2X6TzWf-OcUmiPewFhrh5GK2o-eX0&e=
> .  The precommit build ran
> > fine on the first patch.  But when I submitted a second patch it never
> > appeared to run (or possibly it ran but failed to provide any output to
> the
> > Jira).  I tried canceling and resubmitting and that did not work either.
> >
> > Have there been any changes in this area recently?  I seem to recall
> seeing
> > an update to Yetus - could that be the problem?
> >
> > Jim
>


Re: Problems with precommit builds

2021-03-04 Thread Jim Brennan
The PR where we were having problems was:
https://github.com/apache/hadoop/pull/2722


On Thu, Mar 4, 2021 at 1:04 PM Jim Brennan 
wrote:

> Pre-commit builds in trunk seem to be very unreliable lately.  I know
> Ahmed Hussein ran into an issue last week with one of his PRs. When he
> rebased and updated it, the precommit builds failed to run,  We ended up
> having to close and submit a new PR.
>
> And just this week I have been having similar problems with patches for
> https://issues.apache.org/jira/browse/YARN-10664.  The precommit build
> ran fine on the first patch.  But when I submitted a second patch it never
> appeared to run (or possibly it ran but failed to provide any output to the
> Jira).  I tried canceling and resubmitting and that did not work either.
>
> Have there been any changes in this area recently?  I seem to recall
> seeing an update to Yetus - could that be the problem?
>
> Jim
>


Problems with precommit builds

2021-03-04 Thread Jim Brennan
Pre-commit builds in trunk seem to be very unreliable lately.  I know Ahmed
Hussein ran into an issue last week with one of his PRs. When he rebased
and updated it, the precommit builds failed to run,  We ended up having to
close and submit a new PR.

And just this week I have been having similar problems with patches for
https://issues.apache.org/jira/browse/YARN-10664.  The precommit build ran
fine on the first patch.  But when I submitted a second patch it never
appeared to run (or possibly it ran but failed to provide any output to the
Jira).  I tried canceling and resubmitting and that did not work either.

Have there been any changes in this area recently?  I seem to recall seeing
an update to Yetus - could that be the problem?

Jim


[jira] [Created] (HADOOP-17486) Provide fallbacks for callqueue ipc namespace properties

2021-01-21 Thread Jim Brennan (Jira)
Jim Brennan created HADOOP-17486:


 Summary: Provide fallbacks for callqueue ipc namespace properties
 Key: HADOOP-17486
 URL: https://issues.apache.org/jira/browse/HADOOP-17486
 Project: Hadoop Common
  Issue Type: Improvement
  Components: common
Affects Versions: 3.1.4
Reporter: Jim Brennan


Filing this proposal on behalf of [~daryn], based on comments he made in one of 
our internal Jiras.

The following settings are currently specified per port:
{noformat}
  /**
   * CallQueue related settings. These are not used directly, but rather
   * combined with a namespace and port. For instance:
   * IPC_NAMESPACE + ".8020." + IPC_CALLQUEUE_IMPL_KEY
   */
  public static final String IPC_NAMESPACE = "ipc";
  public static final String IPC_CALLQUEUE_IMPL_KEY = "callqueue.impl";
  public static final String IPC_SCHEDULER_IMPL_KEY = "scheduler.impl";
  public static final String IPC_IDENTITY_PROVIDER_KEY = 
"identity-provider.impl";
  public static final String IPC_COST_PROVIDER_KEY = "cost-provider.impl";
  public static final String IPC_BACKOFF_ENABLE = "backoff.enable";
  public static final boolean IPC_BACKOFF_ENABLE_DEFAULT = false;
 {noformat}
If one of these properties is not specified for the port, the defaults are 
hard-coded.
It would be nice to provide a way to specify a fallback default property that 
would be used for all ports.  If the property for a specific port is not 
defined, the fallback would be used, and if the fallback is not defined it 
would use the hard-coded defaults.

We would likely need to make the same change for properties specified by these 
classes.  For example, properties used in WeightedTimeCostProvider.

The fallback properties could be specified by dropping the port from the 
property name.  For example, the fallback for {{ipc.8020.cost-provider.impl}} 
would be {{ipc.cost-provider.impl}}.
Another option would be to use something more explicit like 
{{ipc.default.cost-provider.impl}}.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17408) Optimize NetworkTopology while sorting of block locations

2021-01-08 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan resolved HADOOP-17408.
--
Fix Version/s: 3.4.0
   3.3.1
   Resolution: Fixed

Thanks for the contribution [~ahussein] and [~daryn]!  I have committed this to 
trunk and branch-3.3.

The patch does not apply cleanly to branch-3.2 or earlier.  Please provide a 
patch for 3.2 if desired.



> Optimize NetworkTopology while sorting of block locations
> -
>
> Key: HADOOP-17408
> URL: https://issues.apache.org/jira/browse/HADOOP-17408
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, net
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In {{NetworkTopology}}, I noticed that there are some hanging fruits to 
> improve the performance.
> Inside {{sortByDistance}}, collections.shuffle is performed on the list 
> before calling {{secondarySort}}.
> {code:java}
> Collections.shuffle(list, r);
> if (secondarySort != null) {
>   secondarySort.accept(list);
> }
> {code}
> However, in different call sites, {{collections.shuffle}} is passed as the 
> secondarySort to {{sortByDistance}}. This means that the shuffle is executed 
> twice on each list.
> Also, logic wise, it is useless to shuffle before applying a tie breaker 
> which might make the shuffle work obsolete.
> In addition, [~daryn] reported that:
> * topology is unnecessarily locking/unlocking to calculate the distance for 
> every node
> * shuffling uses a seeded Random, instead of ThreadLocalRandom, which is 
> heavily synchronized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] Re: [VOTE] Release Apache Hadoop 3.2.2 - RC4

2020-12-21 Thread Jim Brennan
I put up a patch for https://issues.apache.org/jira/browse/YARN-10540.
Thanks for bringing it to my attention.
Jim

On Mon, Dec 21, 2020 at 10:36 AM Sunil Govindan  wrote:

> I had some offline talks with a few folks.
> This issue is happening only in Mac, hence ideally it does not cause much
> of a problem in the supported OS.
>
> I will wait for feedback here to see whether we need another RC by fixing
> this. And will continue the discussion in the jira.
>
> Thanks
> Sunil
>
> On Sat, Dec 19, 2020 at 11:07 PM Sunil Govindan  wrote:
>
> > Thanks, Xiaoqiao.
> > All files are looking good.
> >
> > However, while I did the tests to verify the RC, I ran into a serious NPE
> > in YARN.
> > I raised YARN-10540 <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_YARN-2D10540&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=8i-pN_j9VKNxmOzU6gYGtWm_IVyeZkBcMwVI2eyzpRk&e=
> > to
> > analyze this further. I think this issue due to YARN-10450
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_YARN-2D10450&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=QEHMGtEbBz5Gn7mW4UsGlc-wNZ8ugZwiFQBy2pTx-Fw&e=
> >.
> > In the trunk, I am not able to see this issue. So It could be possible
> > that some patches are not backported to branch-3.2.2.
> >
> > UI1 & UI2 nodes page is not working at this moment. I will check a bit
> > more to see about this and update here.
> >
> > Thanks
> > Sunil
> >
> > On Sat, Dec 19, 2020 at 5:36 PM Xiaoqiao He 
> wrote:
> >
> >> Thanks Sunil, md5 files have been removed from RC4. Please have a look.
> >> Thanks & Regards.
> >>
> >> - He Xiaoqiao
> >>
> >> On Sat, Dec 19, 2020 at 7:22 PM Sunil Govindan 
> wrote:
> >>
> >>> Hi Xiaoqiao,
> >>>
> >>> Please remove the md5 files from your shared RC4 repo. Thanks, @Akira
> >>> Ajisaka  for sharing this input.
> >>>
> >>> Thanks
> >>> Sunil
> >>>
> >>> On Sat, Dec 19, 2020 at 10:21 AM Sunil Govindan 
> >>> wrote:
> >>>
>  Reference:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_dev_release-2Ddistribution.html-23sigs-2Dand-2Dsums&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=0qrQgqFXZzLqTDPzH_T1emam7NnHvnzXqZ6Ag0ccgIQ&e=
>  Also, we had a Jira to track this HADOOP-15930
>  <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D15930&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=afUJEs9zxobnuEnC4v0xJ6v7d0dtrUkUaX15lI6VZbM&e=
> >.
> 
>  Thanks
>  Sunil
> 
>  On Sat, Dec 19, 2020 at 10:16 AM Sunil Govindan 
>  wrote:
> 
> > Hi Xiaoqiao and Wei-chiu
> >
> > I am a bit confused after seeing both *.sha512 and *.md5 files in the
> > RC directory.
> > Are we releasing both now?
> >
> > Thanks
> > Sunil
> >
> > On Wed, Dec 9, 2020 at 10:32 PM Xiaoqiao He 
> > wrote:
> >
> >> Hi folks,
> >>
> >> The release candidate (RC4) for Hadoop-3.2.2 is available now.
> >> There are 10 commits[1] differences between RC4 and RC3[2].
> >>
> >> The RC4 is available at:
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__people.apache.org_-7Ehexiaoqiao_hadoop-2D3.2.2-2DRC4&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=HUI5h2vNLugUtOGnApsEXpk40-hQszUQyUNbfYRQeXU&e=
> >> The RC4 tag in github is here:
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_hadoop_tree_release-2D3.2.2-2DRC4&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=Z3pv4v__QPeuKK5Ol_DOqKD-3deFnGKSfXdhi0v_K1c&e=
> >> The maven artifacts are staged at:
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachehadoop-2D1296&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=XA6ozoshoAKllmFj5vMTlXjE-8Ee41jaKYiTxBmnWpA&e=
> >>
> >> You can find my public key at:
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_hadoop_common_KEYS&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=nSlLXPsCxZGl0VV03dBWreCNrSH0SsNAZzmjRWO-2Zg&s=oJ2jBABd8kCf9bLKre7vOkc4T010inenJNexdbGlOdg&e=
> or
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__people.ap

[jira] [Resolved] (HADOOP-17417) Reduce UGI overhead in token ops

2020-12-11 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan resolved HADOOP-17417.
--
Resolution: Later

This needs to be done as part of a larger feature that we are not ready to put 
up yet.

 

> Reduce UGI overhead in token ops 
> -
>
> Key: HADOOP-17417
> URL: https://issues.apache.org/jira/browse/HADOOP-17417
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, kms, performance, rpc-server, security
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>
> {{DelegationTokenIdentifier}} has a {{ugiCache}} but  
> AbstractDelegationTokenManager calls a static method {{getRemoteUser()}} 
> which would bypass the cache.
> Performance analysis of the KMS revealed the RPC server layer is creating 
> many and redundant UGI instances.. UGIs are not cheap to instantiate, require 
> synchronization, and waste memory. Reducing instantiations will improve the 
> performance of the ipc readers.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17416) Reduce synchronization in the token secret manager

2020-12-11 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan resolved HADOOP-17416.
--
Resolution: Later

This change is part of a larger feature that we are not ready to put up yet.

 

> Reduce synchronization in the token secret manager
> --
>
> Key: HADOOP-17416
> URL: https://issues.apache.org/jira/browse/HADOOP-17416
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, performance, security
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>
> [~daryn] reported that Reducing synchronization in the ZK secret manager is 
> complicated by excessive and unnecessary global synchronization in the 
> AbstractDelegationTokenSecretManager.  All RPC services, not just the KMS, 
> will benefit from the reduced synchronization.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17392) Remote exception messages should not include the exception class

2020-12-03 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan resolved HADOOP-17392.
--
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks [~daryn] and [~ahussein]!  I have committed this to trunk, branch-3.3, 
branch-3.2, and branch-3.1.

 

> Remote exception messages should not include the exception class
> 
>
> Key: HADOOP-17392
> URL: https://issues.apache.org/jira/browse/HADOOP-17392
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HADOOP-9844 added a change that caused some remote SASL exceptions to 
> redundantly include the exception class causing the client to see "{{Class: 
> Class: message}}" from an unwrapped RemoteException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] Yetus is failing with Java unable to create threads

2020-12-02 Thread Jim Brennan
This is still happening.
Latest build:
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink

Looks like we are running out of threads in the containers where the unit
tests run.  Anyone know where this is setup?

On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein  wrote:

> Hey folks,
>
> Yetus has been failing miserably over the last couple of days.
> In the Lastest qbt-report
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA&s=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI&e=
> >,
> hundreds of Junits fail after java failed to acquire resources
> to create new threads.
>
> [ERROR]
> >
> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
> >  Time elapsed: 8.509 s  <<< ERROR!
> > java.lang.OutOfMemoryError: unable to create new native thread
>
>
> Any thoughts on what could trigger that in the last few days? Do we need
> more resources for the image?
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>


[jira] [Resolved] (HADOOP-17367) Add InetAddress api to ProxyUsers.authorize

2020-11-19 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan resolved HADOOP-17367.
--
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks [~ahussein] and [~daryn]!
I've committed this to trunk, branch-3.3, branch-3.2, and branch-3.1.

> Add InetAddress api to ProxyUsers.authorize
> ---
>
> Key: HADOOP-17367
> URL: https://issues.apache.org/jira/browse/HADOOP-17367
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: performance, security
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Improve the ProxyUsers implementation by passing the address of the remote 
> peer to avoid resolving the hostname.
> Similarly, this requires adding InetAddress api to MachineList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17362) Doing hadoop ls on Har file triggers too many RPC calls

2020-11-16 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan resolved HADOOP-17362.
--
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

> Doing hadoop ls on Har file triggers too many RPC calls
> ---
>
> Key: HADOOP-17362
> URL: https://issues.apache.org/jira/browse/HADOOP-17362
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [~daryn] has noticed that Invoking hadoop ls on HAR is taking too much of 
> time.
> The har system has multiple deficiencies that significantly impacted 
> performance:
> # Parsing the master index references ranges within the archive index. Each 
> range required re-opening the hdfs input stream and seeking to the same 
> location where it previously stopped.
> # Listing a har stats the archive index for every "directory". The per-call 
> cache used a unique key for each stat, rendering the cache useless and 
> significantly increasing memory pressure.
> # Determining the children of a directory scans the entire archive contents 
> and filters out children. The cached metadata already stores the exact child 
> list.
> # Globbing a har's contents resulted in unnecessary stats for every leaf path.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-17306) RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09

2020-11-05 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reopened HADOOP-17306:
--

I have reverted this commit from trunk, branch-3.3, branch-3.2, and 
branch-3.2.2.
[~vinayakumarb] please address all of the unit test failures when you resubmit.
I also think we need to review references to modified times in the source base 
to be sure we are not breaking things with this change.  Yarn Resource 
Localization is one area, but there may be others.  Timestamps are sometimes 
stored in state-stores, so there may be compatibility issues with this change 
as well.


> RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09
> 
>
> Key: HADOOP-17306
> URL: https://issues.apache.org/jira/browse/HADOOP-17306
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> RawLocalFileSystem's FileStatus uses {{File.lastModified()}} api from JDK.
> This api looses milliseconds due to JDK bug.
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8177809]
> This bug fixed in JDK 10 b09 onwards and still exists in JDK 8 which is still 
> being used in many productions.
> Apparently, {{Files.getLastModifiedTime()}} from java's nio package returns 
> correct time.
> Use {{Files.getLastModifiedTime()}} instead of {{File.lastModified}} as 
> workaround. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17342) Creating a token identifier should not do kerberos name resolution

2020-11-03 Thread Jim Brennan (Jira)
Jim Brennan created HADOOP-17342:


 Summary: Creating a token identifier should not do kerberos name 
resolution
 Key: HADOOP-17342
 URL: https://issues.apache.org/jira/browse/HADOOP-17342
 Project: Hadoop Common
  Issue Type: Improvement
  Components: common
Affects Versions: 2.10.1, 3.4.0
Reporter: Jim Brennan


This problem was found and fixed internally for us by [~daryn].

Creating a token identifier tries to do auth_to_local short username 
translation. The authentication process creates a blank token identifier for 
deserializing the wire format. Attempting to resolve an empty username is 
useless work.

Discovered the issue during fair call queue backoff testing. The readers are 
unnecessary slowed down by this bug.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [E] [VOTE] Release Apache Hadoop 2.10.1 (RC0)

2020-09-16 Thread Jim Brennan
Thanks for your work on this Masatake!  I am +1 (non-binding) on this
2.10.1 release.
I built from source and ran hdfs, resourcemanager, and nodemanager unit
tests.
I set up a one-node-cluster and ran some example jobs (pi, sleep).
I tested NM/RM recovery by killing NM/RM during jobs and verifying they
completed successfully after restarting them.

On Mon, Sep 14, 2020 at 12:59 PM Masatake Iwasaki <
iwasak...@oss.nttdata.co.jp> wrote:

> Hi folks,
>
> This is the first release candidate for the second release of Apache
> Hadoop 2.10.
> It contains 218 fixes/improvements since 2.10.0 [1].
>
> The RC0 artifacts are at:
>
> https://urldefense.com/v3/__http://home.apache.org/*iwasakims/hadoop-2.10.1-RC0/__;fg!!Op6eflyXZCqGR5I!SffKK-HwJYiYKe2rYhl5ajTjL20w4JKLTHjt3MeLb7WjTV7hbhiFFlECw5DPzXq4v4K_eE4$
>
> RC tag is release-2.10.1-RC0:
>
> https://urldefense.com/v3/__https://github.com/apache/hadoop/tree/release-2.10.1-RC0__;!!Op6eflyXZCqGR5I!SffKK-HwJYiYKe2rYhl5ajTjL20w4JKLTHjt3MeLb7WjTV7hbhiFFlECw5DPzXq4x2Dtalw$
>
> The maven artifacts are hosted here:
>
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/orgapachehadoop-1279/__;!!Op6eflyXZCqGR5I!SffKK-HwJYiYKe2rYhl5ajTjL20w4JKLTHjt3MeLb7WjTV7hbhiFFlECw5DPzXq4i6I2aus$
>
> My public key is available here:
>
> https://urldefense.com/v3/__https://dist.apache.org/repos/dist/release/hadoop/common/KEYS__;!!Op6eflyXZCqGR5I!SffKK-HwJYiYKe2rYhl5ajTjL20w4JKLTHjt3MeLb7WjTV7hbhiFFlECw5DPzXq4cgWpFYE$
>
> The vote will run for 5 days, until Saturday, September 19 at 10:00 am PDT.
>
> [1]
> https://urldefense.com/v3/__https://issues.apache.org/jira/issues/?jql=project*20in*20(HDFS*2C*20YARN*2C*20HADOOP*2C*20MAPREDUCE)*20AND*20resolution*20*3D*20Fixed*20AND*20fixVersion*20*3D*202.10.1__;JSUlJSUlJSUlJSUlJSUlJSUl!!Op6eflyXZCqGR5I!SffKK-HwJYiYKe2rYhl5ajTjL20w4JKLTHjt3MeLb7WjTV7hbhiFFlECw5DPzXq4Q9b1yE4$
>
> Thanks,
> Masatake Iwasaki
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [E] Re: [DISCUSS] Hadoop 2.10.1 release

2020-09-02 Thread Jim Brennan
Thanks Masatake Iwasaki!
I am willing to help out with Hadoop 2.10.1 release.

Jim Brennan

On Tue, Sep 1, 2020 at 2:13 AM Masatake Iwasaki 
wrote:

> Thanks, Mingliang Liu.
>
> I volunteer to take the RM role then.
> I will appreciate advice from who have the experience.
>
> Masatake Iwasaki
>
> On 2020/09/01 10:38, Mingliang Liu wrote:
> > I can see how I can help, but I can not take the RM role this time.
> >
> > Thanks,
> >
> > On Mon, Aug 31, 2020 at 12:15 PM Wei-Chiu Chuang
> >  wrote:
> >
> >> Hello,
> >>
> >> I see that Masatake graciously agreed to volunteer with the Hadoop
> 2.10.1
> >> release work in the 2.9 branch EOL discussion thread
> >>
> https://urldefense.com/v3/__https://s.apache.org/hadoop2.9eold__;!!Op6eflyXZCqGR5I!VgXExIv_vKXrm8I53vtPUDS-H-gXOm8a48tad9NLLwKDrPfbDAjgMW3Cu9eHGwoifmKPnIM$
> >>
> >> Anyone else likes to contribute also?
> >>
> >> Thanks
> >>
> >
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HADOOP-17127) Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and processingTime

2020-07-13 Thread Jim Brennan (Jira)
Jim Brennan created HADOOP-17127:


 Summary: Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and 
processingTime
 Key: HADOOP-17127
 URL: https://issues.apache.org/jira/browse/HADOOP-17127
 Project: Hadoop Common
  Issue Type: Improvement
  Components: common
Reporter: Jim Brennan
Assignee: Jim Brennan


While making an internal change to use {{TimeUnit.MICROSECONDS}} instead of 
{{TimeUnit.MILLISECONDS}} for rpc details, we found that we also had to modify 
this code in DecayRpcScheduler.addResponseTime() to initialize {{queueTime}} 
and {{processingTime}} with the correct units.
{noformat}
long queueTime = details.get(Timing.QUEUE, TimeUnit.MILLISECONDS);
long processingTime = details.get(Timing.PROCESSING, TimeUnit.MILLISECONDS);
{noformat}
If we change these to use {{RpcMetrics.TIMEUNIT}} it is simpler.

We also found one test case in TestRPC that was assuming the units were 
milliseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Branch 2.10 QBT builds

2020-05-21 Thread Jim Brennan
Does anyone know what's up with the qbt builds for branch-2.10?
https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch-2.10-java7-linux-x86/692/console

Last few have failed with:

cd /testptch/hadoop
/usr/bin/mvn --batch-mode
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/hadoop-qbt-branch-2.10-java7-linux-x86/yetus-m2/hadoop-branch-2.10-full-0
checkstyle:checkstyle -Dcheckstyle.consoleOutput=true -Ptest-patch
-DskipTests -Ptest-patch >
/testptch/patchprocess/maven-patch-checkstyle-root.txt 2>&1
FATAL: command execution failed
java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.(ObjectInputStream.java:358)
at 
hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "unknown":
Remote call on H18 failed. The channel is closing down or has closed
down


[jira] [Created] (HADOOP-16789) In TestZKFailoverController, restore changes from HADOOP-11149 that were dropped by HDFS-6440

2020-01-03 Thread Jim Brennan (Jira)
Jim Brennan created HADOOP-16789:


 Summary: In TestZKFailoverController, restore changes from 
HADOOP-11149 that were dropped by HDFS-6440
 Key: HADOOP-16789
 URL: https://issues.apache.org/jira/browse/HADOOP-16789
 Project: Hadoop Common
  Issue Type: Test
Reporter: Jim Brennan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Making 2.10 the last minor 2.x release

2019-12-31 Thread Jim Brennan
It looks like QBT tests are still being run on branch-2 (
https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/),
and they are not very helpful at this point.
Can we change the QBT tests to run against branch-2.10 instead?

Jim

On Mon, Dec 23, 2019 at 7:44 PM Akira Ajisaka  wrote:

> Thank you, Ayush.
>
> I understand we should keep branch-2 as is, as well as master.
>
> -Akira
>
>
> On Mon, Dec 23, 2019 at 9:14 PM Ayush Saxena  wrote:
>
> > Hi Akira
> > Seems there was an INFRA ticket for that. INFRA-19581,
> > But the INFRA people closed as wont do and yes, the branch is protected,
> > we can’t delete it directly.
> >
> > Ref: https://issues.apache.org/jira/browse/INFRA-19581
> >
> > -Ayush
> >
> > On 23-Dec-2019, at 5:03 PM, Akira Ajisaka  wrote:
> >
> > Thank you for your work, Jonathan.
> >
> > I found branch-2 has been unintentionally pushed again. Would you remove
> > it?
> > I think the branch should be protected if possible.
> >
> > -Akira
> >
> > On Tue, Dec 10, 2019 at 5:17 AM Jonathan Hung 
> > wrote:
> >
> > It's done. The new commit chain is: trunk -> branch-3.2 -> branch-3.1 ->
> >
> > branch-2.10 -> branch-2.9 -> branch-2.8 (branch-2 no longer exists,
> please
> >
> > don't try to commit to it)
> >
> >
> > Completed procedure:
> >
> >
> >   - Verified everything in old branch-2.10 was in old branch-2
> >
> >   - Delete old branch-2.10
> >
> >   - Rename branch-2 to (new) branch-2.10
> >
> >   - Set version in new branch-2.10 to 2.10.1-SNAPSHOT
> >
> >   - Renamed fix versions from 2.11.0 to 2.10.1
> >
> >   - Removed 2.11.0 as a version in HADOOP/YARN/HDFS/MAPREDUCE
> >
> >
> >
> > Jonathan Hung
> >
> >
> >
> > On Wed, Dec 4, 2019 at 10:55 AM Jonathan Hung 
> >
> > wrote:
> >
> >
> > FYI, starting the rename process, beginning with INFRA-19521.
> >
> >
> > Jonathan Hung
> >
> >
> >
> > On Wed, Nov 27, 2019 at 12:15 PM Konstantin Shvachko <
> >
> > shv.had...@gmail.com>
> >
> > wrote:
> >
> >
> > Hey guys,
> >
> >
> > I think we diverged a bit from the initial topic of this discussion,
> >
> > which is removing branch-2.10, and changing the version of branch-2 from
> >
> > 2.11.0-SNAPSHOT to 2.10.1-SNAPSHOT.
> >
> > Sounds like the subject line for this thread "Making 2.10 the last minor
> >
> > 2.x release" confused people.
> >
> > It is in fact a wider matter that can be discussed when somebody
> >
> > actually
> >
> > proposes to release 2.11, which I understand nobody does at the moment.
> >
> >
> > So if anybody objects removing branch-2.10 please make an argument.
> >
> > Otherwise we should go ahead and just do it next week.
> >
> > I see people still struggling to keep branch-2 and branch-2.10 in sync.
> >
> >
> > Thanks,
> >
> > --Konstantin
> >
> >
> > On Thu, Nov 21, 2019 at 3:49 PM Jonathan Hung 
> >
> > wrote:
> >
> >
> > Thanks for the detailed thoughts, everyone.
> >
> >
> > Eric (Badger), my understanding is the same as yours re. minor vs patch
> >
> > releases. As for putting features into minor/patch releases, if we
> >
> > keep the
> >
> > convention of putting new features only into minor releases, my
> >
> > assumption
> >
> > is still that it's unlikely people will want to get them into branch-2
> >
> > (based on the 2.10.0 release process). For the java 11 issue, we
> >
> > haven't
> >
> > even really removed support for java 7 in branch-2 (much less java 8),
> >
> > so I
> >
> > feel moving to java 11 would go along with a move to branch 3. And as
> >
> > you
> >
> > mentioned, if people really want to use java 11 on branch-2, we can
> >
> > always
> >
> > revive branch-2. But for now I think the convenience of not needing to
> >
> > port
> >
> > to both branch-2 and branch-2.10 (and below) outweighs the cost of
> >
> > potentially needing to revive branch-2.
> >
> >
> > Jonathan Hung
> >
> >
> >
> > On Wed, Nov 20, 2019 at 10:50 AM Eric Yang  wrote:
> >
> >
> > +1 for 2.10.x as last release for 2.x version.
> >
> >
> > Software would become more compatible when more companies stress test
> >
> > the same software and making improvements in trunk.  Some may be extra
> >
> > caution on moving up the version because obligation internally to keep
> >
> > things running.  Company obligation should not be the driving force to
> >
> > maintain Hadoop branches.  There is no proper collaboration in the
> >
> > community when every name brand company maintains its own Hadoop 2.x
> >
> > version.  I think it would be more healthy for the community to
> >
> > reduce the
> >
> > branch forking and spend energy on trunk to harden the software.
> >
> > This will
> >
> > give more confidence to move up the version than trying to fix n
> >
> > permutations breakage like Flash fixing the timeline.
> >
> >
> > Apache license stated, there is no warranty of any kind for code
> >
> > contributions.  Fewer community release process should improve
> >
> > software
> >
> > quality when eyes are on trunk, and help steering toward the same end
> >
> > goals.

Re: [VOTE] Release Apache Hadoop 2.10.0 (RC1)

2019-10-25 Thread Jim Brennan
+1 (non-binding) on RC1
I built from source on Mac and RHEL7, ran hdfs, nodemanager, and
resourcemanager unit tests, and set up a one-node cluster and ran some test
jobs (pi and sleep).
- Jim


On Tue, Oct 22, 2019 at 4:55 PM Jonathan Hung  wrote:

> Hi folks,
>
> This is the second release candidate for the first release of Apache Hadoop
> 2.10 line. It contains 362 fixes/improvements since 2.9 [1]. It includes
> features such as:
>
> - User-defined resource types
> - Native GPU support as a schedulable resource type
> - Consistent reads from standby node
> - Namenode port based selective encryption
> - Improvements related to rolling upgrade support from 2.x to 3.x
> - Cost based fair call queue
>
> The RC1 artifacts are at: http://home.apache.org/~jhung/hadoop-2.10.0-RC1/
>
> RC tag is release-2.10.0-RC1.
>
> The maven artifacts are hosted here:
> https://repository.apache.org/content/repositories/orgapachehadoop-1243/
>
> My public key is available here:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> The vote will run for 5 weekdays, until Tuesday, October 29 at 3:00 pm PDT.
>
> Thanks,
> Jonathan Hung
>
> [1]
>
> https://issues.apache.org/jira/issues/?jql=project%20in%20(HDFS%2C%20YARN%2C%20HADOOP%2C%20MAPREDUCE)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.10.0%20AND%20fixVersion%20not%20in%20(2.9.2%2C%202.9.1%2C%202.9.0)
>


Re: [VOTE] Merge YARN-8200 to branch-2 and branch-3.0

2019-08-26 Thread Jim Brennan
+1 (non-binding).
I have built branch-2 with the latest YARN-8200 patch
(YARN-8200-branch-2.003.patch).  I ran all of the NM/RM tests and ran a few
test jobs on a one-node cluster with default settings.


On Mon, Aug 26, 2019 at 3:51 PM Oliver Hu  wrote:

> +1 (non-binding)
>
> We have used this patch internally for more than a year to acquire GPU
> reliably at LinkedIn. I don't think it is necessary to merge this to
> branch-3.0 tho, as we are EOLing that branch.
>
> On Thu, Aug 22, 2019 at 4:43 PM Jonathan Hung 
> wrote:
>
> > Hi folks,
> >
> > As per [1], starting a vote to merge YARN-8200 (and YARN-8200.branch3)
> > feature branch to branch-2 (and branch-3.0).
> >
> > Vote runs for 7 days, to Thursday, Aug 29 5:00PM PDT.
> >
> > Thanks.
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201908.mbox/%3cCAHzWLgcX7f5Tr3q=csrqgysvpdf7mh-iu17femgx89dhr+1...@mail.gmail.com%3e
> >
> > Jonathan Hung
> >
>


[jira] [Created] (HADOOP-16361) TestSecureLogins#testValidKerberosName fails on branch-2

2019-06-11 Thread Jim Brennan (JIRA)
Jim Brennan created HADOOP-16361:


 Summary: TestSecureLogins#testValidKerberosName fails on branch-2
 Key: HADOOP-16361
 URL: https://issues.apache.org/jira/browse/HADOOP-16361
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.8.5, 2.9.2, 2.10.0
Reporter: Jim Brennan


This test is failing in branch-2.
{noformat}
[ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.917 
s <<< FAILURE! - in org.apache.hadoop.registry.secure.TestSecureLogins
[ERROR] 
testValidKerberosName(org.apache.hadoop.registry.secure.TestSecureLogins)  Time 
elapsed: 0.007 s  <<< ERROR!
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to zookeeper/localhost
at 
org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:401)
at 
org.apache.hadoop.registry.secure.TestSecureLogins.testValidKerberosName(TestSecureLogins.java:182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Docker build process

2019-03-19 Thread Jim Brennan
I agree with Steve and Marton.   I am ok with having the docker build as an
option, but I don't want it to be the default.
Jim


On Tue, Mar 19, 2019 at 12:19 PM Eric Yang  wrote:

> Hi Marton,
>
> Thank you for your input.  I agree with most of what you said with a few
> exceptions.  Security fix should result in a different version of the image
> instead of replace of a certain version.  Dockerfile is most likely to
> change to apply the security fix.  If it did not change, the source has
> instability over time, and result in non-buildable code over time.  When
> maven release is automated through Jenkins, this is a breeze of clicking a
> button.  Jenkins even increment the target version automatically with
> option to edit.  It makes release manager's job easier than Homer Simpson's
> job.
>
> If versioning is done correctly, older branches can have the same docker
> subproject, and Hadoop 2.7.8 can be released for older Hadoop branches.  We
> don't generate timeline paradox to allow changing the history of Hadoop
> 2.7.1.  That release has passed and let it stay that way.
>
> There are mounting evidence that Hadoop community wants docker profile for
> developer image.  Precommit build will not catch some build errors because
> more codes are allowed to slip through using profile build process.  I will
> make adjustment accordingly unless 7 more people comes out and say
> otherwise.
>
> Regards,
> Eric
>
> On 3/19/19, 1:18 AM, "Elek, Marton"  wrote:
>
>
>
> Thank you Eric to describe the problem.
>
> I have multiple small comments, trying to separate them.
>
> I. separated vs in-build container image creation
>
> > The disadvantages are:
> >
> > 1.  Require developer to have access to docker.
> > 2.  Default build takes longer.
>
>
> These are not the only disadvantages (IMHO) as I wrote it in in the
> previous thread and the issue [1]
>
> Using in-build container image creation doesn't enable:
>
> 1. to modify the image later (eg. apply security fixes to the container
> itself or apply improvements for the startup scripts)
> 2. create images for older releases (eg. hadoop 2.7.1)
>
> I think there are two kind of images:
>
> a) images for released artifacts
> b) developer images
>
> I would prefer to manage a) with separated branch repositories but b)
> with (optional!) in-build process.
>
> II. Agree with Steve. I think it's better to make it optional as most
> of
> the time it's not required. I think it's better to support the default
> dev build with the default settings (=just enough to start)
>
> III. Maven best practices
>
> (https://dzone.com/articles/maven-profile-best-practices)
>
> I think this is a good article. But this is not against profiles but
> creating multiple versions from the same artifact with the same name
> (eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional
> steps. I think it's fine as the maven lifecycle/phase model is very
> static (compare it with the tree based approach in Gradle).
>
> Marton
>
> [1]: https://issues.apache.org/jira/browse/HADOOP-16091
>
> On 3/13/19 11:24 PM, Eric Yang wrote:
> > Hi Hadoop developers,
> >
> > In the recent months, there were various discussions on creating
> docker build process for Hadoop.  There was convergence to make docker
> build process inline in the mailing list last month when Ozone team is
> planning new repository for Hadoop/ozone docker images.  New feature has
> started to add docker image build process inline in Hadoop build.
> > A few lessons learnt from making docker build inline in YARN-7129.
> The build environment must have docker to have a successful docker build.
> BUILD.txt stated for easy build environment use Docker.  There is logic in
> place to ensure that absence of docker does not trigger docker build.  The
> inline process tries to be as non-disruptive as possible to existing
> development environment with one exception.  If docker’s presence is
> detected, but user does not have rights to run docker.  This will cause the
> build to fail.
> >
> > Now, some developers are pushing back on inline docker build process
> because existing environment did not make docker build process mandatory.
> However, there are benefits to use inline docker build process.  The listed
> benefits are:
> >
> > 1.  Source code tag, maven repository artifacts and docker hub
> artifacts can all be produced in one build.
> > 2.  Less manual labor to tag different source branches.
> > 3.  Reduce intermediate build caches that may exist in multi-stage
> builds.
> > 4.  Release engineers and developers do not need to search a maze of
> build flags to acquire artifacts.
> >
> > The disadvantages are:
> >
> > 1.  Require developer to have access to docker.
> > 2.  Default build takes longer.
> >
> > There is workaround for above di

Re: [ANNOUNCE] Eric Badger is now a committer!

2019-03-05 Thread Jim Brennan
Congratulations Eric!

On Tue, Mar 5, 2019 at 11:20 AM Eric Payne 
wrote:

> It is my pleasure to announce that Eric Badger has accepted an invitation
> to become a Hadoop Core committer.
>
> Congratulations, Eric! This is well-deserved!
>
> -Eric Payne
>


[jira] [Created] (HADOOP-15548) Randomize local dirs

2018-06-20 Thread Jim Brennan (JIRA)
Jim Brennan created HADOOP-15548:


 Summary: Randomize local dirs
 Key: HADOOP-15548
 URL: https://issues.apache.org/jira/browse/HADOOP-15548
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jim Brennan
Assignee: Jim Brennan


shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. Some 
applications will process these in exactly the same way in every container 
(e.g. roundrobin) which can cause disks to get unnecessarily overloaded (e.g. 
one output file written to first entry specified in the environment variable).

There are two paths for local dir allocation, depending on whether the size is 
unknown or known.  The unknown path already uses a random algorithm.  The known 
path initializes with a random starting point, and then goes round-robin after 
that.  When selecting a dir, it increments the last used by one and then checks 
sequentially until it finds a dir that satisfies the request.  Proposal is to 
increment by a random value of between 1 and num_dirs - 1, and then check 
sequentially from there.  This should result in a more random selection in all 
cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15357) Configuration.getPropsWithPrefix no longer does variable substitution

2018-04-02 Thread Jim Brennan (JIRA)
Jim Brennan created HADOOP-15357:


 Summary: Configuration.getPropsWithPrefix no longer does variable 
substitution
 Key: HADOOP-15357
 URL: https://issues.apache.org/jira/browse/HADOOP-15357
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jim Brennan


Before [HADOOP-13556], Configuration.getPropsWithPrefix() used the 
Configuration.get() method to get the value of the variables.   After 
[HADOOP-13556], it now uses props.getProperty().

The difference is that Configuration.get() does deprecation handling and more 
importantly variable substitution on the value.  So if a property has a 
variable specified with ${variable_name}, it will no longer be expanded when 
retrieved via getPropsWithPrefix().

Was this change in behavior intentional?  I am using this function in the fix 
for [MAPREDUCE-7069], but we do want variable expansion to happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15085) Output streams closed with IOUtils suppressing write errors

2017-12-15 Thread Jim Brennan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reopened HADOOP-15085:
--

Reopening so I can put up a patch for branch-2.

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HADOOP-15085
> URL: https://issues.apache.org/jira/browse/HADOOP-15085
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jason Lowe
>    Assignee: Jim Brennan
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HADOOP-15085.001.patch, HADOOP-15085.002.patch, 
> HADOOP-15085.003.patch, HADOOP-15085.004.patch, HADOOP-15085.005.patch
>
>
> There are a few places in hadoop-common that are closing an output stream 
> with IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org