Re: [E] Yetus is failing with Java unable to create threads

Ahmed Hussein Mon, 21 Dec 2020 08:43:23 -0800

> In most cases, the OOM occurs when closing MiniDFSCluster.

@Akira Ajisaka <[email protected]> , I see some usage of parallelStream()
in RouterRPCServer.
If we consider pool executors in the code and parallelStream(), could it be
possible that there
is a large number of threads created in the MiniCluster and
MiniRouterCluster that causes the JVM to crash?


On Thu, Dec 17, 2020 at 9:13 PM Akira Ajisaka <[email protected]> wrote:

> Thank you for your reply.
>
> > Will reducing the number of threads not increase the build time?
> Yes, but the difference is 30 ~ 60 mins. Not so much, I think.
>
> > Can we not ask for more resources?
> Now the machines in https://ci-hadoop.apache.org/ are physical, and
> the memory size is fixed.
> (They are donated from Y!.
>
> https://cwiki.apache.org/confluence/display/INFRA/Build+nodes+-+node+name+to+hostname+mappings
> )
>
> I'll ask the infrastructure team how much memory we can use. If the
> size is not 20GB, we can update in
> https://github.com/apache/hadoop/pull/2560
>
> > I think RBF builds are quite stable
> Actually not:
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/357/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
> I attached the log because it will be deleted in 2 weeks.
>
> -Akira
>
> On Thu, Dec 17, 2020 at 7:19 PM Ayush Saxena <[email protected]> wrote:
> >
> > Hi Akira,
> > Will reducing the number of threads not increase the build time? I guess
> it takes in general 2.5-3.5 hrs in the present scenario. Moreover the
> thread count hasn’t been increased recently, Would that be the root of all
> evil?
> >
> > Can we not ask for more resources?
> >
> > Anyway, I think RBF builds are quite stable, I don’t remember seeing OOM
> there, so, in case we decide to reduce the thread count, may be we can keep
> RBF as is?
> >
> > -Ayush
> >
> > > On 17-Dec-2020, at 2:15 PM, Akira Ajisaka <[email protected]> wrote:
> > >
> > > Sorry, now I think the above comment is wrong. Please ignore.
> > > In hadoop-common, hadoop-hdfs, and hadoop-hdfs-rbf, the unit tests are
> > > executed in parallel. I'd like to reduce the number of tests running
> > > at the same time to avoid OOM. Filed
> > > https://issues.apache.org/jira/browse/HDFS-15731
> > >
> > >> On Thu, Dec 17, 2020 at 4:17 PM Akira Ajisaka <[email protected]>
> wrote:
> > >>
> > >> In most cases, the OOM occurs when closing MiniDFSCluster.
> > >> Added a detailed comment in
> > >> https://issues.apache.org/jira/browse/HDFS-13579 and created a PR:
> > >> https://github.com/apache/hadoop/pull/2555
> > >>
> > >> -Akira
> > >>
> > >>> On Fri, Dec 4, 2020 at 12:43 AM Ahmed Hussein <[email protected]> wrote:
> > >>>
> > >>> I remember this error was there for more than 6 months. It
> significantly
> > >>> slows down the progress of collaboration.
> > >>> Then, eventually, the community will develop another habit of
> ignoring the
> > >>> prebuilds (out of despair).
> > >>>
> > >>> I am willing to help to get this fixed.
> > >>> Anyone knows who owns and has experience with Yetus environment?
> > >>>
> > >>> On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan <
> [email protected]>
> > >>> wrote:
> > >>>
> > >>>> This is still happening.
> > >>>> Latest build:
> > >>>>
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink
> > >>>>
> > >>>> Looks like we are running out of threads in the containers where
> the unit
> > >>>> tests run.  Anyone know where this is setup?
> > >>>>
> > >>>> On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein <[email protected]>
> wrote:
> > >>>>
> > >>>>> Hey folks,
> > >>>>>
> > >>>>> Yetus has been failing miserably over the last couple of days.
> > >>>>> In the Lastest qbt-report
> > >>>>> <
> > >>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA&s=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI&e=
> > >>>>>> ,
> > >>>>> hundreds of Junits fail after java failed to acquire resources
> > >>>>> to create new threads.
> > >>>>>
> > >>>>> [ERROR]
> > >>>>>>
> > >>>>>
> testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy)
> > >>>>>> Time elapsed: 8.509 s  <<< ERROR!
> > >>>>>> java.lang.OutOfMemoryError: unable to create new native thread
> > >>>>>
> > >>>>>
> > >>>>> Any thoughts on what could trigger that in the last few days? Do
> we need
> > >>>>> more resources for the image?
> > >>>>>
> > >>>>> --
> > >>>>> Best Regards,
> > >>>>>
> > >>>>> *Ahmed Hussein, PhD*
> > >>>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Best Regards,
> > >>>
> > >>> *Ahmed Hussein, PhD*
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>


-- 
Best Regards,

*Ahmed Hussein, PhD*

Re: [E] Yetus is failing with Java unable to create threads

Reply via email to