> In most cases, the OOM occurs when closing MiniDFSCluster. @Akira Ajisaka <aajis...@apache.org> , I see some usage of parallelStream() in RouterRPCServer. If we consider pool executors in the code and parallelStream(), could it be possible that there is a large number of threads created in the MiniCluster and MiniRouterCluster that causes the JVM to crash?
On Thu, Dec 17, 2020 at 9:13 PM Akira Ajisaka <aajis...@apache.org> wrote: > Thank you for your reply. > > > Will reducing the number of threads not increase the build time? > Yes, but the difference is 30 ~ 60 mins. Not so much, I think. > > > Can we not ask for more resources? > Now the machines in https://ci-hadoop.apache.org/ are physical, and > the memory size is fixed. > (They are donated from Y!. > > https://cwiki.apache.org/confluence/display/INFRA/Build+nodes+-+node+name+to+hostname+mappings > ) > > I'll ask the infrastructure team how much memory we can use. If the > size is not 20GB, we can update in > https://github.com/apache/hadoop/pull/2560 > > > I think RBF builds are quite stable > Actually not: > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/357/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > I attached the log because it will be deleted in 2 weeks. > > -Akira > > On Thu, Dec 17, 2020 at 7:19 PM Ayush Saxena <ayush...@gmail.com> wrote: > > > > Hi Akira, > > Will reducing the number of threads not increase the build time? I guess > it takes in general 2.5-3.5 hrs in the present scenario. Moreover the > thread count hasn’t been increased recently, Would that be the root of all > evil? > > > > Can we not ask for more resources? > > > > Anyway, I think RBF builds are quite stable, I don’t remember seeing OOM > there, so, in case we decide to reduce the thread count, may be we can keep > RBF as is? > > > > -Ayush > > > > > On 17-Dec-2020, at 2:15 PM, Akira Ajisaka <aajis...@apache.org> wrote: > > > > > > Sorry, now I think the above comment is wrong. Please ignore. > > > In hadoop-common, hadoop-hdfs, and hadoop-hdfs-rbf, the unit tests are > > > executed in parallel. I'd like to reduce the number of tests running > > > at the same time to avoid OOM. Filed > > > https://issues.apache.org/jira/browse/HDFS-15731 > > > > > >> On Thu, Dec 17, 2020 at 4:17 PM Akira Ajisaka <aajis...@apache.org> > wrote: > > >> > > >> In most cases, the OOM occurs when closing MiniDFSCluster. > > >> Added a detailed comment in > > >> https://issues.apache.org/jira/browse/HDFS-13579 and created a PR: > > >> https://github.com/apache/hadoop/pull/2555 > > >> > > >> -Akira > > >> > > >>> On Fri, Dec 4, 2020 at 12:43 AM Ahmed Hussein <a...@ahussein.me> wrote: > > >>> > > >>> I remember this error was there for more than 6 months. It > significantly > > >>> slows down the progress of collaboration. > > >>> Then, eventually, the community will develop another habit of > ignoring the > > >>> prebuilds (out of despair). > > >>> > > >>> I am willing to help to get this fixed. > > >>> Anyone knows who owns and has experience with Yetus environment? > > >>> > > >>> On Wed, Dec 2, 2020 at 4:43 PM Jim Brennan < > james.bren...@verizonmedia.com> > > >>> wrote: > > >>> > > >>>> This is still happening. > > >>>> Latest build: > > >>>> > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/343/#showFailuresLink > > >>>> > > >>>> Looks like we are running out of threads in the containers where > the unit > > >>>> tests run. Anyone know where this is setup? > > >>>> > > >>>> On Wed, Oct 21, 2020 at 5:51 PM Ahmed Hussein <a...@ahussein.me> > wrote: > > >>>> > > >>>>> Hey folks, > > >>>>> > > >>>>> Yetus has been failing miserably over the last couple of days. > > >>>>> In the Lastest qbt-report > > >>>>> < > > >>>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__ci-2Dhadoop.apache.org_job_hadoop-2Dqbt-2Dtrunk-2Djava8-2Dlinux-2Dx86-5F64_301_artifact_out_patch-2Dunit-2Dhadoop-2Dhdfs-2Dproject-5Fhadoop-2Dhdfs.txt&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=7Imi06B91L3gbxmt5ChzH4cwlA2_f2tmXh3OXmV9MLw&m=HChJ3ymJ0kOlFyiTBsyRZLs9qcTOQD864ZFb8g7y2CA&s=N-PB427UiouJCuX_U3UbUXvIh2HQTt7VdM2Bs_4XILI&e= > > >>>>>> , > > >>>>> hundreds of Junits fail after java failed to acquire resources > > >>>>> to create new threads. > > >>>>> > > >>>>> [ERROR] > > >>>>>> > > >>>>> > testRecoverAllDataBlocks1(org.apache.hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy) > > >>>>>> Time elapsed: 8.509 s <<< ERROR! > > >>>>>> java.lang.OutOfMemoryError: unable to create new native thread > > >>>>> > > >>>>> > > >>>>> Any thoughts on what could trigger that in the last few days? Do > we need > > >>>>> more resources for the image? > > >>>>> > > >>>>> -- > > >>>>> Best Regards, > > >>>>> > > >>>>> *Ahmed Hussein, PhD* > > >>>>> > > >>>> > > >>> > > >>> -- > > >>> Best Regards, > > >>> > > >>> *Ahmed Hussein, PhD* > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > -- Best Regards, *Ahmed Hussein, PhD*