There is a bug currently in NaRPC which increases the likelyhood of hangs in Crail/TCP as the data sizes increase. We have identified the actual problem in NaRPC but didn't get to fixing it so far. I can look into this.
-Patrick On Wed, Aug 21, 2019 at 12:35 AM 'Ben Sidhom' via zrlio-users < [email protected]> wrote: > I've been experimenting with getting Crail over TCP to work with the > crail-spark-io <https://github.com/zrlio/crail-spark-io> shuffle > extensions. > > It seems to work fine for small shuffle sizes (up to about 10 gigabytes), > but anything larger than that seems to hang. I've investigated this and the > hangs seem to happen due to a few reasons, mostly contained to the NaRPC > layer. > > The benchmark numbers here > <https://crail.incubator.apache.org/blog/2019/03/disaggregation.html> seem > to imply that this has worked for at least 200 gigabyte shuffles (I'm not > certain because that second experiment does not explicitly give the test > parameters). Has anybody had success with Crail over TCP or were pretty > much all of the tests run over RDMA/NVMe? > > -- > -Ben > > -- > You received this message because you are subscribed to the Google Groups > "zrlio-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/zrlio-users/CA%2B%2BvPmYD0UXwpnaEYNxGsRj3uNpAeubzHA6Sjy3AXT82-kuh-g%40mail.gmail.com > <https://groups.google.com/d/msgid/zrlio-users/CA%2B%2BvPmYD0UXwpnaEYNxGsRj3uNpAeubzHA6Sjy3AXT82-kuh-g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . >
