Re: Blade testing 3.2.0-SNAPSHOT (master/)

Dylan Millikin Tue, 05 Apr 2016 08:40:51 -0700

Cheers Marko, good work.

On Tue, Apr 5, 2016 at 4:32 PM, Marko Rodriguez <okramma...@gmail.com>
wrote:


> Hi,
>
> So yesterday and this morning I manually tested TinkerPop 3.2.0-SNAPSHOT
> for our VOTE release on Friday on 4 Blades using Friendster (2.5 billion
> edges). I noticed that Spark 1.6.1 is fickle and Netty-based network errors
> occur "easily." I dropped back down to 1.5.2 and no errors. I think one of
> the problems is GC in Spark 1.6.1 and using MEMORY_XXX storage levels. I
> did DISK_ONLY and the issues went away on the simple query of g.V().count()
> (which only repartitions -- no message passing). In 1.5.2 you get GC stalls
> with MEMORY_XXX storage levels, but no [ERROR]s (and no stack traces w/
> failed tasks). Next, I did a more complex query --
> g.V().out().out().count() -- and Spark 1.6.1 had failed tasks even with
> DISK_ONLY. Bummer. As a last check, I changed the proportion of
> SPARK_WORKER_INSTANCES to SPARK_WORKER_CORES from 4/6 to 6/4 and everything
> started to work again with Spark 1.6.1.
>
> In short, the memory management and workers/core-ratio in Spark 1.6.1 is
> "different" than Spark 1.5.2. I was able to get the same speeds on 1.6.1 as
> with 1.5.2, I just had to do things a little differently. In fact, 1.6.1
> seems a bit faster -- a 55 minute job on 1.5.2 taking 50 minutes on 1.6.1.
>
> I think it is safe to release TinkerPop 3.2.0 with Spark 1.6.1, but we
> will just have to be ready to tell people to reduce the number of workers
> and to use DISK_ONLY if they are GC stalling a lot. Finally, with this
> testing, I ensured that our bump to Hadoop 2.7.2 didn't cause any problems
> and moreover, there were a few nick nack bugs around FileSystemStorage that
> I was able to confirm no longer existed.
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
>

Re: Blade testing 3.2.0-SNAPSHOT (master/)

Reply via email to