Hi Yi
Thanks for the replay.
The other thing I just recall is "how long STOP THE WORLD time can a
program tolerates...".
One of our team run into a situation that they have to use CMS in a
throughput pipeline. They maintain a heavy workload storm cluster. Parallel
full GC takes too much time, and the zookeeper thinks the work node is dead
and kicks it out. This leads to a lot of kafka rebalance... They had to use
CMS to reduce STOP THE WORLD time for quite some time until G1 come out...
If long full GC time, tens of seconds for example, isn't a problem for
samza at framework side, Serial + Serial Old sounds good to me. ;-)
On 1 February 2016 at 16:02, Yi Pan wrote:
> Hi, Bo,
>
> That's an interesting question. Since we have opened up the task.opts
> option to the users to set any favorable GC configuration to Samza jobs, we
> really don't have a "recommended" GC for the users. It would probably also
> depend on the application's usage pattern as well. Our perf partner Tao
> Feng @LinkedIn may have some more insights.
>
> @Tao, do you have any comments on this?
>
> -Yi
>
> On Sun, Jan 31, 2016 at 7:58 PM, Liu Bo wrote:
>
> > Hi group
> >
> > We are trying to migrate our current streaming pipeline to samza. Our
> > pipeline has several NLP modules, such as segment, POS, and a lot of
> score
> > calculation. Each process normally needs 8~10GB memory.
> >
> > Our goal is high throughput so we use Parallel Scavenge + Parallel Old in
> > our current setup. We've tried G1 in Java 8 U65, it's not so good for
> > throughput.
> >
> > My question is since samza is designed for one core, dose it means that
> > Serial + Serial Old is the best garbage collector for samza? On paper
> > serial is more efficient.
> >
> > If it's not could someone share your experience on samza GC tuning for
> > discussion? Thanks in advance.
> >
> > --
> > All the best
> >
> > Liu Bo
> >
>
--
All the best
Liu Bo