When using Samza to process streaming data (kafka/databus), we deploy to
Yarn clusters dedicated to Samza workloads. The configurations of machines
in this cluster are roughly similar to what I provided.

When using Samza to process batch data (files on hadoop
<https://reviews.apache.org/r/52570/>), we deploy to our hadoop clusters
that are shared with other M-R workloads. I believe these clusters use
spinning disks.

For the future, We plan to explore trade-offs in storage-costs versus
performance and will continue to share what we learn with the community.

Thanks,
Jagadish


On Tue, Jan 31, 2017 at 1:38 PM, Ankit Malhotra <[email protected]>
wrote:

> Hi Jagadish,
>
> Thanks for your reply. Is it safe to assume that you are running similar
> machines in production YARN clusters where only SAMZA workloads run?
>
> Ankit
>
> > On Jan 31, 2017, at 3:49 PM, Jagadish Venkatraman <
> [email protected]> wrote:
> >
> > Hi Ankit,
> >
> > We have benchmarked Samza on the following hardware configuration:
> >
> >   - Processor: Intel Xeon 2.67 GHz processor (with 24 cores)
> >   - 48GB of RAM
> >   - 1Gbps Ethernet
> >   - SSD: 1.65TB Fusion-IO SSD
> >
> > Please check out the perf numbers and the methodology here:
> > https://engineering.linkedin.com/performance/benchmarking-
> apache-samza-12-million-messages-second-single-node
> >
> > Thanks,
>
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to