Hi, Yevhenii, Could you clarify how you want to consume the "materialized views"? Are you planning to access it just in realtime analytic pipeline (i.e. Samza)? Or are you planning to serve it s.t. it can be accessed by some online application outside Samza? If it is the later case, usually the "materialized view" will be stored in a different persistent store. In LinkedIn, the destination store is usually a remote KV-store and we have two patterns in loading/updating the materialized view: a) direct write to the KV-store; b) direct to Kafka and the remote KV-store will consume it asynchronously. Usually the later give better performance and less impact to applications reading the "materialized view".
As far as set up a Samza application to run, it is pretty straight forward. Hello Samza example on the web should let you start a Samza application within 15 min. In LinkedIn, with all the internal tools/deployment requirements, we have a guide for users to build and deploy a Samza job in 30 min. The biggest overhead is probably not on Samza itself, but to setup Kafka and YARN. Let us know if there is any difficulties that you encounter. Cheers! -Yi On Tue, Jan 10, 2017 at 9:08 PM, Yevhenii Kurtov <yevhenii.kur...@gmail.com> wrote: > Hello, > > Recently I watched "Turning the database inside out with Apache Samza" and > was very impressed with "Fully precomputed cache" part as it seems to hold > a remedy for the exact problem that our company currently faced with. > We are doing a niche-specific software and nowhere near LinkedIn or Uber, > but have a stable growth of data that we are operating on. > As probably almost everybody, from the very beginning we normalized our > database as much as possible and now years after reading performance > becomes less and less satisfying. > > The idea is to feed MySQL log into Samza and build a "materialized views" > for all use-cases that we want and the part that I don't understand is > where those "materialized views"/"caches" will be stored in? In > Samza itself or Samza will write it back to the, say Kafka queue or another > MySQL database or anything else? > > Does anyone have an experience of implementing such scenario in production? > Will be great to hear your experience as that this is my first encounter > with stream processors and thus I don't have any clues about difficulties > and challenges that introducing Apache Samza into application stack may > bring along. >