Hi Massimiliano, >From my understanding, what you want to do is to process the messages and then store them into, say, Cassandra. To implement this, it's not necessary to write MessageCollector. What you only need to do is to put the writing logic in the process method, see the API doc <https://samza.incubator.apache.org/learn/documentation/latest/api/overview.html>. The method is called for every message. So you can process the message and store it into the remote DB if you want.
Assume you already tested the hello-samza <https://samza.incubator.apache.org/startup/hello-samza/latest/> project, you can have a look at the WikipediaFeedStreamTask <https://github.com/apache/incubator-samza-hello-samza/blob/master/samza-wikipedia/src/main/java/samza/examples/wikipedia/task/WikipediaFeedStreamTask.java> and WikipediaParserStreamTask <https://github.com/apache/incubator-samza-hello-samza/blob/master/samza-wikipedia/src/main/java/samza/examples/wikipedia/task/WikipediaParserStreamTask.java> . You can put your logic in the process method. In terms of writing a batch into the DB for better performance, you can have a batch variable (such as List, Map) in the StreamTask to store the processed result. Then write the results in the variable into the DB after certain number of messages or after certain time ( by implementing the Window <https://samza.incubator.apache.org/learn/documentation/latest/container/windowing.html> interface). Hope that helps. Thanks, Fang, Yan [email protected] +1 (206) 849-4108 On Thu, Sep 4, 2014 at 2:58 PM, Massimiliano Tomassi <[email protected]> wrote: > Hello all, > I was trying to figure out what's the way to implement and use a custom > MessageCollector. Let's say I want to send messages to a system different > from Kafka. How should I do that? Is there any tutorial explaining this? > > I was also thinking at the following use case, not sure if it makes sense > at all but here it is: let's say we receive messages, process them somehow > and then we want to store the results in a remote DB, Cassandra for > example. Does it make sense to create an implementation of MessageCollector > that stores to Cassandra? Also, given that performing a write for every > single message can be not very efficient, would it be possible to collect > some data and then write them to Cassandra as a single batch operation? > > I hope to have explained myself decently...and I hope to receive some > suggestions. > > All the best. > Max > > > -- > ------------------------------------------------ > Massimiliano Tomassi > ------------------------------------------------ > e-mail: [email protected] > ------------------------------------------------ >
