Felix/Jordan,

1 - 2 is exactly what I was looking for as well. I want to expose
webservices call to Kafka/samza. As there is no concept of a session, I was
wondering how to send back enriched data to the web services request.
Or am I way off on this? Meaning, is this a completely wrong use case to
use Kafka/Samza?

- Shekar

On Fri, Mar 27, 2015 at 12:42 PM, Jordan Shaw <jor...@pubnub.com> wrote:

> Felix,
> Here are my thoughts below
>
> 1 - 2) I think so far a majority of samza applications are internal so far.
> However I've developed a Samza Publisher for PubNub that would allow you to
> send data from process or window out over a Data Stream Network. Right now
> it looks something like this:
>
> (.send collector (OutgoingMessageEnvelope. (SystemStream.
> "pubnub.some-channel") {:pub_key demo :sub_key demo} some-data)).
>
> At smaller scale you could do the same with socket.io etc... If you're
> interested in this I can send you the src or jar. If their is wider
> interest I can open source it on github but it needs some cleanup first.
>
> 3) We currently don't have the need to warehouse our stream but we have
> thought about piping samza generated data into some Hadoop based system for
> longer term analysis. Then running Hive queries over that data or something
> alike.
>
> 4) I can't comment on the throughput of the other systems (HBase etc..) but
> our Kafka, Samza through put is pretty impressive considering the single
> thread nature of the system. We are seeing raw throughput per partition
> over well 10MB/s.
>
> 5) I haven't run into this to prevent data loss/backup if we can't process
> a message we have considered dropping it into a "unprocessed topic" but we
> haven't really run into this need. If you needed to reprocess all raw data
> it would be pretty straightforward, you could just add a partition to
> support the extra load.
>
> 6) Kafka is pretty good at ingesting things so could you elaborate more on
> this?
>
> On Fri, Mar 27, 2015 at 9:52 AM, Felix GV <fville...@linkedin.com.invalid>
> wrote:
>
> > Hi Samza devs, users and enthusiasts,
> >
> > I've kept an eye on the Samza project for a while and I think it's super
> > cool! I hope it continues to mature and expand as it seems very
> promising (:
> >
> > One thing I've been wondering for a while is: how do people serve the
> data
> > they computed on Samza? More specifically:
> >
> >   1.  How do you expose the output of Samza jobs to online applications
> > that need low-latency reads?
> >   2.  Are these online apps mostly internal (i.e.: analytics, dashboards,
> > etc.) or public/user-facing?
> >   3.  What systems do you currently use (or plan to use in the
> short-term)
> > to host the data generated in Samza? HBase? Cassandra? MySQL? Druid?
> Others?
> >   4.  Are you satisfied or are you facing challenges in terms of the
> write
> > throughput supported by these storage/serving systems? What about read
> > throughput?
> >   5.  Are there situations where you wish to re-process all historical
> > data when making improvements to your Samza job, which results in the
> need
> > to re-ingest all of the Samza output into your online serving system (as
> > described in the Kappa Architecture<
> >
> http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
> >)
> > ? Is this easy breezy or painful? Do you need to throttle it lest your
> > serving system will fall over?
> >   6.  If there was a highly-optimized and reliable way of ingesting
> > partitioned streams quickly into your online serving system, would that
> > help you leverage Samza more effectively?
> >
> > Your insights would be much appreciated!
> >
> >
> > Thanks (:
> >
> >
> > --
> > Felix
> >
>
>
>
> --
> Jordan Shaw
> Full Stack Software Engineer
> PubNub Inc
> 1045 17th St
> San Francisco, CA 94107
>

Reply via email to