You can also look at the Amazon's kinesis if you don't want to handle the
pain of maintaining kafka/flume infra.

Thanks
Best Regards

On Thu, Nov 20, 2014 at 3:32 AM, Guillermo Ortiz <konstt2...@gmail.com>
wrote:

> Thank you for your answer, I don't know if I typed the question
> correctly. But your nswer helps me.
>
> I'm going to make the question again for knowing if you understood me.
>
> I have this topology:
>
> DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS
>                                                           Kafka -->
> HDFS (raw data)
>
> DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS
>                                                           Flume -->
> HDFS (raw data)
>
>
> All data are going to be processed and going to HDFS as raw and
> processed data. I don't know if it makes sense to use Kafka in this
> case if data are just going to HDFS. I guess that before this
> FlumeSpark Sink has more sense to feed SparkStream with a real-time
> flow of data.. It doesn't look too much sense to have SparkStreaming
> and get the data from HDFS.
>
> 2014-11-19 22:55 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
> > Thank you for your answer, I don't know if I typed the question
> > correctly. But your nswer helps me.
> >
> > I'm going to make the question again for knowing if you understood me.
> >
> > I have this topology:
> >
> > DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS
> >
> > DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS
> >
> > All data are going to be pro
> >
> >
> > 2014-11-19 21:50 GMT+01:00 Hari Shreedharan <hshreedha...@cloudera.com>:
> >> Btw, if you want to write to Spark Streaming from Flume -- there is a
> sink
> >> (it is a part of Spark, not Flume). See Approach 2 here:
> >> http://spark.apache.org/docs/latest/streaming-flume-integration.html
> >>
> >>
> >>
> >> On Wed, Nov 19, 2014 at 12:41 PM, Hari Shreedharan
> >> <hshreedha...@cloudera.com> wrote:
> >>>
> >>> As of now, you can feed Spark Streaming from both kafka and flume.
> >>> Currently though there is no API to write data back to either of the
> two
> >>> directly.
> >>>
> >>> I sent a PR which should eventually add something like this:
> >>>
> https://github.com/harishreedharan/spark/blob/Kafka-output/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
> >>> that would allow Spark Streaming to write back to Kafka. This will
> likely be
> >>> reviewed and committed after 1.2.
> >>>
> >>> I would consider writing something similar to push data to Flume as
> well,
> >>> if there is a sufficient use-case for it. I have seen people talk about
> >>> writing back to kafka quite a bit - hence the above patch.
> >>>
> >>> Which one is better is upto your use-case and existing infrastructure
> and
> >>> preference. Both would work as is, but writing back to Flume would
> usually
> >>> be if you want to write to HDFS/HBase/Solr etc -- which you could
> write back
> >>> directly from Spark Streaming (of course, there are benefits of
> writing back
> >>> using Flume like the additional buffering etc Flume gives), but it is
> still
> >>> possible to do so from Spark Streaming itself.
> >>>
> >>> But for Kafka, the usual use-case is a variety of custom applications
> >>> reading the same data -- for which it makes a whole lot of sense to
> write
> >>> back to Kafka. An example is to sanitize incoming data in Spark
> Streaming
> >>> (from Flume or Kafka or something else) and make it available for a
> variety
> >>> of apps via Kafka.
> >>>
> >>> Hope this helps!
> >>>
> >>> Hari
> >>>
> >>>
> >>> On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz <konstt2...@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I'm starting with Spark and I just trying to understand if I want to
> >>>> use Spark Streaming, should I use to feed it Flume or Kafka? I think
> >>>> there's not a official Sink for Flume to Spark Streaming and it seems
> >>>> that Kafka it fits better since gives you readibility.
> >>>>
> >>>> Could someone give a good scenario for each alternative? When would it
> >>>> make sense to use Kafka and when Flume for Spark Streaming?
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>>> For additional commands, e-mail: user-h...@spark.apache.org
> >>>>
> >>>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to