Re: Spark streaming persist to hdfs question
We are also doing transformations, thats the reason using spark streaming. Does Spark streaming support tumbling windows? I was thinking I can use a window operation to writing into HDFS. Thanks On Sun, Jun 25, 2017 at 10:23 PM, ayan guhawrote: > I would suggest to use Flume, if possible, as it has in built HDFS log > rolling capabilities > > On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire > wrote: > >> Hi, >> >> I am using spark streaming with 1 minute duration to read data from kafka >> topic, apply transformations and persist into HDFS. >> >> The application is creating a new directory every 1 minute with many >> partition files(= nbr of partitions). What parameter should I need to >> change/configure to persist and create a HDFS directory say *every 30 >> minutes* instead of duration of the spark streaming application? >> >> >> Any help would be appreciated. >> >> Thanks, >> Naveen >> >> >> > > > -- > Best Regards, > Ayan Guha >
Re: Spark streaming persist to hdfs question
I would suggest to use Flume, if possible, as it has in built HDFS log rolling capabilities On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhirewrote: > Hi, > > I am using spark streaming with 1 minute duration to read data from kafka > topic, apply transformations and persist into HDFS. > > The application is creating a new directory every 1 minute with many > partition files(= nbr of partitions). What parameter should I need to > change/configure to persist and create a HDFS directory say *every 30 > minutes* instead of duration of the spark streaming application? > > > Any help would be appreciated. > > Thanks, > Naveen > > > -- Best Regards, Ayan Guha
Spark streaming persist to hdfs question
Hi, I am using spark streaming with 1 minute duration to read data from kafka topic, apply transformations and persist into HDFS. The application is creating a new directory every 1 minute with many partition files(= nbr of partitions). What parameter should I need to change/configure to persist and create a HDFS directory say *every 30 minutes* instead of duration of the spark streaming application? Any help would be appreciated. Thanks, Naveen