Re: Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
We are also doing transformations, thats the reason using spark streaming.
Does Spark streaming support tumbling windows? I was thinking I can use a
window operation to writing into HDFS.

Thanks

On Sun, Jun 25, 2017 at 10:23 PM, ayan guha  wrote:

> I would suggest to use Flume, if possible, as it has in built HDFS log
> rolling capabilities
>
> On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire 
> wrote:
>
>> Hi,
>>
>> I am using spark streaming with 1 minute duration to read data from kafka
>> topic, apply transformations and persist into HDFS.
>>
>> The application is creating a new directory every 1 minute with many
>> partition files(= nbr of partitions). What parameter should I need to
>> change/configure to persist and create a HDFS directory say *every 30
>> minutes* instead of duration of the spark streaming application?
>>
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Naveen
>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: Spark streaming persist to hdfs question

2017-06-25 Thread ayan guha
I would suggest to use Flume, if possible, as it has in built HDFS log
rolling capabilities

On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire 
wrote:

> Hi,
>
> I am using spark streaming with 1 minute duration to read data from kafka
> topic, apply transformations and persist into HDFS.
>
> The application is creating a new directory every 1 minute with many
> partition files(= nbr of partitions). What parameter should I need to
> change/configure to persist and create a HDFS directory say *every 30
> minutes* instead of duration of the spark streaming application?
>
>
> Any help would be appreciated.
>
> Thanks,
> Naveen
>
>
>


-- 
Best Regards,
Ayan Guha


Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
Hi,

I am using spark streaming with 1 minute duration to read data from kafka
topic, apply transformations and persist into HDFS.

The application is creating a new directory every 1 minute with many
partition files(= nbr of partitions). What parameter should I need to
change/configure to persist and create a HDFS directory say *every 30
minutes* instead of duration of the spark streaming application?


Any help would be appreciated.

Thanks,
Naveen