Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Gabor Somogyi
Mixed up with Spark version. Seems like the issue is different based on
Guillermo last mail.

On Wed, Feb 27, 2019 at 1:16 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:

> Hi Gabor,
>
> I guess you are looking at Kafka 2.1 but Guillermo mentioned initially
> that they are working with Kafka 1.0
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Wed, Feb 27, 2019 at 5:41 PM Gabor Somogyi 
> wrote:
>
>> Where exactly? In Kafka broker configuration section here it's 10080:
>> https://kafka.apache.org/documentation/
>>
>> offsets.retention.minutes After a consumer group loses all its consumers
>> (i.e. becomes empty) its offsets will be kept for this retention period
>> before getting discarded. For standalone consumers (using manual
>> assignment), offsets will be expired after the time of last commit plus
>> this retention period. int 10080 [1,...] high read-only
>>
>> On Wed, Feb 27, 2019 at 1:04 PM Guillermo Ortiz 
>> wrote:
>>
>>> I'm going to check the value, but I didn't change it., normally, the
>>> process is always running but sometimes I have to restarted to apply some
>>> changes. Sometimes it starts from the beginning and others continue for the
>>> last offset.
>>>
>>> El mié., 27 feb. 2019 a las 12:25, Akshay Bhardwaj (<
>>> akshay.bhardwaj1...@gmail.com>) escribió:
>>>
 Hi Gabor,

 I am talking about offset.retention.minutes which is set default as
 1440 (or 24 hours)

 Akshay Bhardwaj
 +91-97111-33849


 On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi <
 gabor.g.somo...@gmail.com> wrote:

> Hi Akshay,
>
> The feature what you've mentioned has a default value of 7 days...
>
> BR,
> G
>
>
> On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
> akshay.bhardwaj1...@gmail.com> wrote:
>
>> Hi Guillermo,
>>
>> What was the interval in between restarting the spark job? As a
>> feature in Kafka, a broker deleted offsets for a consumer group after
>> inactivity of 24 hours.
>> In such a case, the newly started spark streaming job will read
>> offsets from beginning for the same groupId.
>>
>> Akshay Bhardwaj
>> +91-97111-33849
>>
>>
>> On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi <
>> gabor.g.somo...@gmail.com> wrote:
>>
>>> From the info you've provided not much to say.
>>> Maybe you could collect sample app, logs etc, open a jira and we can
>>> take a deeper look at it...
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz <
>>> konstt2...@gmail.com> wrote:
>>>
 I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
 Stream as connector. I consume data from Kafka and autosave the 
 offsets.
 I can see Spark doing commits in the logs of the last offsets
 processed, Sometimes I have restarted spark and it starts from the
 beginning, when I'm using the same groupId.

 Why could it happen? it only happen rarely.

>>>


Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Gabor Somogyi
Where exactly? In Kafka broker configuration section here it's 10080:
https://kafka.apache.org/documentation/

offsets.retention.minutes After a consumer group loses all its consumers
(i.e. becomes empty) its offsets will be kept for this retention period
before getting discarded. For standalone consumers (using manual
assignment), offsets will be expired after the time of last commit plus
this retention period. int 10080 [1,...] high read-only

On Wed, Feb 27, 2019 at 1:04 PM Guillermo Ortiz 
wrote:

> I'm going to check the value, but I didn't change it., normally, the
> process is always running but sometimes I have to restarted to apply some
> changes. Sometimes it starts from the beginning and others continue for the
> last offset.
>
> El mié., 27 feb. 2019 a las 12:25, Akshay Bhardwaj (<
> akshay.bhardwaj1...@gmail.com>) escribió:
>
>> Hi Gabor,
>>
>> I am talking about offset.retention.minutes which is set default as 1440
>> (or 24 hours)
>>
>> Akshay Bhardwaj
>> +91-97111-33849
>>
>>
>> On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi 
>> wrote:
>>
>>> Hi Akshay,
>>>
>>> The feature what you've mentioned has a default value of 7 days...
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
>>> akshay.bhardwaj1...@gmail.com> wrote:
>>>
 Hi Guillermo,

 What was the interval in between restarting the spark job? As a feature
 in Kafka, a broker deleted offsets for a consumer group after inactivity of
 24 hours.
 In such a case, the newly started spark streaming job will read offsets
 from beginning for the same groupId.

 Akshay Bhardwaj
 +91-97111-33849


 On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi <
 gabor.g.somo...@gmail.com> wrote:

> From the info you've provided not much to say.
> Maybe you could collect sample app, logs etc, open a jira and we can
> take a deeper look at it...
>
> BR,
> G
>
>
> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
> wrote:
>
>> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
>> Stream as connector. I consume data from Kafka and autosave the offsets.
>> I can see Spark doing commits in the logs of the last offsets
>> processed, Sometimes I have restarted spark and it starts from the
>> beginning, when I'm using the same groupId.
>>
>> Why could it happen? it only happen rarely.
>>
>


Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Akshay Bhardwaj
Hi Gabor,

I guess you are looking at Kafka 2.1 but Guillermo mentioned initially that
they are working with Kafka 1.0

Akshay Bhardwaj
+91-97111-33849


On Wed, Feb 27, 2019 at 5:41 PM Gabor Somogyi 
wrote:

> Where exactly? In Kafka broker configuration section here it's 10080:
> https://kafka.apache.org/documentation/
>
> offsets.retention.minutes After a consumer group loses all its consumers
> (i.e. becomes empty) its offsets will be kept for this retention period
> before getting discarded. For standalone consumers (using manual
> assignment), offsets will be expired after the time of last commit plus
> this retention period. int 10080 [1,...] high read-only
>
> On Wed, Feb 27, 2019 at 1:04 PM Guillermo Ortiz 
> wrote:
>
>> I'm going to check the value, but I didn't change it., normally, the
>> process is always running but sometimes I have to restarted to apply some
>> changes. Sometimes it starts from the beginning and others continue for the
>> last offset.
>>
>> El mié., 27 feb. 2019 a las 12:25, Akshay Bhardwaj (<
>> akshay.bhardwaj1...@gmail.com>) escribió:
>>
>>> Hi Gabor,
>>>
>>> I am talking about offset.retention.minutes which is set default as
>>> 1440 (or 24 hours)
>>>
>>> Akshay Bhardwaj
>>> +91-97111-33849
>>>
>>>
>>> On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi 
>>> wrote:
>>>
 Hi Akshay,

 The feature what you've mentioned has a default value of 7 days...

 BR,
 G


 On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
 akshay.bhardwaj1...@gmail.com> wrote:

> Hi Guillermo,
>
> What was the interval in between restarting the spark job? As a
> feature in Kafka, a broker deleted offsets for a consumer group after
> inactivity of 24 hours.
> In such a case, the newly started spark streaming job will read
> offsets from beginning for the same groupId.
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com> wrote:
>
>> From the info you've provided not much to say.
>> Maybe you could collect sample app, logs etc, open a jira and we can
>> take a deeper look at it...
>>
>> BR,
>> G
>>
>>
>> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
>> wrote:
>>
>>> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
>>> Stream as connector. I consume data from Kafka and autosave the offsets.
>>> I can see Spark doing commits in the logs of the last offsets
>>> processed, Sometimes I have restarted spark and it starts from the
>>> beginning, when I'm using the same groupId.
>>>
>>> Why could it happen? it only happen rarely.
>>>
>>


Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Guillermo Ortiz
I'm going to check the value, but I didn't change it., normally, the
process is always running but sometimes I have to restarted to apply some
changes. Sometimes it starts from the beginning and others continue for the
last offset.

El mié., 27 feb. 2019 a las 12:25, Akshay Bhardwaj (<
akshay.bhardwaj1...@gmail.com>) escribió:

> Hi Gabor,
>
> I am talking about offset.retention.minutes which is set default as 1440
> (or 24 hours)
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi 
> wrote:
>
>> Hi Akshay,
>>
>> The feature what you've mentioned has a default value of 7 days...
>>
>> BR,
>> G
>>
>>
>> On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
>> akshay.bhardwaj1...@gmail.com> wrote:
>>
>>> Hi Guillermo,
>>>
>>> What was the interval in between restarting the spark job? As a feature
>>> in Kafka, a broker deleted offsets for a consumer group after inactivity of
>>> 24 hours.
>>> In such a case, the newly started spark streaming job will read offsets
>>> from beginning for the same groupId.
>>>
>>> Akshay Bhardwaj
>>> +91-97111-33849
>>>
>>>
>>> On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi 
>>> wrote:
>>>
 From the info you've provided not much to say.
 Maybe you could collect sample app, logs etc, open a jira and we can
 take a deeper look at it...

 BR,
 G


 On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
 wrote:

> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
> Stream as connector. I consume data from Kafka and autosave the offsets.
> I can see Spark doing commits in the logs of the last offsets
> processed, Sometimes I have restarted spark and it starts from the
> beginning, when I'm using the same groupId.
>
> Why could it happen? it only happen rarely.
>



Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Akshay Bhardwaj
Hi Gabor,

I am talking about offset.retention.minutes which is set default as 1440
(or 24 hours)

Akshay Bhardwaj
+91-97111-33849


On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi 
wrote:

> Hi Akshay,
>
> The feature what you've mentioned has a default value of 7 days...
>
> BR,
> G
>
>
> On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
> akshay.bhardwaj1...@gmail.com> wrote:
>
>> Hi Guillermo,
>>
>> What was the interval in between restarting the spark job? As a feature
>> in Kafka, a broker deleted offsets for a consumer group after inactivity of
>> 24 hours.
>> In such a case, the newly started spark streaming job will read offsets
>> from beginning for the same groupId.
>>
>> Akshay Bhardwaj
>> +91-97111-33849
>>
>>
>> On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi 
>> wrote:
>>
>>> From the info you've provided not much to say.
>>> Maybe you could collect sample app, logs etc, open a jira and we can
>>> take a deeper look at it...
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
>>> wrote:
>>>
 I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
 Stream as connector. I consume data from Kafka and autosave the offsets.
 I can see Spark doing commits in the logs of the last offsets
 processed, Sometimes I have restarted spark and it starts from the
 beginning, when I'm using the same groupId.

 Why could it happen? it only happen rarely.

>>>


Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Gabor Somogyi
Hi Akshay,

The feature what you've mentioned has a default value of 7 days...

BR,
G


On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:

> Hi Guillermo,
>
> What was the interval in between restarting the spark job? As a feature in
> Kafka, a broker deleted offsets for a consumer group after inactivity of 24
> hours.
> In such a case, the newly started spark streaming job will read offsets
> from beginning for the same groupId.
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi 
> wrote:
>
>> From the info you've provided not much to say.
>> Maybe you could collect sample app, logs etc, open a jira and we can take
>> a deeper look at it...
>>
>> BR,
>> G
>>
>>
>> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
>> wrote:
>>
>>> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct
>>> Stream as connector. I consume data from Kafka and autosave the offsets.
>>> I can see Spark doing commits in the logs of the last offsets processed,
>>> Sometimes I have restarted spark and it starts from the beginning, when I'm
>>> using the same groupId.
>>>
>>> Why could it happen? it only happen rarely.
>>>
>>


Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-26 Thread Akshay Bhardwaj
Hi Guillermo,

What was the interval in between restarting the spark job? As a feature in
Kafka, a broker deleted offsets for a consumer group after inactivity of 24
hours.
In such a case, the newly started spark streaming job will read offsets
from beginning for the same groupId.

Akshay Bhardwaj
+91-97111-33849


On Thu, Feb 21, 2019 at 9:08 PM Gabor Somogyi 
wrote:

> From the info you've provided not much to say.
> Maybe you could collect sample app, logs etc, open a jira and we can take
> a deeper look at it...
>
> BR,
> G
>
>
> On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
> wrote:
>
>> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct Stream
>> as connector. I consume data from Kafka and autosave the offsets.
>> I can see Spark doing commits in the logs of the last offsets processed,
>> Sometimes I have restarted spark and it starts from the beginning, when I'm
>> using the same groupId.
>>
>> Why could it happen? it only happen rarely.
>>
>


Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-21 Thread Gabor Somogyi
>From the info you've provided not much to say.
Maybe you could collect sample app, logs etc, open a jira and we can take a
deeper look at it...

BR,
G


On Thu, Feb 21, 2019 at 4:14 PM Guillermo Ortiz 
wrote:

> I' working with Spark Streaming 2.0.2 and Kafka 1.0.0 using Direct Stream
> as connector. I consume data from Kafka and autosave the offsets.
> I can see Spark doing commits in the logs of the last offsets processed,
> Sometimes I have restarted spark and it starts from the beginning, when I'm
> using the same groupId.
>
> Why could it happen? it only happen rarely.
>