Re: Official Kafka Disaster Recovery is insufficient - Suggestions needed

2018-09-03 Thread Ryanne Dolan
Sorry to have misspelled your name Henning.

On Mon, Sep 3, 2018, 1:26 PM Ryanne Dolan  wrote:

> Hanning,
>
> In mission-critical (and indeed GDPR-related) applications, I've ETL'd
> Kafka to a secondary store e.g. HDFS, and built tooling around recovering
> state back into Kafka. I've had situations where data is accidentally or
> incorrectly ingested into Kafka, causing downstream systems to process bad
> data. This, in my experience, is astronomically more likely than the other
> DR scenarios you describe. But my approach is the same:
>
> - don't treat Kafka as a source-of-truth. It is hard to fix data in an
> append-only log, so we can't trust it to always be correct.
>
> - ETL Kafka topics to a read-only, append-only, indexable log e.g. in
> HDFS, and then build tooling to reingest data from HDFS back into Kafka.
> That way in the event of disaster, data can be recovered from cold storage.
> Don't rely on unlimited retention in Kafka.
>
> - build everything around log compaction, keys, and idempotency. People
> think compaction is just to save space, but it is also the only way to
> layer new records over existing records in an otherwise append-only
> environment. I've built pipelines that let me surgically remove or fix
> records at rest and then submit them back to Kafka. Since these records
> have the same key, processors will treat them as replacements to earlier
> records. Additionally, processors should honor timestamps and/or sequence
> IDs and not the actual order of records in a partition. That way old data
> can be ingested from HDFS -> Kafka idempotently.
>
> Imagine that one record out of millions is bad and you don't notice it for
> months. You can grab this record from HDFS, modify it, and then submit it
> back to Kafka. Even tho it is in the stream months later than real-time,
> processors will treat it as a replacement for the old bad record and the
> entire system will end up exactly as if the record was never bad. If you
> can achieve these semantics consistently, DR is straightforward.
>
> - don't worry too much about offsets wrt consumer progress. Timestamps are
> usually more important. If the above is in place, it doesn't matter if you
> skip some records during failover. Just reprocess a few hours from cold
> storage and it's like the failure never happened.
>
> Ryanne
>
> On Mon, Sep 3, 2018, 9:49 AM Henning Røigaard-Petersen 
> wrote:
>
>> I am looking for advice on how to handle disasters not covered by the
>> official methods of replication, whether intra-cluster replication (via
>> replication factor and producer acks) or multi-cluster replication (using
>> Confluent Replicator).
>>
>>
>>
>> We are looking into using Kafka not only as a broker for decoupling, but
>> also as event store.
>>
>> For us, data is mission critical and has unlimited retention with the
>> option to compact (required by GDPR).
>>
>>
>>
>> We are especially interested in two types of disasters:
>>
>> 1.   Unintentional or malicious use of the Kafka API to create or
>> compact messages, as well as deletions of topics or logs.
>>
>> 2.   Loss of tail data from partitions if all ISR fail in the
>> primary cluster. A special case of this is the loss of the tail from the
>> commit topic, which results in loss of consumer progress.
>>
>>
>>
>> Ad (1)
>>
>> Replication does not guard against disaster (1), as the resulting
>> messages and deletions spread to the replicas.
>>
>> A naive solution is to simply secure the cluster, and ensure that there
>> are no errors in the code (…), but anyone with an ounce of experience with
>> software knows that stuff goes wrong.
>>
>>
>>
>> Ad (2)
>>
>> The potential of disaster (2) is much worse. In case of total data center
>> loss, the secondary cluster will lose the tail of every partition not fully
>> replicated. Besides the data loss itself, there is now inconsistencies
>> between related topics and partitions, which breaks the state of the system.
>>
>> Granted, the likelihood of total center loss is not great, but there is a
>> reason we have multi-center setups.
>>
>> The special case with loss of consumer offsets results in double
>> processing of events, once we resume processing from an older offset. While
>> idempotency is a solution, it might not always be possible nor desirable.
>>
>>
>>
>> Once any of these types of disaster has occurred, there is no means to
>> restore lost data, and even worse, we cannot restore the cluster to a point
>> in time where it is consistent. We could probably look our customers in the
>> eyes and tell them that they lost a days worth of progress, but we cannot
>> inform them of a state of perpetual inconsistency.
>>
>>
>>
>> The only solution we can think of right now is to shut the primary
>> cluster down (ensuring that no new events are produced), and then copy all
>> files to some secure location, i.e. effectively creating a backup, allowing
>> restoration to a consistency point in time. Though the tail (backup-wis

Re: Official Kafka Disaster Recovery is insufficient - Suggestions needed

2018-09-03 Thread Ryanne Dolan
Hanning,

In mission-critical (and indeed GDPR-related) applications, I've ETL'd
Kafka to a secondary store e.g. HDFS, and built tooling around recovering
state back into Kafka. I've had situations where data is accidentally or
incorrectly ingested into Kafka, causing downstream systems to process bad
data. This, in my experience, is astronomically more likely than the other
DR scenarios you describe. But my approach is the same:

- don't treat Kafka as a source-of-truth. It is hard to fix data in an
append-only log, so we can't trust it to always be correct.

- ETL Kafka topics to a read-only, append-only, indexable log e.g. in HDFS,
and then build tooling to reingest data from HDFS back into Kafka. That way
in the event of disaster, data can be recovered from cold storage. Don't
rely on unlimited retention in Kafka.

- build everything around log compaction, keys, and idempotency. People
think compaction is just to save space, but it is also the only way to
layer new records over existing records in an otherwise append-only
environment. I've built pipelines that let me surgically remove or fix
records at rest and then submit them back to Kafka. Since these records
have the same key, processors will treat them as replacements to earlier
records. Additionally, processors should honor timestamps and/or sequence
IDs and not the actual order of records in a partition. That way old data
can be ingested from HDFS -> Kafka idempotently.

Imagine that one record out of millions is bad and you don't notice it for
months. You can grab this record from HDFS, modify it, and then submit it
back to Kafka. Even tho it is in the stream months later than real-time,
processors will treat it as a replacement for the old bad record and the
entire system will end up exactly as if the record was never bad. If you
can achieve these semantics consistently, DR is straightforward.

- don't worry too much about offsets wrt consumer progress. Timestamps are
usually more important. If the above is in place, it doesn't matter if you
skip some records during failover. Just reprocess a few hours from cold
storage and it's like the failure never happened.

Ryanne

On Mon, Sep 3, 2018, 9:49 AM Henning Røigaard-Petersen 
wrote:

> I am looking for advice on how to handle disasters not covered by the
> official methods of replication, whether intra-cluster replication (via
> replication factor and producer acks) or multi-cluster replication (using
> Confluent Replicator).
>
>
>
> We are looking into using Kafka not only as a broker for decoupling, but
> also as event store.
>
> For us, data is mission critical and has unlimited retention with the
> option to compact (required by GDPR).
>
>
>
> We are especially interested in two types of disasters:
>
> 1.   Unintentional or malicious use of the Kafka API to create or
> compact messages, as well as deletions of topics or logs.
>
> 2.   Loss of tail data from partitions if all ISR fail in the primary
> cluster. A special case of this is the loss of the tail from the commit
> topic, which results in loss of consumer progress.
>
>
>
> Ad (1)
>
> Replication does not guard against disaster (1), as the resulting messages
> and deletions spread to the replicas.
>
> A naive solution is to simply secure the cluster, and ensure that there
> are no errors in the code (…), but anyone with an ounce of experience with
> software knows that stuff goes wrong.
>
>
>
> Ad (2)
>
> The potential of disaster (2) is much worse. In case of total data center
> loss, the secondary cluster will lose the tail of every partition not fully
> replicated. Besides the data loss itself, there is now inconsistencies
> between related topics and partitions, which breaks the state of the system.
>
> Granted, the likelihood of total center loss is not great, but there is a
> reason we have multi-center setups.
>
> The special case with loss of consumer offsets results in double
> processing of events, once we resume processing from an older offset. While
> idempotency is a solution, it might not always be possible nor desirable.
>
>
>
> Once any of these types of disaster has occurred, there is no means to
> restore lost data, and even worse, we cannot restore the cluster to a point
> in time where it is consistent. We could probably look our customers in the
> eyes and tell them that they lost a days worth of progress, but we cannot
> inform them of a state of perpetual inconsistency.
>
>
>
> The only solution we can think of right now is to shut the primary cluster
> down (ensuring that no new events are produced), and then copy all files to
> some secure location, i.e. effectively creating a backup, allowing
> restoration to a consistency point in time. Though the tail (backup-wise)
> will be lost in case of disaster, we are ensured a consistent state to
> restore to.
>
> As a useful side effect, such backup-files can also be used to create
> environments for test or other destructive purposes.
>
>
>
> Does anyone h

KAFKA-7093 - Warn Messages in Kafka 1.1.0

2018-09-03 Thread Debraj Manna
Hi

I am also observing lot of logs as discussed in
KAFKA-7093
 . Anyone any thoughs?
What does this denote? What does it effect and how to recover from this?

Thanks,


After stopping and restarting Kafka Broker ISR doesn't show ISR doesn't show newly started broker

2018-09-03 Thread suresh sargar
Hi All,

I have 3 node cluster running Kafka in container environment. we have RF=3
when we stop any broker we could see updated ISR with running brokers, but
when we restart broker i could see zookeeper log that broker is up and
connected, however ISR don't reflect that broker with recently started.

i couldn't see any error in broker logs about any connection issue.Please
suggest.

Thanks,
Suresh.


Official Kafka Disaster Recovery is insufficient - Suggestions needed

2018-09-03 Thread Henning Røigaard-Petersen
I am looking for advice on how to handle disasters not covered by the official 
methods of replication, whether intra-cluster replication (via replication 
factor and producer acks) or multi-cluster replication (using Confluent 
Replicator).

We are looking into using Kafka not only as a broker for decoupling, but also 
as event store.
For us, data is mission critical and has unlimited retention with the option to 
compact (required by GDPR).

We are especially interested in two types of disasters:

1.   Unintentional or malicious use of the Kafka API to create or compact 
messages, as well as deletions of topics or logs.

2.   Loss of tail data from partitions if all ISR fail in the primary 
cluster. A special case of this is the loss of the tail from the commit topic, 
which results in loss of consumer progress.

Ad (1)
Replication does not guard against disaster (1), as the resulting messages and 
deletions spread to the replicas.
A naive solution is to simply secure the cluster, and ensure that there are no 
errors in the code (...), but anyone with an ounce of experience with software 
knows that stuff goes wrong.

Ad (2)
The potential of disaster (2) is much worse. In case of total data center loss, 
the secondary cluster will lose the tail of every partition not fully 
replicated. Besides the data loss itself, there is now inconsistencies between 
related topics and partitions, which breaks the state of the system.
Granted, the likelihood of total center loss is not great, but there is a 
reason we have multi-center setups.
The special case with loss of consumer offsets results in double processing of 
events, once we resume processing from an older offset. While idempotency is a 
solution, it might not always be possible nor desirable.

Once any of these types of disaster has occurred, there is no means to restore 
lost data, and even worse, we cannot restore the cluster to a point in time 
where it is consistent. We could probably look our customers in the eyes and 
tell them that they lost a days worth of progress, but we cannot inform them of 
a state of perpetual inconsistency.

The only solution we can think of right now is to shut the primary cluster down 
(ensuring that no new events are produced), and then copy all files to some 
secure location, i.e. effectively creating a backup, allowing restoration to a 
consistency point in time. Though the tail (backup-wise) will be lost in case 
of disaster, we are ensured a consistent state to restore to.
As a useful side effect, such backup-files can also be used to create 
environments for test or other destructive purposes.

Does anyone have a better idea?


Venlig hilsen / Best regards

Henning Røigaard-Petersen
Principal Developer
MSc in Mathematics

h...@edlund.dk 
Direct phone

+45 36150272


[https://www.edlund.dk/sites/default/files/edlundnylogo1.png]

Edlund A/S
La Cours Vej 7
DK-2000 Frederiksberg
Tel +45 36150630
www.edlund.dk


Re: resetting consumer group offset to earliest and to-latest not working

2018-09-03 Thread Joseph M'BIMBI-BENE
After checking at work, it seems to work fine.

The resetting would fail because of other consumers joining the group due
to deployment config errors and i would afterwards check that everything is
fine using the incorrect command, forgetting the "-execute".

Thank you

On Sat, 1 Sep 2018 at 15:15, Joseph M'BIMBI-BENE 
wrote:

> Oh thank you for pointing that out, in the screenshot i sent i indeed
> forgot that parameter.
> It was an attempt to reproduce an error encountered on another system
>
> On that system i can see that the source code includes the --execute
> command.
> I will send you logs, screenshots etc. using the original system when i'm
> back at work.
>
> Thank you
>
> On Sat, 1 Sep 2018 at 15:02, Patrik Kleindl  wrote:
>
>> Hello
>> Did you add --execute to the command?
>> Which command did you use?
>> Best regards
>> Patrik
>>
>> > Am 01.09.2018 um 14:54 schrieb Joseph M'BIMBI-BENE <
>> joseph.mbi...@gmail.com>:
>> >
>> > Hello everyone,
>> >
>> > Hopefully this is the appropriate mailing list for my message.
>> > When i am trying to reset the offset of some consumer group, i get some
>> echo telling me that the offset has indeed been reset to earliest or
>> latest, but checking right after, the offset is still at its previous
>> position, and restarting the consumers on the group, they indeed continue
>> to consume message even after issuing the command to reset the offsets of
>> the partitions to the latest offset.
>> >
>> > I am Using Kafka 1.1.1 for scala 2.11, and i will put a screenshot of
>> the terminal if that could help you help me.
>> >
>> > Thank you in advance. Best regards
>>
>


Best way to manage topics using Java API

2018-09-03 Thread Robin Perice

Hi,

I'm currently using *kafka_2.11* *1.0.0***and I want to list, create, 
delete topics directly in from my Java code.


When searching on stackoverflow, all examples are using *AdminUtils 
*(kafka.admin.AdminUtils) which is not really documented.


I found in the documentation that I should use *AdminClient *API 
(http://kafka.apache.org/10/documentation.html#adminapi), which is well 
documentated. But this interface stability is marked as 'Evolving' (I 
don't think we are going to update the Kafka version soon on my project).


So, I'm asking what is the better way to manage topics from Java ?


Regards,

Robin

*
*

*--*  
Robin PERICE

INGENIEUR DEVELOPPEMENT LOGICIEL
Tel.: +33 (0) 5 62 88 80 39



Could not find or load main class kafka.Kafka (Eclipse Startup)

2018-09-03 Thread M. Manna
Hello,

I completed set up according to cwiki instruction -
https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup

I have also built all the sources and jars using `gradlew jarAll`

After that, I created a Scala application configuration in Eclipse to
debug. I provided the sources to log4j config file and server.properties
file location as arguments. After that, I also included the projects
"Clients", "Tools", and "Core" in the classpath. But the broker couldn't be
started in debug mode. I always get the following error.

Error: Could not find or load main class kafka.Kafka

Could someone guide me to fixing this issue? I was able to run this in the
past, but for some reason it has stopped working and I am struggling to get
this back up. I hope I've followed the correct process above.

Regards,


Re: Determining broker max buffer size from producer.

2018-09-03 Thread Satish Duggana
Are you asking about the maximum message size that can be sent from a
producer?
max.message.bytes is used to control that at broker/topic level. You can
find more details
at the below links.

*"The largest record batch size allowed by Kafka. If this is increased and
there are consumers older than 0.10.2, the consumers' fetch size must also
be increased so that the they can fetch record batches this large. In the
latest message format version, records are always grouped into batches for
efficiency. In previous message format versions, uncompressed records are
not grouped into batches and this limit only applies to a single record in
that case."*


https://kafka.apache.org/documentation/#topicconfigs
https://kafka.apache.org/documentation/#brokerconfigs

Thanks,
Satish.

On Mon, Sep 3, 2018 at 2:12 PM, claude.war...@wipro.com <
claude.war...@wipro.com> wrote:

> Greetings,
>
>
> I have spent several hours looking through documentation and historical
> email chains and have been unable to find a solution to my problem.
>
>
> I have a case where I need to construct a producer and I need to know the
> broker max buffer size so that I don't get any nasty buffer too big errors
> from the server when serializing large objects.  I have been unable to find
> any way to do this short of reading the broker configuration and I can't do
> that from a remote call (as far as I can tell).  Does anybody have a
> solution?
>
>
> Claude
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>


Re: After stopping and restarting Kafka Broker ISR doesn't show ISR doesn't show newly started broker

2018-09-03 Thread M. Manna
Hi Suresh,

Can you please share your startup logs, and broker/zookeeper config with
us? Also, which kafka version are you using?

Regards,

On Mon, 3 Sep 2018 at 10:18, suresh.sar...@gmail.com <
suresh.sar...@gmail.com> wrote:

> Hi All,
>
> I have 3 node cluster running Kafka in container environment. we have RF=3
> when we stop any broker we could see updated ISR with running brokers, but
> when we restart broker i could see zookeeper log that broker is up and
> connected, however ISR don't reflect that broker with recently started.
>
> i couldn't see any error in broker logs about any connection issue.Please
> suggest.
>
> Thanks,
> Suresh.
>


After stopping and restarting Kafka Broker ISR doesn't show ISR doesn't show newly started broker

2018-09-03 Thread suresh . sargar
Hi All,

I have 3 node cluster running Kafka in container environment. we have RF=3 when 
we stop any broker we could see updated ISR with running brokers, but when we 
restart broker i could see zookeeper log that broker is up and connected, 
however ISR don't reflect that broker with recently started.

i couldn't see any error in broker logs about any connection issue.Please 
suggest.

Thanks,
Suresh.


Determining broker max buffer size from producer.

2018-09-03 Thread claude.war...@wipro.com
Greetings,


I have spent several hours looking through documentation and historical email 
chains and have been unable to find a solution to my problem.


I have a case where I need to construct a producer and I need to know the 
broker max buffer size so that I don't get any nasty buffer too big errors from 
the server when serializing large objects.  I have been unable to find any way 
to do this short of reading the broker configuration and I can't do that from a 
remote call (as far as I can tell).  Does anybody have a solution?


Claude

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com