Unable to create new topics, "no brokers available"

2016-12-28 Thread Alex Eftimie
Hello,

We recently migrated from Kafka 0.8 to 0.10, while keeping the log format and 
internal communication at 0.9 version[1]. We have a cluster of two nodes which 
is working correctly for 10 topics (producers/consumers work fine). 

Trying to create a new topic raises: 
kafka-topics.sh --zookeeper zknode:2181 --create --replication-factor 1 
--partitions 2 --topic newtopic
Error while executing topic command : replication factor: 1 larger than 
available brokers: 0
[2016-12-28 09:30:02,177] ERROR 
org.apache.kafka.common.errors.InvalidReplicationFactorException: replication 
factor: 1 larger than available brokers: 0
 (kafka.admin.TopicCommand$)

There are two available brokers, so this must be a different error hidden 
behind this error message. There’s nothing in the log file. 

Where should we start investigating?

Thanks,

Alex Eftimie
Software Engineer DevOps

[1]
kafka version: 0.10.1.0
# config:
inter.broker.protocol.version=0.9.0.1
log.message.format.version=0.9.0.1


-- 


--

[image: email GYG_Logo__Standart_Transparent_400x600_WEB copy.png] 


GetYourGuide AG

Technoparkstrasse 1  

8005 Zürich

Switzerland

  
 
 [image: Icon_4.png] 



Re: How to get and reset Kafka stream application's offsets

2016-12-28 Thread Matthias J. Sax
Hi Sachin,

What do you mean by "with this commented"? Did you set auto.offset.reset
to "earliest" or not? Default value is "latest" and if you do not set it
to "earliest", that the application will start consuming from
end-of-topic if no committed offsets are found.

For default values of Kafka Streams parameters see
http://docs.confluent.io/3.0.1/streams/developer-guide.html#configuring-a-kafka-streams-application

For default consumer parameters see
http://kafka.apache.org/0100/documentation.html#newconsumerconfigs


-Matthias


On 12/27/16 3:27 PM, Sachin Mittal wrote:
> Hi,
> I started my new streams app with this commented
> //props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> 
> What I observed was that it started consuming message from latest offset
> from the source topic.
> Based on comments by Eno I thought that if offset do not exist then
> 
> So in practice an app will include the property below (e.g., set to
> "earliest")
> 
> However it picked the latest offset. Also I check this props
> auto.offset.reset in the latest doc and I see it as largest.
> 
> So now I am confused that in streams application what is the default offset
> if none exist?
> 
> note I am using version kafka_2.10-0.10.0.1.
> 
> Thanks
> Sachin
> 
> 
> 
> On Mon, Nov 21, 2016 at 7:08 PM, Eno Thereska 
> wrote:
> 
>> Hi Sachin,
>>
>> There is no need to check within the app whether the offset exists or not,
>> since the consumer code will do that check automatically for you. So in
>> practice an app will include the property below (e.g., set to "earliest"),
>> but that will only have an effect if the consumer detects that the offsets
>> do not exist anymore. If the offset exist, then that line is a noop.
>>
>> So in summary, I'd just include that property, and no more code changes
>> are required.
>>
>> Thanks
>> Eno
>>
>>> On 21 Nov 2016, at 12:11, Sachin Mittal  wrote:
>>>
>>> So in my java code how can I check
>>> when there is no initial offset in Kafka or if the current offset does
>> not
>>> exist any more on the server (e.g. because that data has been deleted)
>>>
>>> So in this case as you have said I can set offset as
>>> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); //or
>> latest
>>>
>>> Thanks
>>> Sachin
>>>
>>>
>>> On Mon, Nov 21, 2016 at 4:16 PM, Eno Thereska > >
>>> wrote:
>>>
 Hi Sachin,

 Currently you can only change the following global configuration by
>> using
 "earliest" or "latest", as shown in the code snippet below. As the
>> Javadoc
 mentions: "What to do when there is no initial offset in Kafka or if the
 current offset does not exist any more on the server (e.g. because that
 data has been deleted)":

 ...
 props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
 ...
 return new KafkaStreams(builder, props)



 In addition, there is a tool to reset the offsets of all topics to the
 beginning. This is useful for reprocessing:
>> https://www.confluent.io/blog/
 data-reprocessing-with-kafka-streams-resetting-a-streams-application/ <
 https://www.confluent.io/blog/data-reprocessing-with- <
>> https://www.confluent.io/blog/data-reprocessing-with->
 kafka-streams-resetting-a-streams-application/>

 However, there is no option currently for resetting the offset to an
 arbitrary offset.

 Thanks
 Eno

> On 21 Nov 2016, at 10:37, Sachin Mittal  wrote:
>
> Hi
> I am running a streaming application with
> streamsProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
>
> How do I find out the offsets for each of the source, intermediate and
> internal topics associated with this application.
>
> And how can I reset them to some specific value via shell of otherwise.
>
> Thanks
> Sachin
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: How to get and reset Kafka stream application's offsets

2016-12-28 Thread Sachin Mittal
Understood. So if we want it to start consuming from earliest we should add
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

So when we start the app first time it will start from earliest. Later when
we stop this app and restart it will start from point where we has last
committed offset,.
(No need to comment this line then and then restart.)

Hence earliest (or latest) offset is used only when no offsets are found
for that consumer group.

Thanks
Sachin


On Wed, Dec 28, 2016 at 3:02 PM, Matthias J. Sax 
wrote:

> Hi Sachin,
>
> What do you mean by "with this commented"? Did you set auto.offset.reset
> to "earliest" or not? Default value is "latest" and if you do not set it
> to "earliest", that the application will start consuming from
> end-of-topic if no committed offsets are found.
>
> For default values of Kafka Streams parameters see
> http://docs.confluent.io/3.0.1/streams/developer-guide.
> html#configuring-a-kafka-streams-application
>
> For default consumer parameters see
> http://kafka.apache.org/0100/documentation.html#newconsumerconfigs
>
>
> -Matthias
>
>
> On 12/27/16 3:27 PM, Sachin Mittal wrote:
> > Hi,
> > I started my new streams app with this commented
> > //props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> >
> > What I observed was that it started consuming message from latest offset
> > from the source topic.
> > Based on comments by Eno I thought that if offset do not exist then
> >
> > So in practice an app will include the property below (e.g., set to
> > "earliest")
> >
> > However it picked the latest offset. Also I check this props
> > auto.offset.reset in the latest doc and I see it as largest.
> >
> > So now I am confused that in streams application what is the default
> offset
> > if none exist?
> >
> > note I am using version kafka_2.10-0.10.0.1.
> >
> > Thanks
> > Sachin
> >
> >
> >
> > On Mon, Nov 21, 2016 at 7:08 PM, Eno Thereska 
> > wrote:
> >
> >> Hi Sachin,
> >>
> >> There is no need to check within the app whether the offset exists or
> not,
> >> since the consumer code will do that check automatically for you. So in
> >> practice an app will include the property below (e.g., set to
> "earliest"),
> >> but that will only have an effect if the consumer detects that the
> offsets
> >> do not exist anymore. If the offset exist, then that line is a noop.
> >>
> >> So in summary, I'd just include that property, and no more code changes
> >> are required.
> >>
> >> Thanks
> >> Eno
> >>
> >>> On 21 Nov 2016, at 12:11, Sachin Mittal  wrote:
> >>>
> >>> So in my java code how can I check
> >>> when there is no initial offset in Kafka or if the current offset does
> >> not
> >>> exist any more on the server (e.g. because that data has been deleted)
> >>>
> >>> So in this case as you have said I can set offset as
> >>> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); //or
> >> latest
> >>>
> >>> Thanks
> >>> Sachin
> >>>
> >>>
> >>> On Mon, Nov 21, 2016 at 4:16 PM, Eno Thereska  >> >
> >>> wrote:
> >>>
>  Hi Sachin,
> 
>  Currently you can only change the following global configuration by
> >> using
>  "earliest" or "latest", as shown in the code snippet below. As the
> >> Javadoc
>  mentions: "What to do when there is no initial offset in Kafka or if
> the
>  current offset does not exist any more on the server (e.g. because
> that
>  data has been deleted)":
> 
>  ...
>  props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
>  ...
>  return new KafkaStreams(builder, props)
> 
> 
> 
>  In addition, there is a tool to reset the offsets of all topics to the
>  beginning. This is useful for reprocessing:
> >> https://www.confluent.io/blog/
>  data-reprocessing-with-kafka-streams-resetting-a-streams-application/
> <
>  https://www.confluent.io/blog/data-reprocessing-with- <
> >> https://www.confluent.io/blog/data-reprocessing-with->
>  kafka-streams-resetting-a-streams-application/>
> 
>  However, there is no option currently for resetting the offset to an
>  arbitrary offset.
> 
>  Thanks
>  Eno
> 
> > On 21 Nov 2016, at 10:37, Sachin Mittal  wrote:
> >
> > Hi
> > I am running a streaming application with
> > streamsProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
> >
> > How do I find out the offsets for each of the source, intermediate
> and
> > internal topics associated with this application.
> >
> > And how can I reset them to some specific value via shell of
> otherwise.
> >
> > Thanks
> > Sachin
> >>
> >>
> >
>
>


Re: Unable to create new topics, "no brokers available"

2016-12-28 Thread Alex Eftimie
Quick update: digging into Zookeeper data, we observed that the /brokers/ids 
path was empty. Restarting the kafka nodes repopulated zookeeper data, and now 
the error is gone (we are able to create new topics).

We didn’t alter data in Zookeeper manually, but recently we added 2 nodes to a 
3 nodes zookeeper cluster. Also, we had a network partitioning issue in the 
past. Could any of these be the reason for the lost ZK data?

Any other hints on what may have generated the data loss in ZK?

Thanks,
Alex

> On 28 Dec 2016, at 10:00, Alex Eftimie  wrote:
> 
> Hello,
> 
> We recently migrated from Kafka 0.8 to 0.10, while keeping the log format and 
> internal communication at 0.9 version[1]. We have a cluster of two nodes 
> which is working correctly for 10 topics (producers/consumers work fine). 
> 
> Trying to create a new topic raises: 
> kafka-topics.sh --zookeeper zknode:2181 --create --replication-factor 1 
> --partitions 2 --topic newtopic
> Error while executing topic command : replication factor: 1 larger than 
> available brokers: 0
> [2016-12-28 09:30:02,177] ERROR 
> org.apache.kafka.common.errors.InvalidReplicationFactorException: replication 
> factor: 1 larger than available brokers: 0
>  (kafka.admin.TopicCommand$)
> 
> There are two available brokers, so this must be a different error hidden 
> behind this error message. There’s nothing in the log file. 
> 
> Where should we start investigating?
> 
> Thanks,
> 
> Alex Eftimie
> Software Engineer DevOps
> 
> [1]
> kafka version: 0.10.1.0
> # config:
> inter.broker.protocol.version=0.9.0.1
> log.message.format.version=0.9.0.1
> 


-- 


--

[image: email GYG_Logo__Standart_Transparent_400x600_WEB copy.png] 


GetYourGuide AG

Technoparkstrasse 1  

8005 Zürich

Switzerland

  
 
 [image: Icon_4.png] 



Re: Unable to create new topics, "no brokers available"

2016-12-28 Thread Stevo Slavić
Hello Alex,

ZooKeeper nodes that Kafka brokers create in /brokers/ids path, upon
registering themselves that they are available, are ephemeral, so not
persistent. Ephemeral ZooKeeper nodes live as long as ZooKeeper session
that created them is active. The /broker/ids child nodes are gone by design
when broker to ZooKeeper connection is down long enough (e.g. due to
network partition, or broker actually stopped/crashed). They are meant to
signal availability of particular broker in the Kafka cluster. Broker
announces its availability so cluster members can be discovered dynamically
(as brokers join or leave the cluster) by other Kafka brokers in the
cluster and by Kafka clients. So that data is meant to be lost by design.

Maybe ZooKeeper node type could be mentioned in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

Kind regards,
Stevo Slavic.

On Wed, Dec 28, 2016 at 10:57 AM, Alex Eftimie <
alex.efti...@getyourguide.com> wrote:

> Quick update: digging into Zookeeper data, we observed that the
> /brokers/ids path was empty. Restarting the kafka nodes repopulated
> zookeeper data, and now the error is gone (we are able to create new
> topics).
>
> We didn’t alter data in Zookeeper manually, but recently we added 2 nodes
> to a 3 nodes zookeeper cluster. Also, we had a network partitioning issue
> in the past. Could any of these be the reason for the lost ZK data?
>
> Any other hints on what may have generated the data loss in ZK?
>
> Thanks,
> Alex
>
> > On 28 Dec 2016, at 10:00, Alex Eftimie 
> wrote:
> >
> > Hello,
> >
> > We recently migrated from Kafka 0.8 to 0.10, while keeping the log
> format and internal communication at 0.9 version[1]. We have a cluster of
> two nodes which is working correctly for 10 topics (producers/consumers
> work fine).
> >
> > Trying to create a new topic raises:
> > kafka-topics.sh --zookeeper zknode:2181 --create
> --replication-factor 1 --partitions 2 --topic newtopic
> > Error while executing topic command : replication factor: 1 larger
> than available brokers: 0
> > [2016-12-28 09:30:02,177] ERROR org.apache.kafka.common.errors.
> InvalidReplicationFactorException: replication factor: 1 larger than
> available brokers: 0
> >  (kafka.admin.TopicCommand$)
> >
> > There are two available brokers, so this must be a different error
> hidden behind this error message. There’s nothing in the log file.
> >
> > Where should we start investigating?
> >
> > Thanks,
> >
> > Alex Eftimie
> > Software Engineer DevOps
> >
> > [1]
> > kafka version: 0.10.1.0
> > # config:
> > inter.broker.protocol.version=0.9.0.1
> > log.message.format.version=0.9.0.1
> >
>
>
> --
>
>
> --
>
> [image: email GYG_Logo__Standart_Transparent_400x600_WEB copy.png]
> 
>
> GetYourGuide AG
>
> Technoparkstrasse 1
>
> 8005 Zürich
>
> Switzerland
>
>  
> 
>  [image: Icon_4.png]
> 
>


Re: How to get and reset Kafka stream application's offsets

2016-12-28 Thread Matthias J. Sax
Exactly.


On 12/28/16 10:47 AM, Sachin Mittal wrote:
> Understood. So if we want it to start consuming from earliest we should add
> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> 
> So when we start the app first time it will start from earliest. Later when
> we stop this app and restart it will start from point where we has last
> committed offset,.
> (No need to comment this line then and then restart.)
> 
> Hence earliest (or latest) offset is used only when no offsets are found
> for that consumer group.
> 
> Thanks
> Sachin
> 
> 
> On Wed, Dec 28, 2016 at 3:02 PM, Matthias J. Sax 
> wrote:
> 
>> Hi Sachin,
>>
>> What do you mean by "with this commented"? Did you set auto.offset.reset
>> to "earliest" or not? Default value is "latest" and if you do not set it
>> to "earliest", that the application will start consuming from
>> end-of-topic if no committed offsets are found.
>>
>> For default values of Kafka Streams parameters see
>> http://docs.confluent.io/3.0.1/streams/developer-guide.
>> html#configuring-a-kafka-streams-application
>>
>> For default consumer parameters see
>> http://kafka.apache.org/0100/documentation.html#newconsumerconfigs
>>
>>
>> -Matthias
>>
>>
>> On 12/27/16 3:27 PM, Sachin Mittal wrote:
>>> Hi,
>>> I started my new streams app with this commented
>>> //props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
>>>
>>> What I observed was that it started consuming message from latest offset
>>> from the source topic.
>>> Based on comments by Eno I thought that if offset do not exist then
>>>
>>> So in practice an app will include the property below (e.g., set to
>>> "earliest")
>>>
>>> However it picked the latest offset. Also I check this props
>>> auto.offset.reset in the latest doc and I see it as largest.
>>>
>>> So now I am confused that in streams application what is the default
>> offset
>>> if none exist?
>>>
>>> note I am using version kafka_2.10-0.10.0.1.
>>>
>>> Thanks
>>> Sachin
>>>
>>>
>>>
>>> On Mon, Nov 21, 2016 at 7:08 PM, Eno Thereska 
>>> wrote:
>>>
 Hi Sachin,

 There is no need to check within the app whether the offset exists or
>> not,
 since the consumer code will do that check automatically for you. So in
 practice an app will include the property below (e.g., set to
>> "earliest"),
 but that will only have an effect if the consumer detects that the
>> offsets
 do not exist anymore. If the offset exist, then that line is a noop.

 So in summary, I'd just include that property, and no more code changes
 are required.

 Thanks
 Eno

> On 21 Nov 2016, at 12:11, Sachin Mittal  wrote:
>
> So in my java code how can I check
> when there is no initial offset in Kafka or if the current offset does
 not
> exist any more on the server (e.g. because that data has been deleted)
>
> So in this case as you have said I can set offset as
> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); //or
 latest
>
> Thanks
> Sachin
>
>
> On Mon, Nov 21, 2016 at 4:16 PM, Eno Thereska >>> >
> wrote:
>
>> Hi Sachin,
>>
>> Currently you can only change the following global configuration by
 using
>> "earliest" or "latest", as shown in the code snippet below. As the
 Javadoc
>> mentions: "What to do when there is no initial offset in Kafka or if
>> the
>> current offset does not exist any more on the server (e.g. because
>> that
>> data has been deleted)":
>>
>> ...
>> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
>> ...
>> return new KafkaStreams(builder, props)
>>
>>
>>
>> In addition, there is a tool to reset the offsets of all topics to the
>> beginning. This is useful for reprocessing:
 https://www.confluent.io/blog/
>> data-reprocessing-with-kafka-streams-resetting-a-streams-application/
>> <
>> https://www.confluent.io/blog/data-reprocessing-with- <
 https://www.confluent.io/blog/data-reprocessing-with->
>> kafka-streams-resetting-a-streams-application/>
>>
>> However, there is no option currently for resetting the offset to an
>> arbitrary offset.
>>
>> Thanks
>> Eno
>>
>>> On 21 Nov 2016, at 10:37, Sachin Mittal  wrote:
>>>
>>> Hi
>>> I am running a streaming application with
>>> streamsProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
>>>
>>> How do I find out the offsets for each of the source, intermediate
>> and
>>> internal topics associated with this application.
>>>
>>> And how can I reset them to some specific value via shell of
>> otherwise.
>>>
>>> Thanks
>>> Sachin


>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUCE] Apache Kafka 0.10.1.1 Released

2016-12-28 Thread Ismael Juma
The website has now been updated with the 0.10.1.1 release details.

Ismael

On Sat, Dec 24, 2016 at 9:29 AM, Ismael Juma  wrote:

> Hi Guozhang and Allen,
>
> I filed an INFRA ticket about this:
>
> https://issues.apache.org/jira/browse/INFRA-13172
>
> This has happened to me before and it typically requires human
> intervention if the mirroring doesn't happen in a few minutes.
>
> Ismael
>
> On Fri, Dec 23, 2016 at 8:27 PM, Guozhang Wang  wrote:
>
>> Allen,
>>
>> Please see my previous email. The asf-site repo has been updated, but we
>> cannot control when it will be reload and reflected in the web site yet.
>> Usually it takes 5 min to a couple hours, but this time it somehow has not
>> refreshed yet.
>>
>>
>> Guozhang
>>
>>
>> On Fri, Dec 23, 2016 at 12:07 PM, allen chan <
>> allen.michael.c...@gmail.com>
>> wrote:
>>
>> > From what i can tell, it looks like the main kafka website is not
>> updated
>> > with this release. Download page shows 0.10.1.0 as latest release.
>> > The above link for release notes does not work either.
>> >
>> > Not Found
>> >
>> > The requested URL /dist/kafka/0.10.1.1/RELEASE_NOTES.html was not
>> found on
>> > this server.
>> >
>> > On Wed, Dec 21, 2016 at 7:50 PM, Guozhang Wang 
>> wrote:
>> >
>> > > The Apache Kafka community is pleased to announce the release for
>> Apache
>> > > Kafka 0.10.1.1. This is a bug fix release that fixes 30 issues in
>> > 0.10.1.0.
>> > >
>> > > All of the changes in this release can be found in the release notes:
>> > > *https://archive.apache.org/dist/kafka/0.10.1.1/RELEASE_NOTES.html
>> > > 
>> > >
>> > > Apache Kafka is a distributed streaming platform with four four core
>> > APIs:
>> > >
>> > > ** The Producer API allows an application to publish a stream records
>> to
>> > > one or more Kafka topics.
>> > >
>> > > ** The Consumer API allows an application to subscribe to one or more
>> > > topics and process the stream of records produced to them.
>> > >
>> > > ** The Streams API allows an application to act as a stream processor,
>> > > consuming an input stream from one or more topics and producing an
>> output
>> > > stream to one or more output topics, effectively transforming the
>> input
>> > > streams to output streams.
>> > >
>> > > ** The Connector API allows building and running reusable producers or
>> > > consumers that connect Kafka topics to existing applications or data
>> > > systems. For example, a connector to a relational database might
>> capture
>> > > every change to a table.three key capabilities:
>> > >
>> > >
>> > > With these APIs, Kafka can be used for two broad classes of
>> application:
>> > >
>> > > ** Building real-time streaming data pipelines that reliably get data
>> > > between systems or applications.
>> > >
>> > > ** Building real-time streaming applications that transform or react
>> to
>> > the
>> > > streams of data.
>> > >
>> > >
>> > > You can download the source release from
>> > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.
>> > > 1/kafka-0.10.1.1-src.tgz
>> > >
>> > > and binary releases from
>> > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.
>> > > 1/kafka_2.10-0.10.1.1.tgz
>> > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.
>> > > 1/kafka_2.11-0.10.1.1.tgz
>> > > > > > 1/kafka_2.11-0.10.1.1.tgz>
>> > >
>> > > A big thank you for the following 21 contributors to this release!
>> > >
>> > > Alexey Ozeritsky, Anton Karamanov, Ben Stopford, Bernard Leach, Bill
>> > > Bejeck, Damian Guy, Dan Norwood, Eno Thereska, Ewen Cheslack-Postava,
>> > > Guozhang Wang, Jason Gustafson, Jiangjie Qin, Jun He, Jun Rao, Kim
>> > > Christensen, Manikumar Reddy O, Matthias J. Sax, Mayuresh Gharat,
>> Rajini
>> > > Sivaram, Sumant Tambe, Vahid Hashemian
>> > >
>> > > We welcome your help and feedback. For more information on how to
>> > > report problems, and to get involved, visit the project website at
>> > > http://kafka.apache.org/
>> > >
>> > >
>> > > Thanks,
>> > > -- Guozhang
>> > >
>> >
>> >
>> >
>> > --
>> > Allen Michael Chan
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>
>


Re: Processing time series data in order

2016-12-28 Thread Ali Akhtar
This will only ensure the order of delivery though, not the actual order of
the events, right?

I.e if due to network lag or any other reason, if the producer sends A,
then B, but B arrives before A, then B will be returned before A even if
they both went to the same partition. Am I correct about that?

Or can I use KTables to ensure A is processed before B? (Both messages will
have a timestamp which is being extracted by a TimestampExtractor ).

On Tue, Dec 27, 2016 at 8:15 PM, Tauzell, Dave  wrote:

> If you specify a key with each message then all messages with the same key
> get sent to the same partition.
>
> > On Dec 26, 2016, at 23:32, Ali Akhtar  wrote:
> >
> > How would I route the messages to a specific partition?
> >
> >> On 27 Dec 2016 10:25 a.m., "Asaf Mesika"  wrote:
> >>
> >> There is a much easier approach: your can route all messages of a given
> Id
> >> to a specific partition. Since each partition has a single writer you
> get
> >> the ordering you wish for. Of course this won't work if your updates
> occur
> >> in different hosts.
> >> Also maybe Kafka streams can help shard the based on item Id to a second
> >> topic
> >>> On Thu, 22 Dec 2016 at 4:31 Ali Akhtar  wrote:
> >>>
> >>> The batch size can be large, so in memory ordering isn't an option,
> >>> unfortunately.
> >>>
> >>> On Thu, Dec 22, 2016 at 7:09 AM, Jesse Hodges 
> >>> wrote:
> >>>
>  Depending on the expected max out of order window, why not order them
> >> in
>  memory? Then you don't need to reread from Cassandra, in case of a
> >>> problem
>  you can reread data from Kafka.
> 
>  -Jesse
> 
> > On Dec 21, 2016, at 7:24 PM, Ali Akhtar 
> >> wrote:
> >
> > - I'm receiving a batch of messages to a Kafka topic.
> >
> > Each message has a timestamp, however the messages can arrive / get
>  processed out of order. I.e event 1's timestamp could've been a few
> >>> seconds
>  before event 2, and event 2 could still get processed before event 1.
> >
> > - I know the number of messages that are sent per batch.
> >
> > - I need to process the messages in order. The messages are basically
>  providing the history of an item. I need to be able to track the
> >> history
>  accurately (i.e, if an event occurred 3 times, i need to accurately
> log
> >>> the
>  dates of the first, 2nd, and 3rd time it occurred).
> >
> > The approach I'm considering is:
> >
> > - Creating a cassandra table which is ordered by the timestamp of the
>  messages.
> >
> > - Once a batch of messages has arrived, writing them all to
> >> cassandra,
>  counting on them being ordered by the timestamp even if they are
> >>> processed
>  out of order.
> >
> > - Then iterating over the messages in the cassandra table, to process
>  them in order.
> >
> > However, I'm concerned about Cassandra's eventual consistency. Could
> >> it
>  be that even though I wrote the messages, they are not there when I
> try
> >>> to
>  read them (which would be almost immediately after they are written)?
> >
> > Should I enforce consistency = ALL to make sure the messages will be
>  available immediately after being written?
> >
> > Is there a better way to handle this thru either Kafka streams or
>  Cassandra?
> 
> >>>
> >>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


broker disconnected from cluster

2016-12-28 Thread Alessandro De Maria
Hello,

I would like to get some help/advise on some issues I am having with my
kafka cluster.

I am running kafka (kafka_2.11-0.10.1.0) on a 5 broker cluster (ubuntu
16.04)

configuration is here: http://pastebin.com/cPch8Kd7

today one of the 5 brokers (id: 1) appeared to disconnect from the others:

The log shows this around that time
[2016-12-28 16:18:30,575] INFO Partition [aki_reload5yl_5,11] on broker 1:
Shrinking ISR for partition [aki_reload5yl_5,11] from 2,3,1 to 1
(kafka.cluster.Partition)
[2016-12-28 16:18:30,579] INFO Partition [ale_reload5yl_1,0] on broker 1:
Shrinking ISR for partition [ale_reload5yl_1,0] from 5,1,2 to 1
(kafka.cluster.Partition)
[2016-12-28 16:18:30,580] INFO Partition [hl7_staging,17] on broker 1:
Shrinking ISR for partition [hl7_staging,17] from 4,1,5 to 1
(kafka.cluster.Partition)
[2016-12-28 16:18:30,581] INFO Partition [hes_reload_5,37] on broker 1:
Shrinking ISR for partition [hes_reload_5,37] from 1,2,5 to 1
(kafka.cluster.Partition)
[2016-12-28 16:18:30,582] INFO Partition [aki_live,38] on broker 1:
Shrinking ISR for partition [aki_live,38] from 5,2,1 to 1
(kafka.cluster.Partition)
[2016-12-28 16:18:30,582] INFO Partition [hl7_live,51] on broker 1:
Shrinking ISR for partition [hl7_live,51] from 1,3,4 to 1
(kafka.cluster.Partition)

(other hosts had)
java.io.IOException: Connection to 1 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)


while this was happening, the ConsumerOffsetChecker was reporting only few
of the 128 partitions configured for some of the topics, and consumers
started crashing.

I then used KafkaManager to reassign partitions from broker 1 to other
brokers.

I could then see on the kafka1 log the following errors
[2016-12-28 17:23:51,816] ERROR [ReplicaFetcherThread-0-4], Error for
partition [aki_live,86] to broker
4:org.apache.kafka.common.errors.UnknownServerException: The server
experienced an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2016-12-28 17:23:51,817] ERROR [ReplicaFetcherThread-0-4], Error for
partition [aki_live,21] to broker
4:org.apache.kafka.common.errors.UnknownServerException: The server
experienced an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2016-12-28 17:23:51,817] ERROR [ReplicaFetcherThread-0-4], Error for
partition [aki_live,126] to broker
4:org.apache.kafka.common.errors.UnknownServerException: The server
experienced an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2016-12-28 17:23:51,818] ERROR [ReplicaFetcherThread-0-4], Error for
partition [aki_live,6] to broker
4:org.apache.kafka.common.errors.UnknownServerException: The server
experienced an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)


I thought I would restart broker1, but as soon as I did, most of my topic
ended up with some empty partitions, and their consumer offsets were wiped
out completely.

I understand that because of unclean.leader.election.enable = true an
unclean leader would be elected, but why were the partition wiped out if
there were at least 3 replicas for each?

What do you thin caused the disconnection in the first place, and how can I
recover from situations like this in the future?

Regards
Alessandro





-- 
Alessandro De Maria
alessandro.dema...@gmail.com


Errors while moving partitions

2016-12-28 Thread Stephane Maarek
Hi,

Using the reassign partition tool, moving partitions 30 at a time, I’m
getting the following errors (in mass)

kafka.common.NotAssignedReplicaException: Leader 12 failed to record
follower 9's position 924392 since the replica is not recognized to be one
of the assigned replicas 10,12,13 for partition [topic,9].

I’m getting a LOT of those, and they won’t settle until the migration is
complete. Is that a big issue?

Thanks,
Stephane