Re: Kafka running on AWS - how to retain broker.id on new instance spun-up in-place of instance/broker failed

2018-11-14 Thread naresh Goud
Static IP. Buying static IP may help. I am not aws expert

On Wed, Nov 14, 2018 at 12:47 PM Srinivas Rapolu  wrote:

> Hello Kafka experts,
>
> We are running Kafka on AWS, main question is what is the best way to
> retain broker.id on new instance spun-up in-place of instance/broker
> failed.
>
> We are currently running Kafka in AWS with broker.id gets auto generated.
> But we are having issues when a broker is failed, new broker/instance
> spun-up in AWS get assigned with new broker.id. The issue is, with this
> approach, we need to re-assign the topics/replications on to the new broker
> manually.
>
> We learned that, replication can be auto resolved by Kafka, if we can
> manage to get the same broker.id on the new AWS instance spun-up in-place
> of failed broker/instance.
>
> I have read, we can set broker.id.generation.enable= false, but what is the
> best way to identify and retain the broker.id? Any links/help is
> appreciated.
> Thanks and Regards,
> Cnu
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Re: Deleting Kafka consumer offset topic.log files

2018-10-01 Thread naresh Goud
Is this helps?
https://stackoverflow.com/questions/42546501/the-retention-period-for-offset-topic-of-kafka/44277227#44277227



On Mon, Oct 1, 2018 at 2:31 PM Kaushik Nambiar 
wrote:

> Hello,
> Any updates on the issue?
>
>
> Regards,
> Kaushik Nambiar
>
> On Wed, Sep 26, 2018, 12:37 PM Kaushik Nambiar  >
> wrote:
>
> > Hello,
> >
> > I am using an SSL Kafka v 0.11.xx on a Linux operating system.
> >
> > I can see in the log files that the topic segments are getting deleted
> > regularly.
> > The concern I am having is for the system topic which is
> __consumer_offset
> > , the segments are not getting deleted.
> >
> > So that's contributing to a large amount of data getting accumulated on
> > our disks.
> > I can see many sufficiently old  .log files sizing around 100mb each. Due
> > to many such files in the __consumer_offset topic the disk is getting
> maxed
> > out.
> >
> > I had raised similar queries earlier to check if there is a
> > misconfiguration ,but the requests were not processed to completion.
> >
> > So as a quick fix,is it safe to delete the .log files in the
> > __consumer_offset topic?
> > If yes can u tell me how that can be done.
> >
> > If not can u suggest me another fix which can be tried?
> >
> > I have attached my server.properties for reference.
> >
> > Your responses are highly appreciated.
> >
> >
> > Thankyou,
> > Kaushik Nambiar
> >
> >
> >
> >
> >
> >
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Re: Restricting client access to zookeeper metadata

2018-02-26 Thread naresh Goud
Thanks a lot Soenke,
Your explanation make a lot of sense.



On Mon, Feb 26, 2018 at 10:05 PM Sönke Liebau
 wrote:

> Hi Reema, hi Naresh,
>
> I'll try and answer both your questions together by expanding on the
> topic a bit. Also, rereading my message I realize, that I phrased that
> somewhat ambiguously, since a few of the terms in there are
> overloaded.
>
> First of, if you are using the java consumer or producer (which you
> most probably are) then there is no need for these to have access to
> the Zookeeper nodes. Only the old scala clients needed to talk to
> Zookeeper. This allows you to firewall your Zookeeper cluster so that
> only Kafka brokers can connect to them.
>
> Moving on to the topic of listing topics things become a bit more
> complex because both things are possible. If you run the shell command
> "kafka-topics --list" that will connect to Zookeeper and retrieve a
> list of topics. And this is black and white, you either see all topics
> when you can access Zookeeper or none if you can't.
> There is also the Java Admin Client that can list topics and this
> talks to a Kafka broker to retrieve the topics. For this case, ACLs
> apply and you will only see the topics you are allowed to access. The
> main drawback of this method is, that there is no command line tool
> for this yet, it is "just" a java api.
>
> When I said "access the Kafka nodes" I meant being able to connect to
> the Kafka brokers port on those machines, that would be enough to use
> the java admin client as described above.
>
> Hope this helps.
>
> Best regards,
> Sönke
>
>
> On Mon, Feb 26, 2018 at 5:25 PM, naresh Goud 
> wrote:
> > It should require zookeeper connection always, because intern kafka
> brokers
> > will interact with zookeeper for all meta data about topics.
> > But its interesting, how would you give departments to access to kafka
> nodes
> >
> > @Sönke,
> >
> > Could you please shade some light on giving departements access to kafka
> > nodes.? Is it like  departments able to ssh to kafka nodes and run
> describe
> > command? so it will show topics metadata only topics in that node?
> >
> > Apologies, if my question is very basic.
> >
> > Thank you,
> > Naresh
> >
> >
> >
> > Thanks,
> > Naresh
> > www.linkedin.com/in/naresh-dulam
> > http://hadoopandspark.blogspot.com/
> >
> >
> > On Mon, Feb 26, 2018 at 5:30 PM, Reema Chugani  >
> > wrote:
> >
> >> Hi Sönke,
> >>
> >> Thanks for the info, it is helpful!
> >>
> >> I can modify so that the departments can only access the Kafka nodes
> >> themselves. However how would the consumers connect to the topics then?
> >> Don't the consumer clients require to connect via Zookeeper?
> >>
> >> Thanks,
> >> Reema
> >>
> >> On Fri, Feb 23, 2018 at 10:50 PM, Sönke Liebau <
> soenke.lie...@opencore.com
> >> .invalid<mailto:soenke.lie...@opencore.com.invalid>> wrote:
> >> Hi Reema,
> >>
> >> if your departments have access to Zookeeper then there probably is not
> >> much you can do about them accessing data on other departments topics. I
> >> assume that you have enabled Zookeeper ACLs, but even with those in
> place,
> >> the topic metadata is world readable, so listing topics can be done by
> >> anyone who has access to Zookeeper.
> >>
> >> If your departments can only access the Kafka nodes themselves then the
> >> DESCRIBE action on Topics is I believe what you are looking for,
> without an
> >> ACL in place to grant this, the topic should not be listed in Metadata
> >> responses.
> >>
> >> I hope that helps, if you need more information let us know!
> >>
> >> Best regards,
> >> Sönke
> >>
> >> Am 24.02.2018 06:32 schrieb "Reema Chugani"  >> mailto:reemachug...@outlook.com>>:
> >>
> >> Hi,
> >>
> >> I am using Kafka 0.10.2.
> >>
> >> I have multiple topics & consumers set up with ACLS such that consumer
> can
> >> only read from a particular topic. I am wondering how I can prevent a
> >> consumer from accessing metadata in zookeeper about other topics? i.e,
> >> prevent consumers from listing or getting info about topics in the
> cluster.
> >> (Not let marketing dept see the topic names of finance topics.)
> >>
> >> Thanks,
> >> Reema
> >>
> >>
> >>
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>
-- 
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


Re: Restricting client access to zookeeper metadata

2018-02-26 Thread naresh Goud
It should require zookeeper connection always, because intern kafka brokers
will interact with zookeeper for all meta data about topics.
But its interesting, how would you give departments to access to kafka nodes

@Sönke,

Could you please shade some light on giving departements access to kafka
nodes.? Is it like  departments able to ssh to kafka nodes and run describe
command? so it will show topics metadata only topics in that node?

Apologies, if my question is very basic.

Thank you,
Naresh



Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


On Mon, Feb 26, 2018 at 5:30 PM, Reema Chugani 
wrote:

> Hi Sönke,
>
> Thanks for the info, it is helpful!
>
> I can modify so that the departments can only access the Kafka nodes
> themselves. However how would the consumers connect to the topics then?
> Don't the consumer clients require to connect via Zookeeper?
>
> Thanks,
> Reema
>
> On Fri, Feb 23, 2018 at 10:50 PM, Sönke Liebau  .invalid> wrote:
> Hi Reema,
>
> if your departments have access to Zookeeper then there probably is not
> much you can do about them accessing data on other departments topics. I
> assume that you have enabled Zookeeper ACLs, but even with those in place,
> the topic metadata is world readable, so listing topics can be done by
> anyone who has access to Zookeeper.
>
> If your departments can only access the Kafka nodes themselves then the
> DESCRIBE action on Topics is I believe what you are looking for, without an
> ACL in place to grant this, the topic should not be listed in Metadata
> responses.
>
> I hope that helps, if you need more information let us know!
>
> Best regards,
> Sönke
>
> Am 24.02.2018 06:32 schrieb "Reema Chugani"  mailto:reemachug...@outlook.com>>:
>
> Hi,
>
> I am using Kafka 0.10.2.
>
> I have multiple topics & consumers set up with ACLS such that consumer can
> only read from a particular topic. I am wondering how I can prevent a
> consumer from accessing metadata in zookeeper about other topics? i.e,
> prevent consumers from listing or getting info about topics in the cluster.
> (Not let marketing dept see the topic names of finance topics.)
>
> Thanks,
> Reema
>
>
>


Re: Doubts about multiple instance in kafka

2018-02-22 Thread naresh Goud
Hi Pravin,

Your correct.
you can run application with multiple times so they will be started on
multiples JVM's (   run1 :- java yourclass (which runs in one JVM)
;  run2: java yourclass(which runs in another JVM ) )

or else

you can run application on multiple machines i.e multiple application
instances run on multiple JVM's  (run1 :- java yourclass (which runs in one
JVM on machine1) run2: java yourclass(which runs in another JVM  in
another machine2) )



Thank you,
Naresh



On Thu, Feb 22, 2018 at 12:15 AM, pravin kumar  wrote:

> I have the Kafka confluent Document.
>
> But i cant understand the following line.
>
> "It is important to understand that Kafka Streams is not a resource
> manager, but a library that “runs” anywhere its stream processing
> application runs. Multiple instances of the application are executed either
> on the same machine, or spread across multiple machines and tasks can be 
> distributed
> automatically by the library
> 
> to those running application instances"
>
> i have tried to run on same machine with multiple JVM with multiple
> consumers.
>
> is it correct way to run on same machine using multiple consumers??
> or is there any other way??
> i have attached the code below
>


Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Thanks Ted.

I see  createDirectStream is experimental as annotated with
"org.apache.spark.annotation.Experimental".

Is it possible to be this API will be removed in future?  because we wanted
to use this API in one of our production jobs. afraid if it will not be
supported in future.

Thank you,
Naresh




On Sun, Feb 18, 2018 at 7:47 PM, Ted Yu  wrote:

> createStream() is still in external/kafka-0-8/src/main
> /scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
> But it is not in external/kafka-0-10/src/main/scala/org/apache/spark/strea
> ming/kafka010/KafkaUtils.scala
>
> FYI
>
> On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud 
> wrote:
>
>> Hello Team,
>>
>> I see "KafkaUtils.createStream() " method not available in spark 2.2.1.
>>
>> Can someone please confirm if these methods are removed?
>>
>> below is my pom.xml entries.
>>
>>
>> 
>>   2.11.8
>>   2.11
>> 
>>
>>
>>   
>>   org.apache.spark
>>   spark-streaming_${scala.tools.version}
>>   2.2.1
>>   provided
>>   
>> 
>>   org.apache.spark
>>   spark-streaming-kafka-0-10_2.11
>>   2.2.1
>>   provided
>> 
>> 
>>   org.apache.spark
>>   spark-core_2.11
>>   2.2.1
>>   provided
>> 
>>   
>>
>>
>>
>>
>>
>> Thank you,
>> Naresh
>>
>
>


KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Hello Team,

I see "KafkaUtils.createStream() " method not available in spark 2.2.1.

Can someone please confirm if these methods are removed?

below is my pom.xml entries.



  2.11.8
  2.11



  
  org.apache.spark
  spark-streaming_${scala.tools.version}
  2.2.1
  provided
  

  org.apache.spark
  spark-streaming-kafka-0-10_2.11
  2.2.1
  provided


  org.apache.spark
  spark-core_2.11
  2.2.1
  provided

  





Thank you,
Naresh


Re: Choosing Kafka for Real Time Dashboard Application

2018-02-05 Thread naresh Goud
Its good tool for your requirement.
Probably you need to look at kafka conncet/ Kafka streams APIs.



Thank you,
Naresh

On Fri, Feb 2, 2018 at 8:50 PM, Matan Givo  wrote:

> Hi,
>
> My name is Matan Givoni and I am a team leader in a small startup company.
>
> We are starting a development on a cloud based solution for multimedia and
> telemetry streaming applications and we aren’t sure if Kafka is the right
> tool for our use case.
>
> We need to create a client side application that will create multiple real
> time data streams that eventually will be displayed using gauges on the
> screen.
>
> In our use case we are will have:
> • Binary message format
> • 1KB size messages
> • Every message is produced every 20ms
> • Around 40 – 50 data producers
> • Around 100 of web clients
> • Each message need to be received on each client
>
> Does Kafka is the right tool for this kind of tasks?
>
> Best Regards,
> Matan,
>


Re: Strange Topic ...

2018-02-04 Thread naresh Goud
This is the topic used and created by Kafka internally to store consumer
offsets while use consumer programs running.

Thank you,
Naresh

On Sun, Feb 4, 2018 at 1:38 PM Ted Yu  wrote:

> Which Kafka version are you using ?
> Older versions of kafka (0.10 and prior) had some bugs in the log-cleaner
> thread that might sometimes cause it to crash.
>
> Please check the log-cleaner.log file to see if there was some clue.
>
> Cheers
>
> On Sun, Feb 4, 2018 at 11:14 AM, adrien ruffie 
> wrote:
>
> > Hello all,
> >
> >
> > I'm a beginner in Kafka and this morning when I try some tests and when
> > running this following cmd:
> >
> > ./bin kafka-topics.sh --zookeeper localhost:2181 --describe
> >
> >
> > I understand my 3 created topic like "customer-topic",
> > "streams-plaintext-input", and "streams-wordcount-output"
> >
> >
> > But I already get this following output, why __consumer_offsets have 50
> > partitions ! I never created it ... do you know this beavior ?
> >
> >
> > Topic:__consumer_offsetsPartitionCount:50
> >  ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=
> > compact,compression.type=produ$
> > Topic: __consumer_offsets   Partition: 0Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 1Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 2Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 3Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 4Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic: __consumer_offsets   Partition: 5Leader: 0
> >  Replicas: 0 Isr: 0
> > 
> > Topic: __consumer_offsets   Partition: 49   Leader: 0
> >  Replicas: 0 Isr: 0
> >
> >
> > Topic:customer-topicPartitionCount:1ReplicationFactor:1
> >  Configs:
> > Topic: customer-topic   Partition: 0Leader: 0   Replicas:
> > 0 Isr: 0
> > Topic:streams-plaintext-input   PartitionCount:1
> > ReplicationFactor:1 Configs:
> > Topic: streams-plaintext-input  Partition: 0Leader: 0
> >  Replicas: 0 Isr: 0
> > Topic:streams-wordcount-output  PartitionCount:1
> > ReplicationFactor:1 Configs:cleanup.policy=compact
> > Topic: streams-wordcount-output Partition: 0Leader: 0
> >  Replicas: 0 Isr: 0
> >
> >
> > Thank and bests regards,
> >
> > Adrien
> >
> >
> >
> >
>


Re: __consumer_offsets too big

2018-01-16 Thread naresh Goud
Can you check if jira KAFKA-3894 helps?


Thank you,
Naresh

On Tue, Jan 16, 2018 at 10:28 AM Shravan R  wrote:

> We are running Kafka-0.9 and I am seeing large __consumer_offsets on some
> of the partitions of the order of 100GB or more. I see some of the log and
> index files are more than a year old.  I see the following properties that
> are of interest.
>
> offsets.retention.minutes=5769 (4 Days)
> log.cleaner.dedupe.buffer.size=25600 (256MB)
> num.recovery.threads.per.data.dir=4
> log.cleaner.enable=true
> log.cleaner.threads=1
>
>
> Upon restarting of the broker, I see the below exception which clearly
> indicates a problem with dedupe buffer size. However, I see the dedupe
> buffer size is set to 256MB which is far more than what the log complains
> about (37MB). What could be the problem here? How can I get the offsets
> topic size under manageable size?
>
>
> 2018-01-15 21:26:51,434 ERROR kafka.log.LogCleaner:
> [kafka-log-cleaner-thread-0], Error due to
> java.lang.IllegalArgumentException: requirement failed: 990238234 messages
> in segment __consumer_offsets-33/.log but offset map
> can
>  fit only 3749. You can increase log.cleaner.dedupe.buffer.size or
> decrease log.cleaner.threads
> at scala.Predef$.require(Predef.scala:219)
> at
> kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:591)
> at
> kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:587)
> at
>
> scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570)
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:587)
> at kafka.log.Cleaner.clean(LogCleaner.scala:329)
> at
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:237)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:215)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> 2018-01-15 21:26:51,436 INFO kafka.log.LogCleaner:
> [kafka-log-cleaner-thread-0], Stopped
>
>
>
> Thanks,
> -SK
>