Re: Contributor Access to JIRA

2020-09-03 Thread Nag Y
Here it is

username : nag9s

On Thu, Sep 3, 2020 at 8:36 PM Matthias J. Sax  wrote:

> Please create an account (self-service) and share you account info here,
> so we can add you.
>
> On 9/3/20 6:49 AM, Kp k wrote:
> > Hi,
> >
> > Can you please provide me Contributor access to Kafka JIRA, as I am
> > interested in contributing.
> >
> > Thanks,
> > Kalpitha
> >
>
>


Re: Contributor Access to JIRA

2020-09-03 Thread Nag Y
Hi,

Please approve Contributor access to Kafka JIRA, as I am
interested in contributing.

Thanks,
Nag


Confluent Docker Images

2020-07-22 Thread Nag Y
The Docker images are huge for each confluent component - ZK, registry etc
..

Is there any other place I can download one image that contains all the
components ?

REPOSITORY  TAG IMAGE ID
 CREATED SIZE
confluentinc/ksqldb-examples5.5.1
88f3d11247f33 weeks ago 646MB
confluentinc/cp-ksqldb-server   5.5.1
d2f03e1e91d83 weeks ago 679MB
confluentinc/cp-ksqldb-cli  5.5.1
c2768c7e4cc53 weeks ago 663MB
confluentinc/cp-enterprise-control-center   5.5.1
870dffa09a384 weeks ago 888MB
confluentinc/cp-server  5.5.1
f9758c92d7b44 weeks ago 981MB
confluentinc/cp-schema-registry 5.5.1
f51e4f854dc14 weeks ago 1.19GB
confluentinc/cp-kafka-rest  5.5.1
2632bb34f9564 weeks ago 1.15GB
confluentinc/cp-zookeeper   5.5.1
7149731cc5634 weeks ago 598MB
cnfldemos/cp-server-connect-datagen 0.3.2-5.5.0
8b1a9577099c2 months ago1.53GB


Confluent Platform- KTable clarification

2020-07-22 Thread Nag Y
I understood A KStream is an abstraction of a record stream and A KTable is
an abstraction of a changelog stream ( updates or inserts) and the
semantics around it.

However, this is where some confusion arises .. From confluent documentation


To illustrate, let’s imagine the following two data records are being sent
to the stream:

("alice", 1) --> ("alice", 3)

*If your stream processing application were to sum the values per user*, it
would return 3 for alice. Why? Because the second data record would be
considered an update of the previous record. Compare this behavior of
KTable with the illustration for KStream above, which would return 4 for
alice.

Coming to the highlighted area , *if we were to sum the values* , it should
be 4 . right ? However, *if we were to look at the "updated" view of the
logs* , yes , it is 3 as KTable maintains either updates or inserts . Did I
get it right ?


Confluent Kafka - Schema Registry on windows

2020-07-22 Thread Nag Y
I happened to see an example how to run schema registry using
"schema-registry-start.bat" from windows on 5.0.1

I didnt see the file in 5.5.0 . Is the schema registry not supported in
windows now ? IT seems only the way to go about running schema registry in
windows through dockers . Please someone confirm


Kafka - Controller Broker

2020-07-22 Thread Nag Y
 come across this phrase from https://niqdev.github.io/devops/kafka/ and
https://livebook.manning.com/book/kafka-streams-in-action/chapter-2/109 (Kafka
Streams in Action )

The controller broker is responsible for setting up leader/follower
relationships for all partitions of a topic. If a Kafka node dies or is
unresponsive (to ZooKeeper heartbeats), all of its assigned partitions *(both
leader and follower)* are reassigned by the controller broker.

I think it is not correct - where assigning follower partitions to other
brokers as the partitions wont heal themselves unless the broker comes back
. I know it ONLY happens for leader replica where if the broker that has
leader replica gone down, one of the broker that contains follower will
become leader. But, I dont think "reassigment" of followers will happen
automatically unless reassignment is initiated manually. Please share your
inputs


Re: NotEnoughReplicasException: The size of the current ISR Set(2) is insufficient to satisfy the min.isr requirement of 3

2020-07-13 Thread Nag Y
Thanks Liam, so the phrase "  current ISR set " in the warning refers to
ISR set that is being shown as kafka-topics describe command ?
And also, should the "maximum" value of " min.insync.replicas" be
(replication factor - 1 ) - I mean min.insync.replicas should not be same
as " replication factor" .

Please confirm

On Tue, Jul 7, 2020 at 1:22 PM Liam Clarke-Hutchinson <
liam.cla...@adscale.co.nz> wrote:

> Hi Nag,
>
> ISR is the replicas that are in sync with the leader, and there's a
> different ISR set for each partition of a given topic. If you use
> `kafka/bin/kafka-topics --describe --topic ` it'll show you the replicas
> and ISR for each partition.
>
> min.insync.replicas and replication factor are all about preventing data
> loss. Generally I set min ISR to 2 for a topic with a replication factor of
> 3 so that one down or struggling broker doesn't prevent producers writing
> to topics, but I still have a replica of the data in case the broker acting
> as leader goes down - a new partition leader can only be elected from the
> insync replicas.
>
> On Tue, Jul 7, 2020 at 7:39 PM Nag Y  wrote:
>
> > I had the following setup Brokers : 3 - all are up and running with
> > min.insync.replicas=3.
> >
> > I created a topic with the following configuration
> >
> > bin\windows\kafka-topics --zookeeper 127.0.0.1:2181 --topic
> topic-ack-all
> > --create --partitions 4 --replication-factor 3
> >
> > I triggered the producer with "ack = all" and producer is able to send
> the
> > message. However, the problem starts when i start the consumer
> >
> > bin\windows\kafka-console-consumer --bootstrap-server
> > localhost:9094,localhost:9092 --topic topic-ack-all --from-beginning
> >
> > The error is
> >
> > NotEnoughReplicasException: The size of the current ISR Set(2) is
> > insufficient to satisfy the min.isr requirement of 3
> > NotEnoughReplicasException:The size of the current ISR Set(3) is
> > insufficient to satisfy the min.isr requirement of 3 for partition __con
> >
> > I see two kinds of errors here . I went though the documentation and had
> > also understaning about "min.isr", However, these error messages are not
> > clear .
> >
> >1. What does it mean by current ISR set ? Is it different for each
> topic
> >and what it signifies ?
> >2. I guess min.isr is same as min.insync.replicas . I hope is should
> >have value at least same as "replication factor" ?
> >
>


NotEnoughReplicasException: The size of the current ISR Set(2) is insufficient to satisfy the min.isr requirement of 3

2020-07-07 Thread Nag Y
I had the following setup Brokers : 3 - all are up and running with
min.insync.replicas=3.

I created a topic with the following configuration

bin\windows\kafka-topics --zookeeper 127.0.0.1:2181 --topic topic-ack-all
--create --partitions 4 --replication-factor 3

I triggered the producer with "ack = all" and producer is able to send the
message. However, the problem starts when i start the consumer

bin\windows\kafka-console-consumer --bootstrap-server
localhost:9094,localhost:9092 --topic topic-ack-all --from-beginning

The error is

NotEnoughReplicasException: The size of the current ISR Set(2) is
insufficient to satisfy the min.isr requirement of 3
NotEnoughReplicasException:The size of the current ISR Set(3) is
insufficient to satisfy the min.isr requirement of 3 for partition __con

I see two kinds of errors here . I went though the documentation and had
also understaning about "min.isr", However, these error messages are not
clear .

   1. What does it mean by current ISR set ? Is it different for each topic
   and what it signifies ?
   2. I guess min.isr is same as min.insync.replicas . I hope is should
   have value at least same as "replication factor" ?


Re: Highwater mark interpretation

2020-06-20 Thread Nag Y
Thanks D C. Thanks a lot . That is quite a detailed explanation.
If I understand correctly, ( ignoring the case where producers
create transactions) - since the replica is down and never comes , the high
watermark CANNOT advance and the consumer CAN NOT read the messages which
were sent after the replica is down as the message is NOT committed - Hope
this is correct ?

To address this situation, either we should make sure the replica is up or
reduce the replication factor so that the message will be committed and
consumer can start reading the messages ...

Regards,
 Nag


On Sun, Jun 21, 2020 at 3:25 AM D C  wrote:

> The short answer is : yes, a consumer can only consume messages up to the
> High Watermark.
>
> The long answer is not exactly, for the following reasons:
>
> At the partition level you have 3 major offsets that are important to the
> health of the partition and accessibility from the consumer pov:
> LeO (log end offset) - which represents the highest offset in the highest
> segment
> High Watermark - which represents the latest offset that has been
> replicated to all the followers
> LSO (Last stable offset) - which is important when you use producers that
> create transactions - which represents the the highest offset that has been
> committed by a transaction and that is allowed to be read with isolation
> level = read_commited.
>
> The LeO can only be higher or equal to the High Watermark (for obvious
> reasons)
> The High Watermark can only be higher or equal to the LSO (the messages up
> to this point may have been committed to all the followers but the
> transaction isn't yet finished)
> And coming to your question, in case the transaction hasn't finished, the
> LSO may be lower than the High Watermark so if your consumer is accessing
> the data in Read_Committed, it won't be able to surpass the LSO.
>
> Cheers,
> D
>
> On Sat, Jun 20, 2020 at 9:05 PM Nag Y  wrote:
>
> > As I understand it, the consumer can only read "committed" messages -
> which
> > I believe, if we look at internals of it, committed messages are nothing
> > but messages which are upto the high watermark.
> > *The high watermark is the offset of the last message that was
> successfully
> > copied to all of the log’s replicas. *
> >
> > *Having said that, if one of the replica is down, will high water mark
> be*
> > *advanced?*
> >
> > *If replica can't come forever, can we consider this message cant be
> > consumed by the consumer since it is never committed *
> >
>


Highwater mark interpretation

2020-06-20 Thread Nag Y
As I understand it, the consumer can only read "committed" messages - which
I believe, if we look at internals of it, committed messages are nothing
but messages which are upto the high watermark.
*The high watermark is the offset of the last message that was successfully
copied to all of the log’s replicas. *

*Having said that, if one of the replica is down, will high water mark be*
*advanced?*

*If replica can't come forever, can we consider this message cant be
consumed by the consumer since it is never committed *


Kafka - Failed to clean up log for __consumer_offsets-10 in dir

2020-06-18 Thread Nag Y
I am seeing the following exception in one of the broker log files.

Set up contains 3 brokers.

Environment - Windows

I am ok to remove the files c:\tmp directory. However, I'm a little curious
to know why this broker got into this state and if there is a way to
rectify the issue without deleting the directory in question


log4j:ERROR Failed to rename [C:\confluent-5.5.0/logs/log-cleaner.log] to
[C:\confluent-5.5.0/logs/log-cleaner.log.2020-06-18-09].
[2020-06-18 14:10:41,361] ERROR Failed to clean up log for
__consumer_offsets-10 in dir C:\tmp\kafka-logs-3 due to IOException (kafka.s
erver.LogDirFailureChannel)

java.nio.file.FileSystemException:
C:\tmp\kafka-logs-3\__consumer_offsets-10\.timeindex.cleaned
-> C:\tmp\kafka-log
s-3\__consumer_offsets-10\.timeindex.swap: The process
cannot access the file because it is being used by another p
rocess.



at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)

at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)

at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)

at
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)

at java.nio.file.Files.move(Files.java:1395)

at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:834)

at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:207)

at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)

at kafka.log.Log.$anonfun$replaceSegments$4(Log.scala:2288)

at kafka.log.Log.$anonfun$replaceSegments$4$adapted(Log.scala:2288)

at scala.collection.immutable.List.foreach(List.scala:392)

at kafka.log.Log.replaceSegments(Log.scala:2288)

at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:605)

at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:530)

at kafka.log.Cleaner.doClean(LogCleaner.scala:529)

at kafka.log.Cleaner.clean(LogCleaner.scala:503)

at
kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:372)

at
kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:345)

at
kafka.log.LogCleaner$CleanerThread.tryCleanFilthiestLog(LogCleaner.scala:325)

at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:314)

at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)

Suppressed: java.nio.file.FileSystemException:
C:\tmp\kafka-logs-3\__consumer_offsets-10\.timeindex.cleaned
 ->
C:\tmp\kafka-logs-3\__consumer_offsets-10\.timeindex.swap:
The process cannot access the file because it is bei
ng used by another process.



at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)

at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)

at
sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)

at
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)

at java.nio.file.Files.move(Files.java:1395)

at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:831)

... 15 more

[2020-06-18 14:10:41,441] WARN [ReplicaManager broker=3] Stopping serving
replicas in dir C:\tmp\kafka-logs-3 (kafka.server.ReplicaMana
ger)

[2020-06-18 14:10:41,445] INFO [ReplicaFetcherManager on broker 3] Removed
fetcher for partitions Set(__consumer_offsets-22, __consumer
_offsets-4, stock-prices-2, __consumer_offsets-7, __consumer_offsets-46,
stock-prices-1, __consumer_offsets-25, __consumer_offsets-49,
__consumer_offsets-16, __consumer_offsets-28, __consumer_offsets-31,
__consumer_offsets-37, stock-prices-0, __consumer_offsets-19, stoc
k_topic-0, __consumer_offsets-13, __consumer_offsets-43,
__consumer_offsets-1, __consumer_offsets-34, __consumer_offsets-10,
__consumer
_offsets-40) (kafka.server.ReplicaFetcherManager)

[2020-06-18 14:10:41,448] INFO [ReplicaAlterLogDirsManager on broker 3]
Removed fetcher for partitions Set(__consumer_offsets-22, __con
sumer_offsets-4, stock-prices-2, __consumer_offsets-7,
__consumer_offsets-46, stock-prices-1, __consumer_offsets-25,
__consumer_offsets
-49, __consumer_offsets-16, __consumer_offsets-28, __consumer_offsets-31,
__consumer_offsets-37, stock-prices-0, __consumer_offsets-19,
 stock_topic-0, __consumer_offsets-13, __consumer_offsets-43,
__consumer_offsets-1, __consumer_offsets-34, __consumer_offsets-10, __con
sumer_offsets-40) (kafka.server.ReplicaAlterLogDirsManager)

[2020-06-18 14:10:41,492] WARN [ReplicaManager broker=3] Broker 3 stopped
fetcher for partitions __consumer_offsets-22,__consumer_offse
ts-4,stock-prices-2,__consumer_offsets-7,__consumer_offsets-46,stock-prices-1,__consumer_offsets-25,__consumer_offsets-49,__consumer_of

Kafka - replicas do not heal themselves by default

2020-06-14 Thread Nag Y
I am going through the kafka in action and come across this following phrase

*One of the things to note with Kafka is that replicas do not heal
themselves by default. If you lose a broker on which one of your copies of
a partition existed, Kafka does not currently create a new copy. I mention
this since some users are used to filesystems like HDFS that will maintain
that replication number if a block is seen as corrupted or failed. So an
important item to look at with monitoring the health of your system might
be how many of your ISRs are indeed matching your intended number.*


It looks interesting, as in most of the distributed systems, systems will
try to create additional replicas if replicas are not available. I found it
strange,  Any reason to do so ?


Kafka - how to know whether it is broker property or topic property or producer property

2020-06-14 Thread Nag Y
I am going through the documentation and often times, it is either not
clear or need to look at in multiple pleaces to see to which a prticular
property belongs and is it a specific property to an entity etc ..

To give an example, consider *"min.insync.replicas"* - This is just for an
example. From the apache documentation, it is mentioned under
https://kafka.apache.org/documentation/#brokerconfigs . From the confluent
documentation it is mentioned under
https://docs.confluent.io/current/installation/configuration/topic-configs.html
.
Later, I came to know that this property is available under both and
follows inheritance based on where it is configured. This needed to look
into multiple places to understand more about this property to see where it
belongs etc ..

But, is not there a documentation about where each property belongs, and
will it be inherited or not etc.

I do not think this answer need not be complex like looking into source
code, it should be simple enough - perhaps I might be missing something.


Also posted here
https://stackoverflow.com/questions/62369238/kafka-how-to-know-whether-it-is-broker-property-or-topic-property-or-producer


Re: Kafka - entity-default Vs entity-name

2020-06-12 Thread Nag Y
Thanks Brian, that answers my question

On Fri, 12 Jun, 2020, 9:16 PM Brian Byrne,  wrote:

> Hi Nag,
>
> Correct, --all will include both. Removing --all should give you only the
> overridden values.
>
> Are you asking if you can see overridden values for a broker with both the
> default config and specific broker config in one command? Unfortunately the
> commands only give you the overrides for that specific entity, but you can
> try:
>
>   kafka-configs ... --entity-type brokers --entity-name 1 --describe | grep
> DYNAMIC
>
> This will show you overridden --entity-defaults
> (DYNAMIC_DEFAULT_BROKER_CONFIG) and --entity-name 1
> (DYNAMIC_BROKER_CONFIG).
>
> Brian
>
> On Fri, Jun 12, 2020 at 8:12 AM Nag Y  wrote:
>
> > Thanks Brian. I applied the following command with --all, it works fine.
> >
> > kafka-configs --bootstrap-server localhost:9092 --entity-type brokers
> > --entity-name 1 --describe --all.
> >
> > I believe this includes *both* default  config values as well as config
> > values  those were overridden . Is there any command if i wanted to query
> > only the overridden values and the rest will be default values
> >
> > On Fri, Jun 12, 2020 at 7:58 PM Brian Byrne  wrote:
> >
> > > Hi Nag,
> > >
> > > To address (2) first, the --entity-default flag requests the default
> > > configuration that all brokers inherit. An individual broker may
> override
> > > any of the default config's entries, which is done by specifying the
> > broker
> > > ID to the --entity-name flag.
> > >
> > > The reason you're getting blank output for --entity-default is because
> > all
> > > entries in the config are using their default values. If you wish to
> see
> > > all config entries, you can pass flag --all which is included in the
> 2.5
> > > release (KIP-524:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-524%3A+Allow+users+to+choose+config+source+when+describing+configs
> > > ).
> > >
> > > Please let me know if you have any other questions,
> > >
> > > Brian
> > >
> > > On Fri, Jun 12, 2020 at 5:18 AM Nag Y 
> > wrote:
> > >
> > > > I applied kafka-config to get the default settings for the brokers,
> > > >
> > > > kafka-configs --bootstrap-server localhost:9092 --entity-type brokers
> > > > --entity-default --describe
> > > >
> > > > The command responded with the following response, without any
> > *complete*
> > > >  output.
> > > >
> > > > Default configs for brokers in the cluster are:
> > > >
> > > >
> > > >1. So, how to get the all the default settings across all brokers
> > > >2. I understood from the command line what it means, didnt get the
> > > >context completely. What is the real difference between
> > entity-default
> > > >,entity-name
> > > >
> > > > From documentation:
> > > >
> > > > --entity-default   Default entity name for
> > > >clients/users/brokers (applies
> > to
> > > >corresponding entity type in
> > > command
> > > >line)
> > > >
> > > > --entity-name  Name of entity (topic
> > name/client
> > > >id/user principal name/broker
> > id)
> > > >
> > >
> >
>


Re: Kafka - entity-default Vs entity-name

2020-06-12 Thread Nag Y
Thanks Brian. I applied the following command with --all, it works fine.

kafka-configs --bootstrap-server localhost:9092 --entity-type brokers
--entity-name 1 --describe --all.

I believe this includes *both* default  config values as well as config
values  those were overridden . Is there any command if i wanted to query
only the overridden values and the rest will be default values

On Fri, Jun 12, 2020 at 7:58 PM Brian Byrne  wrote:

> Hi Nag,
>
> To address (2) first, the --entity-default flag requests the default
> configuration that all brokers inherit. An individual broker may override
> any of the default config's entries, which is done by specifying the broker
> ID to the --entity-name flag.
>
> The reason you're getting blank output for --entity-default is because all
> entries in the config are using their default values. If you wish to see
> all config entries, you can pass flag --all which is included in the 2.5
> release (KIP-524:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-524%3A+Allow+users+to+choose+config+source+when+describing+configs
> ).
>
> Please let me know if you have any other questions,
>
> Brian
>
> On Fri, Jun 12, 2020 at 5:18 AM Nag Y  wrote:
>
> > I applied kafka-config to get the default settings for the brokers,
> >
> > kafka-configs --bootstrap-server localhost:9092 --entity-type brokers
> > --entity-default --describe
> >
> > The command responded with the following response, without any *complete*
> >  output.
> >
> > Default configs for brokers in the cluster are:
> >
> >
> >1. So, how to get the all the default settings across all brokers
> >2. I understood from the command line what it means, didnt get the
> >context completely. What is the real difference between entity-default
> >,entity-name
> >
> > From documentation:
> >
> > --entity-default   Default entity name for
> >clients/users/brokers (applies to
> >corresponding entity type in
> command
> >line)
> >
> > --entity-name  Name of entity (topic name/client
> >id/user principal name/broker id)
> >
>


Kafka - entity-default Vs entity-name

2020-06-12 Thread Nag Y
I applied kafka-config to get the default settings for the brokers,

kafka-configs --bootstrap-server localhost:9092 --entity-type brokers
--entity-default --describe

The command responded with the following response, without any *complete*
 output.

Default configs for brokers in the cluster are:


   1. So, how to get the all the default settings across all brokers
   2. I understood from the command line what it means, didnt get the
   context completely. What is the real difference between entity-default
   ,entity-name

>From documentation:

--entity-default   Default entity name for
   clients/users/brokers (applies to
   corresponding entity type in command
   line)

--entity-name  Name of entity (topic name/client
   id/user principal name/broker id)


Kafka - Auto commit

2020-06-12 Thread Nag Y
During the time when i was going through the doc

and
come across ,

Automatic Commit The easiest way to commit offsets is to allow the consumer
to do it for you. If you configure enable.auto.commit=true, then every five
seconds the consumer will commit the largest offset your client received
from poll(). The five-second interval is the default and is controlled by
setting auto.commit.interval.ms. Just like everything else in the consumer,
the automatic commits are driven by the poll loop. Whenever you poll, the
consumer checks if it is time to commit, and if it is, it will commit the
offsets it returned in the last poll.

And also, How does kafka consumer auto commit work?


The auto-commit check is called in every poll and it checks that the "time
elapsed is greater than the configured time". If so, the offset is
committed.

In case the commit interval is 5 seconds and poll is happening in 7
seconds, the commit will happen after 7 seconds only.

However, if we take a close look, the auto commit doesnt seem actually
happen every 5 sec ( or time interval configured through "
auto.commit.interval.ms ) but happens every time if "time elapsed" is
greater than "auto.commit.interval.ms" and intervals of "time elapsed"+"
auto.commit.interval.ms" -- which means it doesn't necessarily commit the
offset every interval, configured thorough "auto.commit.interval.ms".

Please add your thoughts

Update #1

It is adding up confusion after going through more details , Can someone
add more details about this - will poll() method happens in the background
at 5 sec which is different from poll() method issued from the consumer ?

The poll() call is issued in the background at the set
auto.commit.interval.ms.