Re: Unable to run kafka on ec2 free tier instance

2021-05-16 Thread Alexandre Dupriez
Hi Satendra,

The JVM core indicates you are running out of system memory.
Increasing the heap size of your JVM will not help; if anything, it
will make things worse.
You need to check which processes occupy system memory (look for their
resident set size) and work on reducing memory consumption
accordingly.

Thanks,
Alexandre

Le dim. 18 avr. 2021 à 09:53, Satendra Pratap Singh
 a écrit :
>
> Hi Team,
>
> I have setup an one node kafka on ec2 free instance which has 8 gb ram and
> 256 GB hdd. I have installed java 8 on ec2. When I am trying to start kafka
> getting error like
>
> OpenJDK 64-Bit Server VM warning: INFO:
> os::commit_memory(0x0007, 1073741824, 0) failed; error='Cannot
> allocate memory' (errno=12)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (mmap) failed to map 1073741824 bytes for
> committing reserved memory.
> # An error report file with more information is saved as:
> # /home/ubuntu/kafka_2.12-2.7.0/hs_err_pid1920.log
>
> I have increased the JAVA_HEAP but nothing happened. How to solve this
> problem?
>
> Looking forward to hearing from you.


Re: Kafka Definitive guide v2 states auto.leader.rebalance.enable = true is not recommended

2021-05-16 Thread Alexandre Dupriez
Hi Liam,

The property you referred to corresponds to partition leadership, not
ownership from consumers. See
https://issues.apache.org/jira/browse/KAFKA-4084 for a discussion
about why post-incident leader rebalance can sometimes impact
foreground traffic.

Thanks,
Alexandre

Le lun. 12 avr. 2021 à 15:37, Liam Clarke-Hutchinson
 a écrit :
>
> Ah, thanks Todd :)
>
> Was it causing issues back in the day of consumer rebalances always being
> stop the world? I was wondering if the statement had perhaps predated the
> cooperative / stick assignors we're able to run now.
>
> Cheers,
>
> Liam Clarke-Hutchinson
>
>
>
> On Tue, Apr 13, 2021 at 2:34 AM Todd Palino  wrote:
>
> > As a note, that part of the second edition has not been updated yet. This
> > setting used to cause significant problems, but more recent updates to the
> > controller code have made the auto leader rebalancing usable.
> >
> > -Todd
> >
> > On Mon, Apr 12, 2021 at 10:20 AM Liam Clarke-Hutchinson <
> > liam.cla...@adscale.co.nz> wrote:
> >
> > > Hi all,
> > >
> > > This question arose elsewhere, and I'm also going to fire it off to
> > > O'Reilly in the hopes that they'll clarify, but on page 180 of the
> > > Definitive Guide v2
> > > <
> > >
> > https://assets.confluent.io/m/2849a76e39cda2bd/original/20201119-EB-Kafka_The_Definitive_Guide-Preview-Chapters_1_thru_6.pdf
> > > >
> > > it
> > > states:
> > >
> > > *Kafka brokers do not automatically take partition leadership back
> > (unless
> > > auto leader rebalance is enabled, but this configuration is not
> > > recommended)*
> > >
> > > The original commenter raised the point that this defaults to true, and
> > it
> > > sounds like a good idea to have auto leader rebalancing.
> > >
> > > So I'm curious, in anyone's war stories or experiences, has this property
> > > being enabled been harmful? From the context that the paragraph was
> > written
> > > in, I'm assuming the writers were perhaps intending to emphasise the
> > Cruise
> > > Control or Confluents self-balancing-cluster / auto-balancing features
> > were
> > > preferable, but in my very brief Google didn't see any advice to set
> > > auto.leader.rebalance.enabled to false to use those tools.
> > >
> > > So yeah, just curious if this rings any bells.
> > >
> > > Cheers,
> > >
> > > Liam Clarke-Hutchinson
> > >
> > --
> > *Todd Palino*
> > Senior Staff Engineer, Site Reliability
> > Capacity Engineering
> >
> >
> >
> > linkedin.com/in/toddpalino
> >


Re: Last Stable Offset (LSO) stuck for specific topic partition after Broker issues

2021-05-16 Thread Alexandre Dupriez
Hi Pieter,

FWIW, you may have encountered the following bug:
https://issues.apache.org/jira/browse/KAFKA-12671 .

Thanks,
Alexandre

Le ven. 12 juin 2020 à 00:43, D C  a écrit :
>
> Hey peeps,
>
> Anyone else encountered this and got to the bottom of it?
>
> I'm facing a similar issue, having LSO stuck for some partitions in a topic
> and the consumers can't get data out of it (we're using read_committed =
> true).
>
> When this issue started happening we were on kafka 2.3.1
> i tried:
> - restarting the consumers
> - deleting the partition from the leader and letting it get in sync with
> the new leader
> - rolling restart of the brokers
> - shutting down the whole cluster and starting it again
> - tried deleting the txnindex files (after backing them up) and restarting
> the brokers
> - tried putting down the follower brokers of a partition and resyncing that
> partition on them from scratch
> - upgraded both kafka broker and client to 2.5.0
>
> Now the following questions arise:
> Where is the LSO actually stored (even if you get rid of the txnfiles, the
> LSO stays the same).
> Is there any way that the LSO can be reset?
> Is there any way that you can manually abort and clean the state of a stuck
> transaction? (i suspect that this is the reason why the LSO is stuck)
> Is there any way to manually trigger a consistency check on the logfiles
> that would fix any existing issues with either the logs or the indexes in
> the partition?
>
> Cheers,
> Dragos
>
> On 2019/11/20 13:26:54, Pieter Hameete  wrote:
> > Hello,
> >
> > after having some Broker issues (too many open files) we managed to recover 
> > our Brokers, but read_committed consumers are stuck for a specific topic 
> > partition. It seems like the LSO is stuck at a specific offset. The 
> > transactional producer for the topic partition is working without errors so 
> > the latest offset is incrementing correctly and so is transactional 
> > producing.
> >
> > What could be wrong here? And how can we get this specific LSO to be 
> > increment again?
> >
> > Thank you in advance for any advice.
> >
> > Best,
> >
> > Pieter
> >


Re: kafka metric to monitor for consumer FETCH using disk caching and not going to disk

2021-05-16 Thread Alexandre Dupriez
Hi Pushkar,

If you are using Linux and Kafka 2.6.0+, the closest metric to what
you are looking for is TotalDiskReadBytes [1], which measures data
transfer at the block layer.
Assuming your consumers are doing tail reads and there is no other
activity which requires loading pages from the disk on your system
(including log compaction from Kafka), you can determine if you are
effectively hitting the disk or not.

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-551%3A+Expose+disk+read+and+write+metrics

Thanks,
Alexandre

Le sam. 15 mai 2021 à 05:49, Pushkar Deole  a écrit :
>
> Hi All,
>
> is there any metric that I can use to check whether the memory allocated
> for kafka is sufficient for the given load on the brokers and whether kafka
> is optimally making use of page cache for consumer fetch reads which are
> not going to disk for each read slowing down the overall consumer
> processing ad thus increasing consumer lag?
>
> which metric can tell that i should assign more memory to brokers?


Re: kafka metric to monitor for consumer FETCH using disk caching and not going to disk

2021-05-16 Thread Pushkar Deole
thanks Alexandre... currently we are using kafka 2.5.0, so is there any
metric that can be used from 2.5.0?

On Sun, May 16, 2021 at 6:02 PM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi Pushkar,
>
> If you are using Linux and Kafka 2.6.0+, the closest metric to what
> you are looking for is TotalDiskReadBytes [1], which measures data
> transfer at the block layer.
> Assuming your consumers are doing tail reads and there is no other
> activity which requires loading pages from the disk on your system
> (including log compaction from Kafka), you can determine if you are
> effectively hitting the disk or not.
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-551%3A+Expose+disk+read+and+write+metrics
>
> Thanks,
> Alexandre
>
> Le sam. 15 mai 2021 à 05:49, Pushkar Deole  a écrit
> :
> >
> > Hi All,
> >
> > is there any metric that I can use to check whether the memory allocated
> > for kafka is sufficient for the given load on the brokers and whether
> kafka
> > is optimally making use of page cache for consumer fetch reads which are
> > not going to disk for each read slowing down the overall consumer
> > processing ad thus increasing consumer lag?
> >
> > which metric can tell that i should assign more memory to brokers?
>


Re: kafka metric to monitor for consumer FETCH using disk caching and not going to disk

2021-05-16 Thread Alexandre Dupriez
Not that I know of - but others may advise otherwise.
The change from KIP-551 is fairly self-contained and can be backported
well though.

Thanks,
Alexandre

Le dim. 16 mai 2021 à 14:51, Pushkar Deole  a écrit :
>
> thanks Alexandre... currently we are using kafka 2.5.0, so is there any
> metric that can be used from 2.5.0?
>
> On Sun, May 16, 2021 at 6:02 PM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
> > Hi Pushkar,
> >
> > If you are using Linux and Kafka 2.6.0+, the closest metric to what
> > you are looking for is TotalDiskReadBytes [1], which measures data
> > transfer at the block layer.
> > Assuming your consumers are doing tail reads and there is no other
> > activity which requires loading pages from the disk on your system
> > (including log compaction from Kafka), you can determine if you are
> > effectively hitting the disk or not.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-551%3A+Expose+disk+read+and+write+metrics
> >
> > Thanks,
> > Alexandre
> >
> > Le sam. 15 mai 2021 à 05:49, Pushkar Deole  a écrit
> > :
> > >
> > > Hi All,
> > >
> > > is there any metric that I can use to check whether the memory allocated
> > > for kafka is sufficient for the given load on the brokers and whether
> > kafka
> > > is optimally making use of page cache for consumer fetch reads which are
> > > not going to disk for each read slowing down the overall consumer
> > > processing ad thus increasing consumer lag?
> > >
> > > which metric can tell that i should assign more memory to brokers?
> >