Why is compression disabled by default?

2017-12-10 Thread Dmitry Minkovsky
This is hopefully my final question for a while.

I noticed that compression is disabled by default. Why is this? My best
guess is that compression doesn't work well for short messages
,
which was maybe identified as the majority use-case for Kafka. But,
producers batch records based on buffer/linger, and in my understanding the
whole record batch is compressed together.

So, what's the deal? Should I turn on compression in production? Does it
depends on my anticipated batch size? I am using Kafka Streams with a very
low linger, so most of my batches will likely be very small.

Thank you!


Re: How can I repartition/rebalance topics processed by a Kafka Streams topology?

2017-12-10 Thread Dmitry Minkovsky
Matthias,

Thank you for your detailed response.

Yes—of course I can use the record timestamp when copying from topic to
topic. For some reason that always slips my mind.

> This will always be computed correctly, even if both records are not in
the buffer at the same time :)

This is music to my ears! I will review the blog post you sent.

Thank you again. And for your work on this incredible software.

Dmitry




On Sat, Dec 9, 2017 at 7:52 PM, Matthias J. Sax 
wrote:

> About timestamps: embedding timestamps in the payload itself is not
> really necessary IMHO. Each record has meta-data timestamp that provides
> the exact same semantic. If you just copy data from one topic to
> another, the timestamp can be preserved (using plain consumer/producer
> and setting the timestamp of the input record explicitly as timestamp
> for the output recrod-- for streams, it could be that "some" timestamps
> get altered as we apply slightly different timestamp inference
> logic---but there are plans to improve this and to better inference that
> would preserve the timestamp exactly in Streams, too).
>
> With regard to flow control: it depends on the operators you use. Some
> are fully deterministic, other have some runtime dependencies. Fully
> deterministic are all aggregations (non-windowed and windowed), as well
> as inner KStream-KStream join and all variants (inner/left/outer) of
> KTable-KTable join.
>
> > If the consumer reads P2 before P1, will the task still
> > properly align these two records given their timestamps for the correct
> > inner join, assuming both records within the record buffer?
>
> This will always be computed correctly, even if both records are not in
> the buffer at the same time :)
>
>
> Thus, only left/outer KStream-KStream and KStream-KTable join have some
> runtime dependencies. For more details about join, check out this blog
> post: https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/
>
> Btw: we are aware of some weaknesses in the current implementation and I
> it's on our road map to strengthen our guarantees. Also with regard to
> the internally used record buffer, time management in general, as well
> as operator semantics.
>
> Note though: Kafka guarantees offset-based ordering, not
> timestamp-ordering. And thus, also in Kafka Streams we process records
> in offset order. This implies, that records might be out-of-order with
> regard to their timestamps, but our operators are implemented to handle
> this case correctly (minus some know issues as mentioned above that we
> are going to fix in future releases).
>
>
> Stateless: I mean, if you write a program that only uses stateless
> operators like filter/map but not aggregation/joins.
>
>
>
> -Matthias
>
>
> On 12/9/17 11:59 AM, Dmitry Minkovsky wrote:
> >> How large is the record buffer? Is it configurable?
> >
> > I seem to have just discovered this answer to this:
> > buffered.records.per.partition
> >
> > On Sat, Dec 9, 2017 at 2:48 PM, Dmitry Minkovsky 
> > wrote:
> >
> >> Hi Matthias, yes that definitely helps. A few thoughts inline below.
> >>
> >> Thank you!
> >>
> >> On Fri, Dec 8, 2017 at 4:21 PM, Matthias J. Sax 
> >> wrote:
> >>
> >>> Hard to give a generic answer.
> >>>
> >>> 1. We recommend to over-partitions your input topics to start with (to
> >>> avoid that you need to add new partitions later on); problem avoidance
> >>> is the best strategy. There will be some overhead for this obviously on
> >>> the broker side, but it's not too big.
> >>>
> >>
> >> Yes,  I will definitely be doing this.
> >>
> >>
> >>>
> >>> 2. Not sure why you would need a new cluster? You can just create a new
> >>> topic in the same cluster and let Kafka Streams read from there.
> >>>
> >>
> >> Motivated by fear of disturbing/manipulating a production cluster and
> the
> >> relative ease of putting up a new cluster. Perhaps that fear is
> irrational.
> >> I could alternatively just prefix topics.
> >>
> >>
> >>>
> >>> 3. Depending on your state requirements, you could also run two
> >>> applications in parallel -- the new one reads from the new input topic
> >>> with more partitions and you configure your producer to write to the
> new
> >>> topic (or maybe even to dual writes to both). If your new application
> is
> >>> ramped up, you can stop the old one.
> >>>
> >>
> >> Yes, this is my plan for migrations. If I could run it past you:
> >>
> >> (i) Write input topics from the old prefix to the new prefix.
> >> (ii) Start the new Kafka Streams application against the new prefix.
> >> (iii) When the two applications are in sync, stop writing to the old
> >> topics
> >>
> >> Since I will be copying from an old prefix to new prefix, it seems
> >> essential here to have timestamps embedded in the data records along
> with a
> >> custom timestamp extractor.
> >>
> >> I really wish I could get some more flavor on "Flow Control With
> >> Timestamps
> >> 

Re: Consumer Offsets Being Deleted by Broker

2017-12-10 Thread Xin Li
Hey,

I think what you looking for is offsets.retention.minutes from server side 
configuration.

Best,
Xin

On 10.12.17, 07:13, "M. Musheer"  wrote:

Hi,
We are using "kafka_2.11-1.0.0" kafka version with default "offset" related
configurations.
Issue:
Consumer offsets are being deleted and we are not using auto commits at
consumer side.
Is there any configuration we need to add for consumer offset retention ??

Please help us.



Thanks,
Musheer




Re: Consumer Offsets Being Deleted by Broker

2017-12-10 Thread Saïd Bouras
Hi M.Musheer,

Can you share some code ? It's difficult to tell without it ?
Are you using synchronous or asyn commit ?

Thanks,

Best regards

Le dim. 10 déc. 2017 à 07:13, M. Musheer  a écrit :

> Hi,
> We are using "kafka_2.11-1.0.0" kafka version with default "offset" related
> configurations.
> Issue:
> Consumer offsets are being deleted and we are not using auto commits at
> consumer side.
> Is there any configuration we need to add for consumer offset retention ??
>
> Please help us.
>
>
>
> Thanks,
> Musheer
>
-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>