Doing something wrong

2019-05-08 Thread Pavel Molchanov
Hi,

I created a simple Spring Boot Application with Kafka and added a
dependency from Kafka Streams. The application can send and receive
messages and works fine.

In the same app, I want to use Kafka Streams to calculate statistic about
the topic.

I wrote the following code from the word count example:

@Service
public class StartupService {

@Value(value = "${kafka.events.topic.name}")
private String eventsTopicName;

@Value(value = "${kafka.bootstrap.servers}")
private String bootstrapAddress;

@PostConstruct
public void init() {
Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "test");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG,
System.getProperty("java.io.tmpdir")+"\\count");
StreamsBuilder builder = new StreamsBuilder();
KStream textLines = builder.stream(eventsTopicName);
Pattern pattern = Pattern.compile("\\W+", Pattern.UNICODE_CHARACTER_CLASS);

KTable wordCounts = textLines
  .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase(
  .groupBy((key, word) -> word)
  .count();
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.setUncaughtExceptionHandler((Thread thread, Throwable throwable) ->
{
  System.out.println("Exception in thread:" + thread.getId()  + ",
Message:" + throwable.getMessage());
});
streams.cleanUp();
streams.start();
try {
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// Check the result here:

}

}

How can I check the result of the calculation? The last lines in the log
file are:

2019-05-08 19:19:36.091  INFO 18380 --- [-StreamThread-1]
o.a.k.c.c.internals.AbstractCoordinator  : [Consumer
clientId=test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1-consumer,
groupId=test] (Re-)joining group
2019-05-08 19:19:36.163  INFO 18380 --- [-StreamThread-1]
o.a.k.s.p.i.StreamsPartitionAssignor : stream-thread
[test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1-consumer]
Assigned tasks to clients as
{a2d0314e-c872-469e-b34c-c08e2fdf422a=[activeTasks: ([0_0, 1_0])
standbyTasks: ([]) assignedTasks: ([0_0, 1_0]) prevActiveTasks: ([])
prevStandbyTasks: ([]) prevAssignedTasks: ([]) capacity: 1]}.
2019-05-08 19:19:36.205  INFO 18380 --- [-StreamThread-1]
o.a.k.c.c.internals.AbstractCoordinator  : [Consumer
clientId=test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1-consumer,
groupId=test] Successfully joined group with generation 21
2019-05-08 19:19:36.209  INFO 18380 --- [-StreamThread-1]
o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer
clientId=test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1-consumer,
groupId=test] Setting newly assigned partitions [events-0,
test-KSTREAM-AGGREGATE-STATE-STORE-03-repartition-0]
2019-05-08 19:19:36.209  INFO 18380 --- [-StreamThread-1]
o.a.k.s.p.internals.StreamThread : stream-thread
[test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1] State transition
from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2019-05-08 19:19:36.234  INFO 18380 --- [-StreamThread-1]
o.a.k.s.p.internals.StreamThread : stream-thread
[test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1] partition
assignment took 25 ms.
current active tasks: [0_0, 1_0]
current standby tasks: []
previous active tasks: []

2019-05-08 19:19:36.635  INFO 18380 --- [-StreamThread-1]
org.apache.kafka.clients.Metadata: Cluster ID:
I6GfqZSORWGKqPHE7zA_cQ
2019-05-08 19:19:36.659  INFO 18380 --- [-StreamThread-1]
o.a.k.s.p.i.StoreChangelogReader : stream-thread
[test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1] Restoring task
1_0's state store KSTREAM-AGGREGATE-STATE-STORE-03 from beginning
of the changelog test-KSTREAM-AGGREGATE-STATE-STORE-03-changelog-0
2019-05-08 19:19:36.668  INFO 18380 --- [-StreamThread-1]
o.a.k.c.consumer.internals.Fetcher   : [Consumer
clientId=test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1-restore-consumer,
groupId=] Resetting offset for partition
test-KSTREAM-AGGREGATE-STATE-STORE-03-changelog-0 to offset 0.
2019-05-08 19:19:36.852  INFO 18380 --- [-StreamThread-1]
o.a.k.s.p.internals.StreamThread : stream-thread
[test-a2d0314e-c872-469e-b34c-c08e2fdf422a-StreamThread-1] State transition
from PARTITIONS_ASSIGNED to RUNNING
2019-05-08 19:19:36.853  INFO 18380 --- [-StreamThread-1]
org.apache.kafka.streams.KafkaStreams: stream-client
[test-a2d0314e-c872-469e-b34c-c08e2fdf422a] State transition from
REBALANCING to RUNNING

It's staying in 

Re: Event Sourcing question

2019-05-08 Thread Pavel Molchanov
Ryanne,

Thank you for the advice. It's exactly what we have now. Different topics
for different processing steps.

However, I would like to make extensible architecture with multiple
processing steps.

We will introduce other transformers later on. I was thinking that if all
of them will send events on the same bus it will be easier to replay the
event sequence later.

Say, I will have 4 other steps later on in the pipeline. Should I create 4
more topics? Or should all other step transformers listen to the same bus
and pick up events they need?

*Pavel Molchanov*
Director of Software Development
InfoDesk
www.infodesk.com

1 Bridge Street  | Suite 105 | Irvington | New York | 10551
|
Office: +1 (914) 332-5940
Change Privacy Settings
 | Contact
Privacy Team  | Privacy Policy


This e-mail message may contain confidential or legally privileged
information and is intended only for the use of the intended recipient(s).
Any unauthorized disclosure, dissemination, distribution, copying or the
taking of any action in reliance on the information herein is prohibited.



On Wed, May 8, 2019 at 3:45 PM Ryanne Dolan  wrote:

> Pavel, one thing I'd recommend: don't jam multiple event types into a
> single topic. You are better served with multiple topics, each with a
> single schema and event type. In your case, you might have a received topic
> and a transformed topic, with an app consuming received and producing
> transformed.
>
> If your transformer process consumes, produces, and commits in the right
> order, your app can crash and restart without skipping records. Consider
> using Kafka Streams for this purpose, as it takes care of the semantics you
> need to do this correctly.
>
> Ryanne
>
> On Wed, May 8, 2019 at 12:06 PM Pavel Molchanov <
> pavel.molcha...@infodesk.com> wrote:
>
> > I have an architectural question.
> >
> > I am planning to create a data transformation pipeline for document
> > transformation. Each component will send processing events to the Kafka
> > 'events' topic.
> >
> > It will have the following steps:
> >
> > 1) Upload data to the repository (S3 or other storage). Get public URL to
> > the uploaded document. Create 'received' event with the document URL and
> > send the event to the Kafka 'events' topic.
> >
> > 2) Tranformer process will be listening to the Kafka 'events' topic. It
> > will react on the 'received' event in the 'events' topic, will download
> the
> > document, transform it, push the transformed document to the repository
> (S3
> > or other storage), create 'transformed' event and send 'transformed'
> event
> > to the same 'events' topic.
> >
> > Tranformer process can break in the middle (exception, died, crashed,
> > etc.). Upon startup, Tranformer process needs to check 'events' topic for
> > documents that were received but not transformed.
> >
> > Should it read all events from the 'events' topic? Should it join
> > 'received' and 'transformed' events somehow to understand what was
> received
> > but not transformed?
> >
> > I don't have a clear idea of how it should behave.
> >
> > Please help.
> >
> > *Pavel Molchanov*
> >
>


Re: Event Sourcing question

2019-05-08 Thread Raman Gupta
If ordering of these events is important, then putting them in the
same topic is not only desired, it's necessary. See
https://www.confluent.io/blog/put-several-event-types-kafka-topic/.
However, think hard about whether ordering is actually important in
your use case or not, as things are certainly simpler when a topic
contains a single message type.

To the original question: your transformer process can be in a stream
as Ryanne suggests, which should take care of most crash situations --
Kafka won't advance the stream consumption offset unless your
transformer completes successfully. Make the transformer idempotent
i.e. if the transformed record already exists in S3, it should just
overwrite it. If you do put both types of events in the same topic,
then the transformer stream can skip "transformed" events and just
execute its transformation on "received" events.

Think hard about how you want to handle failures in the transformer
stream. Some failures are unrecoverable, and you don't want the stream
process to die if they occur, otherwise it won't be able to make
progress past the failing message. For these cases, you'll likely want
to send the data to a dead-letter queue or something similar, so you
can examine why the failure occurred, determine if it is fixable, and
reprocess the event if it is. In our case we actually don't send the
original data to the DLQ, but rather just the meta-data of the failing
message (topic, partition, offset) and error information. For
temporary failures e.g. S3 is unavailable, OutOfMemory errors, and so
forth, its ok for the stream to die, so that the process will start
over at the current offset and retry. This is a great intro to DLQs
with Kafka: https://eng.uber.com/reliable-reprocessing/.

Finally, if you need an aggregated "status" view, you can use KTables
to aggregate all the event information together.

Regards,
Raman

On Wed, May 8, 2019 at 3:44 PM Ryanne Dolan  wrote:
>
> Pavel, one thing I'd recommend: don't jam multiple event types into a
> single topic. You are better served with multiple topics, each with a
> single schema and event type. In your case, you might have a received topic
> and a transformed topic, with an app consuming received and producing
> transformed.
>
> If your transformer process consumes, produces, and commits in the right
> order, your app can crash and restart without skipping records. Consider
> using Kafka Streams for this purpose, as it takes care of the semantics you
> need to do this correctly.
>
> Ryanne
>
> On Wed, May 8, 2019 at 12:06 PM Pavel Molchanov <
> pavel.molcha...@infodesk.com> wrote:
>
> > I have an architectural question.
> >
> > I am planning to create a data transformation pipeline for document
> > transformation. Each component will send processing events to the Kafka
> > 'events' topic.
> >
> > It will have the following steps:
> >
> > 1) Upload data to the repository (S3 or other storage). Get public URL to
> > the uploaded document. Create 'received' event with the document URL and
> > send the event to the Kafka 'events' topic.
> >
> > 2) Tranformer process will be listening to the Kafka 'events' topic. It
> > will react on the 'received' event in the 'events' topic, will download the
> > document, transform it, push the transformed document to the repository (S3
> > or other storage), create 'transformed' event and send 'transformed' event
> > to the same 'events' topic.
> >
> > Tranformer process can break in the middle (exception, died, crashed,
> > etc.). Upon startup, Tranformer process needs to check 'events' topic for
> > documents that were received but not transformed.
> >
> > Should it read all events from the 'events' topic? Should it join
> > 'received' and 'transformed' events somehow to understand what was received
> > but not transformed?
> >
> > I don't have a clear idea of how it should behave.
> >
> > Please help.
> >
> > *Pavel Molchanov*
> >


[VOTE] 2.2.1 RC0

2019-05-08 Thread Vahid Hashemian
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 2.2.1, which
includes many bug fixes for Apache Kafka 2.2.

Release notes for the 2.2.1 release:
https://home.apache.org/~vahid/kafka-2.2.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Monday, May 13, 6:00 pm PT.

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~vahid/kafka-2.2.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~vahid/kafka-2.2.1-rc0/javadoc/

* Tag to be voted upon (off 2.2 branch) is the 2.2.1 tag:
https://github.com/apache/kafka/releases/tag/2.2.1-rc0

* Documentation:
https://kafka.apache.org/22/documentation.html

* Protocol:
https://kafka.apache.org/22/protocol.html

* Successful Jenkins builds for the 2.2 branch:
Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/106/

Thanks,
--Vahid


Re: Event Sourcing question

2019-05-08 Thread Ryanne Dolan
Pavel, one thing I'd recommend: don't jam multiple event types into a
single topic. You are better served with multiple topics, each with a
single schema and event type. In your case, you might have a received topic
and a transformed topic, with an app consuming received and producing
transformed.

If your transformer process consumes, produces, and commits in the right
order, your app can crash and restart without skipping records. Consider
using Kafka Streams for this purpose, as it takes care of the semantics you
need to do this correctly.

Ryanne

On Wed, May 8, 2019 at 12:06 PM Pavel Molchanov <
pavel.molcha...@infodesk.com> wrote:

> I have an architectural question.
>
> I am planning to create a data transformation pipeline for document
> transformation. Each component will send processing events to the Kafka
> 'events' topic.
>
> It will have the following steps:
>
> 1) Upload data to the repository (S3 or other storage). Get public URL to
> the uploaded document. Create 'received' event with the document URL and
> send the event to the Kafka 'events' topic.
>
> 2) Tranformer process will be listening to the Kafka 'events' topic. It
> will react on the 'received' event in the 'events' topic, will download the
> document, transform it, push the transformed document to the repository (S3
> or other storage), create 'transformed' event and send 'transformed' event
> to the same 'events' topic.
>
> Tranformer process can break in the middle (exception, died, crashed,
> etc.). Upon startup, Tranformer process needs to check 'events' topic for
> documents that were received but not transformed.
>
> Should it read all events from the 'events' topic? Should it join
> 'received' and 'transformed' events somehow to understand what was received
> but not transformed?
>
> I don't have a clear idea of how it should behave.
>
> Please help.
>
> *Pavel Molchanov*
>


Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

2019-05-08 Thread Jose Lopez
Hi John,

Thank you for your reply. Indeed, your explanation makes sense to me. Thank
you again for taking the time to reply.

Regards,
Jose

On Tue, 30 Apr 2019 at 22:22, John Roesler  wrote:

> Hi Ashok,
>
> I think some people may be able to give you advice, but please start a new
> thread instead of replying to an existing message. This just helps keep all
> the messages organized.
>
> Thanks!
> -John
>
> On Thu, Apr 25, 2019 at 6:12 AM ASHOK MACHERLA 
> wrote:
>
> > Hii,
> >
> > what I asking
> >
> > I want to know about kafka partitions
> >
> >
> > we have getting data about 200GB+ from sources to kafka for daily .
> >
> > I need to know how many partitions are required to pull data from source
> > without pileup.
> >
> > please suggest us to fix this issue.
> >
> > is there any mathematical rules to create specific no.of partitions for
> > Topic.???
> >
> >
> > please help me
> >
> > Sent from Outlook
> > 
> > From: Jose Lopez 
> > Sent: 25 April 2019 16:34
> > To: users@kafka.apache.org
> > Subject: [Streams] TimeWindows ignores gracePeriodMs in
> > windowsFor(timestamp)
> >
> > Hi all,
> >
> > Given that gradePeriodMs is "the time to admit late-arriving events after
> > the end of the window", I'd expect it is taken into account in
> > windowsFor(timestamp). E.g.:
> >
> > sizeMs = 5
> > gracePeriodMs = 2
> > advanceMs = 3
> > timestamp = 6
> >
> > | window | windowStart | windowEnd | windowsEnd + gracePeriod |
> > | 1   | 0   | 5 | 7
> >|
> > | 2   | 5   | 10   | 12
> >  |
> > ...
> >
> > Current output:
> > windowsFor(timestamp) returns window 2 only.
> >
> > Expected output:
> > windowsFor(timestamp) returns both window 1 and window 2
> >
> > Do you agree with the expected output? Am I missing something?
> >
> > Regards,
> > Jose
> >
>


Event Sourcing question

2019-05-08 Thread Pavel Molchanov
I have an architectural question.

I am planning to create a data transformation pipeline for document
transformation. Each component will send processing events to the Kafka
'events' topic.

It will have the following steps:

1) Upload data to the repository (S3 or other storage). Get public URL to
the uploaded document. Create 'received' event with the document URL and
send the event to the Kafka 'events' topic.

2) Tranformer process will be listening to the Kafka 'events' topic. It
will react on the 'received' event in the 'events' topic, will download the
document, transform it, push the transformed document to the repository (S3
or other storage), create 'transformed' event and send 'transformed' event
to the same 'events' topic.

Tranformer process can break in the middle (exception, died, crashed,
etc.). Upon startup, Tranformer process needs to check 'events' topic for
documents that were received but not transformed.

Should it read all events from the 'events' topic? Should it join
'received' and 'transformed' events somehow to understand what was received
but not transformed?

I don't have a clear idea of how it should behave.

Please help.

*Pavel Molchanov*


Re: Using processor API via DSL

2019-05-08 Thread Alessandro Tagliapietra
Hi Bruno,

no worries.
No that was an old problem, the latest code is on the gist from the last
email.
Anyway I've pushed the master branch with the same code, I'm not sure I've
done the right thing with the jars but the code should be there.
The gist https://gist.github.com/alex88/43b72e23bda9e15657b008855e1904db is
the one with most information and the logs on what I was seeing.

Thank you for your help

--
Alessandro Tagliapietra


On Wed, May 8, 2019 at 2:18 AM Bruno Cadonna  wrote:

> Hi Alessandro,
>
> Apologies for the late reply.
>
> I tried the code from your repository under
> https://github.com/alex88/kafka-test/tree/master and I run into a
> `ClassCastException`. I think this is a bug that is described here
> https://issues.apache.org/jira/browse/KAFKA-8317 .
>
> Should I have tried one of the other branches?
>
> Best regards,
> Bruno
>
> On Fri, May 3, 2019 at 9:33 AM Alessandro Tagliapietra <
> tagliapietra.alessan...@gmail.com> wrote:
>
> > Ok so I'm not sure if I did this correctly,
> >
> > I've upgraded both the server (by replacing the JARs in the confluent
> > docker image with those built from kafka source) and the client (by using
> > the built JARs as local file dependencies).
> > I've used this as source:
> https://github.com/apache/kafka/archive/2.2.zip
> > When the server runs it prints:
> >
> > INFO Kafka version: 2.2.1-SNAPSHOT
> > (org.apache.kafka.common.utils.AppInfoParser).
> >
> > and regarding the client I don't see any kafka jars in the "External
> > libraries" of the IntelliJ project tab so I think it's using the local
> JARs
> > (2.2.1-SNAPSHOT).
> >
> > The problem is that the window isn't keeping the old values and still
> emits
> > values with partially processed intervals.
> >
> > Just to summarize:
> > https://gist.github.com/alex88/43b72e23bda9e15657b008855e1904db
> >
> >  - consumer emits one message per second with production = 1
> >  - windowing stream should emit one message each 10 seconds with the sum
> of
> > productions (so production = 10)
> >
> > If I restart the stream processor, it emits window functions with partial
> > data (production < 10) as you can see from the logs.
> > I've checked the JAR file and it seems to include changes from
> > https://github.com/apache/kafka/pull/6623 (it has the newly
> > added FixedOrderMap class)
> >
> > Even after removing the suppress() the error seems to persist (look at
> > consumer_nosuppress), here it seems it loses track of the contents of the
> > window:
> >
> > S1 with computed metric {"timestamp": 5, "production": 10}
> > S1 with computed metric {"timestamp": 6, "production": 1}
> > S1 with computed metric {"timestamp": 6, "production": 2}
> > S1 with computed metric {"timestamp": 6, "production": 3}
> > S1 with computed metric {"timestamp": 6, "production": 4}
> > -- RESTART --
> > S1 with computed metric {"timestamp": 6, "production": 1}
> > S1 with computed metric {"timestamp": 6, "production": 2}
> > S1 with computed metric {"timestamp": 6, "production": 3}
> > S1 with computed metric {"timestamp": 6, "production": 4}
> > S1 with computed metric {"timestamp": 6, "production": 5}
> > S1 with computed metric {"timestamp": 6, "production": 6}
> > S1 with computed metric {"timestamp": 7, "production": 1}
> >
> > after restart during the 60 seconds window the sum restarts.
> >
> > Is it something wrong with my implementation?
> >
> > --
> > Alessandro Tagliapietra
> >
> > On Thu, May 2, 2019 at 7:58 PM Alessandro Tagliapietra <
> > tagliapietra.alessan...@gmail.com> wrote:
> >
> > > Hi Bruno,
> > >
> > > thank you for your help, glad to hear that those are only bugs and not
> a
> > > problem on my implementation,
> > > I'm currently using confluent docker images, I've checked their master
> > > branch which seems to use the SNAPSHOT version however those
> > > images/packages aren't publicly available. Are there any snapshot
> builds
> > > available?
> > > In the meantime I'm trying to create a custom docker image from kafka
> > > source.
> > >
> > > Thanks
> > >
> > > --
> > > Alessandro Tagliapietra
> > >
> > > On Tue, Apr 23, 2019 at 8:52 AM Bruno Cadonna 
> > wrote:
> > >
> > >> Hi Alessandro,
> > >>
> > >> It seems that the behaviour you described regarding the window
> > aggregation
> > >> is due to bugs. The good news is that the bugs have been already
> fixed.
> > >>
> > >> The relevant bug reports are
> > >> https://issues.apache.org/jira/browse/KAFKA-7895
> > >> https://issues.apache.org/jira/browse/KAFKA-8204
> > >>
> > >> The fixes for both bugs have been already merged to the 2.2 branch.
> > >>
> > >> Could you please build from the 2.2 branch and confirm that the fixes
> > >> solve
> > >> your problem?
> > >>
> > >> Best,
> > >> Bruno
> > >>
> > >>
> > >> On Sat, Apr 20, 2019 at 2:16 PM Alessandro Tagliapietra <
> > >> tagliapietra.alessan...@gmail.com> wrote:
> > >>
> > >> > Thanks Matthias, one less thing to worry about in the future :)
> > >> >

Re: Using processor API via DSL

2019-05-08 Thread Bruno Cadonna
Hi Alessandro,

Apologies for the late reply.

I tried the code from your repository under
https://github.com/alex88/kafka-test/tree/master and I run into a
`ClassCastException`. I think this is a bug that is described here
https://issues.apache.org/jira/browse/KAFKA-8317 .

Should I have tried one of the other branches?

Best regards,
Bruno

On Fri, May 3, 2019 at 9:33 AM Alessandro Tagliapietra <
tagliapietra.alessan...@gmail.com> wrote:

> Ok so I'm not sure if I did this correctly,
>
> I've upgraded both the server (by replacing the JARs in the confluent
> docker image with those built from kafka source) and the client (by using
> the built JARs as local file dependencies).
> I've used this as source: https://github.com/apache/kafka/archive/2.2.zip
> When the server runs it prints:
>
> INFO Kafka version: 2.2.1-SNAPSHOT
> (org.apache.kafka.common.utils.AppInfoParser).
>
> and regarding the client I don't see any kafka jars in the "External
> libraries" of the IntelliJ project tab so I think it's using the local JARs
> (2.2.1-SNAPSHOT).
>
> The problem is that the window isn't keeping the old values and still emits
> values with partially processed intervals.
>
> Just to summarize:
> https://gist.github.com/alex88/43b72e23bda9e15657b008855e1904db
>
>  - consumer emits one message per second with production = 1
>  - windowing stream should emit one message each 10 seconds with the sum of
> productions (so production = 10)
>
> If I restart the stream processor, it emits window functions with partial
> data (production < 10) as you can see from the logs.
> I've checked the JAR file and it seems to include changes from
> https://github.com/apache/kafka/pull/6623 (it has the newly
> added FixedOrderMap class)
>
> Even after removing the suppress() the error seems to persist (look at
> consumer_nosuppress), here it seems it loses track of the contents of the
> window:
>
> S1 with computed metric {"timestamp": 5, "production": 10}
> S1 with computed metric {"timestamp": 6, "production": 1}
> S1 with computed metric {"timestamp": 6, "production": 2}
> S1 with computed metric {"timestamp": 6, "production": 3}
> S1 with computed metric {"timestamp": 6, "production": 4}
> -- RESTART --
> S1 with computed metric {"timestamp": 6, "production": 1}
> S1 with computed metric {"timestamp": 6, "production": 2}
> S1 with computed metric {"timestamp": 6, "production": 3}
> S1 with computed metric {"timestamp": 6, "production": 4}
> S1 with computed metric {"timestamp": 6, "production": 5}
> S1 with computed metric {"timestamp": 6, "production": 6}
> S1 with computed metric {"timestamp": 7, "production": 1}
>
> after restart during the 60 seconds window the sum restarts.
>
> Is it something wrong with my implementation?
>
> --
> Alessandro Tagliapietra
>
> On Thu, May 2, 2019 at 7:58 PM Alessandro Tagliapietra <
> tagliapietra.alessan...@gmail.com> wrote:
>
> > Hi Bruno,
> >
> > thank you for your help, glad to hear that those are only bugs and not a
> > problem on my implementation,
> > I'm currently using confluent docker images, I've checked their master
> > branch which seems to use the SNAPSHOT version however those
> > images/packages aren't publicly available. Are there any snapshot builds
> > available?
> > In the meantime I'm trying to create a custom docker image from kafka
> > source.
> >
> > Thanks
> >
> > --
> > Alessandro Tagliapietra
> >
> > On Tue, Apr 23, 2019 at 8:52 AM Bruno Cadonna 
> wrote:
> >
> >> Hi Alessandro,
> >>
> >> It seems that the behaviour you described regarding the window
> aggregation
> >> is due to bugs. The good news is that the bugs have been already fixed.
> >>
> >> The relevant bug reports are
> >> https://issues.apache.org/jira/browse/KAFKA-7895
> >> https://issues.apache.org/jira/browse/KAFKA-8204
> >>
> >> The fixes for both bugs have been already merged to the 2.2 branch.
> >>
> >> Could you please build from the 2.2 branch and confirm that the fixes
> >> solve
> >> your problem?
> >>
> >> Best,
> >> Bruno
> >>
> >>
> >> On Sat, Apr 20, 2019 at 2:16 PM Alessandro Tagliapietra <
> >> tagliapietra.alessan...@gmail.com> wrote:
> >>
> >> > Thanks Matthias, one less thing to worry about in the future :)
> >> >
> >> > --
> >> > Alessandro Tagliapietra
> >> >
> >> >
> >> > On Sat, Apr 20, 2019 at 11:23 AM Matthias J. Sax <
> matth...@confluent.io
> >> >
> >> > wrote:
> >> >
> >> > > Just a side note. There is currently work in progress on
> >> > > https://issues.apache.org/jira/browse/KAFKA-3729 that should fix
> the
> >> > > configuration problem for Serdes.
> >> > >
> >> > > -Matthias
> >> > >
> >> > > On 4/19/19 9:12 PM, Alessandro Tagliapietra wrote:
> >> > > > Hi Bruno,
> >> > > > thanks a lot for checking the code, regarding the
> SpecificAvroSerde
> >> > I've
> >> > > > found that using
> >> > > >
> >> > > > final Serde valueSpecificAvroSerde = new
> >> > > SpecificAvroSerde<>();
> >> > > > final Map serdeConfig =
> >> > > > 

Re: Topic Creation two different config(s)

2019-05-08 Thread Nayanjyoti Deka
You can use AdminClient.createTopics with the necessary configuration.

On Wed, May 8, 2019 at 12:08 PM SenthilKumar K 
wrote:

> Hello Experts , We have a requirement to create topic dynamically with two
> different config(s). Is this possible in Kafka ?
>
> Kafka Version : 2.2.0
>
> Topics with different settings:
> #1 - Set retention as 24 hours for free tier customers
> # 2 - Set retention as 72 hours for paid customers
>
> Note : Right now Java Producer is auto creating topics with broker default
> config(s).
>
> --Senthil
>


Topic Creation two different config(s)

2019-05-08 Thread SenthilKumar K
Hello Experts , We have a requirement to create topic dynamically with two
different config(s). Is this possible in Kafka ?

Kafka Version : 2.2.0

Topics with different settings:
#1 - Set retention as 24 hours for free tier customers
# 2 - Set retention as 72 hours for paid customers

Note : Right now Java Producer is auto creating topics with broker default
config(s).

--Senthil