Memory Leak in Kafka

2018-01-23 Thread Avinash Herle
Hi,

I'm using Kafka as a messaging system in my data pipeline. I've a couple of
producer processes in my pipeline and Spark Streaming

and Druid's Kafka indexing service

as consumers of Kafka. The indexing service spawns 40 new indexing tasks
(Kafka consumers) every 15 mins.

The heap memory used on Kafka seems fairly constant for an hour after which
it seems to shoot up to the max allocated space. The garbage collection
logs of Kafka seems to indicate a memory leak in Kafka. Find attached the
plots generated from the GC logs.

*Kafka Deployment:*
3 nodes, with 3 topics and 64 partitions per topic

*Kafka Runtime jvm parameters:*
8GB Heap Memory
1GC swap Memory
Using G1GC
MaxGCPauseMilllis=20
InitiatingHeapOccupancyPercent=35

*Kafka Versions Used:*
I've used Kafka version 0.10.0, 0.11.0.2 and 1.0.0 and find similar behavior

*Questions:*
1) Is this a memory leak on the Kafka side or a misconfiguration of my
Kafka cluster? Does Kafka stably handle large number of consumers being
added periodically?
2) As a knock on effect, We also notice kafka partitions going offline
periodically after some time with the following error:
ERROR [ReplicaFetcherThread-18-2], Error for partition [topic1,2] to
broker 2:*org.apache.kafka.common.errors.UnknownTopicOrPartitionException*:
This server does not host this topic-partition.
(kafka.server.ReplicaFetcherThread)

Can someone shed some light on the behavior being seen in my cluster?

Please let me know if more details are needed to root cause the behavior
being seen.

Thanks in advance.

Avinash
[image: Screen Shot 2018-01-23 at 2.29.04 PM.png][image: Screen Shot
2018-01-23 at 2.29.21 PM.png]




-- 

Excuse brevity and typos. Sent from mobile device.


best practices for replication factor / partitions __consumer_offsets

2018-01-23 Thread Dennis
Hi,

Are there any best practices or how to size __consumer_offsets and the
associated replication factor?

Regards,

Dennis O.


uncontinuous offset for consumer seek

2018-01-23 Thread namesuperwood
Hi all


kafka version : kafka_2.11-0.11.0.2


A topic-partition "adn-tracking,15" in kafka who's earliest offset is1255644602 
andlatest offset is1271253441. 
While starting a spark streaming to process the data from the topic , we got a 
exception with "Got wrong record  even after seeking to offset 1266921577”. 
[(earliest offset) 1255644602  1266921577   1271253441 ( latest offset ) ]
Iimplemented a simple project to use consumer to seek offset 1266921577. But it 
return the offset1266921578. Then while seek to1266921576, it return 
the1266921576exactly。
Why ? How to fix that ?


There is the code:
public class consumerDemo {
public static void main(String[] argv){ 
Properties props = new Properties();
props.put("bootstrap.servers", "172.31.29.31:9091");
props.put("group.id", "consumer-tutorial-demo");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumerString, String consumer = new KafkaConsumerString, String(props);
TopicPartition tp = new TopicPartition("adn-tracking-click", 15);
CollectionTopicPartition collection = new ArrayListTopicPartition();
collection.add(tp); consumer.assign(collection);
consumer.seek(tp, 1266921576); ConsumerRecordsString, String consumerRecords = 
consumer.poll(1);
ListConsumerRecordString, String listR = consumerRecords.records(tp);
IteratorConsumerRecordString, String  iter = listR.iterator();
ConsumerRecordString, String record = iter.next();
System.out.println(" the next record " + record.offset() + " recode topic " + 
record.topic());
 }
}

Re: Memory Leak in Kafka

2018-01-23 Thread Ted Yu
Did you attach two .png files ?

Please use third party site since the attachment didn't come thru.

On Tue, Jan 23, 2018 at 5:20 PM, Avinash Herle 
wrote:

>
> Hi,
>
> I'm using Kafka as a messaging system in my data pipeline. I've a couple
> of producer processes in my pipeline and Spark Streaming
> 
> and Druid's Kafka indexing service
> 
> as consumers of Kafka. The indexing service spawns 40 new indexing tasks
> (Kafka consumers) every 15 mins.
>
> The heap memory used on Kafka seems fairly constant for an hour after
> which it seems to shoot up to the max allocated space. The garbage
> collection logs of Kafka seems to indicate a memory leak in Kafka. Find
> attached the plots generated from the GC logs.
>
> *Kafka Deployment:*
> 3 nodes, with 3 topics and 64 partitions per topic
>
> *Kafka Runtime jvm parameters:*
> 8GB Heap Memory
> 1GC swap Memory
> Using G1GC
> MaxGCPauseMilllis=20
> InitiatingHeapOccupancyPercent=35
>
> *Kafka Versions Used:*
> I've used Kafka version 0.10.0, 0.11.0.2 and 1.0.0 and find similar
> behavior
>
> *Questions:*
> 1) Is this a memory leak on the Kafka side or a misconfiguration of my
> Kafka cluster?
> 2) Druid creates new indexing tasks periodically. Does Kafka stably handle
> large number of consumers being added periodically?
> 3) As a knock on effect, We also notice kafka partitions going offline
> periodically after some time with the following error:
> ERROR [ReplicaFetcherThread-18-2], Error for partition [topic1,2] to
> broker 2:*org.apache.kafka.common.errors.UnknownTopicOrPartitionException*:
> This server does not host this topic-partition. (kafka.server.
> ReplicaFetcherThread)
>
> Can someone shed some light on the behavior being seen in my cluster?
>
> Please let me know if more details are needed to root cause the behavior
> being seen.
>
> Thanks in advance.
>
> Avinash
> [image: Screen Shot 2018-01-23 at 2.29.04 PM.png][image: Screen Shot
> 2018-01-23 at 2.29.21 PM.png]
>
>
>
>
> --
>
> Excuse brevity and typos. Sent from mobile device.
>
>
> --
>
> Excuse brevity and typos. Sent from mobile device.
>


Memory Leak in Kafka

2018-01-23 Thread Avinash Herle
Hi,

I'm using Kafka as a messaging system in my data pipeline. I've a couple of
producer processes in my pipeline and Spark Streaming

and Druid's Kafka indexing service

as consumers of Kafka. The indexing service spawns 40 new indexing tasks
(Kafka consumers) every 15 mins.

The heap memory used on Kafka seems fairly constant for an hour after which
it seems to shoot up to the max allocated space. The garbage collection
logs of Kafka seems to indicate a memory leak in Kafka. Find attached the
plots generated from the GC logs.

*Kafka Deployment:*
3 nodes, with 3 topics and 64 partitions per topic

*Kafka Runtime jvm parameters:*
8GB Heap Memory
1GC swap Memory
Using G1GC
MaxGCPauseMilllis=20
InitiatingHeapOccupancyPercent=35

*Kafka Versions Used:*
I've used Kafka version 0.10.0, 0.11.0.2 and 1.0.0 and find similar behavior

*Questions:*
1) Is this a memory leak on the Kafka side or a misconfiguration of my
Kafka cluster?
2) Druid creates new indexing tasks periodically. Does Kafka stably handle
large number of consumers being added periodically?
3) As a knock on effect, We also notice kafka partitions going offline
periodically after some time with the following error:
ERROR [ReplicaFetcherThread-18-2], Error for partition [topic1,2] to
broker 2:*org.apache.kafka.common.errors.UnknownTopicOrPartitionException*:
This server does not host this topic-partition.
(kafka.server.ReplicaFetcherThread)

Can someone shed some light on the behavior being seen in my cluster?

Please let me know if more details are needed to root cause the behavior
being seen.

Thanks in advance.

Avinash
[image: Screen Shot 2018-01-23 at 2.29.04 PM.png][image: Screen Shot
2018-01-23 at 2.29.21 PM.png]




-- 

Excuse brevity and typos. Sent from mobile device.


-- 

Excuse brevity and typos. Sent from mobile device.


Re: can't feed remote broker with producer demo

2018-01-23 Thread Manoj Khangaonkar
In your server.properties , in either the listeners or advertised.listeners
, replace the localhost with the ip address.

regards

On Mon, Jan 22, 2018 at 7:16 AM, Rotem Jacobi  wrote:

> Hi,
> When running the quickstart guide (producer, broker (with zookeeper) and
> consumer on the same machine) it works perfectly.
> When trying to run the producer from another machine to feed the broker
> I'm getting an error message:
>
> C:\Development\kafka_2.11-0.11.0.0\bin\windows>kafka-console-producer.bat
> --broker-list 172.16.40.125:9092 --topic test
> >test_message
> >[2018-01-22 17:14:44,240] ERROR Error when sending message to topic test
> with key: null, value: 12 bytes with error: (org.apache.kafka.clients.
> producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
> test-0: 2298 ms has passed since batch creation plus linger time
> Any idea?
> Is there any broker configuration that I'm missing here?
>
> Thanks,
> Rotem.
>



-- 
http://khangaonkar.blogspot.com/


Re: [VOTE] KIP-247: Add public test utils for Kafka Streams

2018-01-23 Thread Matthias J. Sax
One minor change to the KIP. The class TopologyTestDriver will be in
package `org.apache.kafka.streams` (instead of `o.a.k.streams.test`).


+1 (binding).


I am closing this vote as accepted with 3 binding votes (Damian,
Guozhang, Matthias) and 2 non-binding votes (Bill, James).

Thanks for the discussion and voting!


-Matthias


On 1/18/18 4:17 PM, Matthias J. Sax wrote:
> I added the new method to the KIP and also updated the PR.
> 
> -Matthias
> 
> On 1/18/18 10:48 AM, Guozhang Wang wrote:
>> @Matthias
>>
>> This comes to me while reviewing another using the test driver: could we
>> add a `Map allStateStores()` to the
>> `TopologyTestDriver` besides all the get-store-by-name functions? This is
>> because some of the internal state stores may be implicitly created but
>> users may still want to check its state.
>>
>>
>> Guozhang
>>
>>
>> On Thu, Jan 18, 2018 at 8:40 AM, James Cheng  wrote:
>>
>>> +1 (non-binding)
>>>
>>> -James
>>>
>>> Sent from my iPhone
>>>
 On Jan 17, 2018, at 6:09 PM, Matthias J. Sax 
>>> wrote:

 Hi,

 I would like to start the vote for KIP-247:
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 247%3A+Add+public+test+utils+for+Kafka+Streams


 -Matthias

>>>
>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Kafka Connect high CPU / Memory usage

2018-01-23 Thread Ziliang Chen
Hi,

I run Kafka Connect 0.11.2 (Kafka version) with only one instance and 6
tasks in it for a while, it then showed very high CPU usage ~250 % and
memory consumption 10 Gb memory and it reported consume group rebalance
warning. After the Kafka connect was killed and restarted, it showed the
same symptom, very high cpu usage and quickly consume 10 GB memory even
there is no tasks running there.  The start log showed below.

...
Jan 23, 2018 1:07:58 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: The
(sub)resource method createConnector in org.apache.kafka.connect.
runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectors in
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains
empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in
org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource
contains empty path annotation.
WARNING: The (sub)resource method serverInfo in org.apache.kafka.connect.
runtime.rest.resources.RootResource contains empty path annotation.

[2018-01-23 01:07:58,388] INFO Started
o.e.j.s.ServletContextHandler@5b936410{/,null,AVAILABLE}
(org.eclipse.jetty.server.handler.ContextHandler:744)
[2018-01-23 01:07:58,404] INFO Started ServerConnector@33bef845{HTTP/1.1}{
0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:266)
[2018-01-23 01:07:58,405] INFO Started @3854ms (org.eclipse.jetty.server.
Server:379)
[2018-01-23 01:07:58,406] INFO REST server listening at
http://172.17.0.2:8083/, advertising URL http://172.17.0.2:8083/
 (org.apache.kafka.connect.runtime.rest.RestServer:150)
[2018-01-23 01:07:58,406] INFO Kafka Connect started
(org.apache.kafka.connect.runtime.Connect:55)

screenshot about the CPU and memory usage is attached as below.

[image: Inline image 1]

Could you please shed some light here ?

Thank you !
-- 
Regards, Zi-Liang

Mail:zlchen@gmail.com


Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Hi Matthias and Guozhang,

Given that information, I think I’m going to try out the following in our data 
lake persisters (spark-streaming-kafka): 
https://issues.apache.org/jira/browse/SPARK-17147 


Skipping one message out of 10+ billion a day won’t be the end of the world for 
this topic and it’ll save me from having to manually restart the process. :)

These topics aren’t compacted, and we’re still only on 0.10 (switched to 0.10 
today), but we were able to reproduce the issue when we restarted the Kafka 
brokers migrating from 0.9.0.0 message format to 0.10.2.

Thanks,
Justin

> On Jan 23, 2018, at 2:31 PM, Guozhang Wang  wrote:
> 
> Hello Justin,
> 
> There are actually multi reasons that can cause incontinuous offsets, or
> "holes" in the Kafka partition logs:
> 
> 1. compaction, you knew it already.
> 2. when transactions are turned on, then some offsets are actually taken by
> the "transaction marker" messages, which will not be exposed by the
> consumer since they are only used internally. So from the reader's pov
> there are holes in the offsets.
> 
> 
> 
> Guozhang
> 
> 
> 
> 
> On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
> justin.mil...@protectwise.com> wrote:
> 
>> Greetings,
>> 
>> We’ve seen a strange situation where-in the topic is not compacted but the
>> offset numbers inside the partition (#93) are not contiguous. This only
>> happens once a day though, on a topic with billions of messages per day.
>> 
>> next offset = 1786997223
>> next offset = 1786997224
>> next offset = 1786997226
>> next offset = 1786997227
>> next offset = 1786997228
>> 
>> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>> 
>> Specifically: “In Kafka 0.8, each message is assigned a monotonically
>> increasing, contiguous sequence number per partition,starting with 1.”
>> 
>> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>> 
>> Thanks!
>> Justin
> 
> 
> 
> 
> -- 
> -- Guozhang



Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Guozhang Wang
Hello Justin,

There are actually multi reasons that can cause incontinuous offsets, or
"holes" in the Kafka partition logs:

1. compaction, you knew it already.
2. when transactions are turned on, then some offsets are actually taken by
the "transaction marker" messages, which will not be exposed by the
consumer since they are only used internally. So from the reader's pov
there are holes in the offsets.



Guozhang




On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
justin.mil...@protectwise.com> wrote:

> Greetings,
>
> We’ve seen a strange situation where-in the topic is not compacted but the
> offset numbers inside the partition (#93) are not contiguous. This only
> happens once a day though, on a topic with billions of messages per day.
>
> next offset = 1786997223
> next offset = 1786997224
> next offset = 1786997226
> next offset = 1786997227
> next offset = 1786997228
>
> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>
> Specifically: “In Kafka 0.8, each message is assigned a monotonically
> increasing, contiguous sequence number per partition,starting with 1.”
>
> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>
> Thanks!
> Justin




-- 
-- Guozhang


Re: Merging Two KTables

2018-01-23 Thread Guozhang Wang
Hi Sameer, Dmitry:

Just a side note that for KStream.merge(), we do not guarantee timestamp
ordering, so the resulted KStream may likely have out-of-ordering regarding
the timestamps. If you do want to have some merging operations that
respects the timestamps of the input streams because you believe they are
well aligned, you need to either assume that all input streams do not have
any out-of-ordering data, so some online merge-sort can be applied, or you
assume the out of time range has some upper bound in practice so you can
bookkeep and wait. As said, there is no golden standard rules for merging
and hence we leave it to users to customize in the "process(Processor)
API", or use "merge" if they are tolerable about timestamp ordering in the
resulted stream.


Guozhang


On Tue, Jan 23, 2018 at 1:12 PM, Matthias J. Sax 
wrote:

> Well. That is one possibility I guess. But some other way might be to
> "merge both values" into a single one... There is no "straight forward"
> best semantics IMHO.
>
> If you really need this, you can build it via Processor API.
>
>
> -Matthias
>
>
> On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
> >> Merging two tables does not make too much sense because each table might
> > contain an entry for the same key. So it's unclear, which of both values
> > the merged table should contain.
> >
> > Which of both values should the table contain? Seems straightforward: it
> > should contain the value with the highest timestamp, with
> non-deterministic
> > behavior when two timestamps are the same.
> >
> >
> > ср, 26 июля 2017 г. в 9:42, Matthias J. Sax :
> >
> >> Merging two tables does not make too much sense because each table might
> >> contain an entry for the same key. So it's unclear, which of both values
> >> the merged table should contain.
> >>
> >> KTable.toStream() is just a semantic change and has no runtime overhead.
> >>
> >> -Matthias
> >>
> >>
> >> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> >>> Hi,
> >>>
> >>> Is there a way I can merge two KTables just like I have in KStreams
> api.
> >>> KBuilder.merge().
> >>>
> >>> I understand I can use KTable.toStream(), if I choose to use it, is
> there
> >>> any performance cost associated with this conversion or is it just a
> API
> >>> conversion.
> >>>
> >>> -Sameer.
> >>>
> >>
> >>
> >
>
>


-- 
-- Guozhang


Re: Contiguous Offsets on non-compacted topics

2018-01-23 Thread Matthias J. Sax
In general offsets should be consecutive. However, this is no "official
guarantee" and you should not build application that rely on consecutive
offsets.

Also note, with Kafka 0.11 and transactions, commit/abort markers
require on offset in the partitions and thus, having "offset gaps" is
normal for this case.

Not sure atm, why you have a "offset gap" as your 0.9 log format does
not support transactions.


-Matthias


On 1/23/18 9:52 AM, Justin Miller wrote:
> Greetings, 
> 
> We’ve seen a strange situation where-in the topic is not compacted but the 
> offset numbers inside the partition (#93) are not contiguous. This only 
> happens once a day though, on a topic with billions of messages per day.
> 
> next offset = 1786997223
> next offset = 1786997224
> next offset = 1786997226
> next offset = 1786997227
> next offset = 1786997228
> 
> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:   
> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets 
> 
> 
> Specifically: “In Kafka 0.8, each message is assigned a monotonically 
> increasing, contiguous sequence number per partition,starting with 1.”
> 
> We’re on Kafka 1.0 with logs at version 0.9.0.0.
> 
> Thanks!
> Justin
> 



signature.asc
Description: OpenPGP digital signature


Re: What is the best way to re-key a KTable?

2018-01-23 Thread Matthias J. Sax
Exactly. Because you might get multiple values with the same key, you
need to specify an aggregation to get a single value for each key.

-Matthias

On 1/23/18 9:14 AM, Dmitry Minkovsky wrote:
> KStream has a simple `#selectKey()` method, but it appears the only way to
> re-key a KTable is by doing `.toStream(mapper).groupByKey().reduce()`. Is
> this correct? I'm guessing this is because an attempt to re-key a table
> might result in multiple values at the new key.
> 



signature.asc
Description: OpenPGP digital signature


Re: Merging Two KTables

2018-01-23 Thread Matthias J. Sax
Well. That is one possibility I guess. But some other way might be to
"merge both values" into a single one... There is no "straight forward"
best semantics IMHO.

If you really need this, you can build it via Processor API.


-Matthias


On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
>> Merging two tables does not make too much sense because each table might
> contain an entry for the same key. So it's unclear, which of both values
> the merged table should contain.
> 
> Which of both values should the table contain? Seems straightforward: it
> should contain the value with the highest timestamp, with non-deterministic
> behavior when two timestamps are the same.
> 
> 
> ср, 26 июля 2017 г. в 9:42, Matthias J. Sax :
> 
>> Merging two tables does not make too much sense because each table might
>> contain an entry for the same key. So it's unclear, which of both values
>> the merged table should contain.
>>
>> KTable.toStream() is just a semantic change and has no runtime overhead.
>>
>> -Matthias
>>
>>
>> On 7/26/17 1:34 PM, Sameer Kumar wrote:
>>> Hi,
>>>
>>> Is there a way I can merge two KTables just like I have in KStreams api.
>>> KBuilder.merge().
>>>
>>> I understand I can use KTable.toStream(), if I choose to use it, is there
>>> any performance cost associated with this conversion or is it just a API
>>> conversion.
>>>
>>> -Sameer.
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Kafka/zookeeper logs in every command

2018-01-23 Thread Xin Li
Looking for log4j.properties, in the config directoires.

Best,
Xin
On 23.01.18, 16:58, "José Ribeiro"  wrote:

Good morning.


I have a problem about kafka logs showing in my outputs.

When i started to work with kafka, the outputs were normal. For example:


kafka-topics.sh --list --zookeeper localhost:2181

returned

test


Now, with the same command line, i get this:


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/kafka_2.11-1.0.0/libs/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
SLF4J: Actual binding is of type 
[org.apache.logging.slf4j.Log4jLoggerFactory]
2018-01-23T11:26:53,964 INFO [ZkClient-EventThread-12-localhost:2181] 
org.I0Itec.zkclient.ZkEventThread - Starting ZkClient event thread.
2018-01-23T11:26:53,973 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2018-01-23T11:26:53,974 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:host.name=ubuntu
2018-01-23T11:26:53,974 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:java.version=1.8.0_151
2018-01-23T11:26:53,974 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:java.vendor=Oracle Corporation
2018-01-23T11:26:53,975 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:java.home=/usr/lib/jvm/java-8-oracle/jre
2018-01-23T11:26:53,975 INFO [main] org.apache.zookeeper.ZooKeeper - Client 

JSONSchema Kafka Connect Converter

2018-01-23 Thread Andrew Otto
Hi all,

I’ve been thinking a lot recently about JSON and Kafka.  Because JSON is
not strongly typed, it isn’t treated as a first class citizen of the Kafka
ecosystem.  At Wikimedia, we use JSONSchema validated JSON
 for Kafka
messages.  This makes it so easy for our many disparate teams and services
to consume data from Kafka, without having to consult a remote schema
registry to read data.  (Yes we have to worry about schema evolution, but
we do this on the producer side by requiring that the only schema change
allowed is adding optional fields.)

There’s been discussion
 about
JSONSchema support in Confluent’s Schema registry, or perhaps even support
to produce validated Avro JSON (not binary) from Kafka REST proxy.

However, the more I think about this, I realize that I don’t really care
about JSON support in Confluent products.  What I (and I betcha most of the
folks who commented on the issue
) really want
is the ability to use Kafka Connect with JSON data.  Kafka Connect does
sort of support this, but only if your JSON messages conform to its very
specific envelope schema format

.

What if…Kafka Connect provided a JSONSchemaConverter (*not* Connect’s
JsonConverter), that knew how to convert between a provided JSONSchema and
Kafka Connect internal Schemas?  Would this enable what I think it would?
Would this allow for configuration of Connectors with JSONSchemas to read
JSON messages directly from a Kafka topic?  Once read and converted to a
ConnectRecord, the messages could be used with any Connector out there,
right?

I might have space in the next year to work on something like this, but I
thought I’d ask here first to see what others thought.  Would this be
useful?  If so, is this something that might be upstreamed into Apache
Kafka?

- Andrew Otto
  Senior Systems Engineer
  Wikimedia Foundation


Adding company CICS to "Powered by" page?

2018-01-23 Thread Christoffer Ramm
Hi,
I would like for Swedish software company CICS (https://www.cics.se) to be 
added to the “Powered by Kafka” reference page.

Thanks in advance!

Cheers,
Chris


CICS provides Customer Experience software to Communications Service Providers 
and Apache Kafka is used as queue for input data from telecom networks,
managing billions of transactions in near realtime for each customer every day.




Christoffer Ramm
SVP Head of Marketing



christoffer.r...@cics.se 
+46 733 27 27 70
 
CICS Office
Skeppsbron 28
111 30 Stockholm, Sweden

www.cics.se  

 Follow us on LinkedIn 
 
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error, please notify the system manager.



Regarding : Store stream for infinite time

2018-01-23 Thread Aman Rastogi
Hi All,

We have a use case to store stream for infinite time (given we have enough
storage).

We are planning to solve this by Log Compaction. If each message key is
unique and Log compaction is enabled, it will store whole stream for
infinite time. Just wanted to check if my assumption is correct and this is
an appropriate way to solve this.

Thanks in advance.

Regards,
Aman


Re: Regarding : Store stream for infinite time

2018-01-23 Thread Aman Rastogi
Thanks Svante.

Regards,
Aman

On Tue, Jan 23, 2018 at 11:38 PM, Svante Karlsson 
wrote:

> Yes, it will store the last value for each key
>
> 2018-01-23 18:30 GMT+01:00 Aman Rastogi :
>
> > Hi All,
> >
> > We have a use case to store stream for infinite time (given we have
> enough
> > storage).
> >
> > We are planning to solve this by Log Compaction. If each message key is
> > unique and Log compaction is enabled, it will store whole stream for
> > infinite time. Just wanted to check if my assumption is correct and this
> is
> > an appropriate way to solve this.
> >
> > Thanks in advance.
> >
> > Regards,
> > Aman
> >
>


Re: Regarding : Store stream for infinite time

2018-01-23 Thread Svante Karlsson
Yes, it will store the last value for each key

2018-01-23 18:30 GMT+01:00 Aman Rastogi :

> Hi All,
>
> We have a use case to store stream for infinite time (given we have enough
> storage).
>
> We are planning to solve this by Log Compaction. If each message key is
> unique and Log compaction is enabled, it will store whole stream for
> infinite time. Just wanted to check if my assumption is correct and this is
> an appropriate way to solve this.
>
> Thanks in advance.
>
> Regards,
> Aman
>


Contiguous Offsets on non-compacted topics

2018-01-23 Thread Justin Miller
Greetings, 

We’ve seen a strange situation where-in the topic is not compacted but the 
offset numbers inside the partition (#93) are not contiguous. This only happens 
once a day though, on a topic with billions of messages per day.

next offset = 1786997223
next offset = 1786997224
next offset = 1786997226
next offset = 1786997227
next offset = 1786997228

I was wondering if this still holds with Kafka 0.10, 0.11, 1.0: 
http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets 


Specifically: “In Kafka 0.8, each message is assigned a monotonically 
increasing, contiguous sequence number per partition,starting with 1.”

We’re on Kafka 1.0 with logs at version 0.9.0.0.

Thanks!
Justin

What is the best way to re-key a KTable?

2018-01-23 Thread Dmitry Minkovsky
KStream has a simple `#selectKey()` method, but it appears the only way to
re-key a KTable is by doing `.toStream(mapper).groupByKey().reduce()`. Is
this correct? I'm guessing this is because an attempt to re-key a table
might result in multiple values at the new key.


Kafka/zookeeper logs in every command

2018-01-23 Thread José Ribeiro
Good morning.


I have a problem about kafka logs showing in my outputs.

When i started to work with kafka, the outputs were normal. For example:


kafka-topics.sh --list --zookeeper localhost:2181

returned

test


Now, with the same command line, i get this:


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/kafka_2.11-1.0.0/libs/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2018-01-23T11:26:53,964 INFO [ZkClient-EventThread-12-localhost:2181] 
org.I0Itec.zkclient.ZkEventThread - Starting ZkClient event thread.
2018-01-23T11:26:53,973 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2018-01-23T11:26:53,974 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:host.name=ubuntu
2018-01-23T11:26:53,974 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:java.version=1.8.0_151
2018-01-23T11:26:53,974 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:java.vendor=Oracle Corporation
2018-01-23T11:26:53,975 INFO [main] org.apache.zookeeper.ZooKeeper - Client 
environment:java.home=/usr/lib/jvm/java-8-oracle/jre
2018-01-23T11:26:53,975 INFO [main] org.apache.zookeeper.ZooKeeper - Client 

Re: Merging Two KTables

2018-01-23 Thread Dmitry Minkovsky
> Merging two tables does not make too much sense because each table might
contain an entry for the same key. So it's unclear, which of both values
the merged table should contain.

Which of both values should the table contain? Seems straightforward: it
should contain the value with the highest timestamp, with non-deterministic
behavior when two timestamps are the same.


ср, 26 июля 2017 г. в 9:42, Matthias J. Sax :

> Merging two tables does not make too much sense because each table might
> contain an entry for the same key. So it's unclear, which of both values
> the merged table should contain.
>
> KTable.toStream() is just a semantic change and has no runtime overhead.
>
> -Matthias
>
>
> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> > Hi,
> >
> > Is there a way I can merge two KTables just like I have in KStreams api.
> > KBuilder.merge().
> >
> > I understand I can use KTable.toStream(), if I choose to use it, is there
> > any performance cost associated with this conversion or is it just a API
> > conversion.
> >
> > -Sameer.
> >
>
>


bug config port / listeners

2018-01-23 Thread Elias Abacioglu
Hi

I just upgraded one node from kafka 0.10.1 to 1.0.0.
And I've discovered a bug.

Having this config

listeners=PLAINTEXT://0.0.0.0:6667

and NOT setting port, will result to port using it's default 9092.
And when the broker starts, it advertises wrong port:
[2018-01-23 12:08:56,694] INFO Registered broker 10321017 at path
/brokers/ids/10321017 with addresses:
EndPoint(kafka01.se-ix.delta.prod,9092,ListenerName(PLAINTEXT),PLAINTEXT)
(kafka.utils.ZkUtils)

And documentations states this about *port*
DEPRECATED: only used when `listeners` is not set. Use `listeners` instead.
the port to listen and accept connections on

So either the broker is doing it wrong, or the documentation is.

Regards,
Elias Abacioglu