Re: Timestamps unique?

2022-01-13 Thread Svante Karlsson
No guarantee, /svante Den tors 13 jan. 2022 kl 20:21 skrev Chad Preisler : > > Hello, > > For ConsumerRecord.timestamp() is the timestamp guaranteed to be > unique within the topic's partition, or can there be records inside the > topics partition that have the same timestamp? > > Thanks. > Chad

Re: guidelines for replacing a lost Kafka Broker

2019-09-13 Thread Svante Karlsson
Just bring a new broker up and give it the id of the lost one. It will sync itself /svante Den fre 13 sep. 2019 kl 13:51 skrev saurav suman : > Hi, > > When the old data is lost and another broker is added to the cluster then > it is a new fresh broker with no data. You can reassign the

Re: Same key found on 2 different partitions of compacted topic (kafka-streams)

2019-05-17 Thread Svante Karlsson
Yes that sound likely, if you changed the number of partitions then the hashing of the key's will change destination. You need to either clear the data (ie change retention to very small and roll the logs) or recreate the topic. /svante Den fre 17 maj 2019 kl 12:32 skrev Nitay Kufert : > I

Re: Streaming Data

2019-04-09 Thread Svante Karlsson
I would stream to influxdb and visualize with grafana. Works great for dashboards. But I would rethink your line format. It's very convenient to have tags (or labels) that are key/value pair on each metric if you ever want to aggregate over a group of similar metrics. Svante

Re: Kafka Deplopyment Using Kubernetes (on Cloud) - settings for log.dirs

2018-10-22 Thread Svante Karlsson
Different directories, they cannot share path. A broker will delete everything under the log directory that it does not know about Den mån 22 okt. 2018 kl 17:47 skrev M. Manna : > Hello, > > We are thinking of rolling out Kafka on Kubernetes deployed on public cloud > (AWS or GCP, or other). We

Re: Have connector be paused from start

2018-09-28 Thread Svante Karlsson
Sound like a workflow/pipeline thing in jenkins (or equivalent) to me. Den ons 26 sep. 2018 kl 17:27 skrev Rickard Cardell : > Hi > Is there a way to have a Kafka Connect connector begin in state 'PAUSED'? > I.e I would like to have the connector set to paused before it can process > any data

Re: Low level kafka consumer API to KafkaStreams App.

2018-09-13 Thread Svante Karlsson
You are doing something wrong if you need 10k threads to produce 800k messages per second. It feels you are a factor of 1000 off. What size are your messages? On Thu, Sep 13, 2018, 21:04 Praveen wrote: > Hi there, > > I have a kafka application that uses kafka consumer low-level api to help >

Re: Reliability against rack failure

2018-08-05 Thread Svante Karlsson
solution, but for this specific deployment adding a rack > is out of question. > Is there a way to resolve this with 2 racks ? > > Regards, > Sanjay > > On 05/08/18, 11:57 PM, "Svante Karlsson" wrote: > > >3 racks, Replication Factor = 3, min.insync.replicas=2,

Re: Reliability against rack failure

2018-08-05 Thread Svante Karlsson
3 racks, Replication Factor = 3, min.insync.replicas=2, ack=all 2018-08-05 20:21 GMT+02:00 Sanjay Awatramani : > Hi, > > I have done some experiments and gone through kafka documentation, which > makes me conclude that there is a small chance of data loss or availability > in a rack scenario.

Re: log compaction v log rotation - best of the two worlds

2018-03-21 Thread Svante Karlsson
alt1) if you can store a generation counter in the value of the "latest value" topic you could do as follows topic latest_value key [id] topic full_history key[id, generation] on delete get the latest_value.generation_counter and issue deletes on full_history key[id, 0..generation_counter]

Re: Suggestion over architecture

2018-03-10 Thread Svante Karlsson
ly better to check that before sending > message to our infrastructure side, but the webapp is unaware if it allowed > or not ... > > > > thank for your reply  > > Adrien > > > De : Svante Karlsson <svante.karls...@csi.se> > Envoyé : sam

Re: Suggestion over architecture

2018-03-10 Thread Svante Karlsson
You do not want to expose the kafka instance to your different clients. put some api endpoint between. rest/grpc or whatever. 2018-03-10 19:01 GMT+01:00 Nick Vasilyev : > Hard to say without more info, but why not just deploy something like a > REST api and expose it to

Re: Consultant Help

2018-03-02 Thread Svante Karlsson
try https://www.confluent.io/ - that's what they do /svante 2018-03-02 21:21 GMT+01:00 Matt Stone : > We are looking for a consultant or contractor that can come onsite to our > Ogden, Utah location in the US, to help with a Kafka set up and maintenance > project. What we

Re: Hardware Guidance

2018-03-01 Thread Svante Karlsson
It's per broker. Usually you run with 4-6GB of java heap. The rest is used as disk cache and it's more that 64GB seems like a sweet spot between memory cost and performance. /svante 2018-03-01 18:30 GMT+01:00 Michal Michalski : > I'm quite sure it's per broker (it's

Re: Regarding : Store stream for infinite time

2018-01-23 Thread Svante Karlsson
Yes, it will store the last value for each key 2018-01-23 18:30 GMT+01:00 Aman Rastogi : > Hi All, > > We have a use case to store stream for infinite time (given we have enough > storage). > > We are planning to solve this by Log Compaction. If each message key is >

Re: Kafka Replication Factor

2018-01-17 Thread Svante Karlsson
whats your config for min.insync.replicas? 2018-01-17 13:37 GMT+01:00 Sameer Kumar : > Hi, > > I have a cluster of 3 Kafka brokers, and replication factor is 2. This > means I can tolerate failure of 1 node without data loss. > > Recently, one of my node crashed and some

Re: one machine that have four network.....

2018-01-16 Thread Svante Karlsson
Even if you bind your socket to an ip of a specific card, when the packet is about to leave your host it hits the routing table and gets routed through the interface with least cost (arbitrary but static since all interfaces have same cost since they are on the same subnet) thus you will not reach

Re: Broker won't exit...

2018-01-10 Thread Svante Karlsson
if you really want all the brokers to die, try change server.properties controlled.shutdown.enable=false I had a similar problem on dev laptop with a single broker. It refused to die on system shutdowns (or took a very long time). 2018-01-10 12:57 GMT+01:00 Ted Yu : >

Re: Multiple brokers - do they share the load?

2017-11-28 Thread Svante Karlsson
You are connecting to a single seed node - your kafka library will then under the hood connect to the partition leaders for each partition you subscribe or post to. The load is not different compared to if you gave all nodes as connect parameter. However if your seed node crashes then your client

Re: Building news feed of social app using kafka

2017-11-01 Thread Svante Karlsson
Nope, that's the wrong design. It does not scale. You would end up in a wide and shallow thing. To few messages per partition to make sense. You want many thousands per partition per second to amortize the consumer to broker round-trip. On Nov 1, 2017 21:12, "Anshuman Ghosh"

Re: Kafka Streams Avro SerDe version/id caching

2017-10-03 Thread Svante Karlsson
I've implemented the same logic for a c++ client - caching is the only way to go since the performance impact of not doing it would be to big. So bet on caching on all clients. 2017-10-03 18:12 GMT+02:00 Damian Guy : > If you are using the confluent schema registry then the

Re: Is there a way in increase number of partitions

2017-08-21 Thread Svante Karlsson
Short answer - you cannot. The existing data is not reprocessed since kafka itself has no knowledge on how you did your partitioning. The normal workaround is that you stop producers and consumers. Create a new topic with the desired number of partitions. Consume the old topic from beginning and

Re: Kafka rack-id and min in-sync replicas

2017-08-20 Thread Svante Karlsson
I think you are right, The rack awareness is used to spread the partitions on creation, assignment -etc so get as many racks as your replication count. /svante 2017-08-20 13:33 GMT+02:00 Carl Samuelson : > Hi > > I asked this question on SO here: >

Re: Using JMXMP to access Kafka metrics

2017-07-19 Thread Svante Karlsson
I've used jolokia which gets JMX metrics without RMI (actually json over http) https://jolokia.org/ Integrates nicely with telegraf (and influxdb) 2017-07-19 20:47 GMT+02:00 Vijay Prakash < vijay.prak...@microsoft.com.invalid>: > Hey, > > Is there a way to use JMXMP instead of RMI to access

Re: Issue in Kafka running for few days

2017-04-30 Thread Svante Karlsson
KA%20AND%20text%20~%20%22ZK%20expired%3B%20shut%20down%20all%20controller%22 > > searching for text like "ZK expired; shut down all controller" or "No > broker in ISR is alive for" or other interesting events form the log. > > Hope that helps, > Michal &g

Re: Issue in Kafka running for few days

2017-04-26 Thread Svante Karlsson
You are not supposed to run an even number of zookeepers. Fix that first On Apr 26, 2017 20:59, "Abhit Kalsotra" wrote: > Any pointers please > > > Abhi > > On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra > wrote: > > > Hi * > > > > My kafka setup >

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Svante Karlsson
What kind of disk are you using for the rocksdb store? ie spinning or ssd? 2016-11-25 12:51 GMT+01:00 Damian Guy : > Hi Frank, > > Is this on a restart of the application? > > Thanks, > Damian > > On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu wrote: > > > Hi

Re: kafka connect(copycat) question

2015-12-03 Thread Svante Karlsson
Hi, I tried building this today and the problem seems to remain. /svante [INFO] Building kafka-connect-hdfs 2.0.0-SNAPSHOT [INFO] Downloading:

Re: Locality question

2015-11-12 Thread Svante Karlsson
If you have a kafka partition that is replicated to 3 nodes the partition varies (in time) thus making the colocation pointless. You can only produce and consume to/from the leader. /svante 2015-11-12 9:00 GMT+01:00 Young, Ben : > Hi, > > Any thoughts on this? Perhaps

Re: How to correctly handle offsets?

2015-06-01 Thread svante karlsson
1) correlationId is just a number that you get back in your reply. you can safely set it to anything. If you have some kind of call identification is your system that you want to trace through logs - this is what you would use. 2) You can safely use any external offset management you like. just

Re: Kafka still aware of old zookeeper nodes

2015-04-30 Thread svante karlsson
Have you changed zookeeper.connect= in server.properties. A better procedure for replacing zookeeper nodes would be to shutdown one and install the new one with the same ip. This can easily be done to a running cluster. /svante 2015-04-30 20:08 GMT+02:00 Dillian Murphey

hive output to kafka

2015-04-28 Thread Svante Karlsson
What's the best way of exporting contents (avro encoded) from hive queries to kafka? Kind of camus, the other way around best regards svante

Re: Kafka Consumer

2015-03-31 Thread svante karlsson
Your consumer might belong to a consumer group. Just commit offsets to that consumer groups/topic/partition and it will work. That said - if you want to figure out the consumers groups that exists you have to look in zookeeper. There is no kafka API to get or create them. In the java client it is

Re: Producer Behavior When one or more Brokers' Disk is Full.

2015-03-26 Thread svante karlsson
4. As for recovering broker from disk full, if replication is enabled one can just bring it down (the leader of the partition will then migrate to other brokers), clear the disk space, and bring it up again; if replication is not enabled then you can first move the partitions away from this broker

Re: Kafka 0.8.2 log cleaner

2015-03-02 Thread svante karlsson
Wouldn't it be rather simple to add a retention time on deleted items ie keys with null value for topics that are compacted? The retention time would then be set to some large time to allow all consumers to understand that a previous k/v is being deleted. 2015-03-02 17:30 GMT+01:00 Ivan

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread svante karlsson
Do you have to separate the snapshot from the normal update flow. I've used a compacting kafka topic as the source of truth to a solr database and fed the topic both with real time updates and snapshots from a hive job. This worked very well. The nice point is that there is a seamless transition

Re: kafka.server.ReplicaManager error

2015-02-05 Thread svante karlsson
I believe I've had the same problem on the 0.8.2 rc2. We had a idle test cluster with unknown health status and I applied rc3 without checking if everything was ok before. Since that cluster had been doing nothing for a couple of days and the retention time was 48 hours it's reasonable to assume

Re: kafka sending duplicate content to consumer

2015-01-23 Thread svante karlsson
A kafka broker never pushes data to a consumer. It's the consumer that does a long fetch and it provides the offset to read from. The problem lies in how your consumer handles the for example 1000 messages that it just got. If you handle 500 of them and crash without committing the offsets

Re: Isr difference between Metadata Response vs /kafka-topics.sh --describe

2015-01-21 Thread svante karlsson
thanks, svante 2015-01-21 16:30 GMT+01:00 Joe Stein joe.st...@stealth.ly: Sounds like you are bumping into this https://issues.apache.org/jira/browse/KAFKA-1367

Re: How to handle broker disk failure

2015-01-21 Thread svante karlsson
data over automatically. Thanks, Jun On Tue, Jan 20, 2015 at 1:02 AM, svante karlsson s...@csi.se wrote: I'm trying to figure out the best way to handle a disk failure in a live environment. The obvious (and naive) solution is to decommission the broker and let other brokers taker

Isr difference between Metadata Response vs /kafka-topics.sh --describe

2015-01-21 Thread svante karlsson
We are running an external (like in non supported) C++ client library agains 0.8.2-rc2 and see differences in the Isr vector in Metadata Response compared to what ./kafka-topics.sh --describe returns. We have a triple replicated topic that is not updated during the test. kafka-topics.sh returns

typo in wiki

2015-01-20 Thread svante karlsson
In the wiki - there is a statement that a partition must fit on a single machine, while technically true, isn't it so that a partition must fit on a single disk on that machine. https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave ? A partition is basically a

How to handle broker disk failure

2015-01-20 Thread svante karlsson
I'm trying to figure out the best way to handle a disk failure in a live environment. The obvious (and naive) solution is to decommission the broker and let other brokers taker over and create new followers. Then replace the disk and clean the remaining log directories and add the broker again.

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
I upgrade two small test cluster and I had two small issues but I'm, not clear yet as to if those were an issue due to us using ansible to configure and deploy the cluster. The first issue could be us doing something bad when distributing the update (I updated, not reinstalled) but it should be

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
the issue reliably? Also, is what you saw an issue with the mbean itself or graphite? Thanks, Jun On Fri, Jan 16, 2015 at 4:38 AM, svante karlsson s...@csi.se wrote: I upgrade two small test cluster and I had two small issues but I'm, not clear yet as to if those were an issue due to us

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
or have to find another explanation. regard it as an observation if someone else reports issues. thanks, svante 2015-01-16 20:56 GMT+01:00 svante karlsson s...@csi.se: Jun, I don't know if it was an issue with graphite or the mbean but I have not seen it since - and we have tried

Re: how to order message between different partition

2015-01-08 Thread svante karlsson
The messages are ordered per partition. No order between partitions. If you really need ordering use one partition. 2015-01-08 9:44 GMT+01:00 YuanJia Li yuanjia8...@163.com: Hi all, I have a topic with 3 partitions, and each partion has its sequency in kafka. How to order message between

Re: mirrormaker tool in 0.82beta

2015-01-07 Thread svante karlsson
No, I missed that. thanks, svante 2015-01-07 6:44 GMT+01:00 Jun Rao j...@confluent.io: Did you set offsets.storage to kafka in the consumer of mirror maker? Thanks, Jun On Mon, Jan 5, 2015 at 3:49 PM, svante karlsson s...@csi.se wrote: I'm using 0.82beta and I'm trying

mirrormaker tool in 0.82beta

2015-01-05 Thread svante karlsson
I'm using 0.82beta and I'm trying to push data with the mirrormaker tool from several remote sites to two datacenters. I'm testing this from a node containing zk, broker and mirrormaker and the data is pushed to a normal cluster. 3 zk and 4 brokers with replication. While the configuration seems

Re: Increase in Kafka replication fetcher thread not reducing log replication

2014-12-22 Thread svante karlsson
What kind of network do you have? gigabit? if so 90 MB/s would make sense Also since you have one partition what's your raw transfer speed to the disk? 90 MB/s makes sense here as well... If I were looking for rapid replica catch up I'd have at least 2x Gbit and partitioned topics spread out

Re: How do I create a consumer group

2014-12-16 Thread svante karlsson
Yes - see the offsets.topic.num.partitions and offsets.topic.replication.factor broker configs. Joel, that exactly what I was looking for. I'll look into that and the source for OffsetsMessageFormatter later today! thanks svante

Re: How do I create a consumer group

2014-12-15 Thread svante karlsson
, Jun On Fri, Dec 12, 2014 at 2:45 AM, svante karlsson s...@csi.se wrote: Disregard the creation question - we must have done something wrong because now our code is working without obvious changes (on another set of brokers). However it turns out to be difficult to know the existing

Re: How do I create a consumer group

2014-12-12 Thread svante karlsson
in any way or is there a better way of listing the existing group names? svante 2014-12-11 20:59 GMT+01:00 svante karlsson s...@csi.se: We're using 0.82 beta and a homegrown c++ async library based on boost asio that has support for the offset api. (apikeys OffsetCommitRequest = 8

Re: How do I create a consumer group

2014-12-12 Thread svante karlsson
If I understand KAFKA-1476 it is only a command line tool that gives access by using ZKUtils not an API to Kafka. We're looking for a Kafka API so I guess that this functionality is missing. thanks for the pointer Svante Karlsson 2014-12-12 19:03 GMT+01:00 Jiangjie Qin j

How do I create a consumer group

2014-12-11 Thread svante karlsson
We're using 0.82 beta and a homegrown c++ async library based on boost asio that has support for the offset api. (apikeys OffsetCommitRequest = 8, OffsetFetchRequest = 9, ConsumerMetadataRequest = 10) If we use a java client and commit an offset then the consumer group shows up in the response

Re: KafkaException: Should not set log end offset on partition

2014-12-04 Thread svante karlsson
, 2014 at 5:54 AM, svante karlsson s...@csi.se wrote: I've installed (for ansible scripting testing purposes) 3 VM's each containing kafka zookeeer clustered together Ubuntu 14.04 Zookeepers are 3.4.6 and kafka 2.11-0.8.2-beta The kafka servers have broker id's 2, 4, 6 The zookeepers

Re: KafkaException: Should not set log end offset on partition

2014-12-03 Thread svante karlsson
I found some logs like this before everything started to go wrong ... [2014-12-02 07:08:11,722] WARN Partition [test3,13] on broker 2: No checkpointed highwatermark is found for partition [test3,7] (kafka.cluster.Partition) [2014-12-02 07:08:11,722] WARN Partition [test3,7] on broker 2: No

Re: Partition key not working properly

2014-11-25 Thread svante karlsson
By default, the partition key is used for hashing then it's placed in a partition that has the appropriate hashed keyspace. If you have three physical partitions and then give the partition key 5 it has nothing to do with physical partition 5 (that does not exist) , similar to physical: partition

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread svante karlsson
Both variants will work well (if your kafka cluster can handle the full volume of the transmitted data for the duration of the ttl on each topic) . I would run the whole thing through kafka since you will be stresstesting you production flow - consider if you at some later time lost your

Re: C/C++ kafka client API's

2014-10-14 Thread svante karlsson
Magnus, Do you have any plans to update the protocol to 0.9? I built a boost asio based version half a year ago but that did only implement v0.8 and I have not found time to upgrade it. It is a quite big job to have something equal to java high and low level API. /svante

Re: How to use RPC mechanism in Kafka?

2014-09-22 Thread svante karlsson
Wrong use-case. Kafka is a queue (in normal case a TTL (time to live) on messages). There is no correlation between producers and consumers. There is no concept of a consumed message. There is no request and no response. You can produce messages (in another topic) as result of your processing but

Re: How to use RPC mechanism in Kafka?

2014-09-22 Thread svante karlsson
should we move to some other message broker? If yes, Can you please tell me the name which is best for this use case and can handle large amount of requests? Is there any workaround in Kafka? If Yes, Please tell me. Thanks Warm Regards Lavish Goel On Mon, Sep 22, 2014 at 3:41 PM, svante

Re: How to use RPC mechanism in Kafka?

2014-09-22 Thread svante karlsson
Broker.* 1. We have to handle 30,000 TPS. 2. We need to prioritize the requests. 3. Request Data should not be lost. Thanks Regards Lavish Goel On Mon, Sep 22, 2014 at 4:20 PM, svante karlsson s...@csi.se wrote: Why do you want a message broker for RPC? What is large

Re: 答复: kafka performance question

2014-05-26 Thread svante karlsson
Do you read from the file in the callback from kafka? I just implemented c++ bindings and in one of the tests i did I got the following results: 1000 messages per batch (fairly small messages ~150 bytes) and then wait for the network layer to ack the send (not server ack)'s before putting another