Re: Dealing with large messages

2015-10-06 Thread Gwen Shapira
Storing large blobs in S3 or HDFS and placing URIs in Kafka is the most common solution I've seen in use. On Tue, Oct 6, 2015 at 8:32 AM, Joel Koshy wrote: > The best practice I think is to just put large objects in a blob store > and have messages embed references to those

Re: Dealing with large messages

2015-10-06 Thread Joel Koshy
The best practice I think is to just put large objects in a blob store and have messages embed references to those blobs. Interestingly we ended up having to implement large-message-support at LinkedIn but for various reasons were forced to put messages inline (i.e., against the above

log compaction scaling with ~100m messages

2015-10-06 Thread Feroze Daud
hi! We have a use case where we want to store ~100m keys in kafka. Is there any problem with this approach? I have heard from some people using kafka, that kafka has a problem when doing log compaction with those many number of keys. Another topic might have around 10 different K/V pairs for

Re: Datacenter to datacenter over the open internet

2015-10-06 Thread Pradeep Gollakota
At Lithium, we have multiple datacenters and we distcp our data across our Hadoop clusters. We have 2 DCs in NA and 1 in EU. We have a non-redundant direct connect from our EU cluster to one of our NA DCs. If and when this fails, we have automatic failover to a VPN that goes over the internet. The

Re: Dealing with large messages

2015-10-06 Thread Pradeep Gollakota
Thanks for the replies! I was rather hoping not to have to implement a side channel solution. :/ If we have to do this, we may use an HBase table with a TTL the same as our topic so the large objects are "gc'ed"... thoughts? On Tue, Oct 6, 2015 at 8:45 AM, Gwen Shapira

Partition ownership with high-level consumer

2015-10-06 Thread Joey Echeverria
Hi! Is there a way to track current partition ownership when using the high-level consumer? It looks like the rebalance callback only tells me the partitions I'm (potentially) losing. -Joey

Re: [ANNOUCE] Apache Kafka 0.8.2.2 Released

2015-10-06 Thread Ismael Juma
On Sat, Oct 3, 2015 at 4:36 PM, Jun Rao wrote: > > We will update the download link in our website shortly. > The download page has been updated: http://kafka.apache.org/downloads.html Ismael

Re: Partition ownership with high-level consumer

2015-10-06 Thread Gwen Shapira
Zookeeper will have this information under /consumers//owners On Tue, Oct 6, 2015 at 12:22 PM, Joey Echeverria wrote: > Hi! > > Is there a way to track current partition ownership when using the > high-level consumer? It looks like the rebalance callback only tells me the >

Re: Partition ownership with high-level consumer

2015-10-06 Thread Joey Echeverria
But nothing in the API? -Joey On Tue, Oct 6, 2015 at 3:43 PM, Gwen Shapira wrote: > Zookeeper will have this information under /consumers//owners > > > > On Tue, Oct 6, 2015 at 12:22 PM, Joey Echeverria wrote: > > > Hi! > > > > Is there a way to track

Re: Partition ownership with high-level consumer

2015-10-06 Thread Gwen Shapira
I don't think so. AFAIK, even the new API won't send this information to every consumer, because in some cases it can be huge. On Tue, Oct 6, 2015 at 1:44 PM, Joey Echeverria wrote: > But nothing in the API? > > -Joey > > On Tue, Oct 6, 2015 at 3:43 PM, Gwen Shapira

Datacenter to datacenter over the open internet

2015-10-06 Thread Tom Brown
Hello, How do you consume a kafka topic from a remote location without a dedicated connection? How do you protect the server? The setup: data streams into our datacenter. We process it, and publish it to a kafka cluster. The consumer is located in a different datacenter with no direct

Re: Call to Future.get() hangs for producer

2015-10-06 Thread Guozhang Wang
Jason, What is the config values for your producer, especially "acks"? And what is the replication scheme you were using on the broker side? Guozhang On Tue, Oct 6, 2015 at 6:25 AM, Jason Kania wrote: > Hello, > I am using 8.2.1 and getting a scenario where my send

Dual commit with zookeeper and kafka

2015-10-06 Thread Рябков Алексей Николаевич
Hello! Can somebody explain me how to use multiple consumers with different commit storage... For example, java-based consumers use kafka commit storage... python-based consumers use zookeeper commit storage My question is: Is it true that when one consumer commit to kafka, server also commit

Re: How to verify offsets topic exists?

2015-10-06 Thread Stevo Slavić
Thanks Grant for quick reply! I've used AdminUtils.topicExists("__consumer_offsets") check and even 10sec after Kafka broker startup, the check fails. When, on which event, does this internal topic get created? Is there some broker config property preventing it from being created? Does one have

mapping events to topics

2015-10-06 Thread Mark Drago
Hello, At my organization we are already using kafka in a few areas, but we're looking to expand our use and we're struggling with how best to distribute our events on to topics. We have on the order of 30 different kinds of events that we'd like to distribute via kafka. We have one or two

Re: mapping events to topics

2015-10-06 Thread Gwen Shapira
I usually approach this questions by looking at possible consumers. You usually want each consumer to read from relatively few topics, use most of the messages it receives and have fairly cohesive logic for using these messages. Signs that things went wrong with too few topics: * Consumers that

Re: Datacenter to datacenter over the open internet

2015-10-06 Thread Gwen Shapira
You can configure "advertised.host.name" for each broker, which is the name external consumers and producers will use to refer to the brokers. On Tue, Oct 6, 2015 at 3:31 PM, Tom Brown wrote: > Hello, > > How do you consume a kafka topic from a remote location without a

Re: How to verify offsets topic exists?

2015-10-06 Thread Stevo Slavić
Debugged, and found in KafkaApis.handleConsumerMetadataRequest that consumer offsets topic gets created on first lookup of offsets topic metadata, even when auto topic creation is disabled. In that method there is following call: // get metadata (and create the topic if necessary) val

Re: Partition ownership with high-level consumer

2015-10-06 Thread Joey Echeverria
I really only want them for the partitions I own. The client should know that in order to acquire the zookeeper locks and could potentially execute a callback to tell me the partitions I own after a rebalance. -Joey On Tue, Oct 6, 2015 at 4:08 PM, Gwen Shapira wrote: > I

Call to Future.get() hangs for producer

2015-10-06 Thread Jason Kania
Hello, I am using 8.2.1 and getting a scenario where my send works fine but the subsequent call to Future.get() does not return - it hangs for at least 5 minutes. When I kill the client with the producer I get a Connection reset by peer message in the server.log. I am not sure what to check to