Re: New Producer - ONLY sync mode?

2015-02-02 Thread Otis Gospodnetic
Hi, Thanks for the info. Here's the use case. We have something up stream sending data, say a log shipper called X. It sends it to some remote component Y. Y is the Kafka Producer and it puts data into Kafka. But Y needs to send a reply to X and tell it whether it successfully put all its

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Mathias Söderberg
regards, Mathias -- Thanks, Neha -- Thanks, Neha kafka-0.8.1.1-20150202.hprof.gz Description: GNU Zip compressed data kafka-0.8.2.0-20150202.hprof.gz Description: GNU Zip compressed data

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-02 Thread Joe Stein
Huzzah! Thanks Jun for preparing the release candidates and getting this out to the community. - Joe Stein On Mon, Feb 2, 2015 at 2:27 PM, Jun Rao j...@confluent.io wrote: The following are the results of the votes. +1 binding = 3 votes +1 non-binding = 1 votes -1 = 0 votes 0 = 0 votes

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
If I understood the code and Jay correctly - if you wait for the future it will be a similar delay to that of the old sync producer. Put another way, if you test it out and see longer delays than the sync producer had, we need to find out why and fix it. Gwen On Mon, Feb 2, 2015 at 1:27 PM,

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Mathias Söderberg
Jun, Yeah, sure, I'll take it for a spin tomorrow. On Mon Feb 02 2015 at 11:08:42 PM Jun Rao j...@confluent.io wrote: Mathias, Thanks for the info. I took a quick look. The biggest difference I saw is the org.xerial.snappy.SnappyNative.rawCompress() call. In 0.8.1.1, it uses about 0.05% of

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
Can Y have a callback that will handle the notification to X? In this case, perhaps Y can be async and X can buffer the data until the callback triggers and says all good (or resend if the callback indicates an error) On Mon, Feb 2, 2015 at 12:56 PM, Otis Gospodnetic otis.gospodne...@gmail.com

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Jay Kreps
Yeah as Gwen says there is no sync/async mode anymore. There is a new configuration which does a lot of what async did in terms of allowing batching: batch.size - This is the target amount of data per partition the server will attempt to batch together. linger.ms - This is the time the producer

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jun Rao
Mathias, Thanks for the info. I took a quick look. The biggest difference I saw is the org.xerial.snappy.SnappyNative.rawCompress() call. In 0.8.1.1, it uses about 0.05% of the CPU. In 0.8.2.0, it uses about 0.10% of the CPU. We did upgrade snappy from 1.0.5 in 0.8.1.1 to 1.1.1.6 in 0.8.2.0.

Re: Any Java 7 compatibility issues for 0.8.1.1?

2015-02-02 Thread Mark Reddy
I don't think there are any issues. +1, I've been running Kafka with Java 7 for quite some time now and haven't experienced any issues. Regards, Mark On 2 February 2015 at 19:09, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I don't think there are any issues. We use 0.8.1.1 under

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-02 Thread Neha Narkhede
Great! Thanks Jun for helping with the release and everyone involved for your contributions. On Mon, Feb 2, 2015 at 1:32 PM, Joe Stein joe.st...@stealth.ly wrote: Huzzah! Thanks Jun for preparing the release candidates and getting this out to the community. - Joe Stein On Mon, Feb 2,

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Pradeep Gollakota
This is a great question Otis. Like Gwen said, you can accomplish Sync mode by setting the batch size to 1. But this does highlight a shortcoming of the new producer API. I really like the design of the new API and it has really great properties and I'm enjoying working with it. However, once API

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
I've been thinking about that too, since both Flume and Sqoop rely on send(List) API of the old API. I'd like to see this API come back, but I'm debating how we'd handle errors. IIRC, the old API would fail an entire batch on a single error, which can lead to duplicates. Having N callbacks lets

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Pradeep Gollakota
I looked at the newly added batch API to Kinesis for inspiration. The response on the batch put is a list of message-ids and their status (offset if success else a failure code). Ideally, I think the server should fail the entire batch or succeed the entire batch (i.e. no duplicates), but this is

Any Java 7 compatibility issues for 0.8.1.1?

2015-02-02 Thread Yury Ruchin
Hello, I wonder if there are any known issues with running Kafka 0.8.1.1 against Oracle JDK 7? Any unsupported JVM options in startup scripts, runtime issues, etc.? I'm trying to understand how easy Kafka migration from JDK 6 to 7 would be. Thanks, Yury

Re: create topic does not really executed successfully

2015-02-02 Thread Gwen Shapira
IIRC, the directory is only created after you send data to the topic. Do you get errors when your producer sends data? Another common issue is that you specify replication-factor 3 when you have fewer than 3 brokers. Gwen On Mon, Feb 2, 2015 at 2:34 AM, Xinyi Su xiny...@gmail.com wrote: Hi,

create topic does not really executed successfully

2015-02-02 Thread Xinyi Su
Hi, I am using Kafka_2.9.2-0.8.2-beta. When I use kafka-topic.sh to create topic, I observed sometimes the topic is not really created successfully as the output shows in console. Below is my command line: # bin/kafka-topics.sh --zookeeper xxx:2181 --create --topic zerg.hydra --partitions 3

Re: Replication stop working

2015-02-02 Thread Jun Rao
One thing that you need to be aware is that if a broker goes down, the affected partitions will remain under replicated until the broker is restarted and catches up again. Thanks, Jun On Tue, Jan 27, 2015 at 10:59 AM, Dong, John zunhai.d...@ebay.com wrote: Hi, I am new to this forum and I

Re: Question on ETL while replau

2015-02-02 Thread Jun Rao
You can't change existing messages. You can republish messages with new fields and manually set the consumer offsets. Thanks, Jun On Thu, Jan 29, 2015 at 1:12 PM, Joshua Schumacher friedhardw...@gmail.com wrote: What's the best way to add two 'fields' to my kafka messages once they are

Re: Kafka ETL Camus Question

2015-02-02 Thread Bhavesh Mistry
Hi Jun, Thanks for info. I did not get answer to my question there so I thought I try my luck here :) Thanks, Bhavesh On Mon, Feb 2, 2015 at 9:46 PM, Jun Rao j...@confluent.io wrote: You can probably ask the Camus mailing list. Thanks, Jun On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh

Re: WARN Error in I/O with NetworkReceive.readFrom(NetworkReceive.java

2015-02-02 Thread Jun Rao
Any error on the broker log? Thanks, Jun On Wed, Jan 28, 2015 at 8:36 AM, Dillian Murphey crackshotm...@gmail.com wrote: Running the performance test. What is the nature of this error?? I'm running a very high end cluster on aws. Tried this even within the same subnet on aws.

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jaikiran Pai
On Monday 02 February 2015 11:03 PM, Jun Rao wrote: Jaikiran, The fix you provided in probably unnecessary. The channel that we use in SimpleConsumer (BlockingChannel) is configured to be blocking. So even though the read from the socket is in a loop, each read blocks if there is no bytes

Re: create topic does not really executed successfully

2015-02-02 Thread Harsha
Xinyl, Do you have any logs when the kafka-topics.sh unable to create topic dirs. Apart from this make sure you point to a different dir other than /tmp/kafka-logs since this dir gets delete when your machine restarts and not a place to store your topic data.

Re: Consumer not getting data when there is a big lag in the topic

2015-02-02 Thread Guozhang Wang
Dinesh, I took a look at your logs, first it seems error_logs_kafka_request.log is also from the consumer side, not the server side. And the error logs are pointing to an EOF on the server side while reading the data, one possibility is that your socket buffer size is configured to be not as

Re: Consumers closing sockets abruptly?

2015-02-02 Thread Jun Rao
Is there another broker running on that ip? If the replication factor is larger than 1, the follower will be fetching data from the leader just like a regular consumer. Thanks, Jun On Tue, Jan 27, 2015 at 9:52 AM, Scott Reynolds sreyno...@twilio.com wrote: On my brokers I am seeing this error

subscribe email

2015-02-02 Thread Xinyi Su

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
Actually that fetch call blocks on the server side. That is, if there is no data, the server will wait until data arrives or the timeout occurs to send a response. This is done to help simplify the client development. If that isn't happening it is likely a bug or a configuration change in the

Re: Kafka ETL Camus Question

2015-02-02 Thread Pradeep Gollakota
Hi Bhavesh, At Lithium, we don't run Camus in our pipelines yet, though we plan to. But I just wanted to comment regarding speculative execution. We have it disabled at the cluster level and typically don't need it for most of our jobs. Especially with something like Camus, I don't see any need

Re: create topic does not really executed successfully

2015-02-02 Thread Xinyi Su
Hi, Harsha I have not collected logs during creating topics. I will continue to pay attention to this issue and collect logs if it occurs again. Thanks. Xinyi On 3 February 2015 at 12:46, Harsha st...@harsha.io wrote: Xinyl, Do you have any logs when the kafka-topics.sh unable

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Otis Gospodnetic
Hi, Nope, unfortunately it can't do that. X is a remote app, doesn't listen to anything external, calls Y via HTTPS. So X has to decide what to do with its data based on Y's synchronous response. It has to block until Y responds. And it wouldn't be pretty, I think, because nobody wants to run

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
Ah, yeah, you're right. That is just wait time not CPU time. We should check that profile it must be something else on the list. -Jay On Mon, Feb 2, 2015 at 9:33 AM, Jun Rao j...@confluent.io wrote: Hi, Mathias, From the hprof output, it seems that the top CPU consumers are socketAccept()

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jun Rao
Hi, Mathias, From the hprof output, it seems that the top CPU consumers are socketAccept() and epollWait(). As far as I am aware, there hasn't been any significant changes in the socket server code btw 0.8.1 and 0.8.2. Could you run the same hprof test on 0.8.1 so that we can see the difference?

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-02 Thread Jay Kreps
Yay! -Jay On Mon, Feb 2, 2015 at 2:23 PM, Neha Narkhede n...@confluent.io wrote: Great! Thanks Jun for helping with the release and everyone involved for your contributions. On Mon, Feb 2, 2015 at 1:32 PM, Joe Stein joe.st...@stealth.ly wrote: Huzzah! Thanks Jun for preparing the

Re: create topic does not really executed successfully

2015-02-02 Thread Harsha
Xinyl, Do you have any logs when the kafka-topics.sh unable to create topic dirs. Apart from this make sure you point to a different dir other than /tmp/kafka-logs since this dir gets delete when your machine restarts and not a place to store your topic data.

Re: Detecting lost connection in high level consumer

2015-02-02 Thread Jun Rao
That's actually how consumer.timeout.ms works. The iterator only gets an exception is there is no message after the timeout. The error handling of the connection is done in the consumer client library for you. Thanks, Jun On Wed, Jan 28, 2015 at 6:21 PM, harikiran harihawk...@gmail.com wrote:

Re: Can't start Zookeeper on a EC2 instance in a public subnet

2015-02-02 Thread Jun Rao
It seems there is another process using the Zookeeper port. Thanks, Jun On Thu, Jan 29, 2015 at 11:57 AM, Su She suhsheka...@gmail.com wrote: Hello Everyone, I previously had my EC2 instances in a private subnet, but I spun up a new cluster in a public subnet. However, it seems to have

Re: Kafka ETL Camus Question

2015-02-02 Thread Jun Rao
You can probably ask the Camus mailing list. Thanks, Jun On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Team or Linked-In Team, I would like to know if you guys run Camus ETL job with speculative execution true or false. Does it make sense to

New Producer - ONLY sync mode?

2015-02-02 Thread Otis Gospodnetic
Hi, Is the plan for New Producer to have ONLY async mode? I'm asking because of this info from the Wiki: - The producer will always attempt to batch data and will always immediately return a SendResponse which acts as a Future to allow the client to await the completion of the

Re: Any Java 7 compatibility issues for 0.8.1.1?

2015-02-02 Thread Otis Gospodnetic
I don't think there are any issues. We use 0.8.1.1 under Oracle Java 7. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Feb 2, 2015 at 5:02 AM, Yury Ruchin yuri.ruc...@gmail.com wrote: Hello, I wonder

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Gwen Shapira
If you want to emulate the old sync producer behavior, you need to set the batch size to 1 (in producer config) and wait on the future you get from Send (i.e. future.get) I can't think of good reasons to do so, though. Gwen On Mon, Feb 2, 2015 at 11:08 AM, Otis Gospodnetic

Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-02 Thread Jun Rao
The following are the results of the votes. +1 binding = 3 votes +1 non-binding = 1 votes -1 = 0 votes 0 = 0 votes The vote passes. I will release artifacts to maven central, update the dist svn and download site. Will send out an announce after that. Thanks everyone that contributed to the

how to fetch old message from kafka

2015-02-02 Thread Snehalata Nagaje
Hi , We are using kafka for storing messages in chat application. Currently we divided each topic in multiple partitions. each partition stores data for given customer who uses the application. Right now on very first request, application fetches log from kafka from earliest valid offset to