[jira] [Commented] (KAFKA-2750) Sender.java: handleProduceResponse does not check protocol version
[ https://issues.apache.org/jira/browse/KAFKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139784#comment-15139784 ] Felix GV commented on KAFKA-2750: - Shouldn't the producer automatically do a graceful degradation of its protocol, without even bubbling anything up to the user? > Sender.java: handleProduceResponse does not check protocol version > -- > > Key: KAFKA-2750 > URL: https://issues.apache.org/jira/browse/KAFKA-2750 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Geoff Anderson > > If you try run an 0.9 producer against 0.8.2.2 kafka broker, you get a fairly > cryptic error message: > [2015-11-04 18:55:43,583] ERROR Uncaught error in kafka producer I/O thread: > (org.apache.kafka.clients.producer.internals.Sender) > org.apache.kafka.common.protocol.types.SchemaException: Error reading field > 'throttle_time_ms': java.nio.BufferUnderflowException > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:462) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:279) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:141) > Although we shouldn't expect an 0.9 producer to work against an 0.8.X broker > since the protocol version has been increased, perhaps the error could be > clearer. > The cause seems to be that in Sender.java, handleProduceResponse does not to > have any mechanism for checking the protocol version of the received produce > response - it just calls a constructor which blindly tries to grab the > throttle time field which in this case fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Merge improvements back into Kafka Metrics?
Hi, We've been using the new Kafka Metrics within Voldemort for a little while now, and we have made some improvements to the library that you might like to copy back into Kafka proper. You can view the changes that went in after we forked here: https://github.com/tehuti-io/tehuti/commits/master The most critical ones are probably these two: - A pretty simpe yet nasty bug in Percentile that pretty much made Histograms useless otherwise: https://github.com/tehuti-io/tehuti/commit/913dcc0dcc79e2ce87a4c3e52a1affe2aaae9948 - A few improvements to SampledStat (unfortunately littered across several commits) were made to prevent spurious values from being measured out of a disproportionately small time window (either initally, or because all windows expired in the case of an infrequently used stat) : https://github.com/tehuti-io/tehuti/blob/master/src/main/java/io/tehuti/metrics/stats/SampledStat.java There were other minor changes here and there, to make the APIs more usable (IMHO) though that may be a matter of personal taste more than correctness. If you're interested in the above changes, I could put together a patch and file a JIRA. Or someone else can do it if they prefer. On an unrelated note, if you do merge the changes back into Kafka, it would be nice if you considered releasing kafka-metrics as a standalone artifact. Voldemort could depend on kafka-metrics rather than tehuti if it was fixed properly, but it would be a stretch for Voldemort to depend on all of Kafka (or even Kafka clients...). The fork was just to iterate quicker at the time we needed this, but it would be nice to bring it back together. Let me know if I can help in any way. -- *Felix GV* Senior Software Engineer Data Infrastructure LinkedIn f...@linkedin.com linkedin.com/in/felixgv
RE: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker
Next time the protocol is evolved and new error codes can be introduced, would it make sense to add a new one called Deprecated (or Deprecation or DeprecatedOperation or whatever sounds best)? This would act as a more precise form of Unknown error. It could help identify what the problem is more easily when debugging clients. Of course, this is the kind of lever one would prefer never pulling, but when you need it, you're better off having it than not, and if you end up having it and never using it, it does not do much harm either. -- Felix GV Data Infrastructure Engineer Distributed Data Systems LinkedIn f...@linkedin.commailto:f...@linkedin.com linkedin.com/in/felixgvhttp://linkedin.com/in/felixgv From: kafka-clie...@googlegroups.com [kafka-clie...@googlegroups.com] on behalf of Magnus Edenhill [mag...@edenhill.se] Sent: Thursday, January 15, 2015 10:40 AM To: dev@kafka.apache.org Cc: kafka-clie...@googlegroups.com Subject: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack1 on the broker I very much agree on what Joe is saying, let's use the version field as intended and be very strict with not removing nor altering existing behaviour without bumping the version. Old API versions could be deprecated (documentation only?) immediately and removed completely in the next minor version bump (0.8-0.9). An API to query supported API versions would be a good addition in the long run but doesn't help current clients much as such a request to an older broker version will kill the connection without any error reporting to the client, thus making it rather useless in the short term. Regards, Magnus -- You received this message because you are subscribed to the Google Groups kafka-clients group. To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscr...@googlegroups.commailto:kafka-clients+unsubscr...@googlegroups.com. To post to this group, send email to kafka-clie...@googlegroups.commailto:kafka-clie...@googlegroups.com. Visit this group at http://groups.google.com/group/kafka-clients. To view this discussion on the web visit https://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.comhttps://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout.
Re: Kafka/Hadoop consumers and producers
The contrib code is simple and probably wouldn't require too much work to fix, but it's a lot less robust than Camus, so you would ideally need to do some work to make it solid against all edge cases, failure scenarios and performance bottlenecks... I would definitely recommend investing in Camus instead, since it already covers a lot of the challenges I'm mentioning above, and also has more community support behind it at the moment (as far as I can tell, anyway), so it is more likely to keep getting improvements than the contrib code. -- Felix On Thu, Aug 8, 2013 at 9:28 AM, psaltis.and...@gmail.com wrote: We also have a need today to ETL from Kafka into Hadoop and we do not currently nor have any plans to use Avro. So is the official direction based on this discussion to ditch the Kafka contrib code and direct people to use Camus without Avro as Ken described or are both solutions going to survive? I can put time into the contrib code and/or work on documenting the tutorial on how to make Camus work without Avro. Which is the preferred route, for the long term? Thanks, Andrew On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote: Hi Andrew, Camus can be made to work without avro. You will need to implement a message decoder and and a data writer. We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two. -Ken On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote: Hi all, Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop. We don't currently use Avro and I'm not sure if we are going to. I came across this post. If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options? Thanks! -Andrew -- You received this message because you are subscribed to the Google Groups Camus - Kafka ETL for Hadoop group. To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Kafka/Hadoop consumers and producers
IMHO, I think Camus should probably be decoupled from Avro before the simpler contribs are deleted. We don't actually use the contribs, so I'm not saying this for our sake, but it seems like the right thing to do to provide simple examples for this type of stuff, no...? -- Felix On Wed, Jul 3, 2013 at 4:56 AM, Cosmin Lehene cleh...@adobe.com wrote: If the Hadoop consumer/producers use-case will remain relevant for Kafka (I assume it will), it would make sense to have the core components (kafka input/output format at least) as part of Kafka so that it could be built, tested and versioned together to maintain compatibility. This would also make it easier to build custom MR jobs on top of Kafka, rather than having to decouple stuff from Camus. Also it would also be less confusing for users at least when starting using Kafka. Camus could use those instead of providing it's own. This being said we did some work on the consumer side (0.8 and the new(er) MR API). We could probably try to rewrite them to use Camus or fix Camus or whatever, but please consider this alternative as well. Thanks, Cosmin On 7/3/13 11:06 AM, Sam Meder sam.me...@jivesoftware.com wrote: I think it makes sense to kill the hadoop consumer/producer code in Kafka, given, as you said, Camus and the simplicity of the Hadoop producer. /Sam On Jul 2, 2013, at 5:01 PM, Jay Kreps jay.kr...@gmail.com wrote: We currently have a contrib package for consuming and producing messages from mapreduce ( https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5 3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD ). We keep running into problems (e.g. KAFKA-946) that are basically due to the fact that the Kafka committers don't seem to mostly be Hadoop developers and aren't doing a good job of maintaining this code (keeping it tested, improving it, documenting it, writing tutorials, getting it moved over to the more modern apis, getting it working with newer Hadoop versions, etc). A couple of options: 1. We could try to get someone in the Kafka community (either a current committer or not) who would adopt this as their baby (it's not much code). 2. We could just let Camus take over this functionality. They already have a more sophisticated consumer and the producer is pretty minimal. So are there any people who would like to adopt the current Hadoop contrib code? Conversely would it be possible to provide the same or similar functionality in Camus and just delete these? -Jay -- You received this message because you are subscribed to the Google Groups Camus - Kafka ETL for Hadoop group. To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: kafka replication blog
Thanks Jun! I hadn't been following the discussions regarding 0.8 and replication for a little while and this was a great post to refresh my memory and get up to speed on the current replication architecture's design. -- Felix On Tue, Feb 5, 2013 at 2:21 PM, Jun Rao jun...@gmail.com wrote: I just posted the following blog on Kafka replication. This may answer some of the questions that a few people have asked in the mailing list before. http://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka Thanks, Jun
[jira] [Commented] (KAFKA-260) Add audit trail to kafka
[ https://issues.apache.org/jira/browse/KAFKA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556958#comment-13556958 ] Felix GV commented on KAFKA-260: It would be possible to have optional timestamps by using the magic byte at the beginning of the Kafka Messages, no? If the Message contains the old (current) magic byte, then there's no timestamp, if it's the new magic byte, then there is a timestamp (without needing a key) somewhere in the header... Add audit trail to kafka Key: KAFKA-260 URL: https://issues.apache.org/jira/browse/KAFKA-260 Project: Kafka Issue Type: New Feature Affects Versions: 0.8 Reporter: Jay Kreps Assignee: Jay Kreps Attachments: kafka-audit-trail-draft.patch, Picture 18.png LinkedIn has a system that does monitoring on top of our data flow to ensure all data is delivered to all consumers of data. This works by having each logical tier through which data passes produce messages to a central audit-trail topic; these messages give a time period and the number of messages that passed through that tier in that time period. Example of tiers for data might be producer, broker, hadoop-etl, etc. This makes it possible to compare the total events for a given time period to ensure that all events that are produced are consumed by all consumers. This turns out to be extremely useful. We also have an application that balances the books and checks that all data is consumed in a timely fashion. This gives graphs for each topic and shows any data loss and the lag at which the data is consumed (if any). This would be an optional feature that would allow you to to this kind of reconciliation automatically for all the topics kafka hosts against all the tiers of applications that interact with the data. Some details, the proposed format of the data is JSON using the following format for messages: { time:1301727060032, // the timestamp at which this audit message is sent topic: my_topic_name, // the topic this audit data is for tier:producer, // a user-defined tier name bucket_start: 130172640, // the beginning of the time bucket this data applies to bucket_end: 130172700, // the end of the time bucket this data applies to host:my_host_name.datacenter.linkedin.com, // the server that this was sent from datacenter:hlx32, // the datacenter this occurred in application:newsfeed_service, // a user-defined application name guid:51656274-a86a-4dff-b824-8e8e20a6348f, // a unique identifier for this message count:43634 } DISCUSSION Time is complex: 1. The audit data must be based on a timestamp in the events not the time on machine processing the event. Using this timestamp means that all downstream consumers will report audit data on the right time bucket. This means that there must be a timestamp in the event, which we don't currently require. Arguably we should just add a timestamp to the events, but I think it is sufficient for now just to allow the user to provide a function to extract the time from their events. 2. For counts to reconcile exactly we can only do analysis at a granularity based on the least common multiple of the bucket size used by all tiers. The simplest is just to configure them all to use the same bucket size. We currently use a bucket size of 10 mins, but anything from 1-60 mins is probably reasonable. For analysis purposes one tier is designated as the source tier and we do reconciliation against this count (e.g. if another tier has less, that is treated as lost, if another tier has more that is duplication). Note that this system makes false positives possible since you can lose an audit message. It also makes false negatives possible since if you lose both normal messages and the associated audit messages it will appear that everything adds up. The later problem is astronomically unlikely to happen exactly, though. This would integrate into the client (producer and consumer both) in the following way: 1. The user provides a way to get timestamps from messages (required) 2. The user configures the tier name, host name, datacenter name, and application name as part of the consumer and producer config. We can provide reasonable defaults if not supplied (e.g. if it is a Producer then set tier to producer and get the hostname from the OS). The application that processes this data is currently a Java Jetty app and talks to mysql. It feeds off the audit topic in kafka and runs both automatic monitoring checks and graphical displays of data against this. The data layer is not terribly scalable but because the audit data is sent only periodically this is enough to allow us to audit thousands of servers on very modest hardware