Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo
Hi, I’m with Robin and Michael here. What this decision needs is a good design brief. This article seems decent: https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/ Robin is right about the usage requirements. It goes a bit beyond resolution. How does the logo work when it’s on a sticker on someone’s laptop? Might there be some cases, where you want to print it in black and white? And how would it look if you put the Kafka, ksqlDB, and Streams stickers on a laptop? Of the two, I prefer the first option. The brown on black is a bit subdued – it might not work well on a t-shirt or a laptop sticker. Maybe that could be improved by using a bolder color, but once it gets smaller or lower-resolution, it may not work any longer. Regards, Philip P.S.: Another article about what makes a good logo: https://vanschneider.com/what-makes-a-good-logo P.P.S.: If I were to pick a logo for Streams, I’d choose something that fits well with Kafka and ksqlDB. ksqlDB has the rocket. I can’t remember (or find) the reasoning behind the Kafka logo (aside from representing a K). Was there something about planets orbiting the sun? Or was it the atom? So I might stick with a space/sience metaphor. Could Streams be a comet? UFO? Star? Eclipse? ... Maybe a satellite logo for Connect. Space inspiration: https://thenounproject.com/term/space/ From: Robin Moffatt Sent: Wednesday, August 19, 2020 6:24 PM To: users@kafka.apache.org Cc: d...@kafka.apache.org Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo I echo what Michael says here. Another consideration is that logos are often shrunk (when used on slides) and need to work at lower resolution (think: printing swag, stitching socks, etc) and so whatever logo we come up with needs to not be too fiddly in the level of detail - something that I think both the current proposed options will fall foul of IMHO. On Wed, 19 Aug 2020 at 15:33, Michael Noll wrote: > Hi all! > > Great to see we are in the process of creating a cool logo for Kafka > Streams. First, I apologize for sharing feedback so late -- I just learned > about it today. :-) > > Here's my *personal, subjective* opinion on the currently two logo > candidates for Kafka Streams. > > TL;DR: Sorry, but I really don't like either of the proposed "otter" logos. > Let me try to explain why. > >- The choice to use an animal, regardless of which specific animal, >seems random and doesn't fit Kafka. (What's the purpose? To show that >KStreams is 'cute'?) In comparison, the O’Reilly books always have an >animal cover, that’s their style, and it is very recognizable. Kafka >however has its own, different style. The Kafka logo has clear, simple >lines to achieve an abstract and ‘techy’ look, which also alludes > nicely to >its architectural simplicity. Its logo is also a smart play on the >Kafka-identifying letter “K” and alluding to it being a distributed > system >(the circles and links that make the K). >- The proposed logos, however, make it appear as if KStreams is a >third-party technology that was bolted onto Kafka. They certainly, for > me, >do not convey the message "Kafka Streams is an official part of Apache >Kafka". >- I, too, don't like the way the main Kafka logo is obscured (a concern >already voiced in this thread). Also, the Kafka 'logo' embedded in the >proposed KStreams logos is not the original one. >- None of the proposed KStreams logos visually match the Kafka logo. >They have a totally different style, font, line art, and color scheme. >- Execution-wise, the main Kafka logo looks great at all sizes. The >style of the otter logos, in comparison, becomes undecipherable at > smaller >sizes. > > What I would suggest is to first agree on what the KStreams logo is > supposed to convey to the reader. Here's my personal take: > > Objective 1: First and foremost, the KStreams logo should make it clear and > obvious that KStreams is an official and integral part of Apache Kafka. > This applies to both what is depicted and how it is depicted (like font, > line art, colors). > Objective 2: The logo should allude to the role of KStreams in the Kafka > project, which is the processing part. That is, "doing something useful to > the data in Kafka". > > The "circling arrow" aspect of the current otter logos does allude to > "continuous processing", which is going in the direction of (2), but the > logos do not meet (1) in my opinion. > > -Michael > > > > > On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax wrote: > > > Adding the user mailing list -- I think we should accepts votes on both > > lists for this special case, as it's not a technical decision. > > > > @Boyang: as mentioned by Bruno, can we maybe add black/white options for > > both proposals, too? > > > > I also agree that Design B is not ideal with regard to the Kafka logo. >
Re: Custom converter with Kafka Connect ?
Hi Jehan, I've run into the same issue last week and also got a "class could not be found" error. Konstantine Karantasis helpfully pointed me towards https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-6007 To test this, I simply copied my SMT jar to the folder of the connector I was using and adjusted the plugin.path property. I haven't fully tested tested it, but in my very quick initial test, it seemed to have move past that "class could not be found" issue. Regards, Philip From: jeh...@gmail.com on behalf of Jehan Bruggeman Sent: Monday, October 16, 2017 8:17 AM To: users@kafka.apache.org Subject: Re: Custom converter with Kafka Connect ? Hi Randall, thanks for your reply. I'm not sure about this; what detail could I add that would help you figure it out ? Concerning the classpath: as described in my original email, I'm pretty sure the jars are correctly added to the classpath since the classes in the jar are recognized by Kafka Connect when it starts (they are mentionned in the logs, at least). (It's easier to read here, where I asked the same question: https://stackoverflow.com/questions/46712095/using-a-custom-converter-with-kafka-connect ) thanks for your help ! Jehan On 13 October 2017 at 16:07, Randall Hauch wrote: > On Tue, Oct 10, 2017 at 8:31 AM, Jehan Bruggeman > > wrote: > > > Hello, > > > > I'm trying to use a custom converter with Kafka Connect and I cannot seem > > to get it right. I'm hoping someone has experience with this and could > help > > me figure it out ! > > > > > > Initial situation > > > > > > - my custom converter's class path is 'custom.CustomStringConverter'. > > > > - to avoid any mistakes, my custom converter is currently just a > copy/paste > > of the pre-existing StringConverter (of course, this will change when > I'll > > get it to work). > > https://github.com/apache/kafka/blob/trunk/connect/api/ > > src/main/java/org/apache/kafka/connect/storage/StringConverter.java > > > > - I have a kafka connect cluster of 3 nodes, The nodes are running > > confluent's official docker images ( confluentinc/cp-kafka-connect:3.3.0 > > ). > > > > - Each node is configured to load a jar with my converter in it (using a > > docker volume). > > > > Can you explain this in more detail? Make sure that you add the JAR to the > classpath. > > > > > > > > > > What happens ? > > > > > > When the connectors start, they correctly load the jars and find the > custom > > converter. Indeed, this is what I see in the logs : > > > > [2017-10-10 13:06:46,274] INFO Registered loader: > > PluginClassLoader{pluginLocation=file:/opt/custom-connectors/custom- > > converter-1.0-SNAPSHOT.jar} > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:199) > > [2017-10-10 13:06:46,274] INFO Added plugin > 'custom.CustomStringConverter' > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132) > > [...] > > [2017-10-10 13:07:43,454] INFO Added aliases 'CustomStringConverter' and > > 'CustomString' to plugin 'custom.CustomStringConverter' > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:293) > > > > I then POST a JSON config to one of the connector nodes to create my > > connector : > > > > { > > "name": "hdfsSinkCustom", > > "config": { > > "topics": "yellow", > > "tasks.max": "1", > > "key.converter": "org.apache.kafka.connect.storage.StringConverter", > > "value.converter": "custom.CustomStringConverter", > > "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", > > "hdfs.url": "hdfs://hdfs-namenode:8020/hdfs-sink", > > "topics.dir": "yellow_storage", > > "flush.size": "1", > > "rotate.interval.ms": "1000" > > } > > } > > > > And receive the following reply : > > > > { > >"error_code": 400, > >"message": "Connector configuration is invalid and contains the > > following 1 error(s):\nInvalid value custom.CustomStringConverter for > > configuration value.converter: Class custom.CustomStringConverter could > not > > be found.\nYou can also find the above list of errors at the endpoint > > `/{connectorType}/config/validate`" > > } > > > > > > > > If I try running Kafka Connect stadnalone, the error message is the same. > > > > Has anybody faced this already ? What am I missing ? > > > > Many thanks to anybody reading this ! > > > > Jehan > > >
Re: Reliably producing records to remote cluster: what are my options?
road :) From: Hagen Rother Sent: Wednesday, September 13, 2017 10:17 PM To: users@kafka.apache.org Subject: Re: Reliably producing records to remote cluster: what are my options? In my experience, 7 is the easiest route. Just make sure to run the mirror-maker on the consumer side of the wan, it's order of magnitude faster this way. If you put receive.buffer.bytes=33554432 send.buffer.bytes=33554432 in your consumer config and adjust the remote server.config to socket.receive.buffer.bytes=33554432 socket.send.buffer.bytes=33554432 you can reliably mirror large volumes across the atlantic (we do). It would be so much nicer to run the mirror-maker on the producer side of the wan (enable compression in the mirror-maker and have compressed data on wan, but cpu for that outside the hotpath; but like I said, that's order of magnitude slower for unknown (but reproducable) reasons. Cheers, Hagen On Tue, Sep 12, 2017 at 9:19 PM, Philip Schmitt wrote: > Hi! > > > > We want to reliably produce events into a remote Kafka cluster in (mostly) > near real-time. We have to provide an at-least-once guarantee. > > Examples are a "Customer logged in" event, that will be consumed by a data > warehouse for reporting (numbers should be correct) or a "Customer > unsubscribed from newsletter" event, that determines whether the customer > gets emails (if she unsubscribes, but the message is lost, she will not be > happy). > > > > Context: > > * We run an ecommerce website on a cluster of up to ten servers and an > Oracle database. > * We have a small Kafka cluster at a different site. We have in the > past had a small number of network issues, where the web servers could not > reach the other site for maybe an hour. > * We don't persist all events in the database. If the application is > restarted, events that occurred before the restart cannot be sent to Kafka. > The row of a customer might have a newer timestamp, but we couldn't tell > which columns were changed. > > > > Concerns: > > * In case of, for example, a network outage between the web servers > and the Kafka cluster, we may accumulate thousands of events on each web > server that cannot be sent to Kafka. If a server is shut down during that > time, the messages would be lost. > * If we produce to Kafka from within the application in addition to > writing to the database, the data may become inconsistent if one of the > writes fails. > > > > > > The more I read about Kafka, the more options I see, but I cannot assess, > how well the options might work and what the trade-offs between the options > are. > > > > 1. produce records directly within the application > 2. produce records from the Oracle database via Kafka Connect > 3. produce records from the Oracle database via a CDC solution > (GoldenGate, Attunity, Striim, others?) > 4. persist events in log files and produce to Kafka via elastic > Logstash/Filebeat > 5. persist events in log files and produce to Kafka via a Kafka Connect > source connector > 6. persist events in a local, embedded database and produce to Kafka > via an existing source connector > 7. produce records directly within the application to a new Kafka > cluster in the same network and mirror to remote cluster > 8. ? > > > > These are all the options I could gather so far. Some of the options > probably won't work for my situation -- for example Oracle Golden Gate > might be too expensive -- but I don't want to rule anything out just yet. > > > > > > How would you approach this, and why? Which options might work? Which > options would you advise against? > > > > > I appreciate any advice. Thank you in advance. > > > Thanks, > > Philip > -- *Hagen Rother* Lead Architect | LiquidM -- LiquidM Technology GmbH Rosenthaler Str. 36 | 10178 Berlin | Germany Phone: +49 176 15 00 38 77 Internet: www.liquidm.com<http://www.liquidm.com> | LinkedIn <http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2> -- Managing Directors | André Bräuer, Philipp Simon, Thomas Hille Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B
Reliably producing records to remote cluster: what are my options?
Hi! We want to reliably produce events into a remote Kafka cluster in (mostly) near real-time. We have to provide an at-least-once guarantee. Examples are a "Customer logged in" event, that will be consumed by a data warehouse for reporting (numbers should be correct) or a "Customer unsubscribed from newsletter" event, that determines whether the customer gets emails (if she unsubscribes, but the message is lost, she will not be happy). Context: * We run an ecommerce website on a cluster of up to ten servers and an Oracle database. * We have a small Kafka cluster at a different site. We have in the past had a small number of network issues, where the web servers could not reach the other site for maybe an hour. * We don't persist all events in the database. If the application is restarted, events that occurred before the restart cannot be sent to Kafka. The row of a customer might have a newer timestamp, but we couldn't tell which columns were changed. Concerns: * In case of, for example, a network outage between the web servers and the Kafka cluster, we may accumulate thousands of events on each web server that cannot be sent to Kafka. If a server is shut down during that time, the messages would be lost. * If we produce to Kafka from within the application in addition to writing to the database, the data may become inconsistent if one of the writes fails. The more I read about Kafka, the more options I see, but I cannot assess, how well the options might work and what the trade-offs between the options are. 1. produce records directly within the application 2. produce records from the Oracle database via Kafka Connect 3. produce records from the Oracle database via a CDC solution (GoldenGate, Attunity, Striim, others?) 4. persist events in log files and produce to Kafka via elastic Logstash/Filebeat 5. persist events in log files and produce to Kafka via a Kafka Connect source connector 6. persist events in a local, embedded database and produce to Kafka via an existing source connector 7. produce records directly within the application to a new Kafka cluster in the same network and mirror to remote cluster 8. ? These are all the options I could gather so far. Some of the options probably won't work for my situation -- for example Oracle Golden Gate might be too expensive -- but I don't want to rule anything out just yet. How would you approach this, and why? Which options might work? Which options would you advise against? I appreciate any advice. Thank you in advance. Thanks, Philip