Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Philip Schmitt
Hi,

I’m with Robin and Michael here.

What this decision needs is a good design brief.
This article seems decent: 
https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/

Robin is right about the usage requirements.
It goes a bit beyond resolution. How does the logo work when it’s on a sticker 
on someone’s laptop? Might there be some cases, where you want to print it in 
black and white?
And how would it look if you put the Kafka, ksqlDB, and Streams stickers on a 
laptop?

Of the two, I prefer the first option.
The brown on black is a bit subdued – it might not work well on a t-shirt or a 
laptop sticker. Maybe that could be improved by using a bolder color, but once 
it gets smaller or lower-resolution, it may not work any longer.


Regards,
Philip


P.S.:
Another article about what makes a good logo: 
https://vanschneider.com/what-makes-a-good-logo

P.P.S.:

If I were to pick a logo for Streams, I’d choose something that fits well with 
Kafka and ksqlDB.

ksqlDB has the rocket.
I can’t remember (or find) the reasoning behind the Kafka logo (aside from 
representing a K). Was there something about planets orbiting the sun? Or was 
it the atom?

So I might stick with a space/sience metaphor.
Could Streams be a comet? UFO? Star? Eclipse? ...
Maybe a satellite logo for Connect.

Space inspiration: https://thenounproject.com/term/space/





From: Robin Moffatt 
Sent: Wednesday, August 19, 2020 6:24 PM
To: users@kafka.apache.org 
Cc: d...@kafka.apache.org 
Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

I echo what Michael says here.

Another consideration is that logos are often shrunk (when used on slides)
and need to work at lower resolution (think: printing swag, stitching
socks, etc) and so whatever logo we come up with needs to not be too fiddly
in the level of detail - something that I think both the current proposed
options will fall foul of IMHO.


On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:

> Hi all!
>
> Great to see we are in the process of creating a cool logo for Kafka
> Streams.  First, I apologize for sharing feedback so late -- I just learned
> about it today. :-)
>
> Here's my *personal, subjective* opinion on the currently two logo
> candidates for Kafka Streams.
>
> TL;DR: Sorry, but I really don't like either of the proposed "otter" logos.
> Let me try to explain why.
>
>- The choice to use an animal, regardless of which specific animal,
>seems random and doesn't fit Kafka. (What's the purpose? To show that
>KStreams is 'cute'?) In comparison, the O’Reilly books always have an
>animal cover, that’s their style, and it is very recognizable.  Kafka
>however has its own, different style.  The Kafka logo has clear, simple
>lines to achieve an abstract and ‘techy’ look, which also alludes
> nicely to
>its architectural simplicity. Its logo is also a smart play on the
>Kafka-identifying letter “K” and alluding to it being a distributed
> system
>(the circles and links that make the K).
>- The proposed logos, however, make it appear as if KStreams is a
>third-party technology that was bolted onto Kafka. They certainly, for
> me,
>do not convey the message "Kafka Streams is an official part of Apache
>Kafka".
>- I, too, don't like the way the main Kafka logo is obscured (a concern
>already voiced in this thread). Also, the Kafka 'logo' embedded in the
>proposed KStreams logos is not the original one.
>- None of the proposed KStreams logos visually match the Kafka logo.
>They have a totally different style, font, line art, and color scheme.
>- Execution-wise, the main Kafka logo looks great at all sizes.  The
>style of the otter logos, in comparison, becomes undecipherable at
> smaller
>sizes.
>
> What I would suggest is to first agree on what the KStreams logo is
> supposed to convey to the reader.  Here's my personal take:
>
> Objective 1: First and foremost, the KStreams logo should make it clear and
> obvious that KStreams is an official and integral part of Apache Kafka.
> This applies to both what is depicted and how it is depicted (like font,
> line art, colors).
> Objective 2: The logo should allude to the role of KStreams in the Kafka
> project, which is the processing part.  That is, "doing something useful to
> the data in Kafka".
>
> The "circling arrow" aspect of the current otter logos does allude to
> "continuous processing", which is going in the direction of (2), but the
> logos do not meet (1) in my opinion.
>
> -Michael
>
>
>
>
> On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax  wrote:
>
> > Adding the user mailing list -- I think we should accepts votes on both
> > lists for this special case, as it's not a technical decision.
> >
> > @Boyang: as mentioned by Bruno, can we maybe add black/white options for
> > both proposals, too?
> >
> > I also agree that Design B is not ideal with regard to the Kafka logo.
> 

Re: Custom converter with Kafka Connect ?

2017-10-16 Thread Philip Schmitt
Hi Jehan,

I've run into the same issue last week and also got a "class could not be 
found" error.

Konstantine Karantasis helpfully pointed me towards 
https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-6007

To test this, I simply copied my SMT jar to the folder of the connector I was 
using and adjusted the plugin.path property.

I haven't fully tested tested it, but in my very quick initial test, it seemed 
to have move past that "class could not be found" issue.

Regards,
Philip

From: jeh...@gmail.com  on behalf of Jehan Bruggeman 

Sent: Monday, October 16, 2017 8:17 AM
To: users@kafka.apache.org
Subject: Re: Custom converter with Kafka Connect ?

Hi Randall,

thanks for your reply. I'm not sure about this; what detail could I add
that would help you figure it out ?

Concerning the classpath: as described in my original email, I'm pretty
sure the jars are correctly added to the classpath since the classes in the
jar are recognized by Kafka Connect when it starts (they are mentionned in
the logs, at least).

(It's easier to read here, where I asked the same question:
https://stackoverflow.com/questions/46712095/using-a-custom-converter-with-kafka-connect
)

thanks for your help !

Jehan

On 13 October 2017 at 16:07, Randall Hauch  wrote:

> On Tue, Oct 10, 2017 at 8:31 AM, Jehan Bruggeman  >
> wrote:
>
> > Hello,
> >
> > I'm trying to use a custom converter with Kafka Connect and I cannot seem
> > to get it right. I'm hoping someone has experience with this and could
> help
> > me figure it out !
> >
> >
> > Initial situation
> > 
> >
> > - my custom converter's class path is 'custom.CustomStringConverter'.
> >
> > - to avoid any mistakes, my custom converter is currently just a
> copy/paste
> > of the pre-existing StringConverter (of course, this will change when
> I'll
> > get it to work).
> > https://github.com/apache/kafka/blob/trunk/connect/api/
> > src/main/java/org/apache/kafka/connect/storage/StringConverter.java
> >
> > - I have a kafka connect cluster of 3 nodes, The nodes are running
> > confluent's official docker images ( confluentinc/cp-kafka-connect:3.3.0
> > ).
> >
> > - Each node is configured to load a jar with my converter in it (using a
> > docker volume).
> >
>
> Can you explain this in more detail? Make sure that you add the JAR to the
> classpath.
>
>
> >
> >
> >
> > What happens ?
> > 
> >
> > When the connectors start, they correctly load the jars and find the
> custom
> > converter. Indeed, this is what I see in the logs :
> >
> > [2017-10-10 13:06:46,274] INFO Registered loader:
> > PluginClassLoader{pluginLocation=file:/opt/custom-connectors/custom-
> > converter-1.0-SNAPSHOT.jar}
> > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:199)
> > [2017-10-10 13:06:46,274] INFO Added plugin
> 'custom.CustomStringConverter'
> > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
> > [...]
> > [2017-10-10 13:07:43,454] INFO Added aliases 'CustomStringConverter' and
> > 'CustomString' to plugin 'custom.CustomStringConverter'
> > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:293)
> >
> > I then POST a JSON config to one of the connector nodes to create my
> > connector :
> >
> > {
> >   "name": "hdfsSinkCustom",
> >   "config": {
> > "topics": "yellow",
> > "tasks.max": "1",
> > "key.converter": "org.apache.kafka.connect.storage.StringConverter",
> > "value.converter": "custom.CustomStringConverter",
> > "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
> > "hdfs.url": "hdfs://hdfs-namenode:8020/hdfs-sink",
> > "topics.dir": "yellow_storage",
> > "flush.size": "1",
> > "rotate.interval.ms": "1000"
> >   }
> > }
> >
> > And receive the following reply :
> >
> > {
> >"error_code": 400,
> >"message": "Connector configuration is invalid and contains the
> > following 1 error(s):\nInvalid value custom.CustomStringConverter for
> > configuration value.converter: Class custom.CustomStringConverter could
> not
> > be found.\nYou can also find the above list of errors at the endpoint
> > `/{connectorType}/config/validate`"
> > }
> >
> > 
> >
> > If I try running Kafka Connect stadnalone, the error message is the same.
> >
> > Has anybody faced this already ? What am I missing ?
> >
> > Many thanks to anybody reading this !
> >
> > Jehan
> >
>


Re: Reliably producing records to remote cluster: what are my options?

2017-09-15 Thread Philip Schmitt
 road :)



From: Hagen Rother 
Sent: Wednesday, September 13, 2017 10:17 PM
To: users@kafka.apache.org
Subject: Re: Reliably producing records to remote cluster: what are my options?

In my experience, 7 is the easiest route. Just make sure to run the
mirror-maker on the consumer side of the wan, it's order of magnitude
faster this way.

If you put
receive.buffer.bytes=33554432
send.buffer.bytes=33554432
in your consumer config and adjust the remote server.config to
socket.receive.buffer.bytes=33554432
socket.send.buffer.bytes=33554432

you can reliably mirror large volumes across the atlantic (we do). It would
be so much nicer to run the mirror-maker on the producer side of the wan
(enable compression in the mirror-maker and have compressed data on wan,
but cpu for that outside the hotpath; but like I said, that's order of
magnitude slower for unknown (but reproducable) reasons.

Cheers,
Hagen

On Tue, Sep 12, 2017 at 9:19 PM, Philip Schmitt 
wrote:

> Hi!
>
>
>
> We want to reliably produce events into a remote Kafka cluster in (mostly)
> near real-time. We have to provide an at-least-once guarantee.
>
> Examples are a "Customer logged in" event, that will be consumed by a data
> warehouse for reporting (numbers should be correct) or a "Customer
> unsubscribed from newsletter" event, that determines whether the customer
> gets emails (if she unsubscribes, but the message is lost, she will not be
> happy).
>
>
>
> Context:
>
>   *   We run an ecommerce website on a cluster of up to ten servers and an
> Oracle database.
>   *   We have a small Kafka cluster at a different site. We have in the
> past had a small number of network issues, where the web servers could not
> reach the other site for maybe an hour.
>   *   We don't persist all events in the database. If the application is
> restarted, events that occurred before the restart cannot be sent to Kafka.
> The row of a customer might have a newer timestamp, but we couldn't tell
> which columns were changed.
>
>
>
> Concerns:
>
>   *   In case of, for example, a network outage between the web servers
> and the Kafka cluster, we may accumulate thousands of events on each web
> server that cannot be sent to Kafka. If a server is shut down during that
> time, the messages would be lost.
>   *   If we produce to Kafka from within the application in addition to
> writing to the database, the data may become inconsistent if one of the
> writes fails.
>
>
>
>
>
> The more I read about Kafka, the more options I see, but I cannot assess,
> how well the options might work and what the trade-offs between the options
> are.
>
>
>
>   1.  produce records directly within the application
>   2.  produce records from the Oracle database via Kafka Connect
>   3.  produce records from the Oracle database via a CDC solution
> (GoldenGate, Attunity, Striim, others?)
>   4.  persist events in log files and produce to Kafka via elastic
> Logstash/Filebeat
>   5.  persist events in log files and produce to Kafka via a Kafka Connect
> source connector
>   6.  persist events in a local, embedded database and produce to Kafka
> via an existing source connector
>   7.  produce records directly within the application to a new Kafka
> cluster in the same network and mirror to remote cluster
>   8.  ?
>
>
>
> These are all the options I could gather so far. Some of the options
> probably won't work for my situation -- for example Oracle Golden Gate
> might be too expensive -- but I don't want to rule anything out just yet.
>
>
>
>
>
> How would you approach this, and why? Which options might work? Which
> options would you advise against?
>
>
>
>
> I appreciate any advice. Thank you in advance.
>
>
> Thanks,
>
> Philip
>



--
*Hagen Rother*
Lead Architect | LiquidM
--
LiquidM Technology GmbH
Rosenthaler Str. 36 | 10178 Berlin | Germany
Phone: +49 176 15 00 38 77
Internet: www.liquidm.com<http://www.liquidm.com> | LinkedIn
<http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2>
--
Managing Directors | André Bräuer, Philipp Simon, Thomas Hille
Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B


Reliably producing records to remote cluster: what are my options?

2017-09-12 Thread Philip Schmitt
Hi!



We want to reliably produce events into a remote Kafka cluster in (mostly) near 
real-time. We have to provide an at-least-once guarantee.

Examples are a "Customer logged in" event, that will be consumed by a data 
warehouse for reporting (numbers should be correct) or a "Customer unsubscribed 
from newsletter" event, that determines whether the customer gets emails (if 
she unsubscribes, but the message is lost, she will not be happy).



Context:

  *   We run an ecommerce website on a cluster of up to ten servers and an 
Oracle database.
  *   We have a small Kafka cluster at a different site. We have in the past 
had a small number of network issues, where the web servers could not reach the 
other site for maybe an hour.
  *   We don't persist all events in the database. If the application is 
restarted, events that occurred before the restart cannot be sent to Kafka. The 
row of a customer might have a newer timestamp, but we couldn't tell which 
columns were changed.



Concerns:

  *   In case of, for example, a network outage between the web servers and the 
Kafka cluster, we may accumulate thousands of events on each web server that 
cannot be sent to Kafka. If a server is shut down during that time, the 
messages would be lost.
  *   If we produce to Kafka from within the application in addition to writing 
to the database, the data may become inconsistent if one of the writes fails.





The more I read about Kafka, the more options I see, but I cannot assess, how 
well the options might work and what the trade-offs between the options are.



  1.  produce records directly within the application
  2.  produce records from the Oracle database via Kafka Connect
  3.  produce records from the Oracle database via a CDC solution (GoldenGate, 
Attunity, Striim, others?)
  4.  persist events in log files and produce to Kafka via elastic 
Logstash/Filebeat
  5.  persist events in log files and produce to Kafka via a Kafka Connect 
source connector
  6.  persist events in a local, embedded database and produce to Kafka via an 
existing source connector
  7.  produce records directly within the application to a new Kafka cluster in 
the same network and mirror to remote cluster
  8.  ?



These are all the options I could gather so far. Some of the options probably 
won't work for my situation -- for example Oracle Golden Gate might be too 
expensive -- but I don't want to rule anything out just yet.





How would you approach this, and why? Which options might work? Which options 
would you advise against?




I appreciate any advice. Thank you in advance.


Thanks,

Philip