I've spent some time looking at this and testing various characters and it
would appear that Randall's suspicion was spot on. I think we can support a
fairly large set of characters with very minor changes.

I was put of by the exceptions that were thrown when creating connectors
with certain characters and suspected a larger underlying problem when in
fact the only issue is, that the URL in the rest request used to retrieve
the response for the create connector request needs to be percent encoded
[1].

I've fixed this and done some local testing which worked out quite nicely,
apart from two special cases, I've not been able to find characters that
created issues, even space and slash work.
The mentioned special cases are:
  \  - if the name contains a backslash that is not the beginning of a
valid escape sequence the request fails before we ever get it in
ConnectorsResource, so a backslash would need to be escaped: \\
  "  - Quotation marks need to be escaped as well to keep the json body of
the request legal: \"
In both cases the escape character will be part of the connector name and
need to be specified in the url to retrieve the connector as well, even
though we could URL encode it in a legal way without escaping here. So they
work, not sure if I'd recommend using those characters, but no real reason
to prohibit people from using them that I can see either.


What I'd do going forward is:
- withdraw the KIP, as I don't see a real need for one, since this is not
changing anything, just fixing things.
- add a section to the documentation around legal characters, specify the
ones I tested explicitly (url encoded %20 - %7F) and mention that most
other characters should work as well but no guarantees are given
- update the pull request for KAFKA-4930 to allow all characters but still
prohibit creating a connector with an empty name. I'd propose to keep the
validator though as it'll give us a central location to do any checking
that might turn out to be necessary later on.
- add some integration tests to check connectors with special characters in
their names work
- fix the url encoding line in ConnectorsResource

Does that sound fair to everybody?

Kind regards,
Sönke

[1]
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectorsResource.java#L102

On Tue, Oct 24, 2017 at 8:40 PM, Colin McCabe <cmcc...@apache.org> wrote:

> On Tue, Oct 24, 2017, at 11:28, Sönke Liebau wrote:
> > Hi,
> >
> > after reading your messages I'll grant that I might have picked a
> > somewhat
> > draconic option to solve these issues.
> >
> > In general I believe that properly encoding the URLs after having created
> > the connectors should solve a lot of the issues already. For some
> > characters the rest api returns an error on creating the connector as
> > well,
> > so for that URL encoding won't help. However the connectors do get
> > created
> > even though an error is returned, I've never investigated if they are in
> > a
> > consistent state tbh - I'll give this another look.
> >
> > @colin: Entity encoding would allow us to encode a lot of characters,
> > however I am unsure whether we should prefer it over url encoding in this
> > case, as mostly the end user would have to encode the characters himself.
> > And due to entity encoding ending every character with a ; which causes
> > the
> > embedded jetty server to cut the connector name at that character we'd
> > probably need to encode that character in URL encoding again for that to
> > work out - which might get a bit too complex tbh.
>
> Sorry, I meant to write percent-encoding, not entity refs.
> https://en.wikipedia.org/wiki/Percent-encoding
>
> best,
> Colin
>
>
> > I will further investigate which characters the url decoding that jetty
> > brings to the table will let us use and if all of these are correctly
> > handled during connector creation and report back with a new list of
> > characters that I think we can support fairly easily.
> >
> > Kind regards,
> > Sönke
> >
> >
> > On Tue, Oct 24, 2017 at 6:42 PM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > It should be possible to use entity references to encode these
> > > characters in URLs.  See https://dev.w3.org/html5/html-author/charref
> > > Maybe I'm misunderstanding the problem, but can we simply encode the
> > > URLs, rather than restricting the names?
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Oct 23, 2017, at 14:12, Randall Hauch wrote:
> > > > Here's the link to KIP-212:
> > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > action?pageId=74684586
> > > >
> > > > I do think it's worthwhile to define the rules for connector names.
> > > > However, I think it would be better to describe the current
> restrictions
> > > > for names outside of them appearing within URLs. For example, if we
> can
> > > > keep connector names relatively free of constraints but instead
> define
> > > > how
> > > > names should be encoded when used within URLs (e.g., URL encoding),
> then
> > > > we
> > > > may not have (m)any backward compatibility issues other than fixing
> some
> > > > bugs related to proper encoding/decoding.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > On Mon, Oct 23, 2017 at 3:44 PM, Sönke Liebau <
> > > > soenke.lie...@opencore.com.invalid> wrote:
> > > >
> > > > > All,
> > > > >
> > > > > I've created a KIP to discuss enforcing of rules on what
> characters are
> > > > > allowed in connector names.
> > > > >
> > > > > Since this may break api calls that are currently working I
> figured a
> > > KIP
> > > > > is the better way to go than to just create a jira.
> > > > >
> > > > > I'd love to hear your input on this!
> > > > >
> > >
> >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Reply via email to