Perfect, fully agree. To rephrase (and reorder).
1. We tell users that these are not really URIs.
2. Users need to encode.
3. Change components and redesign URIs to make them valid.

I'll comment on these below.

Well, there aren't many solutions.  Either users need to encode uri or we
need to change the components to not use reserved characters.  I don't
really see any other solution for now, and both are not backward
compatible.  But I'm still not sure what you have really have in mind as a
technical solution.

Also I don't think we've ever transformed any endpoint uri into a
java.net.URI, so as it was already stated, one solution is also to say
those are not uris anymore  (and I don't say I want this solution, but that
still is one to consider).
But first, yes, we *always* transformed String uris into java.net.URI. That code was there from camel 1.0. The issue is that we first encoded unsafe characters, thus hiding the *potential* problem, a bit of a hack. Actually, it wasn't really a serious problem for the first components, because they mostly used valid uri. (As an aside, there are a few other unnecessary/wrong hacks in there, like transforming direct:foo to direct://foo, which is not the same, one is a urn, the other a url, but I digress).

But then when more components came in, nobody realized the problem, myself included. We didn't think it through well enough, and more importantly didn't document any guidelines. We got into ugly URI designs like quartz and sql. For quartz we redesigned the URI, actually a few times, replacing the ' ' with '+' and other things, we still have edge cases that haunt us, look at the recent jiras. The camel-sql uri is a skeleton in the closet. Also we

All in all there are only a handful of components that are bad offenders. To me this is an opportunity we have, based on a lot of experience now, to clean up and simplify this part of Camel.

Now back to the options:

1. Yuck.
2. Totally sucks.
3. IMO the only option.

Yes, we must keep both syntax flavors and stay backwards compatible for a reasonable period of time.
Yes, we must provide a simple migration path.
Yes, we need to consider and not break components developed outside the ASF.


The relevant code (already there, unchanged for a good while) in DefaultComponent.java:

    public Endpoint createEndpoint(String uri) throws Exception {
        ObjectHelper.notNull(getCamelContext(), "camelContext");
        // check URI string to the unsafe URI characters
        String encodedUri = preProcessUri(uri);
        URI u = new URI(encodedUri);
        String path = u.getSchemeSpecificPart();
        ...

and the preProcessUri method:


    @Deprecated
    protected String preProcessUri(String uri) {
// Give components a chance to preprocess URIs and migrate to URI syntax that discourages invalid URIs
        // (see CAMEL-4425)
        // check URI string to the unsafe URI characters
        String encodedUri = UnsafeUriCharactersEncoder.encode(uri);
        if (!encodedUri.equals(uri)) {
            // uri supplied is not really valid
LOG.warn("Supplied URI '{}' contains unsafe characters, please check encoding", uri);
        }
        return encodedUri;
    }

So my solution would be something like creating another method, say oldSyntax(uri), move UnsafeUriCharactersEncoder.encode() and all the hacks there, change preProcessUri to first try to instantiate the URI, if there is an exception convert the old syntax to new syntax, issue a WARN in the log showing both the old syntax and the new syntax (this is how users could easily upgrade) and then return the valid, newly designed URI and pass it to createEndpoint.

We also need to add test support for testing/validating URIs, fairly easy to do.

There are however other things we could (should?) do I am thinking about. For instance, what about having a set of camel specific uri parameters, such as 'id' or 'conf' usable across all endpoints/components. For instance the Endpoint ids in jmx are based on the uri (key) and are totally ugly. Users could redefine that with an optional "direct:foo?id=bar", so the Endpoint id would be "bar". Or we could have "jms:foo?conf=property:bar" so bar would be a configuration object for the jms object. We could prefix them with 'camel-' or something to avoid potential conflicts. There may be other predefined params we may want to use and create conventions around. There are some problems with my examples above, I am fully aware of it, but constructive proposals are appreciated. Next week I'll be in vacation, hopefully with a clearer head and able to propose something that would fully work.

Cheers,
Hadrian




Reply via email to