I found the root cause of the fishiness. It was not a bug (and was not actually 
in ChainResolver) but is in my opinion a rather unfortunate and poorly chosen 
default setting that is to blame (the latest-strategy).

During resolution in the chain resolver, the current sub-resolver tries to 
determine whether it should be skipped and the previous artifact be selected 
over the current artifact. Assuming that the "force" and "dynamic" tests do not 
result in an early rejection of the current artifact, the "latest" of the two 
artifacts is selected by sorting the artifacts in a List using a Comparator 
suited to the current latest-strategy (latest-revision, latest-time, etc).

The problem here is quite simple. The default latest-strategy is 
"latest-revision". So when a second artifact has been resolved, and it is of 
the same revision as the first, nothing gets sorted. The result is that you 
will get the first artifact found based upon the ordering of your repositories 
in the chain instead of the newer of the two artifacts.

I think that either latest-time needs to be the default strategy or the latest 
revision comparator needs to do a secondary sort to sort by lastModified time.

Possibly allow configuration of this behavior (should there be a secondary sort 
by time to avoid stale artifacts, or not so that repository order breaks the 
tie). This is important (critical) behavior and should be configurable.

Defaulting to latest-revision will not only deliver undesired stale artifacts, 
but it is unclear to the user why they are getting stale artifacts or how to 
make it stop happening. The latest-time strategy will give you the latest 
revision 99% of the time and the latest artifact 100% of the time. But the 
latest-revision strategy will give you the latest artifact 100%, 50%, 33%, or 
25% of the time when the revision numbers are the same, depending upon how many 
resolvers you have (1/n), and assuming that any repository may contain the 
latest artifact.

Furthermore, the docs do not give a great description of "force". I learned 
much about the actual behavior of this attribute while debugging. First thing I 
learned was that it is not a good name.

What force actually does is it allows a resolver to be considered when a 
previous resolver has found an artifact. After the first artifact has been 
found, only repositories with force=true will have a chance of competing (for 
instance, they might have a newer version of 1.0.0-SNAPSHOT). Otherwise, they 
are discarded immediately and no date comparison is attempted.

Force should actually be named "considerAlways" or "considerAnyway". That seems 
to be a more suitable name. No action requested here, but pointing out that 
this attribute has a misleading name.

Summary:
1 - Please reconsider changing the default latest-strategy to be latest-time.

2 - Please consider adding a secondary "lastModified" sort to 
LatestRevisionStrategy.ArtifactInfoComparator whether or not you change the 
default latest-strategy to latest-time or not.

3 - Please document, illustrate, and demonstrate in one place the behaviors of 
chain resolver in combination with force, returnFirst, defaultLatestStrategy, 
ivy.resolver.default.check.modified, useOrigin, and other settings and 
attributes that affect resolution behavior. (I am working on this document now.)

4 - Please consider creating independent caches by default for each repository. 
I have not drilled down on this issue yet, but I suspect that it fixes serious 
cache collision issues that I think I saw while debugging (found and selected 
local repo artifact, checked cache before delivery, ended up delivering cached 
stale artifact that came from a totally different repo :( ).

Thanks,

L.K.

From: Loren Kratzke
Sent: Tuesday, March 24, 2015 1:09 PM
To: 'dev@ant.apache.org'
Subject: Possible Ivy bug (and suggested fix) in ChainResolver

I have a some observations about how the chain resolver selects a dependency. I 
think this may be a bug but I am not sure because the intent of the source code 
is not entirely clear. It reads one way, but behaves in a different way. I have 
pinpointed the exact spots in code where this happens.

Here is my simple test setup used to debug this issue. I have two resolvers 
(Filesystem and URL) configured in a ChainResolver in that order. I publish to 
one resolver and then the other repeatedly and consume the result in another 
project. I use checkModified=true and changingPattern=".*" on both resolvers.

My artifact is simply a text file with the current date and time so it is easy 
to see whether you get fresh or stale artifacts from the repos.

When I consume the published artifact from the other project, I will get the 
artifact from the first configured resolver in the chain (Filesystem in this 
case). But I know from debugging that the second resolver is also evaluated. So 
as an experiment, I added force="true" on the second resolver to see if I could 
force Ivy to ignore the first result and favor an artifact returned by the 
second resolver. Instead, Ivy returned the artifact from the first resolver 
even though the second artifact was newer AND the second resolver had 
force="true".

When I debugged this to see why the first artifact was chosen over the second 
artifact, I found something very fishy.

ChainResolver.getDependency() iterates over each resolver in the chain. First 
it found the Filesystem resolver and the artifact and next it found the URL 
resolver and artifact. Next it calls BasicResolver.getDependency() which will 
compare the previously resolved artifact with the current artifact.

This is where it gets very fishy. At the end of the getDependency() method it 
calls AbstractResolver.checkLatest() which I assume is intended to return the 
latest of the two artifacts. But that comparison never happens. 
AbstractResolver.isAfter is invoked with two artifacts to be compared and a 
null Date. Since the date is null, the two artifacts are never compared and no 
matter what, the first artifact will be returned and the second one discarded 
and a verbose message will be emitted stating that the second artifact is older 
than the first artifact, every time. The message is on line 533 of 
AbstractResolver. (I am looking at Ivy-2.3.0 so if that line does not make 
sense on trunk then let me know.)

    Message.debug("\tmodule revision kept as younger: " + newModuleDesc);
    saveModuleRevisionIfNeeded(dd, newModuleFound);
    return newModuleFound;

The message is not true. The artifact that was kept was the older of the two 
and a comparison of lastModified never happened (and never can happen in the 
current code as far as I can tell).

So the actual logic in AbstractResolver.checkLatest() simply returns the first 
artifact found. While this is not a bad behavior, it does not seem like it is 
the intended behavior. I mean, why go through all the trouble of pretending to 
compare two artifacts using date methods when the logic never executes because 
the passed in Date object is null. And why emit a message stating that one was 
determined to be older than the other. That is super fishy.

Furthermore, the next line in ChainResolver.getDependency() after 
resolver.getDependency() is called (ChainResolver line105) references 
isReturnFirst(). That is fishy because none of that matters any more. The 
current artifact was rejected on the previous line of code and the previous 
(aka first) artifact is now the current artifact and is the one that will be 
returned (without a date comparison, and for the arbitrary reason that is was 
found before the other one).

I think that the intent of the null Date object is to compare each artifact to 
a static Date configured elsewhere (I have no idea where), but if the code were 
to actually compare the lastModified dates of the two artifacts, a useful 
result would happen - Ivy would return the latest artifact from across multiple 
repositories.

That is huge because I have never been able to get Ivy to do this. I have never 
seen anybody get Ivy to search multiple repositories and return the latest 
artifact. This is useful for local development when you publish locally to 
consume locally modified artifacts. It would be nice to have the option of 
picking up newer artifacts from a central repo when those occur without having 
to blow away a local repository and its cache.

(By the way, giving my local repo its own cache seems to have solved some other 
strange issues I was having. I recommend this to everybody and I think it 
should be a default in Ivy, but that is debatable and would need some more 
research and concensus.)

I think that this is a good feature and should be configurable. I think 
possibly it was intended to be configured via 
ChainResolver.returnFirst="true|false" but that code executed when it was too 
late and the decision had already been made. If I were to make this a feature, 
and make it configurable, I would configure this using an attribute named 
returnFirst because that is the exact facet of functionality that we are 
talking about here.

Thanks for your attention. Hope I am helping here. I am considering coding this 
to see if it works as expected. I would be happy to report my results and 
provide a patch if anybody is interested in evaluating this.

L.K.

Reply via email to